diff --git a/vendor/stb/.github/CONTRIBUTING.md b/vendor/stb/.github/CONTRIBUTING.md
new file mode 100644
index 0000000..54e3543
--- /dev/null
+++ b/vendor/stb/.github/CONTRIBUTING.md
@@ -0,0 +1,32 @@
+Pull Requests and Issues are both welcome.
+
+# Responsiveness
+
+General priority order is:
+
+* Crashes
+* Security issues in stb_image
+* Bugs
+* Security concerns in other libs
+* Warnings
+* Enhancements (new features, performance improvement, etc)
+
+Pull requests get priority over Issues. Some pull requests I take
+as written; some I modify myself; some I will request changes before
+accepting them. Because I've ended up supporting a lot of libraries
+(20 as I write this, with more on the way), I am somewhat slow to
+address things. Many issues have been around for a long time.
+
+# Pull requests
+
+* Make sure you're using a special branch just for this pull request. (Sometimes people unknowingly use a default branch, then later update that branch, which updates the pull request with the other changes if it hasn't been merged yet.)
+* Do NOT update the version number in the file. (This just causes conflicts.)
+* Do add your name to the list of contributors. (Don't worry about the formatting.) I'll try to remember to add it if you don't, but I sometimes forget as it's an extra step.
+* Your change needs to compile as both C and C++. Pre-C99 compilers should be supported (e.g. declare at start of block)
+
+# Specific libraries
+
+I generally do not want new file formats for stb_image because
+we are trying to improve its security, so increasing its attack
+surface is counter-productive.
+
diff --git a/vendor/stb/.github/ISSUE_TEMPLATE/1-stb_image-doesn-t-load-specific-image-correctly.md b/vendor/stb/.github/ISSUE_TEMPLATE/1-stb_image-doesn-t-load-specific-image-correctly.md
new file mode 100644
index 0000000..88e83b2
--- /dev/null
+++ b/vendor/stb/.github/ISSUE_TEMPLATE/1-stb_image-doesn-t-load-specific-image-correctly.md
@@ -0,0 +1,15 @@
+---
+name: stb_image Doesn't Load Specific Image Correctly
+about: if an image displays wrong in your program, and you've verified stb_image is
+  the problem
+title: ''
+labels: 1 stb_image
+assignees: ''
+
+---
+
+1. **Confirm that, after loading the image with stbi_load, you've immediately written it out with stbi_write_png or similar, and that version of the image is also wrong.** If it is correct when written out, the problem is not in stb_image. If it displays wrong in a program you're writing, it's probably your display code. For example, people writing OpenGL programs frequently do not upload or display the image correctly and assume stb_image is at fault even though writing out the image demonstrates that it loads correctly.
+
+2. *Provide an image that does not load correctly using stb_image* so we can reproduce the problem.
+
+3. *Provide an image or description of what part of the image is incorrect and how* so we can be sure we've reproduced the problem correctly.
diff --git a/vendor/stb/.github/ISSUE_TEMPLATE/2-bug_report.md b/vendor/stb/.github/ISSUE_TEMPLATE/2-bug_report.md
new file mode 100644
index 0000000..a5cb26f
--- /dev/null
+++ b/vendor/stb/.github/ISSUE_TEMPLATE/2-bug_report.md
@@ -0,0 +1,24 @@
+---
+name: Bug report
+about: if you're having trouble using a library, try the support forum instead
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
diff --git a/vendor/stb/.github/ISSUE_TEMPLATE/3-feature_request.md b/vendor/stb/.github/ISSUE_TEMPLATE/3-feature_request.md
new file mode 100644
index 0000000..71c8763
--- /dev/null
+++ b/vendor/stb/.github/ISSUE_TEMPLATE/3-feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: 4 enhancement
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/vendor/stb/.github/ISSUE_TEMPLATE/config.yml b/vendor/stb/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 0000000..2cd43bb
--- /dev/null
+++ b/vendor/stb/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: false
+contact_links:
+  - name: support forum
+    url: https://github.com/nothings/stb/discussions/categories/q-a
+    about: having trouble using an stb library? don't create an issue, post in the forum
diff --git a/vendor/stb/.github/PULL_REQUEST_TEMPLATE.md b/vendor/stb/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 0000000..2b10daa
--- /dev/null
+++ b/vendor/stb/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,6 @@
+* Delete this list before clicking CREATE PULL REQUEST
+* Make sure you're using a special branch just for this pull request. (Sometimes people unknowingly use a default branch, then later update that branch, which updates the pull request with the other changes if it hasn't been merged yet.)
+* Do NOT update the version number in the file. (This just causes conflicts.)
+* Do add your name to the list of contributors. (Don't worry about the formatting.) I'll try to remember to add it if you don't, but I sometimes forget as it's an extra step.
+
+If you get something above wrong, don't fret it, it's not the end of the world.
diff --git a/vendor/stb/.github/workflows/ci-fuzz.yml b/vendor/stb/.github/workflows/ci-fuzz.yml
new file mode 100644
index 0000000..332fca9
--- /dev/null
+++ b/vendor/stb/.github/workflows/ci-fuzz.yml
@@ -0,0 +1,23 @@
+name: CIFuzz
+on: [pull_request]
+jobs:
+  Fuzzing:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Build Fuzzers
+      uses: google/oss-fuzz/infra/cifuzz/actions/build_fuzzers@master
+      with:
+        oss-fuzz-project-name: 'stb'
+        dry-run: false
+    - name: Run Fuzzers
+      uses: google/oss-fuzz/infra/cifuzz/actions/run_fuzzers@master
+      with:
+        oss-fuzz-project-name: 'stb'
+        fuzz-seconds: 900
+        dry-run: false
+    - name: Upload Crash
+      uses: actions/upload-artifact@v1
+      if: failure()
+      with:
+        name: artifacts
+        path: ./out/artifacts
diff --git a/vendor/stb/.gitignore b/vendor/stb/.gitignore
new file mode 100644
index 0000000..8cc774f
--- /dev/null
+++ b/vendor/stb/.gitignore
@@ -0,0 +1,3 @@
+*.o
+*.obj
+*.exe
diff --git a/vendor/stb/.travis.yml b/vendor/stb/.travis.yml
new file mode 100644
index 0000000..c2ad947
--- /dev/null
+++ b/vendor/stb/.travis.yml
@@ -0,0 +1,8 @@
+language: C
+arch:
+  - AMD64
+  - ppc64le
+install: true
+script:
+  - cd tests
+  - make all
diff --git a/vendor/stb/LICENSE b/vendor/stb/LICENSE
new file mode 100644
index 0000000..a77ae91
--- /dev/null
+++ b/vendor/stb/LICENSE
@@ -0,0 +1,37 @@
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/vendor/stb/README.md b/vendor/stb/README.md
new file mode 100644
index 0000000..2f21190
--- /dev/null
+++ b/vendor/stb/README.md
@@ -0,0 +1,184 @@
+<!---   THIS FILE IS AUTOMATICALLY GENERATED, DO NOT CHANGE IT BY HAND   --->
+
+stb
+===
+
+single-file public domain (or MIT licensed) libraries for C/C++
+
+# This project discusses security-relevant bugs in public in Github Issues and Pull Requests, and it may take significant time for security fixes to be implemented or merged. If this poses an unreasonable risk to your project, do not use stb libraries.
+
+Noteworthy:
+
+* image loader: [stb_image.h](stb_image.h)
+* image writer: [stb_image_write.h](stb_image_write.h)
+* image resizer: [stb_image_resize2.h](stb_image_resize2.h)
+* font text rasterizer: [stb_truetype.h](stb_truetype.h)
+* typesafe containers: [stb_ds.h](stb_ds.h)
+
+Most libraries by stb, except: stb_dxt by Fabian "ryg" Giesen, original stb_image_resize
+by Jorge L. "VinoBS" Rodriguez, and stb_image_resize2 and stb_sprintf by Jeff Roberts.
+
+<a name="stb_libs"></a>
+
+library    | latest version | category | LoC | description
+--------------------- | ---- | -------- | --- | --------------------------------
+**[stb_vorbis.c](stb_vorbis.c)** | 1.22 | audio | 5584 | decode ogg vorbis files from file/memory to float/16-bit signed output
+**[stb_hexwave.h](stb_hexwave.h)** | 0.5 | audio | 680 | audio waveform synthesizer
+**[stb_image.h](stb_image.h)** | 2.30 | graphics | 7988 | image loading/decoding from file/memory: JPG, PNG, TGA, BMP, PSD, GIF, HDR, PIC
+**[stb_truetype.h](stb_truetype.h)** | 1.26 | graphics | 5079 | parse, decode, and rasterize characters from truetype fonts
+**[stb_image_write.h](stb_image_write.h)** | 1.16 | graphics | 1724 | image writing to disk: PNG, TGA, BMP
+**[stb_image_resize2.h](stb_image_resize2.h)** | 2.16 | graphics | 10650 | resize images larger/smaller with good quality
+**[stb_rect_pack.h](stb_rect_pack.h)** | 1.01 | graphics | 623 | simple 2D rectangle packer with decent quality
+**[stb_perlin.h](stb_perlin.h)** | 0.5 | graphics | 428 | perlin's revised simplex noise w/ different seeds
+**[stb_ds.h](stb_ds.h)** | 0.67 | utility | 1895 | typesafe dynamic array and hash tables for C, will compile in C++
+**[stb_sprintf.h](stb_sprintf.h)** | 1.10 | utility | 1906 | fast sprintf, snprintf for C/C++
+**[stb_textedit.h](stb_textedit.h)** | 1.14 | user&nbsp;interface | 1429 | guts of a text editor for games etc implementing them from scratch
+**[stb_voxel_render.h](stb_voxel_render.h)** | 0.89 | 3D&nbsp;graphics | 3807 | Minecraft-esque voxel rendering "engine" with many more features
+**[stb_dxt.h](stb_dxt.h)** | 1.12 | 3D&nbsp;graphics | 719 | Fabian "ryg" Giesen's real-time DXT compressor
+**[stb_easy_font.h](stb_easy_font.h)** | 1.1 | 3D&nbsp;graphics | 305 | quick-and-dirty easy-to-deploy bitmap font for printing frame rate, etc
+**[stb_tilemap_editor.h](stb_tilemap_editor.h)** | 0.42 | game&nbsp;dev | 4187 | embeddable tilemap editor
+**[stb_herringbone_wa...](stb_herringbone_wang_tile.h)** | 0.7 | game&nbsp;dev | 1221 | herringbone Wang tile map generator
+**[stb_c_lexer.h](stb_c_lexer.h)** | 0.12 | parsing | 941 | simplify writing parsers for C-like languages
+**[stb_divide.h](stb_divide.h)** | 0.94 | math | 433 | more useful 32-bit modulus e.g. "euclidean divide"
+**[stb_connected_comp...](stb_connected_components.h)** | 0.96 | misc | 1049 | incrementally compute reachability on grids
+**[stb_leakcheck.h](stb_leakcheck.h)** | 0.6 | misc | 194 | quick-and-dirty malloc/free leak-checking
+**[stb_include.h](stb_include.h)** | 0.02 | misc | 295 | implement recursive #include support, particularly for GLSL
+
+Total libraries: 21
+Total lines of C code: 51137
+
+
+FAQ
+---
+
+#### What's the license?
+
+These libraries are in the public domain. You can do anything you
+want with them. You have no legal obligation
+to do anything else, although I appreciate attribution.
+
+They are also licensed under the MIT open source license, if you have lawyers
+who are unhappy with public domain. Every source file includes an explicit
+dual-license for you to choose from.
+
+#### How do I use these libraries?
+
+The idea behind single-header file libraries is that they're easy to distribute and deploy
+because all the code is contained in a single file. By default, the .h files in here act as
+their own header files, i.e. they declare the functions contained in the file but don't
+actually result in any code getting compiled.
+
+So in addition, you should select _exactly one_ C/C++ source file that actually instantiates
+the code, preferably a file you're not editing frequently. This file should define a
+specific macro (this is documented per-library) to actually enable the function definitions.
+For example, to use stb_image, you should have exactly one C/C++ file that doesn't
+include stb_image.h regularly, but instead does
+
+    #define STB_IMAGE_IMPLEMENTATION
+    #include "stb_image.h"
+
+The right macro to define is pointed out right at the top of each of these libraries.
+
+#### <a name="other_libs"></a> Are there other single-file public-domain/open source libraries with minimal dependencies out there?
+
+[Yes.](https://github.com/nothings/single_file_libs)
+
+#### If I wrap an stb library in a new library, does the new library have to be public domain/MIT?
+
+No, because it's public domain you can freely relicense it to whatever license your new
+library wants to be.
+
+#### What's the deal with SSE support in GCC-based compilers?
+
+stb_image will either use SSE2 (if you compile with -msse2) or
+will not use any SIMD at all, rather than trying to detect the
+processor at runtime and handle it correctly. As I understand it,
+the approved path in GCC for runtime-detection require
+you to use multiple source files, one for each CPU configuration.
+Because stb_image is a header-file library that compiles in only
+one source file, there's no approved way to build both an
+SSE-enabled and a non-SSE-enabled variation.
+
+While we've tried to work around it, we've had multiple issues over
+the years due to specific versions of gcc breaking what we're doing,
+so we've given up on it. See https://github.com/nothings/stb/issues/280
+and https://github.com/nothings/stb/issues/410 for examples.
+
+#### Some of these libraries seem redundant to existing open source libraries. Are they better somehow?
+
+Generally they're only better in that they're easier to integrate,
+easier to use, and easier to release (single file; good API; no
+attribution requirement). They may be less featureful, slower,
+and/or use more memory. If you're already using an equivalent
+library, there's probably no good reason to switch.
+
+#### Can I link directly to the table of stb libraries?
+
+You can use [this URL](https://github.com/nothings/stb#stb_libs) to link directly to that list.
+
+#### Why do you list "lines of code"? It's a terrible metric.
+
+Just to give you some idea of the internal complexity of the library,
+to help you manage your expectations, or to let you know what you're
+getting into. While not all the libraries are written in the same
+style, they're certainly similar styles, and so comparisons between
+the libraries are probably still meaningful.
+
+Note though that the lines do include both the implementation, the
+part that corresponds to a header file, and the documentation.
+
+#### Why single-file headers?
+
+Windows doesn't have standard directories where libraries
+live. That makes deploying libraries in Windows a lot more
+painful than open source developers on Unix-derivates generally
+realize. (It also makes library dependencies a lot worse in Windows.)
+
+There's also a common problem in Windows where a library was built
+against a different version of the runtime library, which causes
+link conflicts and confusion. Shipping the libs as headers means
+you normally just compile them straight into your project without
+making libraries, thus sidestepping that problem.
+
+Making them a single file makes it very easy to just
+drop them into a project that needs them. (Of course you can
+still put them in a proper shared library tree if you want.)
+
+Why not two files, one a header and one an implementation?
+The difference between 10 files and 9 files is not a big deal,
+but the difference between 2 files and 1 file is a big deal.
+You don't need to zip or tar the files up, you don't have to
+remember to attach *two* files, etc.
+
+#### Why "stb"? Is this something to do with Set-Top Boxes?
+
+No, they are just the initials for my name, Sean T. Barrett.
+This was not chosen out of egomania, but as a moderately sane
+way of namespacing the filenames and source function names.
+
+#### Will you add more image types to stb_image.h?
+
+No. As stb_image use has grown, it has become more important
+for us to focus on security of the codebase. Adding new image
+formats increases the amount of code we need to secure, so it
+is no longer worth adding new formats.
+
+#### Do you have any advice on how to create my own single-file library?
+
+Yes. https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
+
+#### Why public domain?
+
+I prefer it over GPL, LGPL, BSD, zlib, etc. for many reasons.
+Some of them are listed here:
+https://github.com/nothings/stb/blob/master/docs/why_public_domain.md
+
+#### Why C?
+
+Primarily, because I use C, not C++. But it does also make it easier
+for other people to use them from other languages.
+
+#### Why not C99? stdint.h, declare-anywhere, etc.
+
+I still use MSVC 6 (1998) as my IDE because it has better human factors
+for me than later versions of MSVC.
diff --git a/vendor/stb/SECURITY.md b/vendor/stb/SECURITY.md
new file mode 100644
index 0000000..f01caef
--- /dev/null
+++ b/vendor/stb/SECURITY.md
@@ -0,0 +1,2 @@
+# Security Policy
+This project discusses security-relevant bugs in public in Github Issues and Pull Requests, and it may take significant time for security fixes to be implemented or merged. If this poses an unreasonable risk to your project, do not use stb libraries.
diff --git a/vendor/stb/data/atari_8bit_font_revised.png b/vendor/stb/data/atari_8bit_font_revised.png
new file mode 100644
index 0000000..91c553c
Binary files /dev/null and b/vendor/stb/data/atari_8bit_font_revised.png differ
diff --git a/vendor/stb/data/easy_font_raw.png b/vendor/stb/data/easy_font_raw.png
new file mode 100644
index 0000000..2f08148
Binary files /dev/null and b/vendor/stb/data/easy_font_raw.png differ
diff --git a/vendor/stb/data/herringbone/license.txt b/vendor/stb/data/herringbone/license.txt
new file mode 100644
index 0000000..11ffc42
--- /dev/null
+++ b/vendor/stb/data/herringbone/license.txt
@@ -0,0 +1,4 @@
+All files in this directory are in the public domain. Where
+a public domain declaration is not recognized, you are granted
+a license to freely use, modify, and redistribute them in
+any way you choose.
\ No newline at end of file
diff --git a/vendor/stb/data/herringbone/template_caves_limit_connectivity.png b/vendor/stb/data/herringbone/template_caves_limit_connectivity.png
new file mode 100644
index 0000000..1c286e7
Binary files /dev/null and b/vendor/stb/data/herringbone/template_caves_limit_connectivity.png differ
diff --git a/vendor/stb/data/herringbone/template_caves_tiny_corridors.png b/vendor/stb/data/herringbone/template_caves_tiny_corridors.png
new file mode 100644
index 0000000..e9b0d44
Binary files /dev/null and b/vendor/stb/data/herringbone/template_caves_tiny_corridors.png differ
diff --git a/vendor/stb/data/herringbone/template_corner_caves.png b/vendor/stb/data/herringbone/template_corner_caves.png
new file mode 100644
index 0000000..73421e9
Binary files /dev/null and b/vendor/stb/data/herringbone/template_corner_caves.png differ
diff --git a/vendor/stb/data/herringbone/template_horizontal_corridors_v1.png b/vendor/stb/data/herringbone/template_horizontal_corridors_v1.png
new file mode 100644
index 0000000..c14380d
Binary files /dev/null and b/vendor/stb/data/herringbone/template_horizontal_corridors_v1.png differ
diff --git a/vendor/stb/data/herringbone/template_horizontal_corridors_v2.png b/vendor/stb/data/herringbone/template_horizontal_corridors_v2.png
new file mode 100644
index 0000000..8a35bec
Binary files /dev/null and b/vendor/stb/data/herringbone/template_horizontal_corridors_v2.png differ
diff --git a/vendor/stb/data/herringbone/template_horizontal_corridors_v3.png b/vendor/stb/data/herringbone/template_horizontal_corridors_v3.png
new file mode 100644
index 0000000..f921807
Binary files /dev/null and b/vendor/stb/data/herringbone/template_horizontal_corridors_v3.png differ
diff --git a/vendor/stb/data/herringbone/template_limit_connectivity_fat.png b/vendor/stb/data/herringbone/template_limit_connectivity_fat.png
new file mode 100644
index 0000000..dda1302
Binary files /dev/null and b/vendor/stb/data/herringbone/template_limit_connectivity_fat.png differ
diff --git a/vendor/stb/data/herringbone/template_limited_connectivity.png b/vendor/stb/data/herringbone/template_limited_connectivity.png
new file mode 100644
index 0000000..d9f97c9
Binary files /dev/null and b/vendor/stb/data/herringbone/template_limited_connectivity.png differ
diff --git a/vendor/stb/data/herringbone/template_maze_2_wide.png b/vendor/stb/data/herringbone/template_maze_2_wide.png
new file mode 100644
index 0000000..0e5bfad
Binary files /dev/null and b/vendor/stb/data/herringbone/template_maze_2_wide.png differ
diff --git a/vendor/stb/data/herringbone/template_maze_plus_2_wide.png b/vendor/stb/data/herringbone/template_maze_plus_2_wide.png
new file mode 100644
index 0000000..d27f7bd
Binary files /dev/null and b/vendor/stb/data/herringbone/template_maze_plus_2_wide.png differ
diff --git a/vendor/stb/data/herringbone/template_open_areas.png b/vendor/stb/data/herringbone/template_open_areas.png
new file mode 100644
index 0000000..7ac4eab
Binary files /dev/null and b/vendor/stb/data/herringbone/template_open_areas.png differ
diff --git a/vendor/stb/data/herringbone/template_ref2_corner_caves.png b/vendor/stb/data/herringbone/template_ref2_corner_caves.png
new file mode 100644
index 0000000..b6c70c9
Binary files /dev/null and b/vendor/stb/data/herringbone/template_ref2_corner_caves.png differ
diff --git a/vendor/stb/data/herringbone/template_rooms_and_corridors.png b/vendor/stb/data/herringbone/template_rooms_and_corridors.png
new file mode 100644
index 0000000..c0467f3
Binary files /dev/null and b/vendor/stb/data/herringbone/template_rooms_and_corridors.png differ
diff --git a/vendor/stb/data/herringbone/template_rooms_and_corridors_2_wide_diagonal_bias.png b/vendor/stb/data/herringbone/template_rooms_and_corridors_2_wide_diagonal_bias.png
new file mode 100644
index 0000000..45c669c
Binary files /dev/null and b/vendor/stb/data/herringbone/template_rooms_and_corridors_2_wide_diagonal_bias.png differ
diff --git a/vendor/stb/data/herringbone/template_rooms_limit_connectivity.png b/vendor/stb/data/herringbone/template_rooms_limit_connectivity.png
new file mode 100644
index 0000000..e07599e
Binary files /dev/null and b/vendor/stb/data/herringbone/template_rooms_limit_connectivity.png differ
diff --git a/vendor/stb/data/herringbone/template_round_rooms_diagonal_corridors.png b/vendor/stb/data/herringbone/template_round_rooms_diagonal_corridors.png
new file mode 100644
index 0000000..2073f98
Binary files /dev/null and b/vendor/stb/data/herringbone/template_round_rooms_diagonal_corridors.png differ
diff --git a/vendor/stb/data/herringbone/template_sean_dungeon.png b/vendor/stb/data/herringbone/template_sean_dungeon.png
new file mode 100644
index 0000000..5be3b24
Binary files /dev/null and b/vendor/stb/data/herringbone/template_sean_dungeon.png differ
diff --git a/vendor/stb/data/herringbone/template_simple_caves_2_wide.png b/vendor/stb/data/herringbone/template_simple_caves_2_wide.png
new file mode 100644
index 0000000..3217271
Binary files /dev/null and b/vendor/stb/data/herringbone/template_simple_caves_2_wide.png differ
diff --git a/vendor/stb/data/herringbone/template_square_rooms_with_random_rects.png b/vendor/stb/data/herringbone/template_square_rooms_with_random_rects.png
new file mode 100644
index 0000000..0d7e82e
Binary files /dev/null and b/vendor/stb/data/herringbone/template_square_rooms_with_random_rects.png differ
diff --git a/vendor/stb/data/map_01.png b/vendor/stb/data/map_01.png
new file mode 100644
index 0000000..2da3f5c
Binary files /dev/null and b/vendor/stb/data/map_01.png differ
diff --git a/vendor/stb/data/map_02.png b/vendor/stb/data/map_02.png
new file mode 100644
index 0000000..461796b
Binary files /dev/null and b/vendor/stb/data/map_02.png differ
diff --git a/vendor/stb/data/map_03.png b/vendor/stb/data/map_03.png
new file mode 100644
index 0000000..3aebcb9
Binary files /dev/null and b/vendor/stb/data/map_03.png differ
diff --git a/vendor/stb/deprecated/rrsprintf.h b/vendor/stb/deprecated/rrsprintf.h
new file mode 100644
index 0000000..62962e3
--- /dev/null
+++ b/vendor/stb/deprecated/rrsprintf.h
@@ -0,0 +1,1055 @@
+#ifndef RR_SPRINTF_H_INCLUDE
+#define RR_SPRINTF_H_INCLUDE
+
+/*
+Single file sprintf replacement.
+
+Originally written by Jeff Roberts at RAD Game Tools - 2015/10/20. 
+Hereby placed in public domain.
+
+This is a full sprintf replacement that supports everything that
+the C runtime sprintfs support, including float/double, 64-bit integers,
+hex floats, field parameters (%*.*d stuff), length reads backs, etc.
+
+Why would you need this if sprintf already exists?  Well, first off,
+it's *much* faster (see below). It's also much smaller than the CRT
+versions code-space-wise. We've also added some simple improvements 
+that are super handy (commas in thousands, callbacks at buffer full,
+for example). Finally, the format strings for MSVC and GCC differ 
+for 64-bit integers (among other small things), so this lets you use 
+the same format strings in cross platform code.
+
+It uses the standard single file trick of being both the header file
+and the source itself. If you just include it normally, you just get 
+the header file function definitions. To get the code, you include
+it from a C or C++ file and define RR_SPRINTF_IMPLEMENTATION first.
+
+It only uses va_args macros from the C runtime to do it's work. It
+does cast doubles to S64s and shifts and divides U64s, which does 
+drag in CRT code on most platforms.
+
+It compiles to roughly 8K with float support, and 4K without.
+As a comparison, when using MSVC static libs, calling sprintf drags
+in 16K.
+
+API:
+====
+int rrsprintf( char * buf, char const * fmt, ... )
+int rrsnprintf( char * buf, int count, char const * fmt, ... )
+  Convert an arg list into a buffer.  rrsnprintf always returns
+  a zero-terminated string (unlike regular snprintf).
+
+int rrvsprintf( char * buf, char const * fmt, va_list va )
+int rrvsnprintf( char * buf, int count, char const * fmt, va_list va )
+  Convert a va_list arg list into a buffer.  rrvsnprintf always returns
+  a zero-terminated string (unlike regular snprintf).
+
+int rrvsprintfcb( RRSPRINTFCB * callback, void * user, char * buf, char const * fmt, va_list va )
+    typedef char * RRSPRINTFCB( char const * buf, void * user, int len );
+  Convert into a buffer, calling back every RR_SPRINTF_MIN chars.
+  Your callback can then copy the chars out, print them or whatever.
+  This function is actually the workhorse for everything else.
+  The buffer you pass in must hold at least RR_SPRINTF_MIN characters.
+    // you return the next buffer to use or 0 to stop converting
+
+void rrsetseparators( char comma, char period )
+  Set the comma and period characters to use.
+
+FLOATS/DOUBLES:
+===============
+This code uses a internal float->ascii conversion method that uses
+doubles with error correction (double-doubles, for ~105 bits of
+precision).  This conversion is round-trip perfect - that is, an atof
+of the values output here will give you the bit-exact double back.
+
+One difference is that our insignificant digits will be different than 
+with MSVC or GCC (but they don't match each other either).  We also 
+don't attempt to find the minimum length matching float (pre-MSVC15 
+doesn't either).
+
+If you don't need float or doubles at all, define RR_SPRINTF_NOFLOAT
+and you'll save 4K of code space.
+
+64-BIT INTS:
+============
+This library also supports 64-bit integers and you can use MSVC style or
+GCC style indicators (%I64d or %lld).  It supports the C99 specifiers
+for size_t and ptr_diff_t (%jd %zd) as well.
+
+EXTRAS:
+=======
+Like some GCCs, for integers and floats, you can use a ' (single quote)
+specifier and commas will be inserted on the thousands: "%'d" on 12345 
+would print 12,345.
+
+For integers and floats, you can use a "$" specifier and the number 
+will be converted to float and then divided to get kilo, mega, giga or
+tera and then printed, so "%$d" 1024 is "1.0 k", "%$.2d" 2536000 is 
+"2.42 m", etc.
+
+In addition to octal and hexadecimal conversions, you can print 
+integers in binary: "%b" for 256 would print 100.
+
+PERFORMANCE vs MSVC 2008 32-/64-bit (GCC is even slower than MSVC):
+===================================================================
+"%d" across all 32-bit ints (4.8x/4.0x faster than 32-/64-bit MSVC)
+"%24d" across all 32-bit ints (4.5x/4.2x faster)
+"%x" across all 32-bit ints (4.5x/3.8x faster)
+"%08x" across all 32-bit ints (4.3x/3.8x faster)
+"%f" across e-10 to e+10 floats (7.3x/6.0x faster)
+"%e" across e-10 to e+10 floats (8.1x/6.0x faster)
+"%g" across e-10 to e+10 floats (10.0x/7.1x faster)
+"%f" for values near e-300 (7.9x/6.5x faster)
+"%f" for values near e+300 (10.0x/9.1x faster)
+"%e" for values near e-300 (10.1x/7.0x faster)
+"%e" for values near e+300 (9.2x/6.0x faster)
+"%.320f" for values near e-300 (12.6x/11.2x faster)
+"%a" for random values (8.6x/4.3x faster)
+"%I64d" for 64-bits with 32-bit values (4.8x/3.4x faster)
+"%I64d" for 64-bits > 32-bit values (4.9x/5.5x faster)
+"%s%s%s" for 64 char strings (7.1x/7.3x faster)
+"...512 char string..." ( 35.0x/32.5x faster!)
+*/
+
+#ifdef RR_SPRINTF_STATIC
+#define RRPUBLIC_DEC static
+#define RRPUBLIC_DEF static
+#else
+#ifdef __cplusplus
+#define RRPUBLIC_DEC extern "C"
+#define RRPUBLIC_DEF extern "C"
+#else
+#define RRPUBLIC_DEC extern 
+#define RRPUBLIC_DEF
+#endif
+#endif
+
+#include <stdarg.h>  // for va_list()
+
+#ifndef RR_SPRINTF_MIN
+#define RR_SPRINTF_MIN 512 // how many characters per callback
+#endif
+typedef char * RRSPRINTFCB( char * buf, void * user, int len );
+
+#ifndef RR_SPRINTF_DECORATE
+#define RR_SPRINTF_DECORATE(name) rr##name  // define this before including if you want to change the names
+#endif
+
+#ifndef RR_SPRINTF_IMPLEMENTATION
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( vsprintf )( char * buf, char const * fmt, va_list va );
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( vsnprintf )( char * buf, int count, char const * fmt, va_list va );
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( sprintf ) ( char * buf, char const * fmt, ... );
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( snprintf )( char * buf, int count, char const * fmt, ... );
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( vsprintfcb )( RRSPRINTFCB * callback, void * user, char * buf, char const * fmt, va_list va );
+RRPUBLIC_DEF void RR_SPRINTF_DECORATE( setseparators )( char comma, char period );
+
+#else
+
+#include <stdlib.h>  // for va_arg()
+
+#define rU32 unsigned int
+#define rS32 signed int
+
+#ifdef _MSC_VER
+#define rU64 unsigned __int64
+#define rS64 signed __int64
+#else
+#define rU64 unsigned long long
+#define rS64 signed long long
+#endif
+#define rU16 unsigned short
+
+#ifndef rUINTa 
+#if defined(__ppc64__) || defined(__aarch64__) || defined(_M_X64) || defined(__x86_64__) || defined(__x86_64)
+#define rUINTa rU64
+#else
+#define rUINTa rU32
+#endif
+#endif
+
+#ifndef RR_SPRINTF_MSVC_MODE  // used for MSVC2013 and earlier (MSVC2015 matches GCC)
+#if defined(_MSC_VER) && (_MSC_VER<1900)
+#define RR_SPRINTF_MSVC_MODE
+#endif
+#endif
+
+#ifdef RR_SPRINTF_NOUNALIGNED  // define this before inclusion to force rrsprint to always use aligned accesses
+#define RR_UNALIGNED(code)
+#else
+#define RR_UNALIGNED(code) code
+#endif
+
+#ifndef RR_SPRINTF_NOFLOAT
+// internal float utility functions
+static rS32 rrreal_to_str( char const * * start, rU32 * len, char *out, rS32 * decimal_pos, double value, rU32 frac_digits );
+static rS32 rrreal_to_parts( rS64 * bits, rS32 * expo, double value );
+#define RRSPECIAL 0x7000
+#endif
+
+static char RRperiod='.';
+static char RRcomma=',';
+static char rrdiglookup[201]="00010203040506070809101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899";
+
+RRPUBLIC_DEF void RR_SPRINTF_DECORATE( setseparators )( char pcomma, char pperiod )
+{
+  RRperiod=pperiod;
+  RRcomma=pcomma;
+}
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( vsprintfcb )( RRSPRINTFCB * callback, void * user, char * buf, char const * fmt, va_list va )
+{
+  static char hex[]="0123456789abcdefxp";
+  static char hexu[]="0123456789ABCDEFXP";
+  char * bf;
+  char const * f;
+  int tlen = 0;
+
+  bf = buf;
+  f = fmt;
+  for(;;)
+  {
+    rS32 fw,pr,tz; rU32 fl;
+
+    #define LJ 1
+    #define LP 2
+    #define LS 4
+    #define LX 8
+    #define LZ 16
+    #define BI 32
+    #define CS 64
+    #define NG 128
+    #define KI 256
+    #define HW 512
+ 
+    // macros for the callback buffer stuff
+    #define chk_cb_bufL(bytes) { int len = (int)(bf-buf); if ((len+(bytes))>=RR_SPRINTF_MIN) { tlen+=len; if (0==(bf=buf=callback(buf,user,len))) goto done; } }
+    #define chk_cb_buf(bytes) { if ( callback ) { chk_cb_bufL(bytes); } }
+    #define flush_cb() { chk_cb_bufL(RR_SPRINTF_MIN-1); } //flush if there is even one byte in the buffer
+    #define cb_buf_clamp(cl,v) cl = v; if ( callback ) { int lg = RR_SPRINTF_MIN-(int)(bf-buf); if (cl>lg) cl=lg; }
+
+    // fast copy everything up to the next % (or end of string)
+    for(;;)
+    { 
+      while (((rUINTa)f)&3)
+      {
+       schk1: if (f[0]=='%') goto scandd;
+       schk2: if (f[0]==0) goto endfmt;
+        chk_cb_buf(1); *bf++=f[0]; ++f;
+      } 
+      for(;;)
+      { 
+        rU32 v,c;
+        v=*(rU32*)f; c=(~v)&0x80808080;
+        if ((v-0x26262626)&c) goto schk1; 
+        if ((v-0x01010101)&c) goto schk2; 
+        if (callback) if ((RR_SPRINTF_MIN-(int)(bf-buf))<4) goto schk1;
+        *(rU32*)bf=v; bf+=4; f+=4;
+      }
+    } scandd:
+
+    ++f;
+
+    // ok, we have a percent, read the modifiers first
+    fw = 0; pr = -1; fl = 0; tz = 0;
+    
+    // flags
+    for(;;)
+    {
+      switch(f[0])
+      {
+        // if we have left just
+        case '-': fl|=LJ; ++f; continue;
+        // if we have leading plus
+        case '+': fl|=LP; ++f; continue; 
+        // if we have leading space
+        case ' ': fl|=LS; ++f; continue; 
+        // if we have leading 0x
+        case '#': fl|=LX; ++f; continue; 
+        // if we have thousand commas
+        case '\'': fl|=CS; ++f; continue; 
+        // if we have kilo marker
+        case '$': fl|=KI; ++f; continue; 
+        // if we have leading zero
+        case '0': fl|=LZ; ++f; goto flags_done; 
+        default: goto flags_done;
+      }
+    }
+    flags_done:
+   
+    // get the field width
+    if ( f[0] == '*' ) {fw = va_arg(va,rU32); ++f;} else { while (( f[0] >= '0' ) && ( f[0] <= '9' )) { fw = fw * 10 + f[0] - '0'; f++; } }
+    // get the precision
+    if ( f[0]=='.' ) { ++f; if ( f[0] == '*' ) {pr = va_arg(va,rU32); ++f;} else { pr = 0; while (( f[0] >= '0' ) && ( f[0] <= '9' )) { pr = pr * 10 + f[0] - '0'; f++; } } } 
+    
+    // handle integer size overrides
+    switch(f[0])
+    {
+      // are we halfwidth?
+      case 'h': fl|=HW; ++f; break;
+      // are we 64-bit (unix style)
+      case 'l': ++f; if ( f[0]=='l') { fl|=BI; ++f; } break;
+      // are we 64-bit on intmax? (c99)
+      case 'j': fl|=BI; ++f; break; 
+      // are we 64-bit on size_t or ptrdiff_t? (c99)
+      case 'z': case 't': fl|=((sizeof(char*)==8)?BI:0); ++f; break; 
+      // are we 64-bit (msft style)
+      case 'I': if ( ( f[1]=='6') && ( f[2]=='4') ) { fl|=BI; f+=3; } else if ( ( f[1]=='3') && ( f[2]=='2') ) { f+=3; } else { fl|=((sizeof(void*)==8)?BI:0); ++f; } break;
+      default: break;
+    }
+
+    // handle each replacement
+    switch( f[0] )
+    {
+      #define NUMSZ 512 // big enough for e308 (with commas) or e-307 
+      char num[NUMSZ]; 
+      char lead[8]; 
+      char tail[8]; 
+      char *s;
+      char const *h;
+      rU32 l,n,cs;
+      rU64 n64;
+      #ifndef RR_SPRINTF_NOFLOAT      
+      double fv; 
+      #endif
+      rS32 dp; char const * sn;
+
+      case 's':
+        // get the string
+        s = va_arg(va,char*); if (s==0) s = (char*)"null";
+        // get the length
+        sn = s;
+        for(;;)
+        { 
+          if ((((rUINTa)sn)&3)==0) break;
+         lchk:
+          if (sn[0]==0) goto ld;
+          ++sn;
+        }
+        n = 0xffffffff;
+        if (pr>=0) { n=(rU32)(sn-s); if (n>=(rU32)pr) goto ld; n=((rU32)(pr-n))>>2; }
+        while(n) 
+        { 
+          rU32 v=*(rU32*)sn;
+          if ((v-0x01010101)&(~v)&0x80808080UL) goto lchk; 
+          sn+=4; 
+          --n;
+        }
+        goto lchk;
+       ld:
+
+        l = (rU32) ( sn - s );
+        // clamp to precision
+        if ( l > (rU32)pr ) l = pr;
+        lead[0]=0; tail[0]=0; pr = 0; dp = 0; cs = 0;
+        // copy the string in
+        goto scopy;
+
+      case 'c': // char
+        // get the character
+        s = num + NUMSZ -1; *s = (char)va_arg(va,int);
+        l = 1;
+        lead[0]=0; tail[0]=0; pr = 0; dp = 0; cs = 0;
+        goto scopy;
+
+      case 'n': // weird write-bytes specifier
+        { int * d = va_arg(va,int*);
+        *d = tlen + (int)( bf - buf ); }
+        break;
+
+#ifdef RR_SPRINTF_NOFLOAT
+      case 'A': // float
+      case 'a': // hex float
+      case 'G': // float
+      case 'g': // float
+      case 'E': // float
+      case 'e': // float
+      case 'f': // float
+        va_arg(va,double); // eat it
+        s = (char*)"No float";
+        l = 8;
+        lead[0]=0; tail[0]=0; pr = 0; dp = 0; cs = 0;
+        goto scopy;
+#else
+      case 'A': // float
+        h=hexu;  
+        goto hexfloat;
+
+      case 'a': // hex float
+        h=hex;
+       hexfloat: 
+        fv = va_arg(va,double);
+        if (pr==-1) pr=6; // default is 6
+        // read the double into a string
+        if ( rrreal_to_parts( (rS64*)&n64, &dp, fv ) )
+          fl |= NG;
+  
+        s = num+64;
+
+        // sign
+        lead[0]=0; if (fl&NG) { lead[0]=1; lead[1]='-'; } else if (fl&LS) { lead[0]=1; lead[1]=' '; } else if (fl&LP) { lead[0]=1; lead[1]='+'; };
+
+        if (dp==-1023) dp=(n64)?-1022:0; else n64|=(((rU64)1)<<52);
+        n64<<=(64-56);
+        if (pr<15) n64+=((((rU64)8)<<56)>>(pr*4));
+        // add leading chars
+        
+        #ifdef RR_SPRINTF_MSVC_MODE
+        *s++='0';*s++='x';
+        #else
+        lead[1+lead[0]]='0'; lead[2+lead[0]]='x'; lead[0]+=2;
+        #endif
+        *s++=h[(n64>>60)&15]; n64<<=4;
+        if ( pr ) *s++=RRperiod;
+        sn = s;
+
+        // print the bits
+        n = pr; if (n>13) n = 13; if (pr>(rS32)n) tz=pr-n; pr = 0;
+        while(n--) { *s++=h[(n64>>60)&15]; n64<<=4; }
+
+        // print the expo
+        tail[1]=h[17];
+        if (dp<0) { tail[2]='-'; dp=-dp;} else tail[2]='+';
+        n = (dp>=1000)?6:((dp>=100)?5:((dp>=10)?4:3));
+        tail[0]=(char)n;
+        for(;;) { tail[n]='0'+dp%10; if (n<=3) break; --n; dp/=10; }
+
+        dp = (int)(s-sn);
+        l = (int)(s-(num+64));
+        s = num+64;
+        cs = 1 + (3<<24);
+        goto scopy;
+
+      case 'G': // float
+        h=hexu;
+        goto dosmallfloat;
+
+      case 'g': // float
+        h=hex;
+       dosmallfloat:   
+        fv = va_arg(va,double);
+        if (pr==-1) pr=6; else if (pr==0) pr = 1; // default is 6
+        // read the double into a string
+        if ( rrreal_to_str( &sn, &l, num, &dp, fv, (pr-1)|0x80000000 ) )
+          fl |= NG;
+
+        // clamp the precision and delete extra zeros after clamp
+        n = pr;
+        if ( l > (rU32)pr ) l = pr; while ((l>1)&&(pr)&&(sn[l-1]=='0')) { --pr; --l; }
+
+        // should we use %e
+        if ((dp<=-4)||(dp>(rS32)n))
+        {
+          if ( pr > (rS32)l ) pr = l-1; else if ( pr ) --pr; // when using %e, there is one digit before the decimal
+          goto doexpfromg;
+        }
+        // this is the insane action to get the pr to match %g sematics for %f
+        if(dp>0) { pr=(dp<(rS32)l)?l-dp:0; } else { pr = -dp+((pr>(rS32)l)?l:pr); }
+        goto dofloatfromg;
+
+      case 'E': // float
+        h=hexu;  
+        goto doexp;
+
+      case 'e': // float
+        h=hex;
+       doexp:   
+        fv = va_arg(va,double);
+        if (pr==-1) pr=6; // default is 6
+        // read the double into a string
+        if ( rrreal_to_str( &sn, &l, num, &dp, fv, pr|0x80000000 ) )
+          fl |= NG;
+       doexpfromg: 
+        tail[0]=0; 
+        lead[0]=0; if (fl&NG) { lead[0]=1; lead[1]='-'; } else if (fl&LS) { lead[0]=1; lead[1]=' '; } else if (fl&LP) { lead[0]=1; lead[1]='+'; };
+        if ( dp == RRSPECIAL ) { s=(char*)sn; cs=0; pr=0; goto scopy; }
+        s=num+64; 
+        // handle leading chars
+        *s++=sn[0];
+
+        if (pr) *s++=RRperiod;
+
+        // handle after decimal
+        if ((l-1)>(rU32)pr) l=pr+1;
+        for(n=1;n<l;n++) *s++=sn[n];
+        // trailing zeros
+        tz = pr-(l-1); pr=0;
+        // dump expo
+        tail[1]=h[0xe];
+        dp -= 1;
+        if (dp<0) { tail[2]='-'; dp=-dp;} else tail[2]='+';
+        #ifdef RR_SPRINTF_MSVC_MODE
+        n = 5;
+        #else
+        n = (dp>=100)?5:4;
+        #endif
+        tail[0]=(char)n;
+        for(;;) { tail[n]='0'+dp%10; if (n<=3) break; --n; dp/=10; }
+        cs = 1 + (3<<24); // how many tens
+        goto flt_lead;   
+
+      case 'f': // float
+        fv = va_arg(va,double);
+       doafloat: 
+        // do kilos
+        if (fl&KI) {while(fl<0x4000000) { if ((fv<1024.0) && (fv>-1024.0)) break; fv/=1024.0; fl+=0x1000000; }} 
+        if (pr==-1) pr=6; // default is 6
+        // read the double into a string
+        if ( rrreal_to_str( &sn, &l, num, &dp, fv, pr ) )
+          fl |= NG;
+        dofloatfromg:
+        tail[0]=0;
+        // sign
+        lead[0]=0; if (fl&NG) { lead[0]=1; lead[1]='-'; } else if (fl&LS) { lead[0]=1; lead[1]=' '; } else if (fl&LP) { lead[0]=1; lead[1]='+'; };
+        if ( dp == RRSPECIAL ) { s=(char*)sn; cs=0; pr=0; goto scopy; }
+        s=num+64; 
+
+        // handle the three decimal varieties
+        if (dp<=0) 
+        { 
+          rS32 i;
+          // handle 0.000*000xxxx
+          *s++='0'; if (pr) *s++=RRperiod; 
+          n=-dp; if((rS32)n>pr) n=pr; i=n; while(i) { if ((((rUINTa)s)&3)==0) break; *s++='0'; --i; } while(i>=4) { *(rU32*)s=0x30303030; s+=4; i-=4; } while(i) { *s++='0'; --i; }
+          if ((rS32)(l+n)>pr) l=pr-n; i=l; while(i) { *s++=*sn++; --i; }
+          tz = pr-(n+l);
+          cs = 1 + (3<<24); // how many tens did we write (for commas below)
+        }
+        else
+        {
+          cs = (fl&CS)?((600-(rU32)dp)%3):0;
+          if ((rU32)dp>=l)
+          {
+            // handle xxxx000*000.0
+            n=0; for(;;) { if ((fl&CS) && (++cs==4)) { cs = 0; *s++=RRcomma; } else { *s++=sn[n]; ++n; if (n>=l) break; } }
+            if (n<(rU32)dp)
+            {
+              n = dp - n;
+              if ((fl&CS)==0) { while(n) { if ((((rUINTa)s)&3)==0) break; *s++='0'; --n; }  while(n>=4) { *(rU32*)s=0x30303030; s+=4; n-=4; } }
+              while(n) { if ((fl&CS) && (++cs==4)) { cs = 0; *s++=RRcomma; } else { *s++='0'; --n; } }
+            }
+            cs = (int)(s-(num+64)) + (3<<24); // cs is how many tens
+            if (pr) { *s++=RRperiod; tz=pr;}
+          }
+          else
+          {
+            // handle xxxxx.xxxx000*000
+            n=0; for(;;) { if ((fl&CS) && (++cs==4)) { cs = 0; *s++=RRcomma; } else { *s++=sn[n]; ++n; if (n>=(rU32)dp) break; } }
+            cs = (int)(s-(num+64)) + (3<<24); // cs is how many tens
+            if (pr) *s++=RRperiod;
+            if ((l-dp)>(rU32)pr) l=pr+dp;
+            while(n<l) { *s++=sn[n]; ++n; }
+            tz = pr-(l-dp);
+          }
+        }
+        pr = 0;
+        
+        // handle k,m,g,t
+        if (fl&KI) { tail[0]=1; tail[1]=' '; { if (fl>>24) { tail[2]="_kmgt"[fl>>24]; tail[0]=2; } } };
+
+        flt_lead:
+        // get the length that we copied
+        l = (rU32) ( s-(num+64) );
+        s=num+64; 
+        goto scopy;
+#endif
+
+      case 'B': // upper binary
+        h = hexu;
+        goto binary;
+
+      case 'b': // lower binary
+        h = hex;
+        binary:
+        lead[0]=0;
+        if (fl&LX) { lead[0]=2;lead[1]='0';lead[2]=h[0xb]; }
+        l=(8<<4)|(1<<8);
+        goto radixnum;
+
+      case 'o': // octal
+        h = hexu;
+        lead[0]=0;
+        if (fl&LX) { lead[0]=1;lead[1]='0'; }
+        l=(3<<4)|(3<<8);
+        goto radixnum;
+
+      case 'p': // pointer
+        fl |= (sizeof(void*)==8)?BI:0;
+        pr = sizeof(void*)*2;
+        fl &= ~LZ; // 'p' only prints the pointer with zeros
+        // drop through to X
+      
+      case 'X': // upper binary
+        h = hexu;
+        goto dohexb;
+
+      case 'x': // lower binary
+        h = hex; dohexb:
+        l=(4<<4)|(4<<8);
+        lead[0]=0;
+        if (fl&LX) { lead[0]=2;lead[1]='0';lead[2]=h[16]; }
+       radixnum: 
+        // get the number
+        if ( fl&BI )
+          n64 = va_arg(va,rU64);
+        else
+          n64 = va_arg(va,rU32);
+
+        s = num + NUMSZ; dp = 0;
+        // clear tail, and clear leading if value is zero
+        tail[0]=0; if (n64==0) { lead[0]=0; if (pr==0) { l=0; cs = ( ((l>>4)&15)) << 24; goto scopy; } }
+        // convert to string
+        for(;;) { *--s = h[n64&((1<<(l>>8))-1)]; n64>>=(l>>8); if ( ! ( (n64) || ((rS32) ( (num+NUMSZ) - s ) < pr ) ) ) break; if ( fl&CS) { ++l; if ((l&15)==((l>>4)&15)) { l&=~15; *--s=RRcomma; } } };
+        // get the tens and the comma pos
+        cs = (rU32) ( (num+NUMSZ) - s ) + ( ( ((l>>4)&15)) << 24 );
+        // get the length that we copied
+        l = (rU32) ( (num+NUMSZ) - s );
+        // copy it
+        goto scopy;
+
+      case 'u': // unsigned
+      case 'i':
+      case 'd': // integer
+        // get the integer and abs it
+        if ( fl&BI )
+        {
+          rS64 i64 = va_arg(va,rS64); n64 = (rU64)i64; if ((f[0]!='u') && (i64<0)) { n64=(rU64)-i64; fl|=NG; }
+        }
+        else
+        {
+          rS32 i = va_arg(va,rS32); n64 = (rU32)i; if ((f[0]!='u') && (i<0)) { n64=(rU32)-i; fl|=NG; }
+        }
+
+        #ifndef RR_SPRINTF_NOFLOAT
+        if (fl&KI) { if (n64<1024) pr=0; else if (pr==-1) pr=1; fv=(double)(rS64)n64; goto doafloat; } 
+        #endif
+
+        // convert to string
+        s = num+NUMSZ; l=0; 
+        
+        for(;;)
+        {
+          // do in 32-bit chunks (avoid lots of 64-bit divides even with constant denominators)
+          char * o=s-8;
+          if (n64>=100000000) { n = (rU32)( n64 % 100000000);  n64 /= 100000000; } else {n = (rU32)n64; n64 = 0; }
+          if((fl&CS)==0) { while(n) { s-=2; *(rU16*)s=*(rU16*)&rrdiglookup[(n%100)*2]; n/=100; } }
+          while (n) { if ( ( fl&CS) && (l++==3) ) { l=0; *--s=RRcomma; --o; } else { *--s=(char)(n%10)+'0'; n/=10; } }
+          if (n64==0) { if ((s[0]=='0') && (s!=(num+NUMSZ))) ++s; break; }
+          while (s!=o) if ( ( fl&CS) && (l++==3) ) { l=0; *--s=RRcomma; --o; } else { *--s='0'; }
+        }
+
+        tail[0]=0;
+        // sign
+        lead[0]=0; if (fl&NG) { lead[0]=1; lead[1]='-'; } else if (fl&LS) { lead[0]=1; lead[1]=' '; } else if (fl&LP) { lead[0]=1; lead[1]='+'; };
+
+        // get the length that we copied
+        l = (rU32) ( (num+NUMSZ) - s ); if ( l == 0 ) { *--s='0'; l = 1; }
+        cs = l + (3<<24);
+        if (pr<0) pr = 0;
+
+       scopy: 
+        // get fw=leading/trailing space, pr=leading zeros
+        if (pr<(rS32)l) pr = l;
+        n = pr + lead[0] + tail[0] + tz;
+        if (fw<(rS32)n) fw = n;
+        fw -= n;
+        pr -= l;
+
+        // handle right justify and leading zeros
+        if ( (fl&LJ)==0 )
+        {
+          if (fl&LZ) // if leading zeros, everything is in pr
+          { 
+            pr = (fw>pr)?fw:pr;
+            fw = 0;
+          }
+          else
+          {
+            fl &= ~CS; // if no leading zeros, then no commas
+          }
+        }
+
+        // copy the spaces and/or zeros
+        if (fw+pr)
+        {
+          rS32 i; rU32 c; 
+
+          // copy leading spaces (or when doing %8.4d stuff)
+          if ( (fl&LJ)==0 ) while(fw>0) { cb_buf_clamp(i,fw); fw -= i; while(i) { if ((((rUINTa)bf)&3)==0) break; *bf++=' '; --i; } while(i>=4) { *(rU32*)bf=0x20202020; bf+=4; i-=4; } while (i) {*bf++=' '; --i;} chk_cb_buf(1); }
+        
+          // copy leader
+          sn=lead+1; while(lead[0]) { cb_buf_clamp(i,lead[0]); lead[0] -= (char)i; while (i) {*bf++=*sn++; --i;} chk_cb_buf(1); }
+          
+          // copy leading zeros
+          c = cs >> 24; cs &= 0xffffff;
+          cs = (fl&CS)?((rU32)(c-((pr+cs)%(c+1)))):0;
+          while(pr>0) { cb_buf_clamp(i,pr); pr -= i; if((fl&CS)==0) { while(i) { if ((((rUINTa)bf)&3)==0) break; *bf++='0'; --i; } while(i>=4) { *(rU32*)bf=0x30303030; bf+=4; i-=4; } } while (i) { if((fl&CS) && (cs++==c)) { cs = 0; *bf++=RRcomma; } else *bf++='0'; --i; } chk_cb_buf(1); }
+        }
+
+        // copy leader if there is still one
+        sn=lead+1; while(lead[0]) { rS32 i; cb_buf_clamp(i,lead[0]); lead[0] -= (char)i; while (i) {*bf++=*sn++; --i;} chk_cb_buf(1); }
+
+        // copy the string
+        n = l; while (n) { rS32 i; cb_buf_clamp(i,n); n-=i; RR_UNALIGNED( while(i>=4) { *(rU32*)bf=*(rU32*)s; bf+=4; s+=4; i-=4; } ) while (i) {*bf++=*s++; --i;} chk_cb_buf(1); }
+
+        // copy trailing zeros
+        while(tz) { rS32 i; cb_buf_clamp(i,tz); tz -= i; while(i) { if ((((rUINTa)bf)&3)==0) break; *bf++='0'; --i; } while(i>=4) { *(rU32*)bf=0x30303030; bf+=4; i-=4; } while (i) {*bf++='0'; --i;} chk_cb_buf(1); }
+
+        // copy tail if there is one
+        sn=tail+1; while(tail[0]) { rS32 i; cb_buf_clamp(i,tail[0]); tail[0] -= (char)i; while (i) {*bf++=*sn++; --i;} chk_cb_buf(1); }
+
+        // handle the left justify
+        if (fl&LJ) if (fw>0) { while (fw) { rS32 i; cb_buf_clamp(i,fw); fw-=i; while(i) { if ((((rUINTa)bf)&3)==0) break; *bf++=' '; --i; } while(i>=4) { *(rU32*)bf=0x20202020; bf+=4; i-=4; } while (i--) *bf++=' '; chk_cb_buf(1); } }
+        break;
+
+      default: // unknown, just copy code
+        s = num + NUMSZ -1; *s = f[0];
+        l = 1;
+        fw=pr=fl=0;
+        lead[0]=0; tail[0]=0; pr = 0; dp = 0; cs = 0;
+        goto scopy;
+    }
+    ++f;
+  }
+ endfmt:
+
+  if (!callback) 
+    *bf = 0;
+  else
+    flush_cb();
+ 
+ done:
+  return tlen + (int)(bf-buf);
+}
+
+// cleanup
+#undef LJ
+#undef LP
+#undef LS
+#undef LX
+#undef LZ
+#undef BI
+#undef CS
+#undef NG
+#undef KI
+#undef NUMSZ
+#undef chk_cb_bufL
+#undef chk_cb_buf
+#undef flush_cb
+#undef cb_buf_clamp
+
+// ============================================================================
+//   wrapper functions
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( sprintf )( char * buf, char const * fmt, ... )
+{
+  va_list va;
+  va_start( va, fmt );
+  return RR_SPRINTF_DECORATE( vsprintfcb )( 0, 0, buf, fmt, va );
+}
+
+typedef struct RRCCS
+{
+  char * buf;
+  int count;
+  char tmp[ RR_SPRINTF_MIN ];
+} RRCCS;
+
+static char * rrclampcallback( char * buf, void * user, int len )
+{
+  RRCCS * c = (RRCCS*)user;
+
+  if ( len > c->count ) len = c->count;
+
+  if (len)
+  {
+    if ( buf != c->buf )
+    {
+      char * s, * d, * se;
+      d = c->buf; s = buf; se = buf+len;
+      do{ *d++ = *s++; } while (s<se);
+    }
+    c->buf += len;
+    c->count -= len;
+  }
+  
+  if ( c->count <= 0 ) return 0;
+  return ( c->count >= RR_SPRINTF_MIN ) ? c->buf : c->tmp; // go direct into buffer if you can
+}
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( vsnprintf )( char * buf, int count, char const * fmt, va_list va )
+{
+  RRCCS c;
+  int l;
+
+  if ( count == 0 )
+    return 0;
+
+  c.buf = buf;
+  c.count = count;
+
+  RR_SPRINTF_DECORATE( vsprintfcb )( rrclampcallback, &c, rrclampcallback(0,&c,0), fmt, va );
+  
+  // zero-terminate
+  l = (int)( c.buf - buf );
+  if ( l >= count ) // should never be greater, only equal (or less) than count
+    l = count - 1;
+  buf[l] = 0;
+
+  return l;
+}
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( snprintf )( char * buf, int count, char const * fmt, ... )
+{
+  va_list va;
+  va_start( va, fmt );
+
+  return RR_SPRINTF_DECORATE( vsnprintf )( buf, count, fmt, va );
+}
+
+RRPUBLIC_DEF int RR_SPRINTF_DECORATE( vsprintf )( char * buf, char const * fmt, va_list va )
+{
+  return RR_SPRINTF_DECORATE( vsprintfcb )( 0, 0, buf, fmt, va );
+}
+
+// =======================================================================
+//   low level float utility functions
+
+#ifndef RR_SPRINTF_NOFLOAT
+
+ // copies d to bits w/ strict aliasing (this compiles to nothing on /Ox)
+ #define RRCOPYFP(dest,src) { int cn; for(cn=0;cn<8;cn++) ((char*)&dest)[cn]=((char*)&src)[cn]; }
+ 
+// get float info
+static rS32 rrreal_to_parts( rS64 * bits, rS32 * expo, double value )
+{
+  double d;
+  rS64 b = 0;
+
+  // load value and round at the frac_digits
+  d = value;
+
+  RRCOPYFP( b, d );
+
+  *bits = b & ((((rU64)1)<<52)-1);
+  *expo = ((b >> 52) & 2047)-1023;
+    
+  return (rS32)(b >> 63);
+}
+
+static double const rrbot[23]={1e+000,1e+001,1e+002,1e+003,1e+004,1e+005,1e+006,1e+007,1e+008,1e+009,1e+010,1e+011,1e+012,1e+013,1e+014,1e+015,1e+016,1e+017,1e+018,1e+019,1e+020,1e+021,1e+022};
+static double const rrnegbot[22]={1e-001,1e-002,1e-003,1e-004,1e-005,1e-006,1e-007,1e-008,1e-009,1e-010,1e-011,1e-012,1e-013,1e-014,1e-015,1e-016,1e-017,1e-018,1e-019,1e-020,1e-021,1e-022};
+static double const rrnegboterr[22]={-5.551115123125783e-018,-2.0816681711721684e-019,-2.0816681711721686e-020,-4.7921736023859299e-021,-8.1803053914031305e-022,4.5251888174113741e-023,4.5251888174113739e-024,-2.0922560830128471e-025,-6.2281591457779853e-026,-3.6432197315497743e-027,6.0503030718060191e-028,2.0113352370744385e-029,-3.0373745563400371e-030,1.1806906454401013e-032,-7.7705399876661076e-032,2.0902213275965398e-033,-7.1542424054621921e-034,-7.1542424054621926e-035,2.4754073164739869e-036,5.4846728545790429e-037,9.2462547772103625e-038,-4.8596774326570872e-039};
+static double const rrtop[13]={1e+023,1e+046,1e+069,1e+092,1e+115,1e+138,1e+161,1e+184,1e+207,1e+230,1e+253,1e+276,1e+299};
+static double const rrnegtop[13]={1e-023,1e-046,1e-069,1e-092,1e-115,1e-138,1e-161,1e-184,1e-207,1e-230,1e-253,1e-276,1e-299};
+static double const rrtoperr[13]={8388608,6.8601809640529717e+028,-7.253143638152921e+052,-4.3377296974619174e+075,-1.5559416129466825e+098,-3.2841562489204913e+121,-3.7745893248228135e+144,-1.7356668416969134e+167,-3.8893577551088374e+190,-9.9566444326005119e+213,6.3641293062232429e+236,-5.2069140800249813e+259,-5.2504760255204387e+282};
+static double const rrnegtoperr[13]={3.9565301985100693e-040,-2.299904345391321e-063,3.6506201437945798e-086,1.1875228833981544e-109,-5.0644902316928607e-132,-6.7156837247865426e-155,-2.812077463003139e-178,-5.7778912386589953e-201,7.4997100559334532e-224,-4.6439668915134491e-247,-6.3691100762962136e-270,-9.436808465446358e-293,8.0970921678014997e-317};
+
+#if defined(_MSC_VER) && (_MSC_VER<=1200)                                                                                                                                                                                       
+static rU64 const rrpot[20]={1,10,100,1000, 10000,100000,1000000,10000000, 100000000,1000000000,10000000000,100000000000,  1000000000000,10000000000000,100000000000000,1000000000000000,  10000000000000000,100000000000000000,1000000000000000000,10000000000000000000U };
+#define rrtento19th ((rU64)1000000000000000000)
+#else
+static rU64 const rrpot[20]={1,10,100,1000, 10000,100000,1000000,10000000, 100000000,1000000000,10000000000ULL,100000000000ULL,  1000000000000ULL,10000000000000ULL,100000000000000ULL,1000000000000000ULL,  10000000000000000ULL,100000000000000000ULL,1000000000000000000ULL,10000000000000000000ULL };
+#define rrtento19th (1000000000000000000ULL)
+#endif
+
+#define rrddmulthi(oh,ol,xh,yh) \
+{ \
+  double ahi=0,alo,bhi=0,blo; \
+  rS64 bt; \
+  oh = xh * yh; \
+  RRCOPYFP(bt,xh); bt&=((~(rU64)0)<<27); RRCOPYFP(ahi,bt); alo = xh-ahi; \
+  RRCOPYFP(bt,yh); bt&=((~(rU64)0)<<27); RRCOPYFP(bhi,bt); blo = yh-bhi; \
+  ol = ((ahi*bhi-oh)+ahi*blo+alo*bhi)+alo*blo; \
+}
+
+#define rrddtoS64(ob,xh,xl) \
+{ \
+  double ahi=0,alo,vh,t;\
+  ob = (rS64)ph;\
+  vh=(double)ob;\
+  ahi = ( xh - vh );\
+  t = ( ahi - xh );\
+  alo = (xh-(ahi-t))-(vh+t);\
+  ob += (rS64)(ahi+alo+xl);\
+}
+
+
+#define rrddrenorm(oh,ol) { double s; s=oh+ol; ol=ol-(s-oh); oh=s; }
+
+#define rrddmultlo(oh,ol,xh,xl,yh,yl) \
+  ol = ol + ( xh*yl + xl*yh ); \
+
+#define rrddmultlos(oh,ol,xh,yl) \
+  ol = ol + ( xh*yl ); \
+
+static void rrraise_to_power10( double *ohi, double *olo, double d, rS32 power )  // power can be -323 to +350
+{
+  double ph, pl;
+  if ((power>=0) && (power<=22))
+  {
+    rrddmulthi(ph,pl,d,rrbot[power]);
+  }
+  else
+  {
+    rS32 e,et,eb;
+    double p2h,p2l;
+
+    e=power; if (power<0) e=-e; 
+    et = (e*0x2c9)>>14;/* %23 */ if (et>13) et=13; eb = e-(et*23);
+
+    ph = d; pl = 0.0;
+    if (power<0)
+    {
+      if (eb) { --eb; rrddmulthi(ph,pl,d,rrnegbot[eb]); rrddmultlos(ph,pl,d,rrnegboterr[eb]); }
+      if (et)
+      { 
+        rrddrenorm(ph,pl);
+        --et; rrddmulthi(p2h,p2l,ph,rrnegtop[et]); rrddmultlo(p2h,p2l,ph,pl,rrnegtop[et],rrnegtoperr[et]); ph=p2h;pl=p2l;
+      }
+    }
+    else
+    {
+      if (eb) 
+      { 
+        e = eb; if (eb>22) eb=22; e -= eb;
+        rrddmulthi(ph,pl,d,rrbot[eb]); 
+        if ( e ) { rrddrenorm(ph,pl); rrddmulthi(p2h,p2l,ph,rrbot[e]); rrddmultlos(p2h,p2l,rrbot[e],pl); ph=p2h;pl=p2l; }
+      }
+      if (et)
+      {
+        rrddrenorm(ph,pl);
+        --et; rrddmulthi(p2h,p2l,ph,rrtop[et]); rrddmultlo(p2h,p2l,ph,pl,rrtop[et],rrtoperr[et]); ph=p2h;pl=p2l;
+      }
+    }
+  }
+  rrddrenorm(ph,pl);
+  *ohi = ph; *olo = pl;
+}
+
+// given a float value, returns the significant bits in bits, and the position of the
+//   decimal point in decimal_pos.  +/-INF and NAN are specified by special values
+//   returned in the decimal_pos parameter.
+// frac_digits is absolute normally, but if you want from first significant digits (got %g and %e), or in 0x80000000
+static rS32 rrreal_to_str( char const * * start, rU32 * len, char *out, rS32 * decimal_pos, double value, rU32 frac_digits )
+{
+  double d;
+  rS64 bits = 0;
+  rS32 expo, e, ng, tens;
+
+  d = value;
+  RRCOPYFP(bits,d);
+  expo = (bits >> 52) & 2047;
+  ng = (rS32)(bits >> 63);
+  if (ng) d=-d;
+
+  if ( expo == 2047 ) // is nan or inf?
+  {
+    *start = (bits&((((rU64)1)<<52)-1)) ? "NaN" : "Inf";
+    *decimal_pos =  RRSPECIAL;
+    *len = 3;
+    return ng;
+  } 
+
+  if ( expo == 0 ) // is zero or denormal
+  {
+    if ((bits<<1)==0) // do zero
+    {
+      *decimal_pos = 1; 
+      *start = out;
+      out[0] = '0'; *len = 1;
+      return ng;
+    }
+    // find the right expo for denormals
+    {
+      rS64 v = ((rU64)1)<<51;
+      while ((bits&v)==0) { --expo; v >>= 1; }
+    }
+  }
+
+  // find the decimal exponent as well as the decimal bits of the value
+  {
+    double ph,pl;
+
+    // log10 estimate - very specifically tweaked to hit or undershoot by no more than 1 of log10 of all expos 1..2046
+    tens=expo-1023; tens = (tens<0)?((tens*617)/2048):(((tens*1233)/4096)+1);
+
+    // move the significant bits into position and stick them into an int 
+    rrraise_to_power10( &ph, &pl, d, 18-tens );
+
+    // get full as much precision from double-double as possible
+    rrddtoS64( bits, ph,pl );
+
+    // check if we undershot
+    if ( ((rU64)bits) >= rrtento19th ) ++tens; 
+  }
+
+  // now do the rounding in integer land
+  frac_digits = ( frac_digits & 0x80000000 ) ? ( (frac_digits&0x7ffffff) + 1 ) : ( tens + frac_digits );
+  if ( ( frac_digits < 24 ) )
+  {
+    rU32 dg = 1; if ((rU64)bits >= rrpot[9] ) dg=10; while( (rU64)bits >= rrpot[dg] ) { ++dg; if (dg==20) goto noround; }
+    if ( frac_digits < dg )
+    {
+      rU64 r;
+      // add 0.5 at the right position and round
+      e = dg - frac_digits;
+      if ( (rU32)e >= 24 ) goto noround;
+      r = rrpot[e];
+      bits = bits + (r/2);
+      if ( (rU64)bits >= rrpot[dg] ) ++tens;
+      bits /= r;
+    } 
+    noround:;
+  }
+
+  // kill long trailing runs of zeros
+  if ( bits )
+  {
+    rU32 n; for(;;) { if ( bits<=0xffffffff ) break; if (bits%1000) goto donez; bits/=1000; } n = (rU32)bits; while ((n%1000)==0) n/=1000; bits=n; donez:;
+  }
+
+  // convert to string
+  out += 64;
+  e = 0; 
+  for(;;)
+  {
+    rU32 n;
+    char * o = out-8;
+    // do the conversion in chunks of U32s (avoid most 64-bit divides, worth it, constant denomiators be damned)
+    if (bits>=100000000) { n = (rU32)( bits % 100000000);  bits /= 100000000; } else {n = (rU32)bits; bits = 0; }
+    while(n) { out-=2; *(rU16*)out=*(rU16*)&rrdiglookup[(n%100)*2]; n/=100; e+=2; }
+    if (bits==0) { if ((e) && (out[0]=='0')) { ++out; --e; } break; }
+    while( out!=o ) { *--out ='0'; ++e; }
+  }
+  
+  *decimal_pos = tens;
+  *start = out;
+  *len = e;
+  return ng;
+}
+
+#undef rrddmulthi
+#undef rrddrenorm
+#undef rrddmultlo
+#undef rrddmultlos
+#undef RRSPECIAL 
+#undef RRCOPYFP
+ 
+#endif
+
+// clean up
+#undef rU16
+#undef rU32 
+#undef rS32 
+#undef rU64
+#undef rS64
+#undef RRPUBLIC_DEC
+#undef RRPUBLIC_DEF
+#undef RR_SPRINTF_DECORATE
+#undef RR_UNALIGNED
+
+#endif
+
+#endif
diff --git a/vendor/stb/deprecated/stb.h b/vendor/stb/deprecated/stb.h
new file mode 100644
index 0000000..1633c3b
--- /dev/null
+++ b/vendor/stb/deprecated/stb.h
@@ -0,0 +1,13111 @@
+/* stb.h - v2.37 - Sean's Tool Box -- public domain -- http://nothings.org/stb.h
+          no warranty is offered or implied; use this code at your own risk
+
+   This is a single header file with a bunch of useful utilities
+   for getting stuff done in C/C++.
+
+   Documentation: http://nothings.org/stb/stb_h.html
+   Unit tests:    http://nothings.org/stb/stb.c
+
+ ============================================================================
+   You MUST
+
+      #define STB_DEFINE
+
+   in EXACTLY _one_ C or C++ file that includes this header, BEFORE the
+   include, like this:
+
+      #define STB_DEFINE
+      #include "stb.h"
+
+   All other files should just #include "stb.h" without the #define.
+ ============================================================================
+
+Version History
+
+   2.36   various fixes
+   2.35   fix clang-cl issues with swprintf
+   2.34   fix warnings
+   2.33   more fixes to random numbers
+   2.32   stb_intcmprev, stb_uidict, fix random numbers on Linux
+   2.31   stb_ucharcmp
+   2.30   MinGW fix
+   2.29   attempt to fix use of swprintf()
+   2.28   various new functionality
+   2.27   test _WIN32 not WIN32 in STB_THREADS
+   2.26   various warning & bugfixes
+   2.25   various warning & bugfixes
+   2.24   various warning & bugfixes
+   2.23   fix 2.22
+   2.22   64-bit fixes from '!='; fix stb_sdict_copy() to have preferred name
+   2.21   utf-8 decoder rejects "overlong" encodings; attempted 64-bit improvements
+   2.20   fix to hash "copy" function--reported by someone with handle "!="
+   2.19   ???
+   2.18   stb_readdir_subdirs_mask
+   2.17   stb_cfg_dir
+   2.16   fix stb_bgio_, add stb_bgio_stat(); begin a streaming wrapper
+   2.15   upgraded hash table template to allow:
+            - aggregate keys (explicit comparison func for EMPTY and DEL keys)
+            - "static" implementations (so they can be culled if unused)
+   2.14   stb_mprintf
+   2.13   reduce identifiable strings in STB_NO_STB_STRINGS
+   2.12   fix STB_ONLY -- lots of uint32s, TRUE/FALSE things had crept in
+   2.11   fix bug in stb_dirtree_get() which caused "c://path" sorts of stuff
+   2.10   STB_F(), STB_I() inline constants (also KI,KU,KF,KD)
+   2.09   stb_box_face_vertex_axis_side
+   2.08   bugfix stb_trimwhite()
+   2.07   colored printing in windows (why are we in 1985?)
+   2.06   comparison functions are now functions-that-return-functions and
+          accept a struct-offset as a parameter (not thread-safe)
+   2.05   compile and pass tests under Linux (but no threads); thread cleanup
+   2.04   stb_cubic_bezier_1d, smoothstep, avoid dependency on registry
+   2.03   ?
+   2.02   remove integrated documentation
+   2.01   integrate various fixes; stb_force_uniprocessor
+   2.00   revised stb_dupe to use multiple hashes
+   1.99   stb_charcmp
+   1.98   stb_arr_deleten, stb_arr_insertn
+   1.97   fix stb_newell_normal()
+   1.96   stb_hash_number()
+   1.95   hack stb__rec_max; clean up recursion code to use new functions
+   1.94   stb_dirtree; rename stb_extra to stb_ptrmap
+   1.93   stb_sem_new() API cleanup (no blockflag-starts blocked; use 'extra')
+   1.92   stb_threadqueue--multi reader/writer queue, fixed size or resizeable
+   1.91   stb_bgio_* for reading disk asynchronously
+   1.90   stb_mutex uses CRITICAL_REGION; new stb_sync primitive for thread
+          joining; workqueue supports stb_sync instead of stb_semaphore
+   1.89   support ';' in constant-string wildcards; stb_mutex wrapper (can
+          implement with EnterCriticalRegion eventually)
+   1.88   portable threading API (only for win32 so far); worker thread queue
+   1.87   fix wildcard handling in stb_readdir_recursive
+   1.86   support ';' in wildcards
+   1.85   make stb_regex work with non-constant strings;
+               beginnings of stb_introspect()
+   1.84   (forgot to make notes)
+   1.83   whoops, stb_keep_if_different wasn't deleting the temp file
+   1.82   bring back stb_compress from stb_file.h for cmirror
+   1.81   various bugfixes, STB_FASTMALLOC_INIT inits FASTMALLOC in release
+   1.80   stb_readdir returns utf8; write own utf8-utf16 because lib was wrong
+   1.79   stb_write
+   1.78   calloc() support for malloc wrapper, STB_FASTMALLOC
+   1.77   STB_FASTMALLOC
+   1.76   STB_STUA - Lua-like language; (stb_image, stb_csample, stb_bilinear)
+   1.75   alloc/free array of blocks; stb_hheap bug; a few stb_ps_ funcs;
+          hash*getkey, hash*copy; stb_bitset; stb_strnicmp; bugfix stb_bst
+   1.74   stb_replaceinplace; use stdlib C function to convert utf8 to UTF-16
+   1.73   fix performance bug & leak in stb_ischar (C++ port lost a 'static')
+   1.72   remove stb_block, stb_block_manager, stb_decompress (to stb_file.h)
+   1.71   stb_trimwhite, stb_tokens_nested, etc.
+   1.70   back out 1.69 because it might problemize mixed builds; stb_filec()
+   1.69   (stb_file returns 'char *' in C++)
+   1.68   add a special 'tree root' data type for stb_bst; stb_arr_end
+   1.67   full C++ port. (stb_block_manager)
+   1.66   stb_newell_normal
+   1.65   stb_lex_item_wild -- allow wildcard items which MUST match entirely
+   1.64   stb_data
+   1.63   stb_log_name
+   1.62   stb_define_sort; C++ cleanup
+   1.61   stb_hash_fast -- Paul Hsieh's hash function (beats Bob Jenkins'?)
+   1.60   stb_delete_directory_recursive
+   1.59   stb_readdir_recursive
+   1.58   stb_bst variant with parent pointer for O(1) iteration, not O(log N)
+   1.57   replace LCG random with Mersenne Twister (found a public domain one)
+   1.56   stb_perfect_hash, stb_ischar, stb_regex
+   1.55   new stb_bst API allows multiple BSTs per node (e.g. secondary keys)
+   1.54   bugfix: stb_define_hash, stb_wildmatch, regexp
+   1.53   stb_define_hash; recoded stb_extra, stb_sdict use it
+   1.52   stb_rand_define, stb_bst, stb_reverse
+   1.51   fix 'stb_arr_setlen(NULL, 0)'
+   1.50   stb_wordwrap
+   1.49   minor improvements to enable the scripting language
+   1.48   better approach for stb_arr using stb_malloc; more invasive, clearer
+   1.47   stb_lex (lexes stb.h at 1.5ML/s on 3Ghz P4; 60/70% of optimal/flex)
+   1.46   stb_wrapper_*, STB_MALLOC_WRAPPER
+   1.45   lightly tested DFA acceleration of regexp searching
+   1.44   wildcard matching & searching; regexp matching & searching
+   1.43   stb_temp
+   1.42   allow stb_arr to use stb_malloc/realloc; note this is global
+   1.41   make it compile in C++; (disable stb_arr in C++)
+   1.40   stb_dupe tweak; stb_swap; stb_substr
+   1.39   stb_dupe; improve stb_file_max to be less stupid
+   1.38   stb_sha1_file: generate sha1 for file, even > 4GB
+   1.37   stb_file_max; partial support for utf8 filenames in Windows
+   1.36   remove STB__NO_PREFIX - poor interaction with IDE, not worth it
+          streamline stb_arr to make it separately publishable
+   1.35   bugfixes for stb_sdict, stb_malloc(0), stristr
+   1.34   (streaming interfaces for stb_compress)
+   1.33   stb_alloc; bug in stb_getopt; remove stb_overflow
+   1.32   (stb_compress returns, smaller&faster; encode window & 64-bit len)
+   1.31   stb_prefix_count
+   1.30   (STB__NO_PREFIX - remove stb_ prefixes for personal projects)
+   1.29   stb_fput_varlen64, etc.
+   1.28   stb_sha1
+   1.27   ?
+   1.26   stb_extra
+   1.25   ?
+   1.24   stb_copyfile
+   1.23   stb_readdir
+   1.22   ?
+   1.21   ?
+   1.20   ?
+   1.19   ?
+   1.18   ?
+   1.17   ?
+   1.16   ?
+   1.15   stb_fixpath, stb_splitpath, stb_strchr2
+   1.14   stb_arr
+   1.13   ?stb, stb_log, stb_fatal
+   1.12   ?stb_hash2
+   1.11   miniML
+   1.10   stb_crc32, stb_adler32
+   1.09   stb_sdict
+   1.08   stb_bitreverse, stb_ispow2, stb_big32
+          stb_fopen, stb_fput_varlen, stb_fput_ranged
+          stb_fcmp, stb_feq
+   1.07   (stb_encompress)
+   1.06   stb_compress
+   1.05   stb_tokens, (stb_hheap)
+   1.04   stb_rand
+   1.03   ?(s-strings)
+   1.02   ?stb_filelen, stb_tokens
+   1.01   stb_tolower
+   1.00   stb_hash, stb_intcmp
+          stb_file, stb_stringfile, stb_fgets
+          stb_prefix, stb_strlower, stb_strtok
+          stb_image
+          (stb_array), (stb_arena)
+
+Parenthesized items have since been removed.
+
+LICENSE
+
+ See end of file for license information.
+
+CREDITS
+
+ Written by Sean Barrett.
+
+ Fixes:
+  Philipp Wiesemann
+  Robert Nix
+  r-lyeh
+  blackpawn
+  github:Mojofreem
+  Ryan Whitworth
+  Vincent Isambart
+  Mike Sartain
+  Eugene Opalev
+  Tim Sjostrand
+  github:infatum
+  Dave Butler (Croepha)
+  Ethan Lee (flibitijibibo)
+  Brian Collins
+  Kyle Langley
+*/
+
+#include <stdarg.h>
+
+#ifndef STB__INCLUDE_STB_H
+#define STB__INCLUDE_STB_H
+
+#define STB_VERSION  1
+
+#ifdef STB_INTROSPECT
+   #define STB_DEFINE
+#endif
+
+#ifdef STB_DEFINE_THREADS
+   #ifndef STB_DEFINE
+   #define STB_DEFINE
+   #endif
+   #ifndef STB_THREADS
+   #define STB_THREADS
+   #endif
+#endif
+
+#if defined(_WIN32) && !defined(__MINGW32__)
+   #ifndef _CRT_SECURE_NO_WARNINGS
+   #define _CRT_SECURE_NO_WARNINGS
+   #endif
+   #ifndef _CRT_NONSTDC_NO_DEPRECATE
+   #define _CRT_NONSTDC_NO_DEPRECATE
+   #endif
+   #ifndef _CRT_NON_CONFORMING_SWPRINTFS
+   #define _CRT_NON_CONFORMING_SWPRINTFS
+   #endif
+   #if !defined(_MSC_VER) || _MSC_VER > 1700
+   #include <intrin.h> // _BitScanReverse
+   #endif
+#endif
+
+#include <stdlib.h>     // stdlib could have min/max
+#include <stdio.h>      // need FILE
+#include <string.h>     // stb_define_hash needs memcpy/memset
+#include <time.h>       // stb_dirtree
+#ifdef __MINGW32__
+   #include <fcntl.h>   // O_RDWR
+#endif
+
+#ifdef STB_PERSONAL
+   typedef int Bool;
+   #define False 0
+   #define True 1
+#endif
+
+#ifdef STB_MALLOC_WRAPPER_PAGED
+   #define STB_MALLOC_WRAPPER_DEBUG
+#endif
+#ifdef STB_MALLOC_WRAPPER_DEBUG
+   #define STB_MALLOC_WRAPPER
+#endif
+#ifdef STB_MALLOC_WRAPPER_FASTMALLOC
+   #define STB_FASTMALLOC
+   #define STB_MALLOC_WRAPPER
+#endif
+
+#ifdef STB_FASTMALLOC
+   #ifndef _WIN32
+      #undef STB_FASTMALLOC
+   #endif
+#endif
+
+#ifdef STB_DEFINE
+   #include <assert.h>
+   #include <stdarg.h>
+   #include <stddef.h>
+   #include <ctype.h>
+   #include <math.h>
+   #ifndef _WIN32
+   #include <unistd.h>
+   #else
+   #include <io.h>      // _mktemp
+   #include <direct.h>  // _rmdir
+   #endif
+   #include <sys/types.h> // stat()/_stat()
+   #include <sys/stat.h>  // stat()/_stat()
+#endif
+
+#define stb_min(a,b)   ((a) < (b) ? (a) : (b))
+#define stb_max(a,b)   ((a) > (b) ? (a) : (b))
+
+#ifndef STB_ONLY
+   #if !defined(__cplusplus) && !defined(min) && !defined(max)
+     #define min(x,y) stb_min(x,y)
+     #define max(x,y) stb_max(x,y)
+   #endif
+
+   #ifndef M_PI
+     #define M_PI  3.14159265358979323846f
+   #endif
+
+   #ifndef TRUE
+     #define TRUE  1
+     #define FALSE 0
+   #endif
+
+   #ifndef deg2rad
+   #define deg2rad(a)  ((a)*(M_PI/180))
+   #endif
+   #ifndef rad2deg
+   #define rad2deg(a)  ((a)*(180/M_PI))
+   #endif
+
+   #ifndef swap
+   #ifndef __cplusplus
+   #define swap(TYPE,a,b)  \
+               do { TYPE stb__t; stb__t = (a); (a) = (b); (b) = stb__t; } while (0)
+   #endif
+   #endif
+
+   typedef unsigned char  uint8 ;
+   typedef   signed char   int8 ;
+   typedef unsigned short uint16;
+   typedef   signed short  int16;
+  #if defined(STB_USE_LONG_FOR_32_BIT_INT) || defined(STB_LONG32)
+   typedef unsigned long  uint32;
+   typedef   signed long   int32;
+  #else
+   typedef unsigned int   uint32;
+   typedef   signed int    int32;
+  #endif
+
+   typedef unsigned char  uchar ;
+   typedef unsigned short ushort;
+   typedef unsigned int   uint  ;
+   typedef unsigned long  ulong ;
+
+   // produce compile errors if the sizes aren't right
+   typedef char stb__testsize16[sizeof(int16)==2];
+   typedef char stb__testsize32[sizeof(int32)==4];
+#endif
+
+#ifndef STB_TRUE
+  #define STB_TRUE 1
+  #define STB_FALSE 0
+#endif
+
+// if we're STB_ONLY, can't rely on uint32 or even uint, so all the
+// variables we'll use herein need typenames prefixed with 'stb':
+typedef unsigned char stb_uchar;
+typedef unsigned char stb_uint8;
+typedef unsigned int  stb_uint;
+typedef unsigned short stb_uint16;
+typedef          short stb_int16;
+typedef   signed char  stb_int8;
+#if defined(STB_USE_LONG_FOR_32_BIT_INT) || defined(STB_LONG32)
+  typedef unsigned long  stb_uint32;
+  typedef          long  stb_int32;
+#else
+  typedef unsigned int   stb_uint32;
+  typedef          int   stb_int32;
+#endif
+typedef char stb__testsize2_16[sizeof(stb_uint16)==2 ? 1 : -1];
+typedef char stb__testsize2_32[sizeof(stb_uint32)==4 ? 1 : -1];
+
+#ifdef _MSC_VER
+  typedef unsigned __int64 stb_uint64;
+  typedef          __int64 stb_int64;
+  #define STB_IMM_UINT64(literalui64) (literalui64##ui64)
+  #define STB_IMM_INT64(literali64) (literali64##i64)
+#else
+  // ??
+  typedef unsigned long long stb_uint64;
+  typedef          long long stb_int64;
+  #define STB_IMM_UINT64(literalui64) (literalui64##ULL)
+  #define STB_IMM_INT64(literali64) (literali64##LL)
+#endif
+typedef char stb__testsize2_64[sizeof(stb_uint64)==8 ? 1 : -1];
+
+// add platform-specific ways of checking for sizeof(char*) == 8,
+// and make those define STB_PTR64
+#if defined(_WIN64) || defined(__x86_64__) || defined(__ia64__) || defined(__LP64__)
+  #define STB_PTR64
+#endif
+
+#ifdef STB_PTR64
+typedef char stb__testsize2_ptr[sizeof(char *) == 8];
+typedef stb_uint64 stb_uinta;
+typedef stb_int64  stb_inta;
+#else
+typedef char stb__testsize2_ptr[sizeof(char *) == 4];
+typedef stb_uint32 stb_uinta;
+typedef stb_int32  stb_inta;
+#endif
+typedef char stb__testsize2_uinta[sizeof(stb_uinta)==sizeof(char*) ? 1 : -1];
+
+// if so, we should define an int type that is the pointer size. until then,
+// we'll have to make do with this (which is not the same at all!)
+
+typedef union
+{
+   unsigned int i;
+   void * p;
+} stb_uintptr;
+
+
+#ifdef __cplusplus
+   #define STB_EXTERN   extern "C"
+#else
+   #define STB_EXTERN   extern
+#endif
+
+// check for well-known debug defines
+#if defined(DEBUG) || defined(_DEBUG) || defined(DBG)
+   #ifndef NDEBUG
+      #define STB_DEBUG
+   #endif
+#endif
+
+#ifdef STB_DEBUG
+   #include <assert.h>
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         C library function platform handling
+//
+
+#ifdef STB_DEFINE
+
+#if defined(_WIN32) &&  defined(__STDC_WANT_SECURE_LIB__)
+static FILE * stb_p_fopen(const char *filename, const char *mode)
+{
+   FILE *f;
+   if (0 == fopen_s(&f, filename, mode))
+      return f;
+   else
+      return NULL;
+}
+static FILE * stb_p_wfopen(const wchar_t *filename, const wchar_t *mode)
+{
+   FILE *f;
+   if (0 == _wfopen_s(&f, filename, mode))
+      return f;
+   else
+      return NULL;
+}
+static char *stb_p_strcpy_s(char *a, size_t size, const char *b)
+{
+   strcpy_s(a,size,b);
+   return a;
+}
+static char *stb_p_strncpy_s(char *a, size_t size, const char *b, size_t count)
+{
+   strncpy_s(a,size,b,count);
+   return a;
+}
+#define stb_p_mktemp(s)  (_mktemp_s(s, strlen(s)+1) == 0)
+#define stb_p_sprintf    sprintf_s
+#define stb_p_size(x)    ,(x)
+#else
+#define stb_p_fopen      fopen
+#define stb_p_wfopen     _wfopen
+#define stb_p_strcpy_s(a,s,b)     strcpy(a,b)
+#define stb_p_strncpy_s(a,s,b,c)  strncpy(a,b,c)
+#define stb_p_mktemp(s)  (mktemp(s) != NULL)
+
+#define stb_p_sprintf    sprintf
+#define stb_p_size(x)
+#endif
+
+#if defined(_WIN32)
+#define stb_p_vsnprintf  _vsnprintf
+#else
+#define stb_p_vsnprintf  vsnprintf
+#endif
+#endif // STB_DEFINE
+
+#if defined(_WIN32) && (_MSC_VER >= 1300)
+#define stb_p_stricmp    _stricmp
+#define stb_p_strnicmp   _strnicmp
+#define stb_p_strdup     _strdup
+#else
+#define stb_p_strdup     strdup
+#define stb_p_stricmp    stricmp
+#define stb_p_strnicmp   strnicmp
+#endif
+
+STB_EXTERN void stb_wrapper_malloc(void *newp, size_t sz, char *file, int line);
+STB_EXTERN void stb_wrapper_free(void *oldp, char *file, int line);
+STB_EXTERN void stb_wrapper_realloc(void *oldp, void *newp, size_t sz, char *file, int line);
+STB_EXTERN void stb_wrapper_calloc(size_t num, size_t sz, char *file, int line);
+STB_EXTERN void stb_wrapper_listall(void (*func)(void *ptr, size_t sz, char *file, int line));
+STB_EXTERN void stb_wrapper_dump(char *filename);
+STB_EXTERN size_t stb_wrapper_allocsize(void *oldp);
+STB_EXTERN void stb_wrapper_check(void *oldp);
+
+#ifdef STB_DEFINE
+// this is a special function used inside malloc wrapper
+// to do allocations that aren't tracked (to avoid
+// reentrancy). Of course if someone _else_ wraps realloc,
+// this breaks, but if they're doing that AND the malloc
+// wrapper they need to explicitly check for reentrancy.
+//
+// only define realloc_raw() and we do realloc(NULL,sz)
+// for malloc() and realloc(p,0) for free().
+static void * stb__realloc_raw(void *p, int sz)
+{
+   if (p == NULL) return malloc(sz);
+   if (sz == 0)   { free(p); return NULL; }
+   return realloc(p,sz);
+}
+#endif
+
+#ifdef _WIN32
+STB_EXTERN void * stb_smalloc(size_t sz);
+STB_EXTERN void   stb_sfree(void *p);
+STB_EXTERN void * stb_srealloc(void *p, size_t sz);
+STB_EXTERN void * stb_scalloc(size_t n, size_t sz);
+STB_EXTERN char * stb_sstrdup(char *s);
+#endif
+
+#ifdef STB_FASTMALLOC
+#define malloc  stb_smalloc
+#define free    stb_sfree
+#define realloc stb_srealloc
+#define strdup  stb_sstrdup
+#define calloc  stb_scalloc
+#endif
+
+#ifndef STB_MALLOC_ALLCHECK
+   #define stb__check(p)  1
+#else
+   #ifndef STB_MALLOC_WRAPPER
+      #error STB_MALLOC_ALLCHECK requires STB_MALLOC_WRAPPER
+   #else
+      #define stb__check(p) stb_mcheck(p)
+   #endif
+#endif
+
+#ifdef STB_MALLOC_WRAPPER
+   STB_EXTERN void * stb__malloc(size_t, char *, int);
+   STB_EXTERN void * stb__realloc(void *, size_t, char *, int);
+   STB_EXTERN void * stb__calloc(size_t n, size_t s, char *, int);
+   STB_EXTERN void   stb__free(void *, char *file, int);
+   STB_EXTERN char * stb__strdup(char *s, char *file, int);
+   STB_EXTERN void   stb_malloc_checkall(void);
+   STB_EXTERN void   stb_malloc_check_counter(int init_delay, int rep_delay);
+   #ifndef STB_MALLOC_WRAPPER_DEBUG
+      #define stb_mcheck(p) 1
+   #else
+      STB_EXTERN int   stb_mcheck(void *);
+   #endif
+
+
+   #ifdef STB_DEFINE
+
+   #ifdef STB_MALLOC_WRAPPER_DEBUG
+      #define STB__PAD   32
+      #define STB__BIAS  16
+      #define STB__SIG   0x51b01234
+      #define STB__FIXSIZE(sz)  (((sz+3) & ~3) + STB__PAD)
+      #define STB__ptr(x,y)   ((char *) (x) + (y))
+   #else
+      #define STB__ptr(x,y)   (x)
+      #define STB__FIXSIZE(sz)  (sz)
+   #endif
+
+   #ifdef STB_MALLOC_WRAPPER_DEBUG
+   int stb_mcheck(void *p)
+   {
+      unsigned int sz;
+      if (p == NULL) return 1;
+      p = ((char *) p) - STB__BIAS;
+      sz = * (unsigned int *) p;
+      assert(* (unsigned int *) STB__ptr(p,4) == STB__SIG);
+      assert(* (unsigned int *) STB__ptr(p,8) == STB__SIG);
+      assert(* (unsigned int *) STB__ptr(p,12) == STB__SIG);
+      assert(* (unsigned int *) STB__ptr(p,sz-4) == STB__SIG+1);
+      assert(* (unsigned int *) STB__ptr(p,sz-8) == STB__SIG+1);
+      assert(* (unsigned int *) STB__ptr(p,sz-12) == STB__SIG+1);
+      assert(* (unsigned int *) STB__ptr(p,sz-16) == STB__SIG+1);
+      stb_wrapper_check(STB__ptr(p, STB__BIAS));
+      return 1;
+   }
+
+   static void stb__check2(void *p, size_t sz, char *file, int line)
+   {
+      stb_mcheck(p);
+   }
+
+   void stb_malloc_checkall(void)
+   {
+      stb_wrapper_listall(stb__check2);
+   }
+   #else
+   void stb_malloc_checkall(void) { }
+   #endif
+
+   static int stb__malloc_wait=(1 << 30), stb__malloc_next_wait = (1 << 30), stb__malloc_iter;
+   void stb_malloc_check_counter(int init_delay, int rep_delay)
+   {
+      stb__malloc_wait = init_delay;
+      stb__malloc_next_wait = rep_delay;
+   }
+
+   void stb_mcheck_all(void)
+   {
+      #ifdef STB_MALLOC_WRAPPER_DEBUG
+      ++stb__malloc_iter;
+      if (--stb__malloc_wait <= 0) {
+         stb_malloc_checkall();
+         stb__malloc_wait = stb__malloc_next_wait;
+      }
+      #endif
+   }
+
+   #ifdef STB_MALLOC_WRAPPER_PAGED
+   #define STB__WINDOWS_PAGE (1 << 12)
+   #ifndef _WINDOWS_
+   STB_EXTERN __declspec(dllimport) void * __stdcall VirtualAlloc(void *p, unsigned long size, unsigned long type, unsigned long protect);
+   STB_EXTERN __declspec(dllimport) int   __stdcall VirtualFree(void *p, unsigned long size, unsigned long freetype);
+   #endif
+   #endif
+
+   static void *stb__malloc_final(size_t sz)
+   {
+      #ifdef STB_MALLOC_WRAPPER_PAGED
+      size_t aligned = (sz + STB__WINDOWS_PAGE - 1) & ~(STB__WINDOWS_PAGE-1);
+      char *p = VirtualAlloc(NULL, aligned + STB__WINDOWS_PAGE, 0x2000, 0x04); // RESERVE, READWRITE
+      if (p == NULL) return p;
+      VirtualAlloc(p, aligned,   0x1000, 0x04); // COMMIT, READWRITE
+      return p;
+      #else
+      return malloc(sz);
+      #endif
+   }
+
+   static void stb__free_final(void *p)
+   {
+      #ifdef STB_MALLOC_WRAPPER_PAGED
+      VirtualFree(p, 0, 0x8000); // RELEASE
+      #else
+      free(p);
+      #endif
+   }
+
+   int stb__malloc_failure;
+   #ifdef STB_MALLOC_WRAPPER_PAGED
+   static void *stb__realloc_final(void *p, size_t sz, size_t old_sz)
+   {
+      void *q = stb__malloc_final(sz);
+      if (q == NULL)
+          return ++stb__malloc_failure, q;
+      // @TODO: deal with p being smaller!
+      memcpy(q, p, sz < old_sz ? sz : old_sz);
+      stb__free_final(p);
+      return q;
+   }
+   #endif
+
+   void stb__free(void *p, char *file, int line)
+   {
+      stb_mcheck_all();
+      if (!p) return;
+      #ifdef STB_MALLOC_WRAPPER_DEBUG
+      stb_mcheck(p);
+      #endif
+      stb_wrapper_free(p,file,line);
+      #ifdef STB_MALLOC_WRAPPER_DEBUG
+         p = STB__ptr(p,-STB__BIAS);
+         * (unsigned int *) STB__ptr(p,0) = 0xdeadbeef;
+         * (unsigned int *) STB__ptr(p,4) = 0xdeadbeef;
+         * (unsigned int *) STB__ptr(p,8) = 0xdeadbeef;
+         * (unsigned int *) STB__ptr(p,12) = 0xdeadbeef;
+      #endif
+      stb__free_final(p);
+   }
+
+   void * stb__malloc(size_t sz, char *file, int line)
+   {
+      void *p;
+      stb_mcheck_all();
+      if (sz == 0) return NULL;
+      p = stb__malloc_final(STB__FIXSIZE(sz));
+      if (p == NULL) p = stb__malloc_final(STB__FIXSIZE(sz));
+      if (p == NULL) p = stb__malloc_final(STB__FIXSIZE(sz));
+      if (p == NULL) {
+         ++stb__malloc_failure;
+         #ifdef STB_MALLOC_WRAPPER_DEBUG
+         stb_malloc_checkall();
+         #endif
+         return p;
+      }
+      #ifdef STB_MALLOC_WRAPPER_DEBUG
+      * (int *) STB__ptr(p,0) = STB__FIXSIZE(sz);
+      * (unsigned int *) STB__ptr(p,4) = STB__SIG;
+      * (unsigned int *) STB__ptr(p,8) = STB__SIG;
+      * (unsigned int *) STB__ptr(p,12) = STB__SIG;
+      * (unsigned int *) STB__ptr(p,STB__FIXSIZE(sz)-4) = STB__SIG+1;
+      * (unsigned int *) STB__ptr(p,STB__FIXSIZE(sz)-8) = STB__SIG+1;
+      * (unsigned int *) STB__ptr(p,STB__FIXSIZE(sz)-12) = STB__SIG+1;
+      * (unsigned int *) STB__ptr(p,STB__FIXSIZE(sz)-16) = STB__SIG+1;
+      p = STB__ptr(p, STB__BIAS);
+      #endif
+      stb_wrapper_malloc(p,sz,file,line);
+      return p;
+   }
+
+   void * stb__realloc(void *p, size_t sz, char *file, int line)
+   {
+      void *q;
+
+      stb_mcheck_all();
+      if (p == NULL) return stb__malloc(sz,file,line);
+      if (sz == 0  ) { stb__free(p,file,line); return NULL; }
+
+      #ifdef STB_MALLOC_WRAPPER_DEBUG
+         stb_mcheck(p);
+         p = STB__ptr(p,-STB__BIAS);
+      #endif
+      #ifdef STB_MALLOC_WRAPPER_PAGED
+      {
+         size_t n = stb_wrapper_allocsize(STB__ptr(p,STB__BIAS));
+         if (!n)
+            stb_wrapper_check(STB__ptr(p,STB__BIAS));
+         q = stb__realloc_final(p, STB__FIXSIZE(sz), STB__FIXSIZE(n));
+      }
+      #else
+      q = realloc(p, STB__FIXSIZE(sz));
+      #endif
+      if (q == NULL)
+         return ++stb__malloc_failure, q;
+      #ifdef STB_MALLOC_WRAPPER_DEBUG
+      * (int *) STB__ptr(q,0) = STB__FIXSIZE(sz);
+      * (unsigned int *) STB__ptr(q,4) = STB__SIG;
+      * (unsigned int *) STB__ptr(q,8) = STB__SIG;
+      * (unsigned int *) STB__ptr(q,12) = STB__SIG;
+      * (unsigned int *) STB__ptr(q,STB__FIXSIZE(sz)-4) = STB__SIG+1;
+      * (unsigned int *) STB__ptr(q,STB__FIXSIZE(sz)-8) = STB__SIG+1;
+      * (unsigned int *) STB__ptr(q,STB__FIXSIZE(sz)-12) = STB__SIG+1;
+      * (unsigned int *) STB__ptr(q,STB__FIXSIZE(sz)-16) = STB__SIG+1;
+
+      q = STB__ptr(q, STB__BIAS);
+      p = STB__ptr(p, STB__BIAS);
+      #endif
+      stb_wrapper_realloc(p,q,sz,file,line);
+      return q;
+   }
+
+   STB_EXTERN int stb_log2_ceil(size_t);
+   static void *stb__calloc(size_t n, size_t sz, char *file, int line)
+   {
+      void *q;
+      stb_mcheck_all();
+      if (n == 0 || sz == 0) return NULL;
+      if (stb_log2_ceil(n) + stb_log2_ceil(sz) >= 32) return NULL;
+      q = stb__malloc(n*sz, file, line);
+      if (q) memset(q, 0, n*sz);
+      return q;
+   }
+
+   char * stb__strdup(char *s, char *file, int line)
+   {
+      char *p;
+      stb_mcheck_all();
+      p = stb__malloc(strlen(s)+1, file, line);
+      if (!p) return p;
+      stb_p_strcpy_s(p, strlen(s)+1, s);
+      return p;
+   }
+   #endif // STB_DEFINE
+
+   #ifdef STB_FASTMALLOC
+   #undef malloc
+   #undef realloc
+   #undef free
+   #undef strdup
+   #undef calloc
+   #endif
+
+   // include everything that might define these, BEFORE making macros
+   #include <stdlib.h>
+   #include <string.h>
+   #include <malloc.h>
+
+   #define malloc(s)      stb__malloc (  s, __FILE__, __LINE__)
+   #define realloc(p,s)   stb__realloc(p,s, __FILE__, __LINE__)
+   #define calloc(n,s)    stb__calloc (n,s, __FILE__, __LINE__)
+   #define free(p)        stb__free   (p,   __FILE__, __LINE__)
+   #define strdup(p)      stb__strdup (p,   __FILE__, __LINE__)
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Windows pretty display
+//
+
+STB_EXTERN void stbprint(const char *fmt, ...);
+STB_EXTERN char *stb_sprintf(const char *fmt, ...);
+STB_EXTERN char *stb_mprintf(const char *fmt, ...);
+STB_EXTERN int  stb_snprintf(char *s, size_t n, const char *fmt, ...);
+STB_EXTERN int  stb_vsnprintf(char *s, size_t n, const char *fmt, va_list v);
+
+#ifdef STB_DEFINE
+int stb_vsnprintf(char *s, size_t n, const char *fmt, va_list v)
+{
+   int res;
+   #ifdef _WIN32
+      #ifdef __STDC_WANT_SECURE_LIB__
+      res = _vsnprintf_s(s, n, _TRUNCATE, fmt, v);
+      #else
+      res = stb_p_vsnprintf(s,n,fmt,v);
+      #endif
+   #else
+   res = vsnprintf(s,n,fmt,v);
+   #endif
+   if (n) s[n-1] = 0;
+   // Unix returns length output would require, Windows returns negative when truncated.
+   return (res >= (int) n || res < 0) ? -1 : res;
+}
+
+int stb_snprintf(char *s, size_t n, const char *fmt, ...)
+{
+   int res;
+   va_list v;
+   va_start(v,fmt);
+   res = stb_vsnprintf(s, n, fmt, v);
+   va_end(v);
+   return res;
+}
+
+char *stb_sprintf(const char *fmt, ...)
+{
+   static char buffer[1024];
+   va_list v;
+   va_start(v,fmt);
+   stb_vsnprintf(buffer,1024,fmt,v);
+   va_end(v);
+   return buffer;
+}
+
+char *stb_mprintf(const char *fmt, ...)
+{
+   static char buffer[1024];
+   va_list v;
+   va_start(v,fmt);
+   stb_vsnprintf(buffer,1024,fmt,v);
+   va_end(v);
+   return stb_p_strdup(buffer);
+}
+
+#ifdef _WIN32
+
+#ifndef _WINDOWS_
+STB_EXTERN __declspec(dllimport) int __stdcall WriteConsoleA(void *, const void *, unsigned int, unsigned int *, void *);
+STB_EXTERN __declspec(dllimport) void * __stdcall GetStdHandle(unsigned int);
+STB_EXTERN __declspec(dllimport) int __stdcall SetConsoleTextAttribute(void *, unsigned short);
+#endif
+
+static void stb__print_one(void *handle, char *s, ptrdiff_t  len)
+{
+   if (len)
+      if (0==WriteConsoleA(handle, s, (unsigned) len, NULL,NULL))
+         // if it fails, maybe redirected, so output normally...
+         // but it's supriously reporting failure now on Win7 and later
+         {}//fwrite(s, 1, (unsigned) len, stdout);
+}
+
+static void stb__print(char *s)
+{
+   void *handle = GetStdHandle((unsigned int) -11); // STD_OUTPUT_HANDLE
+   int pad=0; // number of padding characters to add
+
+   char *t = s;
+   while (*s) {
+      int lpad;
+      while (*s && *s != '{') {
+         if (pad) {
+            if (*s == '\r' || *s == '\n')
+               pad = 0;
+            else if (s[0] == ' ' && s[1] == ' ') {
+               stb__print_one(handle, t, s-t);
+               t = s;
+               while (pad) {
+                  stb__print_one(handle, t, 1);
+                  --pad;
+               }
+            }
+         }
+         ++s;
+      }
+      if (!*s) break;
+      stb__print_one(handle, t, s-t);
+      if (s[1] == '{') {
+         ++s;
+         continue;
+      }
+
+      if (s[1] == '#') {
+         t = s+3;
+         if (isxdigit(s[2]))
+            if (isdigit(s[2]))
+               SetConsoleTextAttribute(handle, s[2] - '0');
+            else
+               SetConsoleTextAttribute(handle, tolower(s[2]) - 'a' + 10);
+         else {
+            SetConsoleTextAttribute(handle, 0x0f);
+            t=s+2;
+         }
+      } else if (s[1] == '!') {
+         SetConsoleTextAttribute(handle, 0x0c);
+         t = s+2;
+      } else if (s[1] == '@') {
+         SetConsoleTextAttribute(handle, 0x09);
+         t = s+2;
+      } else if (s[1] == '$') {
+         SetConsoleTextAttribute(handle, 0x0a);
+         t = s+2;
+      } else {
+         SetConsoleTextAttribute(handle, 0x08); // 0,7,8,15 => shades of grey
+         t = s+1;
+      }
+
+      lpad = (int) (t-s);
+      s = t;
+      while (*s && *s != '}') ++s;
+      if (!*s) break;
+      stb__print_one(handle, t, s-t);
+      if (s[1] == '}') {
+         t = s+2;
+      } else {
+         pad += 1+lpad;
+         t = s+1;
+      }
+      s=t;
+      SetConsoleTextAttribute(handle, 0x07);
+   }
+   stb__print_one(handle, t, s-t);
+   SetConsoleTextAttribute(handle, 0x07);
+}
+
+void stbprint(const char *fmt, ...)
+{
+   int res;
+   char buffer[1024];
+   char *tbuf = buffer;
+   va_list v;
+
+   va_start(v,fmt);
+   res = stb_vsnprintf(buffer, sizeof(buffer), fmt, v);
+   va_end(v);
+
+   if (res < 0) {
+      tbuf = (char *) malloc(16384);
+      va_start(v,fmt);
+      res = stb_vsnprintf(tbuf,16384, fmt, v);
+      va_end(v);
+      tbuf[16383] = 0;
+   }
+
+   stb__print(tbuf);
+
+   if (tbuf != buffer)
+      free(tbuf);
+}
+
+#else  // _WIN32
+void stbprint(const char *fmt, ...)
+{
+   va_list v;
+   va_start(v,fmt);
+   vprintf(fmt,v);
+   va_end(v);
+}
+#endif // _WIN32
+#endif // STB_DEFINE
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Windows UTF8 filename handling
+//
+// Windows stupidly treats 8-bit filenames as some dopey code page,
+// rather than utf-8. If we want to use utf8 filenames, we have to
+// convert them to WCHAR explicitly and call WCHAR versions of the
+// file functions. So, ok, we do.
+
+
+#ifdef _WIN32
+   #define stb__fopen(x,y)    stb_p_wfopen((const wchar_t *)stb__from_utf8(x), (const wchar_t *)stb__from_utf8_alt(y))
+   #define stb__windows(x,y)  x
+#else
+   #define stb__fopen(x,y)    stb_p_fopen(x,y)
+   #define stb__windows(x,y)  y
+#endif
+
+
+typedef unsigned short stb__wchar;
+
+STB_EXTERN stb__wchar * stb_from_utf8(stb__wchar *buffer, const char *str, int n);
+STB_EXTERN char       * stb_to_utf8  (char *buffer, const stb__wchar *str, int n);
+
+STB_EXTERN stb__wchar *stb__from_utf8(const char *str);
+STB_EXTERN stb__wchar *stb__from_utf8_alt(const char *str);
+STB_EXTERN char *stb__to_utf8(const stb__wchar *str);
+
+
+#ifdef STB_DEFINE
+stb__wchar * stb_from_utf8(stb__wchar *buffer, const char *ostr, int n)
+{
+   unsigned char *str = (unsigned char *) ostr;
+   stb_uint32 c;
+   int i=0;
+   --n;
+   while (*str) {
+      if (i >= n)
+         return NULL;
+      if (!(*str & 0x80))
+         buffer[i++] = *str++;
+      else if ((*str & 0xe0) == 0xc0) {
+         if (*str < 0xc2) return NULL;
+         c = (*str++ & 0x1f) << 6;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         buffer[i++] = c + (*str++ & 0x3f);
+      } else if ((*str & 0xf0) == 0xe0) {
+         if (*str == 0xe0 && (str[1] < 0xa0 || str[1] > 0xbf)) return NULL;
+         if (*str == 0xed && str[1] > 0x9f) return NULL; // str[1] < 0x80 is checked below
+         c = (*str++ & 0x0f) << 12;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f) << 6;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         buffer[i++] = c + (*str++ & 0x3f);
+      } else if ((*str & 0xf8) == 0xf0) {
+         if (*str > 0xf4) return NULL;
+         if (*str == 0xf0 && (str[1] < 0x90 || str[1] > 0xbf)) return NULL;
+         if (*str == 0xf4 && str[1] > 0x8f) return NULL; // str[1] < 0x80 is checked below
+         c = (*str++ & 0x07) << 18;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f) << 12;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f) << 6;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f);
+         // utf-8 encodings of values used in surrogate pairs are invalid
+         if ((c & 0xFFFFF800) == 0xD800) return NULL;
+         if (c >= 0x10000) {
+            c -= 0x10000;
+            if (i + 2 > n) return NULL;
+            buffer[i++] = 0xD800 | (0x3ff & (c >> 10));
+            buffer[i++] = 0xDC00 | (0x3ff & (c      ));
+         }
+      } else
+         return NULL;
+   }
+   buffer[i] = 0;
+   return buffer;
+}
+
+char * stb_to_utf8(char *buffer, const stb__wchar *str, int n)
+{
+   int i=0;
+   --n;
+   while (*str) {
+      if (*str < 0x80) {
+         if (i+1 > n) return NULL;
+         buffer[i++] = (char) *str++;
+      } else if (*str < 0x800) {
+         if (i+2 > n) return NULL;
+         buffer[i++] = 0xc0 + (*str >> 6);
+         buffer[i++] = 0x80 + (*str & 0x3f);
+         str += 1;
+      } else if (*str >= 0xd800 && *str < 0xdc00) {
+         stb_uint32 c;
+         if (i+4 > n) return NULL;
+         c = ((str[0] - 0xd800) << 10) + ((str[1]) - 0xdc00) + 0x10000;
+         buffer[i++] = 0xf0 + (c >> 18);
+         buffer[i++] = 0x80 + ((c >> 12) & 0x3f);
+         buffer[i++] = 0x80 + ((c >>  6) & 0x3f);
+         buffer[i++] = 0x80 + ((c      ) & 0x3f);
+         str += 2;
+      } else if (*str >= 0xdc00 && *str < 0xe000) {
+         return NULL;
+      } else {
+         if (i+3 > n) return NULL;
+         buffer[i++] = 0xe0 + (*str >> 12);
+         buffer[i++] = 0x80 + ((*str >> 6) & 0x3f);
+         buffer[i++] = 0x80 + ((*str     ) & 0x3f);
+         str += 1;
+      }
+   }
+   buffer[i] = 0;
+   return buffer;
+}
+
+stb__wchar *stb__from_utf8(const char *str)
+{
+   static stb__wchar buffer[4096];
+   return stb_from_utf8(buffer, str, 4096);
+}
+
+stb__wchar *stb__from_utf8_alt(const char *str)
+{
+   static stb__wchar buffer[4096];
+   return stb_from_utf8(buffer, str, 4096);
+}
+
+char *stb__to_utf8(const stb__wchar *str)
+{
+   static char buffer[4096];
+   return stb_to_utf8(buffer, str, 4096);
+}
+
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Miscellany
+//
+
+STB_EXTERN void stb_fatal(const char *fmt, ...);
+STB_EXTERN void stb_(char *fmt, ...);
+STB_EXTERN void stb_append_to_file(char *file, char *fmt, ...);
+STB_EXTERN void stb_log(int active);
+STB_EXTERN void stb_log_fileline(int active);
+STB_EXTERN void stb_log_name(char *filename);
+
+STB_EXTERN void stb_swap(void *p, void *q, size_t sz);
+STB_EXTERN void *stb_copy(void *p, size_t sz);
+STB_EXTERN void stb_pointer_array_free(void *p, int len);
+STB_EXTERN void **stb_array_block_alloc(int count, int blocksize);
+
+#define stb_arrcount(x)   (sizeof(x)/sizeof((x)[0]))
+
+
+STB_EXTERN int  stb__record_fileline(const char *f, int n);
+
+#ifdef STB_DEFINE
+
+static char *stb__file;
+static int   stb__line;
+
+int  stb__record_fileline(const char *f, int n)
+{
+   stb__file = (char*) f;
+   stb__line = n;
+   return 0;
+}
+
+void stb_fatal(const char *s, ...)
+{
+   va_list a;
+   if (stb__file)
+      fprintf(stderr, "[%s:%d] ", stb__file, stb__line);
+   va_start(a,s);
+   fputs("Fatal error: ", stderr);
+   vfprintf(stderr, s, a);
+   va_end(a);
+   fputs("\n", stderr);
+   #ifdef STB_DEBUG
+   #ifdef _MSC_VER
+   #ifndef STB_PTR64
+   __asm int 3;   // trap to debugger!
+   #else
+   __debugbreak();
+   #endif
+   #else
+   __builtin_trap();
+   #endif
+   #endif
+   exit(1);
+}
+
+static int stb__log_active=1, stb__log_fileline=1;
+
+void stb_log(int active)
+{
+   stb__log_active = active;
+}
+
+void stb_log_fileline(int active)
+{
+   stb__log_fileline = active;
+}
+
+#ifdef STB_NO_STB_STRINGS
+const char *stb__log_filename = "temp.log";
+#else
+const char *stb__log_filename = "stb.log";
+#endif
+
+void stb_log_name(char *s)
+{
+   stb__log_filename = s;
+}
+
+void stb_(char *s, ...)
+{
+   if (stb__log_active) {
+      FILE *f = stb_p_fopen(stb__log_filename, "a");
+      if (f) {
+         va_list a;
+         if (stb__log_fileline && stb__file)
+            fprintf(f, "[%s:%4d] ", stb__file, stb__line);
+         va_start(a,s);
+         vfprintf(f, s, a);
+         va_end(a);
+         fputs("\n", f);
+         fclose(f);
+      }
+   }
+}
+
+void stb_append_to_file(char *filename, char *s, ...)
+{
+   FILE *f = stb_p_fopen(filename, "a");
+   if (f) {
+      va_list a;
+      va_start(a,s);
+      vfprintf(f, s, a);
+      va_end(a);
+      fputs("\n", f);
+      fclose(f);
+   }
+}
+
+
+typedef struct { char d[4]; } stb__4;
+typedef struct { char d[8]; } stb__8;
+
+// optimize the small cases, though you shouldn't be calling this for those!
+void stb_swap(void *p, void *q, size_t sz)
+{
+   char buffer[256];
+   if (p == q) return;
+   if (sz == 4) {
+      stb__4 temp    = * ( stb__4 *) p;
+      * (stb__4 *) p = * ( stb__4 *) q;
+      * (stb__4 *) q = temp;
+      return;
+   } else if (sz == 8) {
+      stb__8 temp    = * ( stb__8 *) p;
+      * (stb__8 *) p = * ( stb__8 *) q;
+      * (stb__8 *) q = temp;
+      return;
+   }
+
+   while (sz > sizeof(buffer)) {
+      stb_swap(p, q, sizeof(buffer));
+      p = (char *) p + sizeof(buffer);
+      q = (char *) q + sizeof(buffer);
+      sz -= sizeof(buffer);
+   }
+
+   memcpy(buffer, p     , sz);
+   memcpy(p     , q     , sz);
+   memcpy(q     , buffer, sz);
+}
+
+void *stb_copy(void *p, size_t sz)
+{
+   void *q = malloc(sz);
+   memcpy(q, p, sz);
+   return q;
+}
+
+void stb_pointer_array_free(void *q, int len)
+{
+   void **p = (void **) q;
+   int i;
+   for (i=0; i < len; ++i)
+      free(p[i]);
+}
+
+void **stb_array_block_alloc(int count, int blocksize)
+{
+   int i;
+   char *p = (char *) malloc(sizeof(void *) * count + count * blocksize);
+   void **q;
+   if (p == NULL) return NULL;
+   q = (void **) p;
+   p += sizeof(void *) * count;
+   for (i=0; i < count; ++i)
+      q[i] = p + i * blocksize;
+   return q;
+}
+#endif
+
+#ifdef STB_DEBUG
+   // tricky hack to allow recording FILE,LINE even in varargs functions
+   #define STB__RECORD_FILE(x)  (stb__record_fileline(__FILE__, __LINE__),(x))
+   #define stb_log              STB__RECORD_FILE(stb_log)
+   #define stb_                 STB__RECORD_FILE(stb_)
+   #ifndef STB_FATAL_CLEAN
+   #define stb_fatal            STB__RECORD_FILE(stb_fatal)
+   #endif
+   #define STB__DEBUG(x)        x
+#else
+   #define STB__DEBUG(x)
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         stb_temp
+//
+
+#define stb_temp(block, sz)     stb__temp(block, sizeof(block), (sz))
+
+STB_EXTERN void * stb__temp(void *b, int b_sz, int want_sz);
+STB_EXTERN void   stb_tempfree(void *block, void *ptr);
+
+#ifdef STB_DEFINE
+
+void * stb__temp(void *b, int b_sz, int want_sz)
+{
+   if (b_sz >= want_sz)
+      return b;
+   else
+      return malloc(want_sz);
+}
+
+void   stb_tempfree(void *b, void *p)
+{
+   if (p != b)
+      free(p);
+}
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                      math/sampling operations
+//
+
+
+#define stb_lerp(t,a,b)               ( (a) + (t) * (float) ((b)-(a)) )
+#define stb_unlerp(t,a,b)             ( ((t) - (a)) / (float) ((b) - (a)) )
+
+#define stb_clamp(x,xmin,xmax)  ((x) < (xmin) ? (xmin) : (x) > (xmax) ? (xmax) : (x))
+
+STB_EXTERN void stb_newell_normal(float *normal, int num_vert, float **vert, int normalize);
+STB_EXTERN int stb_box_face_vertex_axis_side(int face_number, int vertex_number, int axis);
+STB_EXTERN void stb_linear_controller(float *curpos, float target_pos, float acc, float deacc, float dt);
+
+STB_EXTERN int stb_float_eq(float x, float y, float delta, int max_ulps);
+STB_EXTERN int stb_is_prime(unsigned int m);
+STB_EXTERN unsigned int stb_power_of_two_nearest_prime(int n);
+
+STB_EXTERN float stb_smoothstep(float t);
+STB_EXTERN float stb_cubic_bezier_1d(float t, float p0, float p1, float p2, float p3);
+
+STB_EXTERN double stb_linear_remap(double x, double a, double b,
+                                             double c, double d);
+
+#ifdef STB_DEFINE
+float stb_smoothstep(float t)
+{
+   return (3 - 2*t)*(t*t);
+}
+
+float stb_cubic_bezier_1d(float t, float p0, float p1, float p2, float p3)
+{
+   float it = 1-t;
+   return it*it*it*p0 + 3*it*it*t*p1 + 3*it*t*t*p2 + t*t*t*p3;
+}
+
+void stb_newell_normal(float *normal, int num_vert, float **vert, int normalize)
+{
+   int i,j;
+   float p;
+   normal[0] = normal[1] = normal[2] = 0;
+   for (i=num_vert-1,j=0; j < num_vert; i=j++) {
+      float *u = vert[i];
+      float *v = vert[j];
+      normal[0] += (u[1] - v[1]) * (u[2] + v[2]);
+      normal[1] += (u[2] - v[2]) * (u[0] + v[0]);
+      normal[2] += (u[0] - v[0]) * (u[1] + v[1]);
+   }
+   if (normalize) {
+      p = normal[0]*normal[0] + normal[1]*normal[1] + normal[2]*normal[2];
+      p = (float) (1.0 / sqrt(p));
+      normal[0] *= p;
+      normal[1] *= p;
+      normal[2] *= p;
+   }
+}
+
+int stb_box_face_vertex_axis_side(int face_number, int vertex_number, int axis)
+{
+   static int box_vertices[6][4][3] =
+   {
+      { { 1,1,1 }, { 1,0,1 }, { 1,0,0 }, { 1,1,0 } },
+      { { 0,0,0 }, { 0,0,1 }, { 0,1,1 }, { 0,1,0 } },
+      { { 0,0,0 }, { 0,1,0 }, { 1,1,0 }, { 1,0,0 } },
+      { { 0,0,0 }, { 1,0,0 }, { 1,0,1 }, { 0,0,1 } },
+      { { 1,1,1 }, { 0,1,1 }, { 0,0,1 }, { 1,0,1 } },
+      { { 1,1,1 }, { 1,1,0 }, { 0,1,0 }, { 0,1,1 } },
+   };
+   assert(face_number >= 0 && face_number < 6);
+   assert(vertex_number >= 0 && vertex_number < 4);
+   assert(axis >= 0 && axis < 3);
+   return box_vertices[face_number][vertex_number][axis];
+}
+
+void stb_linear_controller(float *curpos, float target_pos, float acc, float deacc, float dt)
+{
+   float sign = 1, p, cp = *curpos;
+   if (cp == target_pos) return;
+   if (target_pos < cp) {
+      target_pos = -target_pos;
+      cp = -cp;
+      sign = -1;
+   }
+   // first decelerate
+   if (cp < 0) {
+      p = cp + deacc * dt;
+      if (p > 0) {
+         p = 0;
+         dt = dt - cp / deacc;
+         if (dt < 0) dt = 0;
+      } else {
+         dt = 0;
+      }
+      cp = p;
+   }
+   // now accelerate
+   p = cp + acc*dt;
+   if (p > target_pos) p = target_pos;
+   *curpos = p * sign;
+   // @TODO: testing
+}
+
+float stb_quadratic_controller(float target_pos, float curpos, float maxvel, float maxacc, float dt, float *curvel)
+{
+   return 0; // @TODO
+}
+
+int stb_float_eq(float x, float y, float delta, int max_ulps)
+{
+   if (fabs(x-y) <= delta) return 1;
+   if (abs(*(int *)&x - *(int *)&y) <= max_ulps) return 1;
+   return 0;
+}
+
+int stb_is_prime(unsigned int m)
+{
+   unsigned int i,j;
+   if (m < 2) return 0;
+   if (m == 2) return 1;
+   if (!(m & 1)) return 0;
+   if (m % 3 == 0) return (m == 3);
+   for (i=5; (j=i*i), j <= m && j > i; i += 6) {
+      if (m %   i   == 0) return 0;
+      if (m % (i+2) == 0) return 0;
+   }
+   return 1;
+}
+
+unsigned int stb_power_of_two_nearest_prime(int n)
+{
+   static signed char tab[32] = { 0,0,0,0,1,0,-1,0,1,-1,-1,3,-1,0,-1,2,1,
+                                   0,2,0,-1,-4,-1,5,-1,18,-2,15,2,-1,2,0 };
+   if (!tab[0]) {
+      int i;
+      for (i=0; i < 32; ++i)
+         tab[i] = (1 << i) + 2*tab[i] - 1;
+      tab[1] = 2;
+      tab[0] = 1;
+   }
+   if (n >= 32) return 0xfffffffb;
+   return tab[n];
+}
+
+double stb_linear_remap(double x, double x_min, double x_max,
+                                  double out_min, double out_max)
+{
+   return stb_lerp(stb_unlerp(x,x_min,x_max),out_min,out_max);
+}
+#endif
+
+// create a macro so it's faster, but you can get at the function pointer
+#define stb_linear_remap(t,a,b,c,d)   stb_lerp(stb_unlerp(t,a,b),c,d)
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         bit operations
+//
+
+#define stb_big32(c)    (((c)[0]<<24) + (c)[1]*65536 + (c)[2]*256 + (c)[3])
+#define stb_little32(c) (((c)[3]<<24) + (c)[2]*65536 + (c)[1]*256 + (c)[0])
+#define stb_big16(c)    ((c)[0]*256 + (c)[1])
+#define stb_little16(c) ((c)[1]*256 + (c)[0])
+
+STB_EXTERN          int stb_bitcount(unsigned int a);
+STB_EXTERN unsigned int stb_bitreverse8(unsigned char n);
+STB_EXTERN unsigned int stb_bitreverse(unsigned int n);
+
+STB_EXTERN          int stb_is_pow2(size_t);
+STB_EXTERN          int stb_log2_ceil(size_t);
+STB_EXTERN          int stb_log2_floor(size_t);
+
+STB_EXTERN          int stb_lowbit8(unsigned int n);
+STB_EXTERN          int stb_highbit8(unsigned int n);
+
+#ifdef STB_DEFINE
+int stb_bitcount(unsigned int a)
+{
+   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
+   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
+   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
+   a = (a + (a >> 8)); // max 16 per 8 bits
+   a = (a + (a >> 16)); // max 32 per 8 bits
+   return a & 0xff;
+}
+
+unsigned int stb_bitreverse8(unsigned char n)
+{
+   n = ((n & 0xAA) >> 1) + ((n & 0x55) << 1);
+   n = ((n & 0xCC) >> 2) + ((n & 0x33) << 2);
+   return (unsigned char) ((n >> 4) + (n << 4));
+}
+
+unsigned int stb_bitreverse(unsigned int n)
+{
+  n = ((n & 0xAAAAAAAA) >>  1) | ((n & 0x55555555) << 1);
+  n = ((n & 0xCCCCCCCC) >>  2) | ((n & 0x33333333) << 2);
+  n = ((n & 0xF0F0F0F0) >>  4) | ((n & 0x0F0F0F0F) << 4);
+  n = ((n & 0xFF00FF00) >>  8) | ((n & 0x00FF00FF) << 8);
+  return (n >> 16) | (n << 16);
+}
+
+int stb_is_pow2(size_t n)
+{
+   return (n & (n-1)) == 0;
+}
+
+// tricky use of 4-bit table to identify 5 bit positions (note the '-1')
+// 3-bit table would require another tree level; 5-bit table wouldn't save one
+#if defined(_WIN32) && !defined(__MINGW32__)
+#pragma warning(push)
+#pragma warning(disable: 4035)  // disable warning about no return value
+int stb_log2_floor(size_t n)
+{
+   #if _MSC_VER > 1700
+   unsigned long i;
+   #ifdef STB_PTR64
+   _BitScanReverse64(&i, n);
+   #else
+   _BitScanReverse(&i, n);
+   #endif
+   return i != 0 ? i : -1;
+   #else
+   __asm {
+      bsr eax,n
+      jnz done
+      mov eax,-1
+   }
+   done:;
+   #endif
+}
+#pragma warning(pop)
+#else
+int stb_log2_floor(size_t n)
+{
+   static signed char log2_4[16] = { -1,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3 };
+
+#ifdef STB_PTR64
+   if (n >= ((size_t) 1u << 32))
+        return stb_log2_floor(n >> 32);
+#endif
+
+   // 2 compares if n < 16, 3 compares otherwise
+   if (n < (1U << 14))
+        if (n < (1U <<  4))        return     0 + log2_4[n      ];
+        else if (n < (1U <<  9))      return  5 + log2_4[n >>  5];
+             else                     return 10 + log2_4[n >> 10];
+   else if (n < (1U << 24))
+             if (n < (1U << 19))      return 15 + log2_4[n >> 15];
+             else                     return 20 + log2_4[n >> 20];
+        else if (n < (1U << 29))      return 25 + log2_4[n >> 25];
+             else                     return 30 + log2_4[n >> 30];
+}
+#endif
+
+// define ceil from floor
+int stb_log2_ceil(size_t n)
+{
+   if (stb_is_pow2(n))  return     stb_log2_floor(n);
+   else                 return 1 + stb_log2_floor(n);
+}
+
+int stb_highbit8(unsigned int n)
+{
+   return stb_log2_ceil(n&255);
+}
+
+int stb_lowbit8(unsigned int n)
+{
+   static signed char lowbit4[16] = { -1,0,1,0, 2,0,1,0, 3,0,1,0, 2,0,1,0 };
+   int k = lowbit4[n & 15];
+   if (k >= 0) return k;
+   k = lowbit4[(n >> 4) & 15];
+   if (k >= 0) return k+4;
+   return k;
+}
+#endif
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                            qsort Compare Routines
+//
+
+#ifdef _WIN32
+   #define stb_stricmp(a,b) stb_p_stricmp(a,b)
+   #define stb_strnicmp(a,b,n) stb_p_strnicmp(a,b,n)
+#else
+   #define stb_stricmp(a,b) strcasecmp(a,b)
+   #define stb_strnicmp(a,b,n) strncasecmp(a,b,n)
+#endif
+
+
+STB_EXTERN int (*stb_intcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_intcmprev(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_qsort_strcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_qsort_stricmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_floatcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_doublecmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_charcmp(int offset))(const void *a, const void *b);
+
+#ifdef STB_DEFINE
+static int stb__intcmpoffset, stb__ucharcmpoffset, stb__strcmpoffset;
+static int stb__floatcmpoffset, stb__doublecmpoffset;
+static int stb__memcmpoffset, stb__memcmpsize;
+
+int stb__intcmp(const void *a, const void *b)
+{
+   const int p = *(const int *) ((const char *) a + stb__intcmpoffset);
+   const int q = *(const int *) ((const char *) b + stb__intcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__intcmprev(const void *a, const void *b)
+{
+   const int p = *(const int *) ((const char *) a + stb__intcmpoffset);
+   const int q = *(const int *) ((const char *) b + stb__intcmpoffset);
+   return q < p ? -1 : q > p;
+}
+
+int stb__ucharcmp(const void *a, const void *b)
+{
+   const int p = *(const unsigned char *) ((const char *) a + stb__ucharcmpoffset);
+   const int q = *(const unsigned char *) ((const char *) b + stb__ucharcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__floatcmp(const void *a, const void *b)
+{
+   const float p = *(const float *) ((const char *) a + stb__floatcmpoffset);
+   const float q = *(const float *) ((const char *) b + stb__floatcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__doublecmp(const void *a, const void *b)
+{
+   const double p = *(const double *) ((const char *) a + stb__doublecmpoffset);
+   const double q = *(const double *) ((const char *) b + stb__doublecmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__qsort_strcmp(const void *a, const void *b)
+{
+   const char *p = *(const char **) ((const char *) a + stb__strcmpoffset);
+   const char *q = *(const char **) ((const char *) b + stb__strcmpoffset);
+   return strcmp(p,q);
+}
+
+int stb__qsort_stricmp(const void *a, const void *b)
+{
+   const char *p = *(const char **) ((const char *) a + stb__strcmpoffset);
+   const char *q = *(const char **) ((const char *) b + stb__strcmpoffset);
+   return stb_stricmp(p,q);
+}
+
+int stb__memcmp(const void *a, const void *b)
+{
+   return memcmp((char *) a + stb__memcmpoffset, (char *) b + stb__memcmpoffset, stb__memcmpsize);
+}
+
+int (*stb_intcmp(int offset))(const void *, const void *)
+{
+   stb__intcmpoffset = offset;
+   return &stb__intcmp;
+}
+
+int (*stb_intcmprev(int offset))(const void *, const void *)
+{
+   stb__intcmpoffset = offset;
+   return &stb__intcmprev;
+}
+
+int (*stb_ucharcmp(int offset))(const void *, const void *)
+{
+   stb__ucharcmpoffset = offset;
+   return &stb__ucharcmp;
+}
+
+int (*stb_qsort_strcmp(int offset))(const void *, const void *)
+{
+   stb__strcmpoffset = offset;
+   return &stb__qsort_strcmp;
+}
+
+int (*stb_qsort_stricmp(int offset))(const void *, const void *)
+{
+   stb__strcmpoffset = offset;
+   return &stb__qsort_stricmp;
+}
+
+int (*stb_floatcmp(int offset))(const void *, const void *)
+{
+   stb__floatcmpoffset = offset;
+   return &stb__floatcmp;
+}
+
+int (*stb_doublecmp(int offset))(const void *, const void *)
+{
+   stb__doublecmpoffset = offset;
+   return &stb__doublecmp;
+}
+
+int (*stb_memcmp(int offset, int size))(const void *, const void *)
+{
+   stb__memcmpoffset = offset;
+   stb__memcmpsize = size;
+   return &stb__memcmp;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                           Binary Search Toolkit
+//
+
+typedef struct
+{
+   int minval, maxval, guess;
+   int mode, step;
+} stb_search;
+
+STB_EXTERN int stb_search_binary(stb_search *s, int minv, int maxv, int find_smallest);
+STB_EXTERN int stb_search_open(stb_search *s, int minv, int find_smallest);
+STB_EXTERN int stb_probe(stb_search *s, int compare, int *result); // return 0 when done
+
+#ifdef STB_DEFINE
+enum
+{
+   STB_probe_binary_smallest,
+   STB_probe_binary_largest,
+   STB_probe_open_smallest,
+   STB_probe_open_largest,
+};
+
+static int stb_probe_guess(stb_search *s, int *result)
+{
+   switch(s->mode) {
+      case STB_probe_binary_largest:
+         if (s->minval == s->maxval) {
+            *result = s->minval;
+            return 0;
+         }
+         assert(s->minval < s->maxval);
+         // if a < b, then a < p <= b
+         s->guess = s->minval + (((unsigned) s->maxval - s->minval + 1) >> 1);
+         break;
+
+      case STB_probe_binary_smallest:
+         if (s->minval == s->maxval) {
+            *result = s->minval;
+            return 0;
+         }
+         assert(s->minval < s->maxval);
+         // if a < b, then a <= p < b
+         s->guess = s->minval + (((unsigned) s->maxval - s->minval) >> 1);
+         break;
+      case STB_probe_open_smallest:
+      case STB_probe_open_largest:
+         s->guess = s->maxval;  // guess the current maxval
+         break;
+   }
+   *result = s->guess;
+   return 1;
+}
+
+int stb_probe(stb_search *s, int compare, int *result)
+{
+   switch(s->mode) {
+      case STB_probe_open_smallest:
+      case STB_probe_open_largest: {
+         if (compare <= 0) {
+            // then it lies within minval & maxval
+            if (s->mode == STB_probe_open_smallest)
+               s->mode = STB_probe_binary_smallest;
+            else
+               s->mode = STB_probe_binary_largest;
+         } else {
+            // otherwise, we need to probe larger
+            s->minval  = s->maxval + 1;
+            s->maxval  = s->minval + s->step;
+            s->step   += s->step;
+         }
+         break;
+      }
+      case STB_probe_binary_smallest: {
+         // if compare < 0, then s->minval <= a <  p
+         // if compare = 0, then s->minval <= a <= p
+         // if compare > 0, then         p <  a <= s->maxval
+         if (compare <= 0)
+            s->maxval = s->guess;
+         else
+            s->minval = s->guess+1;
+         break;
+      }
+      case STB_probe_binary_largest: {
+         // if compare < 0, then s->minval <= a < p
+         // if compare = 0, then         p <= a <= s->maxval
+         // if compare > 0, then         p <  a <= s->maxval
+         if (compare < 0)
+            s->maxval = s->guess-1;
+         else
+            s->minval = s->guess;
+         break;
+      }
+   }
+   return stb_probe_guess(s, result);
+}
+
+int stb_search_binary(stb_search *s, int minv, int maxv, int find_smallest)
+{
+   int r;
+   if (maxv < minv) return minv-1;
+   s->minval = minv;
+   s->maxval = maxv;
+   s->mode = find_smallest ? STB_probe_binary_smallest : STB_probe_binary_largest;
+   stb_probe_guess(s, &r);
+   return r;
+}
+
+int stb_search_open(stb_search *s, int minv, int find_smallest)
+{
+   int r;
+   s->step   = 4;
+   s->minval = minv;
+   s->maxval = minv+s->step;
+   s->mode = find_smallest ? STB_probe_open_smallest : STB_probe_open_largest;
+   stb_probe_guess(s, &r);
+   return r;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                           String Processing
+//
+
+#define stb_prefixi(s,t)  (0==stb_strnicmp((s),(t),strlen(t)))
+
+enum stb_splitpath_flag
+{
+   STB_PATH = 1,
+   STB_FILE = 2,
+   STB_EXT  = 4,
+   STB_PATH_FILE = STB_PATH + STB_FILE,
+   STB_FILE_EXT  = STB_FILE + STB_EXT,
+   STB_EXT_NO_PERIOD = 8,
+};
+
+STB_EXTERN char * stb_skipwhite(char *s);
+STB_EXTERN char * stb_trimwhite(char *s);
+STB_EXTERN char * stb_skipnewline(char *s);
+STB_EXTERN char * stb_strncpy(char *s, char *t, int n);
+STB_EXTERN char * stb_substr(char *t, int n);
+STB_EXTERN char * stb_duplower(char *s);
+STB_EXTERN void   stb_tolower (char *s);
+STB_EXTERN char * stb_strchr2 (char *s, char p1, char p2);
+STB_EXTERN char * stb_strrchr2(char *s, char p1, char p2);
+STB_EXTERN char * stb_strtok(char *output, char *src, char *delimit);
+STB_EXTERN char * stb_strtok_keep(char *output, char *src, char *delimit);
+STB_EXTERN char * stb_strtok_invert(char *output, char *src, char *allowed);
+STB_EXTERN char * stb_dupreplace(char *s, char *find, char *replace);
+STB_EXTERN void   stb_replaceinplace(char *s, char *find, char *replace);
+STB_EXTERN char * stb_splitpath(char *output, char *src, int flag);
+STB_EXTERN char * stb_splitpathdup(char *src, int flag);
+STB_EXTERN char * stb_replacedir(char *output, char *src, char *dir);
+STB_EXTERN char * stb_replaceext(char *output, char *src, char *ext);
+STB_EXTERN void   stb_fixpath(char *path);
+STB_EXTERN char * stb_shorten_path_readable(char *path, int max_len);
+STB_EXTERN int    stb_suffix (char *s, char *t);
+STB_EXTERN int    stb_suffixi(char *s, char *t);
+STB_EXTERN int    stb_prefix (char *s, char *t);
+STB_EXTERN char * stb_strichr(char *s, char t);
+STB_EXTERN char * stb_stristr(char *s, char *t);
+STB_EXTERN int    stb_prefix_count(char *s, char *t);
+STB_EXTERN const char * stb_plural(int n);  // "s" or ""
+STB_EXTERN size_t stb_strscpy(char *d, const char *s, size_t n);
+
+STB_EXTERN char **stb_tokens(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_nested(char *src, char *delimit, int *count, char *nest_in, char *nest_out);
+STB_EXTERN char **stb_tokens_nested_empty(char *src, char *delimit, int *count, char *nest_in, char *nest_out);
+STB_EXTERN char **stb_tokens_allowempty(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_stripwhite(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_withdelim(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_quoted(char *src, char *delimit, int *count);
+// with 'quoted', allow delimiters to appear inside quotation marks, and don't
+// strip whitespace inside them (and we delete the quotation marks unless they
+// appear back to back, in which case they're considered escaped)
+
+#ifdef STB_DEFINE
+
+size_t stb_strscpy(char *d, const char *s, size_t n)
+{
+   size_t len = strlen(s);
+   if (len >= n) {
+      if (n) d[0] = 0;
+      return 0;
+   }
+   stb_p_strcpy_s(d,n,s);
+   return len;
+}
+
+const char *stb_plural(int n)
+{
+   return n == 1 ? "" : "s";
+}
+
+int stb_prefix(char *s, char *t)
+{
+   while (*t)
+      if (*s++ != *t++)
+         return STB_FALSE;
+   return STB_TRUE;
+}
+
+int stb_prefix_count(char *s, char *t)
+{
+   int c=0;
+   while (*t) {
+      if (*s++ != *t++)
+         break;
+      ++c;
+   }
+   return c;
+}
+
+int stb_suffix(char *s, char *t)
+{
+   size_t n = strlen(s);
+   size_t m = strlen(t);
+   if (m <= n)
+      return 0 == strcmp(s+n-m, t);
+   else
+      return 0;
+}
+
+int stb_suffixi(char *s, char *t)
+{
+   size_t n = strlen(s);
+   size_t m = strlen(t);
+   if (m <= n)
+      return 0 == stb_stricmp(s+n-m, t);
+   else
+      return 0;
+}
+
+// originally I was using this table so that I could create known sentinel
+// values--e.g. change whitetable[0] to be true if I was scanning for whitespace,
+// and false if I was scanning for nonwhite. I don't appear to be using that
+// functionality anymore (I do for tokentable, though), so just replace it
+// with isspace()
+char *stb_skipwhite(char *s)
+{
+   while (isspace((unsigned char) *s)) ++s;
+   return s;
+}
+
+char *stb_skipnewline(char *s)
+{
+   if (s[0] == '\r' || s[0] == '\n') {
+      if (s[0]+s[1] == '\r' + '\n') ++s;
+      ++s;
+   }
+   return s;
+}
+
+char *stb_trimwhite(char *s)
+{
+   int i,n;
+   s = stb_skipwhite(s);
+   n = (int) strlen(s);
+   for (i=n-1; i >= 0; --i)
+      if (!isspace(s[i]))
+         break;
+   s[i+1] = 0;
+   return s;
+}
+
+char *stb_strncpy(char *s, char *t, int n)
+{
+   stb_p_strncpy_s(s,n+1,t,n);
+   s[n] = 0;
+   return s;
+}
+
+char *stb_substr(char *t, int n)
+{
+   char *a;
+   int z = (int) strlen(t);
+   if (z < n) n = z;
+   a = (char *) malloc(n+1);
+   stb_p_strncpy_s(a,n+1,t,n);
+   a[n] = 0;
+   return a;
+}
+
+char *stb_duplower(char *s)
+{
+   char *p = stb_p_strdup(s), *q = p;
+   while (*q) {
+      *q = tolower(*q);
+      ++q;
+   }
+   return p;
+}
+
+void stb_tolower(char *s)
+{
+   while (*s) {
+      *s = tolower(*s);
+      ++s;
+   }
+}
+
+char *stb_strchr2(char *s, char x, char y)
+{
+   for(; *s; ++s)
+      if (*s == x || *s == y)
+         return s;
+   return NULL;
+}
+
+char *stb_strrchr2(char *s, char x, char y)
+{
+   char *r = NULL;
+   for(; *s; ++s)
+      if (*s == x || *s == y)
+         r = s;
+   return r;
+}
+
+char *stb_strichr(char *s, char t)
+{
+   if (tolower(t) == toupper(t))
+      return strchr(s,t);
+   return stb_strchr2(s, (char) tolower(t), (char) toupper(t));
+}
+
+char *stb_stristr(char *s, char *t)
+{
+   size_t n = strlen(t);
+   char *z;
+   if (n==0) return s;
+   while ((z = stb_strichr(s, *t)) != NULL) {
+      if (0==stb_strnicmp(z, t, n))
+         return z;
+      s = z+1;
+   }
+   return NULL;
+}
+
+static char *stb_strtok_raw(char *output, char *src, char *delimit, int keep, int invert)
+{
+   if (invert) {
+      while (*src && strchr(delimit, *src) != NULL) {
+         *output++ = *src++;
+      }
+   } else {
+      while (*src && strchr(delimit, *src) == NULL) {
+         *output++ = *src++;
+      }
+   }
+   *output = 0;
+   if (keep)
+      return src;
+   else
+      return *src ? src+1 : src;
+}
+
+char *stb_strtok(char *output, char *src, char *delimit)
+{
+   return stb_strtok_raw(output, src, delimit, 0, 0);
+}
+
+char *stb_strtok_keep(char *output, char *src, char *delimit)
+{
+   return stb_strtok_raw(output, src, delimit, 1, 0);
+}
+
+char *stb_strtok_invert(char *output, char *src, char *delimit)
+{
+   return stb_strtok_raw(output, src, delimit, 1,1);
+}
+
+static char **stb_tokens_raw(char *src_, char *delimit, int *count,
+                             int stripwhite, int allow_empty, char *start, char *end)
+{
+   int nested = 0;
+   unsigned char *src = (unsigned char *) src_;
+   static char stb_tokentable[256]; // rely on static initializion to 0
+   static char stable[256],etable[256];
+   char *out;
+   char **result;
+   int num=0;
+   unsigned char *s;
+
+   s = (unsigned char *) delimit; while (*s) stb_tokentable[*s++] = 1;
+   if (start) {
+      s = (unsigned char *) start;         while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) etable[*s++] = 1;
+   }
+   stable[0] = 1;
+
+   // two passes through: the first time, counting how many
+   s = (unsigned char *) src;
+   while (*s) {
+      // state: just found delimiter
+      // skip further delimiters
+      if (!allow_empty) {
+         stb_tokentable[0] = 0;
+         while (stb_tokentable[*s])
+            ++s;
+         if (!*s) break;
+      }
+      ++num;
+      // skip further non-delimiters
+      stb_tokentable[0] = 1;
+      if (stripwhite == 2) { // quoted strings
+         while (!stb_tokentable[*s]) {
+            if (*s != '"')
+               ++s;
+            else {
+               ++s;
+               if (*s == '"')
+                  ++s;   // "" -> ", not start a string
+               else {
+                  // begin a string
+                  while (*s) {
+                     if (s[0] == '"') {
+                        if (s[1] == '"') s += 2; // "" -> "
+                        else { ++s; break; } // terminating "
+                     } else
+                        ++s;
+                  }
+               }
+            }
+         }
+      } else
+         while (nested || !stb_tokentable[*s]) {
+            if (stable[*s]) {
+               if (!*s) break;
+               if (end ? etable[*s] : nested)
+                  --nested;
+               else
+                  ++nested;
+            }
+            ++s;
+         }
+      if (allow_empty) {
+         if (*s) ++s;
+      }
+   }
+   // now num has the actual count... malloc our output structure
+   // need space for all the strings: strings won't be any longer than
+   // original input, since for every '\0' there's at least one delimiter
+   result = (char **) malloc(sizeof(*result) * (num+1) + (s-src+1));
+   if (result == NULL) return result;
+   out = (char *) (result + (num+1));
+   // second pass: copy out the data
+   s = (unsigned char *) src;
+   num = 0;
+   nested = 0;
+   while (*s) {
+      char *last_nonwhite;
+      // state: just found delimiter
+      // skip further delimiters
+      if (!allow_empty) {
+         stb_tokentable[0] = 0;
+         if (stripwhite)
+            while (stb_tokentable[*s] || isspace(*s))
+               ++s;
+         else
+            while (stb_tokentable[*s])
+               ++s;
+      } else if (stripwhite) {
+         while (isspace(*s)) ++s;
+      }
+      if (!*s) break;
+      // we're past any leading delimiters and whitespace
+      result[num] = out;
+      ++num;
+      // copy non-delimiters
+      stb_tokentable[0] = 1;
+      last_nonwhite = out-1;
+      if (stripwhite == 2) {
+         while (!stb_tokentable[*s]) {
+            if (*s != '"') {
+               if (!isspace(*s)) last_nonwhite = out;
+               *out++ = *s++;
+            } else {
+               ++s;
+               if (*s == '"') {
+                  if (!isspace(*s)) last_nonwhite = out;
+                  *out++ = *s++; // "" -> ", not start string
+               } else {
+                  // begin a quoted string
+                  while (*s) {
+                     if (s[0] == '"') {
+                        if (s[1] == '"') { *out++ = *s; s += 2; }
+                        else { ++s; break; } // terminating "
+                     } else
+                        *out++ = *s++;
+                  }
+                  last_nonwhite = out-1; // all in quotes counts as non-white
+               }
+            }
+         }
+      } else {
+         while (nested || !stb_tokentable[*s]) {
+            if (!isspace(*s)) last_nonwhite = out;
+            if (stable[*s]) {
+               if (!*s) break;
+               if (end ? etable[*s] : nested)
+                  --nested;
+               else
+                  ++nested;
+            }
+            *out++ = *s++;
+         }
+      }
+
+      if (stripwhite) // rewind to last non-whitespace char
+         out = last_nonwhite+1;
+      *out++ = '\0';
+
+      if (*s) ++s; // skip delimiter
+   }
+   s = (unsigned char *) delimit; while (*s) stb_tokentable[*s++] = 0;
+   if (start) {
+      s = (unsigned char *) start;         while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) etable[*s++] = 1;
+   }
+   if (count != NULL) *count = num;
+   result[num] = 0;
+   return result;
+}
+
+char **stb_tokens(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,0,0,0,0);
+}
+
+char **stb_tokens_nested(char *src, char *delimit, int *count, char *nest_in, char *nest_out)
+{
+   return stb_tokens_raw(src,delimit,count,0,0,nest_in,nest_out);
+}
+
+char **stb_tokens_nested_empty(char *src, char *delimit, int *count, char *nest_in, char *nest_out)
+{
+   return stb_tokens_raw(src,delimit,count,0,1,nest_in,nest_out);
+}
+
+char **stb_tokens_allowempty(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,0,1,0,0);
+}
+
+char **stb_tokens_stripwhite(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,1,1,0,0);
+}
+
+char **stb_tokens_quoted(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,2,1,0,0);
+}
+
+char *stb_dupreplace(char *src, char *find, char *replace)
+{
+   size_t len_find = strlen(find);
+   size_t len_replace = strlen(replace);
+   int count = 0;
+
+   char *s,*p,*q;
+
+   s = strstr(src, find);
+   if (s == NULL) return stb_p_strdup(src);
+   do {
+      ++count;
+      s = strstr(s + len_find, find);
+   } while (s != NULL);
+
+   p = (char *)  malloc(strlen(src) + count * (len_replace - len_find) + 1);
+   if (p == NULL) return p;
+   q = p;
+   s = src;
+   for (;;) {
+      char *t = strstr(s, find);
+      if (t == NULL) {
+         stb_p_strcpy_s(q,strlen(src)+count*(len_replace-len_find)+1,s);
+         assert(strlen(p) == strlen(src) + count*(len_replace-len_find));
+         return p;
+      }
+      memcpy(q, s, t-s);
+      q += t-s;
+      memcpy(q, replace, len_replace);
+      q += len_replace;
+      s = t + len_find;
+   }
+}
+
+void stb_replaceinplace(char *src, char *find, char *replace)
+{
+   size_t len_find = strlen(find);
+   size_t len_replace = strlen(replace);
+   int delta;
+
+   char *s,*p,*q;
+
+   delta = (int) (len_replace - len_find);
+   assert(delta <= 0);
+   if (delta > 0) return;
+
+   p = strstr(src, find);
+   if (p == NULL) return;
+
+   s = q = p;
+   while (*s) {
+      memcpy(q, replace, len_replace);
+      p += len_find;
+      q += len_replace;
+      s = strstr(p, find);
+      if (s == NULL) s = p + strlen(p);
+      memmove(q, p, s-p);
+      q += s-p;
+      p = s;
+   }
+   *q = 0;
+}
+
+void stb_fixpath(char *path)
+{
+   for(; *path; ++path)
+      if (*path == '\\')
+         *path = '/';
+}
+
+void stb__add_section(char *buffer, char *data, ptrdiff_t curlen, ptrdiff_t newlen)
+{
+   if (newlen < curlen) {
+      ptrdiff_t z1 = newlen >> 1, z2 = newlen-z1;
+      memcpy(buffer, data, z1-1);
+      buffer[z1-1] = '.';
+      buffer[z1-0] = '.';
+      memcpy(buffer+z1+1, data+curlen-z2+1, z2-1);
+   } else
+      memcpy(buffer, data, curlen);
+}
+
+char * stb_shorten_path_readable(char *path, int len)
+{
+   static char buffer[1024];
+   ptrdiff_t n = strlen(path),n1,n2,r1,r2;
+   char *s;
+   if (n <= len) return path;
+   if (len > 1024) return path;
+   s = stb_strrchr2(path, '/', '\\');
+   if (s) {
+      n1 = s - path + 1;
+      n2 = n - n1;
+      ++s;
+   } else {
+      n1 = 0;
+      n2 = n;
+      s = path;
+   }
+   // now we need to reduce r1 and r2 so that they fit in len
+   if (n1 < len>>1) {
+      r1 = n1;
+      r2 = len - r1;
+   } else if (n2 < len >> 1) {
+      r2 = n2;
+      r1 = len - r2;
+   } else {
+      r1 = n1 * len / n;
+      r2 = n2 * len / n;
+      if (r1 < len>>2) r1 = len>>2, r2 = len-r1;
+      if (r2 < len>>2) r2 = len>>2, r1 = len-r2;
+   }
+   assert(r1 <= n1 && r2 <= n2);
+   if (n1)
+      stb__add_section(buffer, path, n1, r1);
+   stb__add_section(buffer+r1, s, n2, r2);
+   buffer[len] = 0;
+   return buffer;
+}
+
+static char *stb__splitpath_raw(char *buffer, char *path, int flag)
+{
+   ptrdiff_t len=0,x,y, n = (int) strlen(path), f1,f2;
+   char *s = stb_strrchr2(path, '/', '\\');
+   char *t = strrchr(path, '.');
+   if (s && t && t < s) t = NULL;
+
+   if (!s) {
+      // check for drive
+      if (isalpha(path[0]) && path[1] == ':')
+         s = &path[1];
+   }
+   if (s) ++s;
+
+   if (flag == STB_EXT_NO_PERIOD)
+      flag |= STB_EXT;
+
+   if (!(flag & (STB_PATH | STB_FILE | STB_EXT))) return NULL;
+
+   f1 = s == NULL ? 0 : s-path; // start of filename
+   f2 = t == NULL ? n : t-path; // just past end of filename
+
+   if (flag & STB_PATH) {
+      x = 0; if (f1 == 0 && flag == STB_PATH) len=2;
+   } else if (flag & STB_FILE) {
+      x = f1;
+   } else {
+      x = f2;
+      if (flag & STB_EXT_NO_PERIOD)
+         if (path[x] == '.')
+            ++x;
+   }
+
+   if (flag & STB_EXT)
+      y = n;
+   else if (flag & STB_FILE)
+      y = f2;
+   else
+      y = f1;
+
+   if (buffer == NULL) {
+      buffer = (char *) malloc(y-x + len + 1);
+      if (!buffer) return NULL;
+   }
+
+   if (len) { stb_p_strcpy_s(buffer, 3, "./"); return buffer; }
+   stb_strncpy(buffer, path+(int)x, (int)(y-x));
+   return buffer;
+}
+
+char *stb_splitpath(char *output, char *src, int flag)
+{
+   return stb__splitpath_raw(output, src, flag);
+}
+
+char *stb_splitpathdup(char *src, int flag)
+{
+   return stb__splitpath_raw(NULL, src, flag);
+}
+
+char *stb_replacedir(char *output, char *src, char *dir)
+{
+   char buffer[4096];
+   stb_splitpath(buffer, src, STB_FILE | STB_EXT);
+   if (dir)
+      stb_p_sprintf(output stb_p_size(9999), "%s/%s", dir, buffer);
+   else
+      stb_p_strcpy_s(output, sizeof(buffer),  buffer); // @UNSAFE
+   return output;
+}
+
+char *stb_replaceext(char *output, char *src, char *ext)
+{
+   char buffer[4096];
+   stb_splitpath(buffer, src, STB_PATH | STB_FILE);
+   if (ext)
+      stb_p_sprintf(output stb_p_size(9999), "%s.%s", buffer, ext[0] == '.' ? ext+1 : ext);
+   else
+      stb_p_strcpy_s(output, sizeof(buffer), buffer); // @UNSAFE
+   return output;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                   stb_alloc - hierarchical allocator
+//
+//                                     inspired by http://swapped.cc/halloc
+//
+//
+// When you alloc a given block through stb_alloc, you have these choices:
+//
+//       1. does it have a parent?
+//       2. can it have children?
+//       3. can it be freed directly?
+//       4. is it transferrable?
+//       5. what is its alignment?
+//
+// Here are interesting combinations of those:
+//
+//                              children   free    transfer     alignment
+//  arena                          Y         Y         N           n/a
+//  no-overhead, chunked           N         N         N         normal
+//  string pool alloc              N         N         N            1
+//  parent-ptr, chunked            Y         N         N         normal
+//  low-overhead, unchunked        N         Y         Y         normal
+//  general purpose alloc          Y         Y         Y         normal
+//
+// Unchunked allocations will probably return 16-aligned pointers. If
+// we 16-align the results, we have room for 4 pointers. For smaller
+// allocations that allow finer alignment, we can reduce the pointers.
+//
+// The strategy is that given a pointer, assuming it has a header (only
+// the no-overhead allocations have no header), we can determine the
+// type of the header fields, and the number of them, by stepping backwards
+// through memory and looking at the tags in the bottom bits.
+//
+// Implementation strategy:
+//     chunked allocations come from the middle of chunks, and can't
+//     be freed. thefore they do not need to be on a sibling chain.
+//     they may need child pointers if they have children.
+//
+// chunked, with-children
+//     void *parent;
+//
+// unchunked, no-children -- reduced storage
+//     void *next_sibling;
+//     void *prev_sibling_nextp;
+//
+// unchunked, general
+//     void *first_child;
+//     void *next_sibling;
+//     void *prev_sibling_nextp;
+//     void *chunks;
+//
+// so, if we code each of these fields with different bit patterns
+// (actually same one for next/prev/child), then we can identify which
+// each one is from the last field.
+
+STB_EXTERN void  stb_free(void *p);
+STB_EXTERN void *stb_malloc_global(size_t size);
+STB_EXTERN void *stb_malloc(void *context, size_t size);
+STB_EXTERN void *stb_malloc_nofree(void *context, size_t size);
+STB_EXTERN void *stb_malloc_leaf(void *context, size_t size);
+STB_EXTERN void *stb_malloc_raw(void *context, size_t size);
+STB_EXTERN void *stb_realloc(void *ptr, size_t newsize);
+
+STB_EXTERN void stb_reassign(void *new_context, void *ptr);
+STB_EXTERN void stb_malloc_validate(void *p, void *parent);
+
+extern int stb_alloc_chunk_size ;
+extern int stb_alloc_count_free ;
+extern int stb_alloc_count_alloc;
+extern int stb_alloc_alignment  ;
+
+#ifdef STB_DEFINE
+
+int stb_alloc_chunk_size  = 65536;
+int stb_alloc_count_free  = 0;
+int stb_alloc_count_alloc = 0;
+int stb_alloc_alignment   = -16;
+
+typedef struct stb__chunk
+{
+   struct stb__chunk *next;
+   int                data_left;
+   int                alloc;
+} stb__chunk;
+
+typedef struct
+{
+   void *  next;
+   void ** prevn;
+} stb__nochildren;
+
+typedef struct
+{
+   void ** prevn;
+   void *  child;
+   void *  next;
+   stb__chunk *chunks;
+} stb__alloc;
+
+typedef struct
+{
+   stb__alloc *parent;
+} stb__chunked;
+
+#define STB__PARENT          1
+#define STB__CHUNKS          2
+
+typedef enum
+{
+   STB__nochildren = 0,
+   STB__chunked    = STB__PARENT,
+   STB__alloc      = STB__CHUNKS,
+
+   STB__chunk_raw  = 4,
+} stb__alloc_type;
+
+// these functions set the bottom bits of a pointer efficiently
+#define STB__DECODE(x,v)  ((void *) ((char *) (x) - (v)))
+#define STB__ENCODE(x,v)  ((void *) ((char *) (x) + (v)))
+
+#define stb__parent(z)       (stb__alloc *) STB__DECODE((z)->parent, STB__PARENT)
+#define stb__chunks(z)       (stb__chunk *) STB__DECODE((z)->chunks, STB__CHUNKS)
+
+#define stb__setparent(z,p)  (z)->parent = (stb__alloc *) STB__ENCODE((p), STB__PARENT)
+#define stb__setchunks(z,c)  (z)->chunks = (stb__chunk *) STB__ENCODE((c), STB__CHUNKS)
+
+static stb__alloc stb__alloc_global =
+{
+   NULL,
+   NULL,
+   NULL,
+   (stb__chunk *) STB__ENCODE(NULL, STB__CHUNKS)
+};
+
+static stb__alloc_type stb__identify(void *p)
+{
+   void **q = (void **) p;
+   return (stb__alloc_type) ((stb_uinta) q[-1] & 3);
+}
+
+static void *** stb__prevn(void *p)
+{
+   if (stb__identify(p) == STB__alloc) {
+      stb__alloc      *s = (stb__alloc *) p - 1;
+      return &s->prevn;
+   } else {
+      stb__nochildren *s = (stb__nochildren *) p - 1;
+      return &s->prevn;
+   }
+}
+
+void stb_free(void *p)
+{
+   if (p == NULL) return;
+
+   // count frees so that unit tests can see what's happening
+   ++stb_alloc_count_free;
+
+   switch(stb__identify(p)) {
+      case STB__chunked:
+         // freeing a chunked-block with children does nothing;
+         // they only get freed when the parent does
+         // surely this is wrong, and it should free them immediately?
+         // otherwise how are they getting put on the right chain?
+         return;
+      case STB__nochildren: {
+         stb__nochildren *s = (stb__nochildren *) p - 1;
+         // unlink from sibling chain
+         *(s->prevn) = s->next;
+         if (s->next)
+            *stb__prevn(s->next) = s->prevn;
+         free(s);
+         return;
+      }
+      case STB__alloc: {
+         stb__alloc *s = (stb__alloc *) p - 1;
+         stb__chunk *c, *n;
+         void *q;
+
+         // unlink from sibling chain, if any
+         *(s->prevn) = s->next;
+         if (s->next)
+            *stb__prevn(s->next) = s->prevn;
+
+         // first free chunks
+         c = (stb__chunk *) stb__chunks(s);
+         while (c != NULL) {
+            n = c->next;
+            stb_alloc_count_free += c->alloc;
+            free(c);
+            c = n;
+         }
+
+         // validating
+         stb__setchunks(s,NULL);
+         s->prevn = NULL;
+         s->next = NULL;
+
+         // now free children
+         while ((q = s->child) != NULL) {
+            stb_free(q);
+         }
+
+         // now free self
+         free(s);
+         return;
+      }
+      default:
+         assert(0); /* NOTREACHED */
+   }
+}
+
+void stb_malloc_validate(void *p, void *parent)
+{
+   if (p == NULL) return;
+
+   switch(stb__identify(p)) {
+      case STB__chunked:
+         return;
+      case STB__nochildren: {
+         stb__nochildren *n = (stb__nochildren *) p - 1;
+         if (n->prevn)
+            assert(*n->prevn == p);
+         if (n->next) {
+            assert(*stb__prevn(n->next) == &n->next);
+            stb_malloc_validate(n, parent);
+         }
+         return;
+      }
+      case STB__alloc: {
+         stb__alloc *s = (stb__alloc *) p - 1;
+
+         if (s->prevn)
+            assert(*s->prevn == p);
+
+         if (s->child) {
+            assert(*stb__prevn(s->child) == &s->child);
+            stb_malloc_validate(s->child, p);
+         }
+
+         if (s->next) {
+            assert(*stb__prevn(s->next) == &s->next);
+            stb_malloc_validate(s->next, parent);
+         }
+         return;
+      }
+      default:
+         assert(0); /* NOTREACHED */
+   }
+}
+
+static void * stb__try_chunk(stb__chunk *c, int size, int align, int pre_align)
+{
+   char *memblock = (char *) (c+1), *q;
+   stb_inta iq;
+   int start_offset;
+
+   // we going to allocate at the end of the chunk, not the start. confusing,
+   // but it means we don't need both a 'limit' and a 'cur', just a 'cur'.
+   // the block ends at: p + c->data_left
+   //   then we move back by size
+   start_offset = c->data_left - size;
+
+   // now we need to check the alignment of that
+   q = memblock + start_offset;
+   iq = (stb_inta) q;
+   assert(sizeof(q) == sizeof(iq));
+
+   // suppose align = 2
+   // then we need to retreat iq far enough that (iq & (2-1)) == 0
+   // to get (iq & (align-1)) = 0 requires subtracting (iq & (align-1))
+
+   start_offset -= iq & (align-1);
+   assert(((stb_uinta) (memblock+start_offset) & (align-1)) == 0);
+
+   // now, if that + pre_align works, go for it!
+   start_offset -= pre_align;
+
+   if (start_offset >= 0) {
+      c->data_left = start_offset;
+      return memblock + start_offset;
+   }
+
+   return NULL;
+}
+
+static void stb__sort_chunks(stb__alloc *src)
+{
+   // of the first two chunks, put the chunk with more data left in it first
+   stb__chunk *c = stb__chunks(src), *d;
+   if (c == NULL) return;
+   d = c->next;
+   if (d == NULL) return;
+   if (c->data_left > d->data_left) return;
+
+   c->next = d->next;
+   d->next = c;
+   stb__setchunks(src, d);
+}
+
+static void * stb__alloc_chunk(stb__alloc *src, int size, int align, int pre_align)
+{
+   void *p;
+   stb__chunk *c = stb__chunks(src);
+
+   if (c && size <= stb_alloc_chunk_size) {
+
+      p = stb__try_chunk(c, size, align, pre_align);
+      if (p) { ++c->alloc; return p; }
+
+      // try a second chunk to reduce wastage
+      if (c->next) {
+         p = stb__try_chunk(c->next, size, align, pre_align);
+         if (p) { ++c->alloc; return p; }
+
+         // put the bigger chunk first, since the second will get buried
+         // the upshot of this is that, until it gets allocated from, chunk #2
+         // is always the largest remaining chunk. (could formalize
+         // this with a heap!)
+         stb__sort_chunks(src);
+         c = stb__chunks(src);
+      }
+   }
+
+   // allocate a new chunk
+   {
+      stb__chunk *n;
+
+      int chunk_size = stb_alloc_chunk_size;
+      // we're going to allocate a new chunk to put this in
+      if (size > chunk_size)
+         chunk_size = size;
+
+      assert(sizeof(*n) + pre_align <= 16);
+
+      // loop trying to allocate a large enough chunk
+      // the loop is because the alignment may cause problems if it's big...
+      // and we don't know what our chunk alignment is going to be
+      while (1) {
+         n = (stb__chunk *) malloc(16 + chunk_size);
+         if (n == NULL) return NULL;
+
+         n->data_left = chunk_size - sizeof(*n);
+
+         p = stb__try_chunk(n, size, align, pre_align);
+         if (p != NULL) {
+            n->next = c;
+            stb__setchunks(src, n);
+
+            // if we just used up the whole block immediately,
+            // move the following chunk up
+            n->alloc = 1;
+            if (size == chunk_size)
+               stb__sort_chunks(src);
+
+            return p;
+         }
+
+         free(n);
+         chunk_size += 16+align;
+      }
+   }
+}
+
+static stb__alloc * stb__get_context(void *context)
+{
+   if (context == NULL) {
+      return &stb__alloc_global;
+   } else {
+      int u = stb__identify(context);
+      // if context is chunked, grab parent
+      if (u == STB__chunked) {
+         stb__chunked *s = (stb__chunked *) context - 1;
+         return stb__parent(s);
+      } else {
+         return (stb__alloc *) context - 1;
+      }
+   }
+}
+
+static void stb__insert_alloc(stb__alloc *src, stb__alloc *s)
+{
+   s->prevn = &src->child;
+   s->next  = src->child;
+   src->child = s+1;
+   if (s->next)
+      *stb__prevn(s->next) = &s->next;
+}
+
+static void stb__insert_nochild(stb__alloc *src, stb__nochildren *s)
+{
+   s->prevn = &src->child;
+   s->next  = src->child;
+   src->child = s+1;
+   if (s->next)
+      *stb__prevn(s->next) = &s->next;
+}
+
+static void * malloc_base(void *context, size_t size, stb__alloc_type t, int align)
+{
+   void *p;
+
+   stb__alloc *src = stb__get_context(context);
+
+   if (align <= 0) {
+      // compute worst-case C packed alignment
+      // e.g. a 24-byte struct is 8-aligned
+      int align_proposed = 1 << stb_lowbit8((unsigned int) size);
+
+      if (align_proposed < 0)
+         align_proposed = 4;
+
+      if (align_proposed == 0) {
+         if (size == 0)
+            align_proposed = 1;
+         else
+            align_proposed = 256;
+      }
+
+      // a negative alignment means 'don't align any larger
+      // than this'; so -16 means we align 1,2,4,8, or 16
+
+      if (align < 0) {
+         if (align_proposed > -align)
+            align_proposed = -align;
+      }
+
+      align = align_proposed;
+   }
+
+   assert(stb_is_pow2(align));
+
+   // don't cause misalignment when allocating nochildren
+   if (t == STB__nochildren && align > 8)
+      t = STB__alloc;
+
+   switch (t) {
+      case STB__alloc: {
+         stb__alloc *s = (stb__alloc *) malloc(size + sizeof(*s));
+         if (s == NULL) return NULL;
+         p = s+1;
+         s->child = NULL;
+         stb__insert_alloc(src, s);
+
+         stb__setchunks(s,NULL);
+         break;
+      }
+
+      case STB__nochildren: {
+         stb__nochildren *s = (stb__nochildren *) malloc(size + sizeof(*s));
+         if (s == NULL) return NULL;
+         p = s+1;
+         stb__insert_nochild(src, s);
+         break;
+      }
+
+      case STB__chunk_raw: {
+         p = stb__alloc_chunk(src, (int) size, align, 0);
+         if (p == NULL) return NULL;
+         break;
+      }
+
+      case STB__chunked: {
+         stb__chunked *s;
+         if (align < sizeof(stb_uintptr)) align = sizeof(stb_uintptr);
+         s = (stb__chunked *) stb__alloc_chunk(src, (int) size, align, sizeof(*s));
+         if (s == NULL) return NULL;
+         stb__setparent(s, src);
+         p = s+1;
+         break;
+      }
+
+      default: p = NULL; assert(0); /* NOTREACHED */
+   }
+
+   ++stb_alloc_count_alloc;
+   return p;
+}
+
+void *stb_malloc_global(size_t size)
+{
+   return malloc_base(NULL, size, STB__alloc, stb_alloc_alignment);
+}
+
+void *stb_malloc(void *context, size_t size)
+{
+   return malloc_base(context, size, STB__alloc, stb_alloc_alignment);
+}
+
+void *stb_malloc_nofree(void *context, size_t size)
+{
+   return malloc_base(context, size, STB__chunked, stb_alloc_alignment);
+}
+
+void *stb_malloc_leaf(void *context, size_t size)
+{
+   return malloc_base(context, size, STB__nochildren, stb_alloc_alignment);
+}
+
+void *stb_malloc_raw(void *context, size_t size)
+{
+   return malloc_base(context, size, STB__chunk_raw, stb_alloc_alignment);
+}
+
+char *stb_malloc_string(void *context, size_t size)
+{
+   return (char *) malloc_base(context, size, STB__chunk_raw, 1);
+}
+
+void *stb_realloc(void *ptr, size_t newsize)
+{
+   stb__alloc_type t;
+
+   if (ptr == NULL) return stb_malloc(NULL, newsize);
+   if (newsize == 0) { stb_free(ptr); return NULL; }
+
+   t = stb__identify(ptr);
+   assert(t == STB__alloc || t == STB__nochildren);
+
+   if (t == STB__alloc) {
+      stb__alloc *s = (stb__alloc *) ptr - 1;
+
+      s = (stb__alloc *) realloc(s, newsize + sizeof(*s));
+      if (s == NULL) return NULL;
+
+      ptr = s+1;
+
+      // update pointers
+      (*s->prevn) = ptr;
+      if (s->next)
+         *stb__prevn(s->next) = &s->next;
+
+      if (s->child)
+         *stb__prevn(s->child) = &s->child;
+
+      return ptr;
+   } else {
+      stb__nochildren *s = (stb__nochildren *) ptr - 1;
+
+      s = (stb__nochildren *) realloc(ptr, newsize + sizeof(s));
+      if (s == NULL) return NULL;
+
+      // update pointers
+      (*s->prevn) = s+1;
+      if (s->next)
+         *stb__prevn(s->next) = &s->next;
+
+      return s+1;
+   }
+}
+
+void *stb_realloc_c(void *context, void *ptr, size_t newsize)
+{
+   if (ptr == NULL) return stb_malloc(context, newsize);
+   if (newsize == 0) { stb_free(ptr); return NULL; }
+   // @TODO: verify you haven't changed contexts
+   return stb_realloc(ptr, newsize);
+}
+
+void stb_reassign(void *new_context, void *ptr)
+{
+   stb__alloc *src = stb__get_context(new_context);
+
+   stb__alloc_type t = stb__identify(ptr);
+   assert(t == STB__alloc || t == STB__nochildren);
+
+   if (t == STB__alloc) {
+      stb__alloc *s = (stb__alloc *) ptr - 1;
+
+      // unlink from old
+      *(s->prevn) = s->next;
+      if (s->next)
+         *stb__prevn(s->next) = s->prevn;
+
+      stb__insert_alloc(src, s);
+   } else {
+      stb__nochildren *s = (stb__nochildren *) ptr - 1;
+
+      // unlink from old
+      *(s->prevn) = s->next;
+      if (s->next)
+         *stb__prevn(s->next) = s->prevn;
+
+      stb__insert_nochild(src, s);
+   }
+}
+
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                                stb_arr
+//
+//  An stb_arr is directly useable as a pointer (use the actual type in your
+//  definition), but when it resizes, it returns a new pointer and you can't
+//  use the old one, so you have to be careful to copy-in-out as necessary.
+//
+//  Use a NULL pointer as a 0-length array.
+//
+//     float *my_array = NULL, *temp;
+//
+//     // add elements on the end one at a time
+//     stb_arr_push(my_array, 0.0f);
+//     stb_arr_push(my_array, 1.0f);
+//     stb_arr_push(my_array, 2.0f);
+//
+//     assert(my_array[1] == 2.0f);
+//
+//     // add an uninitialized element at the end, then assign it
+//     *stb_arr_add(my_array) = 3.0f;
+//
+//     // add three uninitialized elements at the end
+//     temp = stb_arr_addn(my_array,3);
+//     temp[0] = 4.0f;
+//     temp[1] = 5.0f;
+//     temp[2] = 6.0f;
+//
+//     assert(my_array[5] == 5.0f);
+//
+//     // remove the last one
+//     stb_arr_pop(my_array);
+//
+//     assert(stb_arr_len(my_array) == 6);
+
+
+#ifdef STB_MALLOC_WRAPPER
+  #define STB__PARAMS    , char *file, int line
+  #define STB__ARGS      ,       file,     line
+#else
+  #define STB__PARAMS
+  #define STB__ARGS
+#endif
+
+// calling this function allocates an empty stb_arr attached to p
+// (whereas NULL isn't attached to anything)
+STB_EXTERN void stb_arr_malloc(void **target, void *context);
+
+// call this function with a non-NULL value to have all successive
+// stbs that are created be attached to the associated parent. Note
+// that once a given stb_arr is non-empty, it stays attached to its
+// current parent, even if you call this function again.
+// it turns the previous value, so you can restore it
+STB_EXTERN void* stb_arr_malloc_parent(void *p);
+
+// simple functions written on top of other functions
+#define stb_arr_empty(a)       (  stb_arr_len(a) == 0 )
+#define stb_arr_add(a)         (  stb_arr_addn((a),1) )
+#define stb_arr_push(a,v)      ( *stb_arr_add(a)=(v)  )
+
+typedef struct
+{
+   int len, limit;
+   int stb_malloc;
+   unsigned int signature;
+} stb__arr;
+
+#define stb_arr_signature      0x51bada7b  // ends with 0123 in decimal
+
+// access the header block stored before the data
+#define stb_arrhead(a)         /*lint --e(826)*/ (((stb__arr *) (a)) - 1)
+#define stb_arrhead2(a)        /*lint --e(826)*/ (((stb__arr *) (a)) - 1)
+
+#ifdef STB_DEBUG
+#define stb_arr_check(a)       assert(!a || stb_arrhead(a)->signature == stb_arr_signature)
+#define stb_arr_check2(a)      assert(!a || stb_arrhead2(a)->signature == stb_arr_signature)
+#else
+#define stb_arr_check(a)       ((void) 0)
+#define stb_arr_check2(a)      ((void) 0)
+#endif
+
+// ARRAY LENGTH
+
+// get the array length; special case if pointer is NULL
+#define stb_arr_len(a)         (a ? stb_arrhead(a)->len : 0)
+#define stb_arr_len2(a)        ((stb__arr *) (a) ? stb_arrhead2(a)->len : 0)
+#define stb_arr_lastn(a)       (stb_arr_len(a)-1)
+
+// check whether a given index is valid -- tests 0 <= i < stb_arr_len(a)
+#define stb_arr_valid(a,i)     (a ? (int) (i) < stb_arrhead(a)->len : 0)
+
+// change the array length so is is exactly N entries long, creating
+// uninitialized entries as needed
+#define stb_arr_setlen(a,n)  \
+            (stb__arr_setlen((void **) &(a), sizeof(a[0]), (n)))
+
+// change the array length so that N is a valid index (that is, so
+// it is at least N entries long), creating uninitialized entries as needed
+#define stb_arr_makevalid(a,n)  \
+            (stb_arr_len(a) < (n)+1 ? stb_arr_setlen(a,(n)+1),(a) : (a))
+
+// remove the last element of the array, returning it
+#define stb_arr_pop(a)         ((stb_arr_check(a), (a))[--stb_arrhead(a)->len])
+
+// access the last element in the array
+#define stb_arr_last(a)        ((stb_arr_check(a), (a))[stb_arr_len(a)-1])
+
+// is iterator at end of list?
+#define stb_arr_end(a,i)       ((i) >= &(a)[stb_arr_len(a)])
+
+// (internal) change the allocated length of the array
+#define stb_arr__grow(a,n)     (stb_arr_check(a), stb_arrhead(a)->len += (n))
+
+// add N new uninitialized elements to the end of the array
+#define stb_arr__addn(a,n)     /*lint --e(826)*/ \
+                               ((stb_arr_len(a)+(n) > stb_arrcurmax(a))      \
+                                 ? (stb__arr_addlen((void **) &(a),sizeof(*a),(n)),0) \
+                                 : ((stb_arr__grow(a,n), 0)))
+
+// add N new uninitialized elements to the end of the array, and return
+// a pointer to the first new one
+#define stb_arr_addn(a,n)      (stb_arr__addn((a),n),(a)+stb_arr_len(a)-(n))
+
+// add N new uninitialized elements starting at index 'i'
+#define stb_arr_insertn(a,i,n) (stb__arr_insertn((void **) &(a), sizeof(*a), (i), (n)))
+
+// insert an element at i
+#define stb_arr_insert(a,i,v)  (stb__arr_insertn((void **) &(a), sizeof(*a), (i), (1)), ((a)[i] = v))
+
+// delete N elements from the middle starting at index 'i'
+#define stb_arr_deleten(a,i,n) (stb__arr_deleten((void **) &(a), sizeof(*a), (i), (n)))
+
+// delete the i'th element
+#define stb_arr_delete(a,i)   stb_arr_deleten(a,i,1)
+
+// delete the i'th element, swapping down from the end
+#define stb_arr_fastdelete(a,i)  \
+   (stb_swap(&a[i], &a[stb_arrhead(a)->len-1], sizeof(*a)), stb_arr_pop(a))
+
+
+// ARRAY STORAGE
+
+// get the array maximum storage; special case if NULL
+#define stb_arrcurmax(a)       (a ? stb_arrhead(a)->limit : 0)
+#define stb_arrcurmax2(a)      (a ? stb_arrhead2(a)->limit : 0)
+
+// set the maxlength of the array to n in anticipation of further growth
+#define stb_arr_setsize(a,n)   (stb_arr_check(a), stb__arr_setsize((void **) &(a),sizeof((a)[0]),n))
+
+// make sure maxlength is large enough for at least N new allocations
+#define stb_arr_atleast(a,n)   (stb_arr_len(a)+(n) > stb_arrcurmax(a)      \
+                                 ? stb_arr_setsize((a), (n)) : 0)
+
+// make a copy of a given array (copies contents via 'memcpy'!)
+#define stb_arr_copy(a)        stb__arr_copy(a, sizeof((a)[0]))
+
+// compute the storage needed to store all the elements of the array
+#define stb_arr_storage(a)     (stb_arr_len(a) * sizeof((a)[0]))
+
+#define stb_arr_for(v,arr)     for((v)=(arr); (v) < (arr)+stb_arr_len(arr); ++(v))
+
+// IMPLEMENTATION
+
+STB_EXTERN void stb_arr_free_(void **p);
+STB_EXTERN void *stb__arr_copy_(void *p, int elem_size);
+STB_EXTERN void stb__arr_setsize_(void **p, int size, int limit  STB__PARAMS);
+STB_EXTERN void stb__arr_setlen_(void **p, int size, int newlen  STB__PARAMS);
+STB_EXTERN void stb__arr_addlen_(void **p, int size, int addlen  STB__PARAMS);
+STB_EXTERN void stb__arr_deleten_(void **p, int size, int loc, int n  STB__PARAMS);
+STB_EXTERN void stb__arr_insertn_(void **p, int size, int loc, int n  STB__PARAMS);
+
+#define stb_arr_free(p)            stb_arr_free_((void **) &(p))
+#define stb__arr_copy              stb__arr_copy_
+
+#ifndef STB_MALLOC_WRAPPER
+  #define stb__arr_setsize         stb__arr_setsize_
+  #define stb__arr_setlen          stb__arr_setlen_
+  #define stb__arr_addlen          stb__arr_addlen_
+  #define stb__arr_deleten         stb__arr_deleten_
+  #define stb__arr_insertn         stb__arr_insertn_
+#else
+  #define stb__arr_addlen(p,s,n)    stb__arr_addlen_(p,s,n,__FILE__,__LINE__)
+  #define stb__arr_setlen(p,s,n)    stb__arr_setlen_(p,s,n,__FILE__,__LINE__)
+  #define stb__arr_setsize(p,s,n)   stb__arr_setsize_(p,s,n,__FILE__,__LINE__)
+  #define stb__arr_deleten(p,s,i,n) stb__arr_deleten_(p,s,i,n,__FILE__,__LINE__)
+  #define stb__arr_insertn(p,s,i,n) stb__arr_insertn_(p,s,i,n,__FILE__,__LINE__)
+#endif
+
+#ifdef STB_DEFINE
+static void *stb__arr_context;
+
+void *stb_arr_malloc_parent(void *p)
+{
+   void *q = stb__arr_context;
+   stb__arr_context = p;
+   return q;
+}
+
+void stb_arr_malloc(void **target, void *context)
+{
+   stb__arr *q = (stb__arr *) stb_malloc(context, sizeof(*q));
+   q->len = q->limit = 0;
+   q->stb_malloc = 1;
+   q->signature = stb_arr_signature;
+   *target = (void *) (q+1);
+}
+
+static void * stb__arr_malloc(int size)
+{
+   if (stb__arr_context)
+      return stb_malloc(stb__arr_context, size);
+   return malloc(size);
+}
+
+void * stb__arr_copy_(void *p, int elem_size)
+{
+   stb__arr *q;
+   if (p == NULL) return p;
+   q = (stb__arr *) stb__arr_malloc(sizeof(*q) + elem_size * stb_arrhead2(p)->limit);
+   stb_arr_check2(p);
+   memcpy(q, stb_arrhead2(p), sizeof(*q) + elem_size * stb_arrhead2(p)->len);
+   q->stb_malloc = !!stb__arr_context;
+   return q+1;
+}
+
+void stb_arr_free_(void **pp)
+{
+   void *p = *pp;
+   stb_arr_check2(p);
+   if (p) {
+      stb__arr *q = stb_arrhead2(p);
+      if (q->stb_malloc)
+         stb_free(q);
+      else
+         free(q);
+   }
+   *pp = NULL;
+}
+
+static void stb__arrsize_(void **pp, int size, int limit, int len  STB__PARAMS)
+{
+   void *p = *pp;
+   stb__arr *a;
+   stb_arr_check2(p);
+   if (p == NULL) {
+      if (len == 0 && size == 0) return;
+      a = (stb__arr *) stb__arr_malloc(sizeof(*a) + size*limit);
+      a->limit = limit;
+      a->len   = len;
+      a->stb_malloc = !!stb__arr_context;
+      a->signature = stb_arr_signature;
+   } else {
+      a = stb_arrhead2(p);
+      a->len = len;
+      if (a->limit < limit) {
+         void *p;
+         if (a->limit >= 4 && limit < a->limit * 2)
+            limit = a->limit * 2;
+         if (a->stb_malloc)
+            p = stb_realloc(a, sizeof(*a) + limit*size);
+         else
+            #ifdef STB_MALLOC_WRAPPER
+            p = stb__realloc(a, sizeof(*a) + limit*size, file, line);
+            #else
+            p = realloc(a, sizeof(*a) + limit*size);
+            #endif
+         if (p) {
+            a = (stb__arr *) p;
+            a->limit = limit;
+         } else {
+            // throw an error!
+         }
+      }
+   }
+   a->len   = stb_min(a->len, a->limit);
+   *pp = a+1;
+}
+
+void stb__arr_setsize_(void **pp, int size, int limit  STB__PARAMS)
+{
+   void *p = *pp;
+   stb_arr_check2(p);
+   stb__arrsize_(pp, size, limit, stb_arr_len2(p)  STB__ARGS);
+}
+
+void stb__arr_setlen_(void **pp, int size, int newlen  STB__PARAMS)
+{
+   void *p = *pp;
+   stb_arr_check2(p);
+   if (stb_arrcurmax2(p) < newlen || p == NULL) {
+      stb__arrsize_(pp, size, newlen, newlen  STB__ARGS);
+   } else {
+      stb_arrhead2(p)->len = newlen;
+   }
+}
+
+void stb__arr_addlen_(void **p, int size, int addlen  STB__PARAMS)
+{
+   stb__arr_setlen_(p, size, stb_arr_len2(*p) + addlen  STB__ARGS);
+}
+
+void stb__arr_insertn_(void **pp, int size, int i, int n  STB__PARAMS)
+{
+   void *p = *pp;
+   if (n) {
+      int z;
+
+      if (p == NULL) {
+         stb__arr_addlen_(pp, size, n  STB__ARGS);
+         return;
+      }
+
+      z = stb_arr_len2(p);
+      stb__arr_addlen_(&p, size, n  STB__ARGS);
+      memmove((char *) p + (i+n)*size, (char *) p + i*size, size * (z-i));
+   }
+   *pp = p;
+}
+
+void stb__arr_deleten_(void **pp, int size, int i, int n  STB__PARAMS)
+{
+   void *p = *pp;
+   if (n) {
+      memmove((char *) p + i*size, (char *) p + (i+n)*size, size * (stb_arr_len2(p)-(i+n)));
+      stb_arrhead2(p)->len -= n;
+   }
+   *pp = p;
+}
+
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                               Hashing
+//
+//      typical use for this is to make a power-of-two hash table.
+//
+//      let N = size of table (2^n)
+//      let H = stb_hash(str)
+//      let S = stb_rehash(H) | 1
+//
+//      then hash probe sequence P(i) for i=0..N-1
+//         P(i) = (H + S*i) & (N-1)
+//
+//      the idea is that H has 32 bits of hash information, but the
+//      table has only, say, 2^20 entries so only uses 20 of the bits.
+//      then by rehashing the original H we get 2^12 different probe
+//      sequences for a given initial probe location. (So it's optimal
+//      for 64K tables and its optimality decreases past that.)
+//
+//      ok, so I've added something that generates _two separate_
+//      32-bit hashes simultaneously which should scale better to
+//      very large tables.
+
+
+STB_EXTERN unsigned int stb_hash(char *str);
+STB_EXTERN unsigned int stb_hashptr(void *p);
+STB_EXTERN unsigned int stb_hashlen(char *str, int len);
+STB_EXTERN unsigned int stb_rehash_improved(unsigned int v);
+STB_EXTERN unsigned int stb_hash_fast(void *p, int len);
+STB_EXTERN unsigned int stb_hash2(char *str, unsigned int *hash2_ptr);
+STB_EXTERN unsigned int stb_hash_number(unsigned int hash);
+
+#define stb_rehash(x)  ((x) + ((x) >> 6) + ((x) >> 19))
+
+#ifdef STB_DEFINE
+unsigned int stb_hash(char *str)
+{
+   unsigned int hash = 0;
+   while (*str)
+      hash = (hash << 7) + (hash >> 25) + *str++;
+   return hash + (hash >> 16);
+}
+
+unsigned int stb_hashlen(char *str, int len)
+{
+   unsigned int hash = 0;
+   while (len-- > 0 && *str)
+      hash = (hash << 7) + (hash >> 25) + *str++;
+   return hash + (hash >> 16);
+}
+
+unsigned int stb_hashptr(void *p)
+{
+    unsigned int x = (unsigned int)(size_t) p;
+
+   // typically lacking in low bits and high bits
+   x = stb_rehash(x);
+   x += x << 16;
+
+   // pearson's shuffle
+   x ^= x << 3;
+   x += x >> 5;
+   x ^= x << 2;
+   x += x >> 15;
+   x ^= x << 10;
+   return stb_rehash(x);
+}
+
+unsigned int stb_rehash_improved(unsigned int v)
+{
+   return stb_hashptr((void *)(size_t) v);
+}
+
+unsigned int stb_hash2(char *str, unsigned int *hash2_ptr)
+{
+   unsigned int hash1 = 0x3141592c;
+   unsigned int hash2 = 0x77f044ed;
+   while (*str) {
+      hash1 = (hash1 << 7) + (hash1 >> 25) + *str;
+      hash2 = (hash2 << 11) + (hash2 >> 21) + *str;
+      ++str;
+   }
+   *hash2_ptr = hash2 + (hash1 >> 16);
+   return       hash1 + (hash2 >> 16);
+}
+
+// Paul Hsieh hash
+#define stb__get16(p) ((p)[0] | ((p)[1] << 8))
+
+unsigned int stb_hash_fast(void *p, int len)
+{
+   unsigned char *q = (unsigned char *) p;
+   unsigned int hash = len;
+
+   if (len <= 0 || q == NULL) return 0;
+
+   /* Main loop */
+   for (;len > 3; len -= 4) {
+      unsigned int val;
+      hash +=  stb__get16(q);
+      val   = (stb__get16(q+2) << 11);
+      hash  = (hash << 16) ^ hash ^ val;
+      q    += 4;
+      hash += hash >> 11;
+   }
+
+   /* Handle end cases */
+   switch (len) {
+      case 3: hash += stb__get16(q);
+              hash ^= hash << 16;
+              hash ^= q[2] << 18;
+              hash += hash >> 11;
+              break;
+      case 2: hash += stb__get16(q);
+              hash ^= hash << 11;
+              hash += hash >> 17;
+              break;
+      case 1: hash += q[0];
+              hash ^= hash << 10;
+              hash += hash >> 1;
+              break;
+      case 0: break;
+   }
+
+   /* Force "avalanching" of final 127 bits */
+   hash ^= hash << 3;
+   hash += hash >> 5;
+   hash ^= hash << 4;
+   hash += hash >> 17;
+   hash ^= hash << 25;
+   hash += hash >> 6;
+
+   return hash;
+}
+
+unsigned int stb_hash_number(unsigned int hash)
+{
+   hash ^= hash << 3;
+   hash += hash >> 5;
+   hash ^= hash << 4;
+   hash += hash >> 17;
+   hash ^= hash << 25;
+   hash += hash >> 6;
+   return hash;
+}
+
+#endif
+
+#ifdef STB_PERFECT_HASH
+//////////////////////////////////////////////////////////////////////////////
+//
+//                     Perfect hashing for ints/pointers
+//
+//   This is mainly useful for making faster pointer-indexed tables
+//   that don't change frequently. E.g. for stb_ischar().
+//
+
+typedef struct
+{
+   stb_uint32  addend;
+   stb_uint    multiplicand;
+   stb_uint    b_mask;
+   stb_uint8   small_bmap[16];
+   stb_uint16  *large_bmap;
+
+   stb_uint table_mask;
+   stb_uint32 *table;
+} stb_perfect;
+
+STB_EXTERN int stb_perfect_create(stb_perfect *,unsigned int*,int n);
+STB_EXTERN void stb_perfect_destroy(stb_perfect *);
+STB_EXTERN int stb_perfect_hash(stb_perfect *, unsigned int x);
+extern int stb_perfect_hash_max_failures;
+
+#ifdef STB_DEFINE
+
+int stb_perfect_hash_max_failures;
+
+int stb_perfect_hash(stb_perfect *p, unsigned int x)
+{
+   stb_uint m = x * p->multiplicand;
+   stb_uint y = x >> 16;
+   stb_uint bv = (m >> 24) + y;
+   stb_uint av = (m + y) >> 12;
+   if (p->table == NULL) return -1;  // uninitialized table fails
+   bv &= p->b_mask;
+   av &= p->table_mask;
+   if (p->large_bmap)
+      av ^= p->large_bmap[bv];
+   else
+      av ^= p->small_bmap[bv];
+   return p->table[av] == x ? av : -1;
+}
+
+static void stb__perfect_prehash(stb_perfect *p, stb_uint x, stb_uint16 *a, stb_uint16 *b)
+{
+   stb_uint m = x * p->multiplicand;
+   stb_uint y = x >> 16;
+   stb_uint bv = (m >> 24) + y;
+   stb_uint av = (m + y) >> 12;
+   bv &= p->b_mask;
+   av &= p->table_mask;
+   *b = bv;
+   *a = av;
+}
+
+static unsigned long stb__perfect_rand(void)
+{
+   static unsigned long stb__rand;
+   stb__rand = stb__rand * 2147001325 + 715136305;
+   return 0x31415926 ^ ((stb__rand >> 16) + (stb__rand << 16));
+}
+
+typedef struct {
+   unsigned short count;
+   unsigned short b;
+   unsigned short map;
+   unsigned short *entries;
+} stb__slot;
+
+static int stb__slot_compare(const void *p, const void *q)
+{
+   stb__slot *a = (stb__slot *) p;
+   stb__slot *b = (stb__slot *) q;
+   return a->count > b->count ? -1 : a->count < b->count;  // sort large to small
+}
+
+int stb_perfect_create(stb_perfect *p, unsigned int *v, int n)
+{
+   unsigned int buffer1[64], buffer2[64], buffer3[64], buffer4[64], buffer5[32];
+   unsigned short *as = (unsigned short *) stb_temp(buffer1, sizeof(*v)*n);
+   unsigned short *bs = (unsigned short *) stb_temp(buffer2, sizeof(*v)*n);
+   unsigned short *entries = (unsigned short *) stb_temp(buffer4, sizeof(*entries) * n);
+   int size = 1 << stb_log2_ceil(n), bsize=8;
+   int failure = 0,i,j,k;
+
+   assert(n <= 32768);
+   p->large_bmap = NULL;
+
+   for(;;) {
+      stb__slot *bcount = (stb__slot *) stb_temp(buffer3, sizeof(*bcount) * bsize);
+      unsigned short *bloc = (unsigned short *) stb_temp(buffer5, sizeof(*bloc) * bsize);
+      unsigned short *e;
+      int bad=0;
+
+      p->addend = stb__perfect_rand();
+      p->multiplicand = stb__perfect_rand() | 1;
+      p->table_mask = size-1;
+      p->b_mask = bsize-1;
+      p->table = (stb_uint32 *) malloc(size * sizeof(*p->table));
+
+      for (i=0; i < bsize; ++i) {
+         bcount[i].b     = i;
+         bcount[i].count = 0;
+         bcount[i].map   = 0;
+      }
+      for (i=0; i < n; ++i) {
+         stb__perfect_prehash(p, v[i], as+i, bs+i);
+         ++bcount[bs[i]].count;
+      }
+      qsort(bcount, bsize, sizeof(*bcount), stb__slot_compare);
+      e = entries; // now setup up their entries index
+      for (i=0; i < bsize; ++i) {
+         bcount[i].entries = e;
+         e += bcount[i].count;
+         bcount[i].count = 0;
+         bloc[bcount[i].b] = i;
+      }
+      // now fill them out
+      for (i=0; i < n; ++i) {
+         int b = bs[i];
+         int w = bloc[b];
+         bcount[w].entries[bcount[w].count++] = i;
+      }
+      stb_tempfree(buffer5,bloc);
+      // verify
+      for (i=0; i < bsize; ++i)
+         for (j=0; j < bcount[i].count; ++j)
+            assert(bs[bcount[i].entries[j]] == bcount[i].b);
+      memset(p->table, 0, size*sizeof(*p->table));
+
+      // check if any b has duplicate a
+      for (i=0; i < bsize; ++i) {
+         if (bcount[i].count > 1) {
+            for (j=0; j < bcount[i].count; ++j) {
+               if (p->table[as[bcount[i].entries[j]]])
+                  bad = 1;
+               p->table[as[bcount[i].entries[j]]] = 1;
+            }
+            for (j=0; j < bcount[i].count; ++j) {
+               p->table[as[bcount[i].entries[j]]] = 0;
+            }
+            if (bad) break;
+         }
+      }
+
+      if (!bad) {
+         // go through the bs and populate the table, first fit
+         for (i=0; i < bsize; ++i) {
+            if (bcount[i].count) {
+               // go through the candidate table[b] values
+               for (j=0; j < size; ++j) {
+                  // go through the a values and see if they fit
+                  for (k=0; k < bcount[i].count; ++k) {
+                     int a = as[bcount[i].entries[k]];
+                     if (p->table[(a^j)&p->table_mask]) {
+                        break; // fails
+                     }
+                  }
+                  // if succeeded, accept
+                  if (k == bcount[i].count) {
+                     bcount[i].map = j;
+                     for (k=0; k < bcount[i].count; ++k) {
+                        int a = as[bcount[i].entries[k]];
+                        p->table[(a^j)&p->table_mask] = 1;
+                     }
+                     break;
+                  }
+               }
+               if (j == size)
+                  break; // no match for i'th entry, so break out in failure
+            }
+         }
+         if (i == bsize) {
+            // success... fill out map
+            if (bsize <= 16 && size <= 256) {
+               p->large_bmap = NULL;
+               for (i=0; i < bsize; ++i)
+                  p->small_bmap[bcount[i].b] = (stb_uint8) bcount[i].map;
+            } else {
+               p->large_bmap = (unsigned short *) malloc(sizeof(*p->large_bmap) * bsize);
+               for (i=0; i < bsize; ++i)
+                  p->large_bmap[bcount[i].b] = bcount[i].map;
+            }
+
+            // initialize table to v[0], so empty slots will fail
+            for (i=0; i < size; ++i)
+               p->table[i] = v[0];
+
+            for (i=0; i < n; ++i)
+               if (p->large_bmap)
+                  p->table[as[i] ^ p->large_bmap[bs[i]]] = v[i];
+               else
+                  p->table[as[i] ^ p->small_bmap[bs[i]]] = v[i];
+
+            // and now validate that none of them collided
+            for (i=0; i < n; ++i)
+               assert(stb_perfect_hash(p, v[i]) >= 0);
+
+            stb_tempfree(buffer3, bcount);
+            break;
+         }
+      }
+      free(p->table);
+      p->table = NULL;
+      stb_tempfree(buffer3, bcount);
+
+      ++failure;
+      if (failure >= 4 && bsize < size) bsize *= 2;
+      if (failure >= 8 && (failure & 3) == 0 && size < 4*n) {
+         size *= 2;
+         bsize *= 2;
+      }
+      if (failure == 6) {
+         // make sure the input data is unique, so we don't infinite loop
+         unsigned int *data = (unsigned int *) stb_temp(buffer3, n * sizeof(*data));
+         memcpy(data, v, sizeof(*data) * n);
+         qsort(data, n, sizeof(*data), stb_intcmp(0));
+         for (i=1; i < n; ++i) {
+            if (data[i] == data[i-1])
+               size = 0; // size is return value, so 0 it
+         }
+         stb_tempfree(buffer3, data);
+         if (!size) break;
+      }
+   }
+
+   if (failure > stb_perfect_hash_max_failures)
+      stb_perfect_hash_max_failures = failure;
+
+   stb_tempfree(buffer1, as);
+   stb_tempfree(buffer2, bs);
+   stb_tempfree(buffer4, entries);
+
+   return size;
+}
+
+void stb_perfect_destroy(stb_perfect *p)
+{
+   if (p->large_bmap) free(p->large_bmap);
+   if (p->table     ) free(p->table);
+   p->large_bmap = NULL;
+   p->table      = NULL;
+   p->b_mask     = 0;
+   p->table_mask = 0;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                     Perfect hash clients
+
+STB_EXTERN int    stb_ischar(char s, char *set);
+
+#ifdef STB_DEFINE
+
+int stb_ischar(char c, char *set)
+{
+   static unsigned char bit[8] = { 1,2,4,8,16,32,64,128 };
+   static stb_perfect p;
+   static unsigned char (*tables)[256];
+   static char ** sets = NULL;
+
+   int z = stb_perfect_hash(&p, (int)(size_t) set);
+   if (z < 0) {
+      int i,k,n,j,f;
+      // special code that means free all existing data
+      if (set == NULL) {
+         stb_arr_free(sets);
+         free(tables);
+         tables = NULL;
+         stb_perfect_destroy(&p);
+         return 0;
+      }
+      stb_arr_push(sets, set);
+      stb_perfect_destroy(&p);
+      n = stb_perfect_create(&p, (unsigned int *) (char **) sets, stb_arr_len(sets));
+      assert(n != 0);
+      k = (n+7) >> 3;
+      tables = (unsigned char (*)[256]) realloc(tables, sizeof(*tables) * k);
+      memset(tables, 0, sizeof(*tables) * k);
+      for (i=0; i < stb_arr_len(sets); ++i) {
+          k = stb_perfect_hash(&p, (int)(size_t) sets[i]);
+         assert(k >= 0);
+         n = k >> 3;
+         f = bit[k&7];
+         for (j=0; !j || sets[i][j]; ++j) {
+            tables[n][(unsigned char) sets[i][j]] |= f;
+         }
+      }
+      z = stb_perfect_hash(&p, (int)(size_t) set);
+   }
+   return tables[z >> 3][(unsigned char) c] & bit[z & 7];
+}
+
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                     Instantiated data structures
+//
+// This is an attempt to implement a templated data structure.
+//
+// Hash table: call stb_define_hash(TYPE,N,KEY,K1,K2,HASH,VALUE)
+//     TYPE     -- will define a structure type containing the hash table
+//     N        -- the name, will prefix functions named:
+//                        N create
+//                        N destroy
+//                        N get
+//                        N set, N add, N update,
+//                        N remove
+//     KEY      -- the type of the key. 'x == y' must be valid
+//       K1,K2  -- keys never used by the app, used as flags in the hashtable
+//       HASH   -- a piece of code ending with 'return' that hashes key 'k'
+//     VALUE    -- the type of the value. 'x = y' must be valid
+//
+//  Note that stb_define_hash_base can be used to define more sophisticated
+//  hash tables, e.g. those that make copies of the key or use special
+//  comparisons (e.g. strcmp).
+
+#define STB_(prefix,name)     stb__##prefix##name
+#define STB__(prefix,name)    prefix##name
+#define STB__use(x)           x
+#define STB__skip(x)
+
+#define stb_declare_hash(PREFIX,TYPE,N,KEY,VALUE) \
+   typedef struct stb__st_##TYPE TYPE;\
+   PREFIX int STB__(N, init)(TYPE *h, int count);\
+   PREFIX int STB__(N, memory_usage)(TYPE *h);\
+   PREFIX TYPE * STB__(N, create)(void);\
+   PREFIX TYPE * STB__(N, copy)(TYPE *h);\
+   PREFIX void STB__(N, destroy)(TYPE *h);\
+   PREFIX int STB__(N,get_flag)(TYPE *a, KEY k, VALUE *v);\
+   PREFIX VALUE STB__(N,get)(TYPE *a, KEY k);\
+   PREFIX int STB__(N, set)(TYPE *a, KEY k, VALUE v);\
+   PREFIX int STB__(N, add)(TYPE *a, KEY k, VALUE v);\
+   PREFIX int STB__(N, update)(TYPE*a,KEY k,VALUE v);\
+   PREFIX int STB__(N, remove)(TYPE *a, KEY k, VALUE *v);
+
+#define STB_nocopy(x)        (x)
+#define STB_nodelete(x)      0
+#define STB_nofields
+#define STB_nonullvalue(x)
+#define STB_nullvalue(x)     x
+#define STB_safecompare(x)   x
+#define STB_nosafe(x)
+#define STB_noprefix
+
+#ifdef __GNUC__
+#define STB__nogcc(x)
+#else
+#define STB__nogcc(x)  x
+#endif
+
+#define stb_define_hash_base(PREFIX,TYPE,FIELDS,N,NC,LOAD_FACTOR,             \
+                             KEY,EMPTY,DEL,COPY,DISPOSE,SAFE,                 \
+                             VCOMPARE,CCOMPARE,HASH,                          \
+                             VALUE,HASVNULL,VNULL)                            \
+                                                                              \
+typedef struct                                                                \
+{                                                                             \
+   KEY   k;                                                                   \
+   VALUE v;                                                                   \
+} STB_(N,_hashpair);                                                          \
+                                                                              \
+STB__nogcc( typedef struct stb__st_##TYPE TYPE;  )                            \
+struct stb__st_##TYPE {                                                       \
+   FIELDS                                                                     \
+   STB_(N,_hashpair) *table;                                                  \
+   unsigned int mask;                                                         \
+   int count, limit;                                                          \
+   int deleted;                                                               \
+                                                                              \
+   int delete_threshhold;                                                     \
+   int grow_threshhold;                                                       \
+   int shrink_threshhold;                                                     \
+   unsigned char alloced, has_empty, has_del;                                 \
+   VALUE ev; VALUE dv;                                                        \
+};                                                                            \
+                                                                              \
+static unsigned int STB_(N, hash)(KEY k)                                      \
+{                                                                             \
+   HASH                                                                       \
+}                                                                             \
+                                                                              \
+PREFIX int STB__(N, init)(TYPE *h, int count)                                        \
+{                                                                             \
+   int i;                                                                     \
+   if (count < 4) count = 4;                                                  \
+   h->limit = count;                                                          \
+   h->count = 0;                                                              \
+   h->mask  = count-1;                                                        \
+   h->deleted = 0;                                                            \
+   h->grow_threshhold = (int) (count * LOAD_FACTOR);                          \
+   h->has_empty = h->has_del = 0;                                             \
+   h->alloced = 0;                                                            \
+   if (count <= 64)                                                           \
+      h->shrink_threshhold = 0;                                               \
+   else                                                                       \
+      h->shrink_threshhold = (int) (count * (LOAD_FACTOR/2.25));              \
+   h->delete_threshhold = (int) (count * (1-LOAD_FACTOR)/2);                  \
+   h->table = (STB_(N,_hashpair)*) malloc(sizeof(h->table[0]) * count);       \
+   if (h->table == NULL) return 0;                                            \
+   /* ideally this gets turned into a memset32 automatically */               \
+   for (i=0; i < count; ++i)                                                  \
+      h->table[i].k = EMPTY;                                                  \
+   return 1;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX int STB__(N, memory_usage)(TYPE *h)                                           \
+{                                                                             \
+   return sizeof(*h) + h->limit * sizeof(h->table[0]);                        \
+}                                                                             \
+                                                                              \
+PREFIX TYPE * STB__(N, create)(void)                                                 \
+{                                                                             \
+   TYPE *h = (TYPE *) malloc(sizeof(*h));                                     \
+   if (h) {                                                                   \
+      if (STB__(N, init)(h, 16))                                              \
+         h->alloced = 1;                                                      \
+      else { free(h); h=NULL; }                                               \
+   }                                                                          \
+   return h;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX void STB__(N, destroy)(TYPE *a)                                               \
+{                                                                             \
+   int i;                                                                     \
+   for (i=0; i < a->limit; ++i)                                               \
+      if (!CCOMPARE(a->table[i].k,EMPTY) && !CCOMPARE(a->table[i].k, DEL))    \
+         DISPOSE(a->table[i].k);                                              \
+   free(a->table);                                                            \
+   if (a->alloced)                                                            \
+      free(a);                                                                \
+}                                                                             \
+                                                                              \
+static void STB_(N, rehash)(TYPE *a, int count);                              \
+                                                                              \
+PREFIX int STB__(N,get_flag)(TYPE *a, KEY k, VALUE *v)                               \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask, s;                                           \
+   if (CCOMPARE(k,EMPTY)){ if (a->has_empty) *v = a->ev; return a->has_empty;}\
+   if (CCOMPARE(k,DEL)) { if (a->has_del  ) *v = a->dv; return a->has_del;   }\
+   if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                               \
+   SAFE(if (!CCOMPARE(a->table[n].k,DEL)))                                    \
+   if (VCOMPARE(a->table[n].k,k)) { *v = a->table[n].v; return 1; }            \
+   s = stb_rehash(h) | 1;                                                     \
+   for(;;) {                                                                  \
+      n = (n + s) & a->mask;                                                  \
+      if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                            \
+      SAFE(if (CCOMPARE(a->table[n].k,DEL)) continue;)                        \
+      if (VCOMPARE(a->table[n].k,k))                                           \
+         { *v = a->table[n].v; return 1; }                                    \
+   }                                                                          \
+}                                                                             \
+                                                                              \
+HASVNULL(                                                                     \
+   PREFIX VALUE STB__(N,get)(TYPE *a, KEY k)                                         \
+   {                                                                          \
+      VALUE v;                                                                \
+      if (STB__(N,get_flag)(a,k,&v)) return v;                                \
+      else                           return VNULL;                            \
+   }                                                                          \
+)                                                                             \
+                                                                              \
+PREFIX int STB__(N,getkey)(TYPE *a, KEY k, KEY *kout)                                \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask, s;                                           \
+   if (CCOMPARE(k,EMPTY)||CCOMPARE(k,DEL)) return 0;                          \
+   if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                               \
+   SAFE(if (!CCOMPARE(a->table[n].k,DEL)))                                    \
+   if (VCOMPARE(a->table[n].k,k)) { *kout = a->table[n].k; return 1; }         \
+   s = stb_rehash(h) | 1;                                                     \
+   for(;;) {                                                                  \
+      n = (n + s) & a->mask;                                                  \
+      if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                            \
+      SAFE(if (CCOMPARE(a->table[n].k,DEL)) continue;)                        \
+      if (VCOMPARE(a->table[n].k,k))                                          \
+         { *kout = a->table[n].k; return 1; }                                 \
+   }                                                                          \
+}                                                                             \
+                                                                              \
+static int STB_(N,addset)(TYPE *a, KEY k, VALUE v,                            \
+                             int allow_new, int allow_old, int copy)          \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask;                                              \
+   int b = -1;                                                                \
+   if (CCOMPARE(k,EMPTY)) {                                                   \
+      if (a->has_empty ? allow_old : allow_new) {                             \
+          n=a->has_empty; a->ev = v; a->has_empty = 1; return !n;             \
+      } else return 0;                                                        \
+   }                                                                          \
+   if (CCOMPARE(k,DEL)) {                                                     \
+      if (a->has_del ? allow_old : allow_new) {                               \
+          n=a->has_del; a->dv = v; a->has_del = 1; return !n;                 \
+      } else return 0;                                                        \
+   }                                                                          \
+   if (!CCOMPARE(a->table[n].k, EMPTY)) {                                     \
+      unsigned int s;                                                         \
+      if (CCOMPARE(a->table[n].k, DEL))                                       \
+         b = n;                                                               \
+      else if (VCOMPARE(a->table[n].k,k)) {                                   \
+         if (allow_old)                                                       \
+            a->table[n].v = v;                                                \
+         return !allow_new;                                                   \
+      }                                                                       \
+      s = stb_rehash(h) | 1;                                                  \
+      for(;;) {                                                               \
+         n = (n + s) & a->mask;                                               \
+         if (CCOMPARE(a->table[n].k, EMPTY)) break;                           \
+         if (CCOMPARE(a->table[n].k, DEL)) {                                  \
+            if (b < 0) b = n;                                                 \
+         } else if (VCOMPARE(a->table[n].k,k)) {                              \
+            if (allow_old)                                                    \
+               a->table[n].v = v;                                             \
+            return !allow_new;                                                \
+         }                                                                    \
+      }                                                                       \
+   }                                                                          \
+   if (!allow_new) return 0;                                                  \
+   if (b < 0) b = n; else --a->deleted;                                       \
+   a->table[b].k = copy ? COPY(k) : k;                                        \
+   a->table[b].v = v;                                                         \
+   ++a->count;                                                                \
+   if (a->count > a->grow_threshhold)                                         \
+      STB_(N,rehash)(a, a->limit*2);                                          \
+   return 1;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX int STB__(N, set)(TYPE *a, KEY k, VALUE v){return STB_(N,addset)(a,k,v,1,1,1);}\
+PREFIX int STB__(N, add)(TYPE *a, KEY k, VALUE v){return STB_(N,addset)(a,k,v,1,0,1);}\
+PREFIX int STB__(N, update)(TYPE*a,KEY k,VALUE v){return STB_(N,addset)(a,k,v,0,1,1);}\
+                                                                              \
+PREFIX int STB__(N, remove)(TYPE *a, KEY k, VALUE *v)                                \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask, s;                                           \
+   if (CCOMPARE(k,EMPTY)) { if (a->has_empty) { if(v)*v = a->ev; a->has_empty=0; return 1; } return 0; } \
+   if (CCOMPARE(k,DEL))   { if (a->has_del  ) { if(v)*v = a->dv; a->has_del  =0; return 1; } return 0; } \
+   if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                               \
+   if (SAFE(CCOMPARE(a->table[n].k,DEL) || ) !VCOMPARE(a->table[n].k,k)) {     \
+      s = stb_rehash(h) | 1;                                                  \
+      for(;;) {                                                               \
+         n = (n + s) & a->mask;                                               \
+         if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                         \
+         SAFE(if (CCOMPARE(a->table[n].k, DEL)) continue;)                    \
+         if (VCOMPARE(a->table[n].k,k)) break;                                 \
+      }                                                                       \
+   }                                                                          \
+   DISPOSE(a->table[n].k);                                                    \
+   a->table[n].k = DEL;                                                       \
+   --a->count;                                                                \
+   ++a->deleted;                                                              \
+   if (v != NULL)                                                             \
+      *v = a->table[n].v;                                                     \
+   if (a->count < a->shrink_threshhold)                                       \
+      STB_(N, rehash)(a, a->limit >> 1);                                      \
+   else if (a->deleted > a->delete_threshhold)                                \
+      STB_(N, rehash)(a, a->limit);                                           \
+   return 1;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX TYPE * STB__(NC, copy)(TYPE *a)                                        \
+{                                                                             \
+   int i;                                                                     \
+   TYPE *h = (TYPE *) malloc(sizeof(*h));                                     \
+   if (!h) return NULL;                                                       \
+   if (!STB__(N, init)(h, a->limit)) { free(h); return NULL; }                \
+   h->count = a->count;                                                       \
+   h->deleted = a->deleted;                                                   \
+   h->alloced = 1;                                                            \
+   h->ev = a->ev; h->dv = a->dv;                                              \
+   h->has_empty = a->has_empty; h->has_del = a->has_del;                      \
+   memcpy(h->table, a->table, h->limit * sizeof(h->table[0]));                \
+   for (i=0; i < a->limit; ++i)                                               \
+      if (!CCOMPARE(h->table[i].k,EMPTY) && !CCOMPARE(h->table[i].k,DEL))     \
+         h->table[i].k = COPY(h->table[i].k);                                 \
+   return h;                                                                  \
+}                                                                             \
+                                                                              \
+static void STB_(N, rehash)(TYPE *a, int count)                               \
+{                                                                             \
+   int i;                                                                     \
+   TYPE b;                                                                    \
+   STB__(N, init)(&b, count);                                                 \
+   for (i=0; i < a->limit; ++i)                                               \
+      if (!CCOMPARE(a->table[i].k,EMPTY) && !CCOMPARE(a->table[i].k,DEL))     \
+         STB_(N,addset)(&b, a->table[i].k, a->table[i].v,1,1,0);              \
+   free(a->table);                                                            \
+   a->table = b.table;                                                        \
+   a->mask = b.mask;                                                          \
+   a->count = b.count;                                                        \
+   a->limit = b.limit;                                                        \
+   a->deleted = b.deleted;                                                    \
+   a->delete_threshhold = b.delete_threshhold;                                \
+   a->grow_threshhold = b.grow_threshhold;                                    \
+   a->shrink_threshhold = b.shrink_threshhold;                                \
+}
+
+#define STB_equal(a,b)  ((a) == (b))
+
+#define stb_define_hash(TYPE,N,KEY,EMPTY,DEL,HASH,VALUE)                      \
+   stb_define_hash_base(STB_noprefix, TYPE,STB_nofields,N,NC,0.85f,           \
+              KEY,EMPTY,DEL,STB_nocopy,STB_nodelete,STB_nosafe,               \
+              STB_equal,STB_equal,HASH,                                       \
+              VALUE,STB_nonullvalue,0)
+
+#define stb_define_hash_vnull(TYPE,N,KEY,EMPTY,DEL,HASH,VALUE,VNULL)          \
+   stb_define_hash_base(STB_noprefix, TYPE,STB_nofields,N,NC,0.85f,           \
+              KEY,EMPTY,DEL,STB_nocopy,STB_nodelete,STB_nosafe,               \
+              STB_equal,STB_equal,HASH,                                       \
+              VALUE,STB_nullvalue,VNULL)
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                        stb_ptrmap
+//
+// An stb_ptrmap data structure is an O(1) hash table between pointers. One
+// application is to let you store "extra" data associated with pointers,
+// which is why it was originally called stb_extra.
+
+stb_declare_hash(STB_EXTERN, stb_ptrmap, stb_ptrmap_, void *, void *)
+stb_declare_hash(STB_EXTERN, stb_idict, stb_idict_, stb_int32, stb_int32)
+stb_declare_hash(STB_EXTERN, stb_uidict, stbi_uidict_, stb_uint32, stb_uint32)
+
+STB_EXTERN void        stb_ptrmap_delete(stb_ptrmap *e, void (*free_func)(void *));
+STB_EXTERN stb_ptrmap *stb_ptrmap_new(void);
+
+STB_EXTERN stb_idict * stb_idict_new_size(int size);
+STB_EXTERN void        stb_idict_remove_all(stb_idict *e);
+STB_EXTERN void        stb_uidict_reset(stb_uidict *e);
+
+#ifdef STB_DEFINE
+
+#define STB_EMPTY ((void *) 2)
+#define STB_EDEL  ((void *) 6)
+
+stb_define_hash_base(STB_noprefix,stb_ptrmap, STB_nofields, stb_ptrmap_,stb_ptrmap_,0.85f,
+              void *,STB_EMPTY,STB_EDEL,STB_nocopy,STB_nodelete,STB_nosafe,
+              STB_equal,STB_equal,return stb_hashptr(k);,
+              void *,STB_nullvalue,NULL)
+
+stb_ptrmap *stb_ptrmap_new(void)
+{
+   return stb_ptrmap_create();
+}
+
+void stb_ptrmap_delete(stb_ptrmap *e, void (*free_func)(void *))
+{
+   int i;
+   if (free_func)
+      for (i=0; i < e->limit; ++i)
+         if (e->table[i].k != STB_EMPTY && e->table[i].k != STB_EDEL) {
+            if (free_func == free)
+               free(e->table[i].v); // allow STB_MALLOC_WRAPPER to operate
+            else
+               free_func(e->table[i].v);
+         }
+   stb_ptrmap_destroy(e);
+}
+
+// extra fields needed for stua_dict
+#define STB_IEMPTY  ((int) 1)
+#define STB_IDEL    ((int) 3)
+stb_define_hash_base(STB_noprefix, stb_idict, short type; short gc; STB_nofields, stb_idict_,stb_idict_,0.95f,
+              stb_int32,STB_IEMPTY,STB_IDEL,STB_nocopy,STB_nodelete,STB_nosafe,
+              STB_equal,STB_equal,
+              return stb_rehash_improved(k);,stb_int32,STB_nonullvalue,0)
+
+stb_idict * stb_idict_new_size(int size)
+{
+   stb_idict *e = (stb_idict *) malloc(sizeof(*e));
+   if (e) {
+      if (!stb_is_pow2(size))
+         size = 1 << stb_log2_ceil(size);
+      stb_idict_init(e, size);
+      e->alloced = 1;
+   }
+   return e;
+}
+
+void stb_idict_remove_all(stb_idict *e)
+{
+   int n;
+   for (n=0; n < e->limit; ++n)
+      e->table[n].k = STB_IEMPTY;
+   e->has_empty = e->has_del = 0;
+   e->count = 0;
+   e->deleted = 0;
+}
+
+stb_define_hash_base(STB_noprefix, stb_uidict, STB_nofields, stb_uidict_,stb_uidict_,0.85f,
+              stb_int32,0xffffffff,0xfffffffe,STB_nocopy,STB_nodelete,STB_nosafe,
+              STB_equal,STB_equal,
+              return stb_rehash_improved(k);,stb_uint32,STB_nonullvalue,0)
+
+void stb_uidict_reset(stb_uidict *e)
+{
+   int n;
+   for (n=0; n < e->limit; ++n)
+      e->table[n].k = 0xffffffff;
+   e->has_empty = e->has_del = 0;
+   e->count = 0;
+   e->deleted = 0;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                        stb_sparse_ptr_matrix
+//
+// An stb_ptrmap data structure is an O(1) hash table storing an arbitrary
+// block of data for a given pair of pointers.
+//
+// If create=0, returns
+
+typedef struct stb__st_stb_spmatrix stb_spmatrix;
+
+STB_EXTERN stb_spmatrix * stb_sparse_ptr_matrix_new(int val_size);
+STB_EXTERN void           stb_sparse_ptr_matrix_free(stb_spmatrix *z);
+STB_EXTERN void         * stb_sparse_ptr_matrix_get(stb_spmatrix *z, void *a, void *b, int create);
+
+#ifdef STB_DEFINE
+typedef struct
+{
+   void *a;
+   void *b;
+} stb__ptrpair;
+
+static stb__ptrpair stb__ptrpair_empty = { (void *) 1, (void *) 1 };
+static stb__ptrpair stb__ptrpair_del   = { (void *) 2, (void *) 2 };
+
+#define STB__equal_ptrpair(x,y) ((x).a == (y).a && (x).b == (y).b)
+
+stb_define_hash_base(STB_noprefix, stb_spmatrix, int val_size; void *arena;, stb__spmatrix_,stb__spmatrix_, 0.85,
+     stb__ptrpair, stb__ptrpair_empty, stb__ptrpair_del,
+     STB_nocopy, STB_nodelete, STB_nosafe,
+     STB__equal_ptrpair, STB__equal_ptrpair, return stb_rehash(stb_hashptr(k.a))+stb_hashptr(k.b);,
+     void *, STB_nullvalue, 0)
+
+stb_spmatrix *stb_sparse_ptr_matrix_new(int val_size)
+{
+   stb_spmatrix *m = stb__spmatrix_create();
+   if (m) m->val_size = val_size;
+   if (m) m->arena = stb_malloc_global(1);
+   return m;
+}
+
+void stb_sparse_ptr_matrix_free(stb_spmatrix *z)
+{
+   if (z->arena) stb_free(z->arena);
+   stb__spmatrix_destroy(z);
+}
+
+void *stb_sparse_ptr_matrix_get(stb_spmatrix *z, void *a, void *b, int create)
+{
+   stb__ptrpair t = { a,b };
+   void *data = stb__spmatrix_get(z, t);
+   if (!data && create) {
+      data = stb_malloc_raw(z->arena, z->val_size);
+      if (!data) return NULL;
+      memset(data, 0, z->val_size);
+      stb__spmatrix_add(z, t, data);
+   }
+   return data;
+}
+#endif
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                  SDICT: Hash Table for Strings (symbol table)
+//
+//           if "use_arena=1", then strings will be copied
+//           into blocks and never freed until the sdict is freed;
+//           otherwise they're malloc()ed and free()d on the fly.
+//           (specify use_arena=1 if you never stb_sdict_remove)
+
+stb_declare_hash(STB_EXTERN, stb_sdict, stb_sdict_, char *, void *)
+
+STB_EXTERN stb_sdict * stb_sdict_new(int use_arena);
+STB_EXTERN stb_sdict * stb_sdict_copy(stb_sdict*);
+STB_EXTERN void        stb_sdict_delete(stb_sdict *);
+STB_EXTERN void *      stb_sdict_change(stb_sdict *, char *str, void *p);
+STB_EXTERN int         stb_sdict_count(stb_sdict *d);
+
+STB_EXTERN int         stb_sdict_internal_limit(stb_sdict *d);
+STB_EXTERN char *      stb_sdict_internal_key(stb_sdict *d, int n);
+STB_EXTERN void *      stb_sdict_internal_value(stb_sdict *d, int n);
+
+#define stb_sdict_for(d,i,q,z)                                          \
+   for(i=0; i < stb_sdict_internal_limit(d) ? (q=stb_sdict_internal_key(d,i),z=stb_sdict_internal_value(d,i),1) : 0; ++i)    \
+      if (q==NULL||q==(void *) 1);else   // reversed makes macro friendly
+
+#ifdef STB_DEFINE
+
+// if in same translation unit, for speed, don't call accessors
+#undef stb_sdict_for
+#define stb_sdict_for(d,i,q,z)                                          \
+   for(i=0; i < (d)->limit ? (q=(d)->table[i].k,z=(d)->table[i].v,1) : 0; ++i)    \
+      if (q==NULL||q==(void *) 1);else   // reversed makes macro friendly
+
+#define STB_DEL ((void *) 1)
+#define STB_SDEL  ((char *) 1)
+
+#define stb_sdict__copy(x)                                             \
+   stb_p_strcpy_s(a->arena ? stb_malloc_string(a->arena, strlen(x)+1)    \
+                         : (char *) malloc(strlen(x)+1), strlen(x)+1, x)
+
+#define stb_sdict__dispose(x)  if (!a->arena) free(x)
+
+stb_define_hash_base(STB_noprefix, stb_sdict, void*arena;, stb_sdict_,stb_sdictinternal_, 0.85f,
+        char *, NULL, STB_SDEL, stb_sdict__copy, stb_sdict__dispose,
+                        STB_safecompare, !strcmp, STB_equal, return stb_hash(k);,
+        void *, STB_nullvalue, NULL)
+
+int stb_sdict_count(stb_sdict *a)
+{
+   return a->count;
+}
+
+int stb_sdict_internal_limit(stb_sdict *a)
+{
+   return a->limit;
+}
+char* stb_sdict_internal_key(stb_sdict *a, int n)
+{
+   return a->table[n].k;
+}
+void* stb_sdict_internal_value(stb_sdict *a, int n)
+{
+   return a->table[n].v;
+}
+
+stb_sdict * stb_sdict_new(int use_arena)
+{
+   stb_sdict *d = stb_sdict_create();
+   if (d == NULL) return NULL;
+   d->arena = use_arena ? stb_malloc_global(1) : NULL;
+   return d;
+}
+
+stb_sdict* stb_sdict_copy(stb_sdict *old)
+{
+   stb_sdict *n;
+   void *old_arena = old->arena;
+   void *new_arena = old_arena ? stb_malloc_global(1) : NULL;
+   old->arena = new_arena;
+   n = stb_sdictinternal_copy(old);
+   old->arena = old_arena;
+   if (n)
+      n->arena = new_arena;
+   else if (new_arena)
+      stb_free(new_arena);
+   return n;
+}
+
+
+void stb_sdict_delete(stb_sdict *d)
+{
+   if (d->arena)
+      stb_free(d->arena);
+   stb_sdict_destroy(d);
+}
+
+void * stb_sdict_change(stb_sdict *d, char *str, void *p)
+{
+   void *q = stb_sdict_get(d, str);
+   stb_sdict_set(d, str, p);
+   return q;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                     Instantiated data structures
+//
+// This is an attempt to implement a templated data structure.
+// What you do is define a struct foo, and then include several
+// pointer fields to struct foo in your struct. Then you call
+// the instantiator, which creates the functions that implement
+// the data structure. This requires massive undebuggable #defines,
+// so we limit the cases where we do this.
+//
+// AA tree is an encoding of a 2-3 tree whereas RB trees encode a 2-3-4 tree;
+// much simpler code due to fewer cases.
+
+#define stb__bst_parent(x)    x
+#define stb__bst_noparent(x)
+
+#define stb_bst_fields(N)                                   \
+    *STB_(N,left), *STB_(N,right);                          \
+    unsigned char STB_(N,level)
+
+#define stb_bst_fields_parent(N)                            \
+    *STB_(N,left), *STB_(N,right),  *STB_(N,parent);        \
+    unsigned char STB_(N,level)
+
+#define STB__level(N,x)         ((x) ? (x)->STB_(N,level) : 0)
+
+#define stb_bst_base(TYPE, N, TREE, M, compare, PAR)                         \
+                                                                             \
+static int STB_(N,_compare)(TYPE *p, TYPE *q)                                \
+{                                                                            \
+   compare                                                                   \
+}                                                                            \
+                                                                             \
+static void STB_(N,setleft)(TYPE *q, TYPE *v)                                \
+{                                                                            \
+   q->STB_(N,left) = v;                                                      \
+   PAR(if (v) v->STB_(N,parent) = q;)                                        \
+}                                                                            \
+                                                                             \
+static void STB_(N,setright)(TYPE *q, TYPE *v)                               \
+{                                                                            \
+   q->STB_(N,right) = v;                                                     \
+   PAR(if (v) v->STB_(N,parent) = q;)                                        \
+}                                                                            \
+                                                                             \
+static TYPE *STB_(N,skew)(TYPE *q)                                           \
+{                                                                            \
+   if (q == NULL) return q;                                                  \
+   if (q->STB_(N,left)                                                       \
+        && q->STB_(N,left)->STB_(N,level) == q->STB_(N,level)) {             \
+      TYPE *p       = q->STB_(N,left);                                       \
+      STB_(N,setleft)(q, p->STB_(N,right));                                  \
+      STB_(N,setright)(p, q);                                                \
+      return p;                                                              \
+   }                                                                         \
+   return q;                                                                 \
+}                                                                            \
+                                                                             \
+static TYPE *STB_(N,split)(TYPE *p)                                          \
+{                                                                            \
+   TYPE *q = p->STB_(N,right);                                               \
+   if (q && q->STB_(N,right)                                                 \
+         && q->STB_(N,right)->STB_(N,level) == p->STB_(N,level)) {           \
+      STB_(N,setright)(p, q->STB_(N,left));                                  \
+      STB_(N,setleft)(q,p);                                                  \
+      ++q->STB_(N,level);                                                    \
+      return q;                                                              \
+   }                                                                         \
+   return p;                                                                 \
+}                                                                            \
+                                                                             \
+TYPE *STB__(N,insert)(TYPE *tree, TYPE *item)                                \
+{                                                                            \
+   int c;                                                                    \
+   if (tree == NULL) {                                                       \
+      item->STB_(N,left) = NULL;                                             \
+      item->STB_(N,right) = NULL;                                            \
+      item->STB_(N,level) = 1;                                               \
+      PAR(item->STB_(N,parent) = NULL;)                                      \
+      return item;                                                           \
+   }                                                                         \
+   c = STB_(N,_compare)(item,tree);                                          \
+   if (c == 0) {                                                             \
+      if (item != tree) {                                                    \
+         STB_(N,setleft)(item, tree->STB_(N,left));                          \
+         STB_(N,setright)(item, tree->STB_(N,right));                        \
+         item->STB_(N,level) = tree->STB_(N,level);                          \
+         PAR(item->STB_(N,parent) = NULL;)                                   \
+      }                                                                      \
+      return item;                                                           \
+   }                                                                         \
+   if (c < 0)                                                                \
+      STB_(N,setleft )(tree, STB__(N,insert)(tree->STB_(N,left), item));     \
+   else                                                                      \
+      STB_(N,setright)(tree, STB__(N,insert)(tree->STB_(N,right), item));    \
+   tree = STB_(N,skew)(tree);                                                \
+   tree = STB_(N,split)(tree);                                               \
+   PAR(tree->STB_(N,parent) = NULL;)                                         \
+   return tree;                                                              \
+}                                                                            \
+                                                                             \
+TYPE *STB__(N,remove)(TYPE *tree, TYPE *item)                                \
+{                                                                            \
+   static TYPE *delnode, *leaf, *restore;                                    \
+   if (tree == NULL) return NULL;                                            \
+   leaf = tree;                                                              \
+   if (STB_(N,_compare)(item, tree) < 0) {                                   \
+      STB_(N,setleft)(tree, STB__(N,remove)(tree->STB_(N,left), item));      \
+   } else {                                                                  \
+      TYPE *r;                                                               \
+      delnode = tree;                                                        \
+      r = STB__(N,remove)(tree->STB_(N,right), item);                        \
+      /* maybe move 'leaf' up to this location */                            \
+      if (restore == tree) { tree = leaf; leaf = restore = NULL;  }          \
+      STB_(N,setright)(tree,r);                                              \
+      assert(tree->STB_(N,right) != tree);                                   \
+   }                                                                         \
+   if (tree == leaf) {                                                       \
+      if (delnode == item) {                                                 \
+         tree = tree->STB_(N,right);                                         \
+         assert(leaf->STB_(N,left) == NULL);                                 \
+         /* move leaf (the right sibling) up to delnode */                   \
+         STB_(N,setleft )(leaf, item->STB_(N,left ));                        \
+         STB_(N,setright)(leaf, item->STB_(N,right));                        \
+         leaf->STB_(N,level) = item->STB_(N,level);                          \
+         if (leaf != item)                                                   \
+            restore = delnode;                                               \
+      }                                                                      \
+      delnode = NULL;                                                        \
+   } else {                                                                  \
+      if (STB__level(N,tree->STB_(N,left) ) < tree->STB_(N,level)-1 ||       \
+          STB__level(N,tree->STB_(N,right)) < tree->STB_(N,level)-1) {       \
+         --tree->STB_(N,level);                                              \
+         if (STB__level(N,tree->STB_(N,right)) > tree->STB_(N,level))        \
+            tree->STB_(N,right)->STB_(N,level) = tree->STB_(N,level);        \
+         tree = STB_(N,skew)(tree);                                          \
+         STB_(N,setright)(tree, STB_(N,skew)(tree->STB_(N,right)));          \
+         if (tree->STB_(N,right))                                            \
+            STB_(N,setright)(tree->STB_(N,right),                            \
+                  STB_(N,skew)(tree->STB_(N,right)->STB_(N,right)));         \
+         tree = STB_(N,split)(tree);                                         \
+         if (tree->STB_(N,right))                                            \
+            STB_(N,setright)(tree,  STB_(N,split)(tree->STB_(N,right)));     \
+      }                                                                      \
+   }                                                                         \
+   PAR(if (tree) tree->STB_(N,parent) = NULL;)                               \
+   return tree;                                                              \
+}                                                                            \
+                                                                             \
+TYPE *STB__(N,last)(TYPE *tree)                                              \
+{                                                                            \
+   if (tree)                                                                 \
+      while (tree->STB_(N,right)) tree = tree->STB_(N,right);                \
+   return tree;                                                              \
+}                                                                            \
+                                                                             \
+TYPE *STB__(N,first)(TYPE *tree)                                             \
+{                                                                            \
+   if (tree)                                                                 \
+      while (tree->STB_(N,left)) tree = tree->STB_(N,left);                  \
+   return tree;                                                              \
+}                                                                            \
+                                                                             \
+TYPE *STB__(N,next)(TYPE *tree, TYPE *item)                                  \
+{                                                                            \
+   TYPE *next = NULL;                                                        \
+   if (item->STB_(N,right))                                                  \
+      return STB__(N,first)(item->STB_(N,right));                            \
+   PAR(                                                                      \
+      while(item->STB_(N,parent)) {                                          \
+         TYPE *up = item->STB_(N,parent);                                    \
+         if (up->STB_(N,left) == item) return up;                            \
+         item = up;                                                          \
+      }                                                                      \
+      return NULL;                                                           \
+   )                                                                         \
+   while (tree != item) {                                                    \
+      if (STB_(N,_compare)(item, tree) < 0) {                                \
+         next = tree;                                                        \
+         tree = tree->STB_(N,left);                                          \
+      } else {                                                               \
+         tree = tree->STB_(N,right);                                         \
+      }                                                                      \
+   }                                                                         \
+   return next;                                                              \
+}                                                                            \
+                                                                             \
+TYPE *STB__(N,prev)(TYPE *tree, TYPE *item)                                  \
+{                                                                            \
+   TYPE *next = NULL;                                                        \
+   if (item->STB_(N,left))                                                   \
+      return STB__(N,last)(item->STB_(N,left));                              \
+   PAR(                                                                      \
+      while(item->STB_(N,parent)) {                                          \
+         TYPE *up = item->STB_(N,parent);                                    \
+         if (up->STB_(N,right) == item) return up;                           \
+         item = up;                                                          \
+      }                                                                      \
+      return NULL;                                                           \
+   )                                                                         \
+   while (tree != item) {                                                    \
+      if (STB_(N,_compare)(item, tree) < 0) {                                \
+         tree = tree->STB_(N,left);                                          \
+      } else {                                                               \
+         next = tree;                                                        \
+         tree = tree->STB_(N,right);                                         \
+      }                                                                      \
+   }                                                                         \
+   return next;                                                              \
+}                                                                            \
+                                                                             \
+STB__DEBUG(                                                                  \
+   void STB__(N,_validate)(TYPE *tree, int root)                             \
+   {                                                                         \
+      if (tree == NULL) return;                                              \
+      PAR(if(root) assert(tree->STB_(N,parent) == NULL);)                    \
+      assert(STB__level(N,tree->STB_(N,left) ) == tree->STB_(N,level)-1);    \
+      assert(STB__level(N,tree->STB_(N,right)) <= tree->STB_(N,level));      \
+      assert(STB__level(N,tree->STB_(N,right)) >= tree->STB_(N,level)-1);    \
+      if (tree->STB_(N,right)) {                                             \
+        assert(STB__level(N,tree->STB_(N,right)->STB_(N,right))              \
+                                               !=    tree->STB_(N,level));   \
+        PAR(assert(tree->STB_(N,right)->STB_(N,parent) == tree);)            \
+      }                                                                      \
+      PAR(if(tree->STB_(N,left)) assert(tree->STB_(N,left)->STB_(N,parent) == tree);) \
+      STB__(N,_validate)(tree->STB_(N,left) ,0);                             \
+      STB__(N,_validate)(tree->STB_(N,right),0);                             \
+   }                                                                         \
+)                                                                            \
+                                                                             \
+typedef struct                                                               \
+{                                                                            \
+   TYPE *root;                                                               \
+} TREE;                                                                      \
+                                                                             \
+void  STB__(M,Insert)(TREE *tree, TYPE *item)                                \
+{ tree->root = STB__(N,insert)(tree->root, item); }                          \
+void  STB__(M,Remove)(TREE *tree, TYPE *item)                                \
+{ tree->root = STB__(N,remove)(tree->root, item); }                          \
+TYPE *STB__(M,Next)(TREE *tree, TYPE *item)                                  \
+{ return STB__(N,next)(tree->root, item); }                                  \
+TYPE *STB__(M,Prev)(TREE *tree, TYPE *item)                                  \
+{ return STB__(N,prev)(tree->root, item); }                                  \
+TYPE *STB__(M,First)(TREE *tree) { return STB__(N,first)(tree->root); }      \
+TYPE *STB__(M,Last) (TREE *tree) { return STB__(N,last) (tree->root); }      \
+void STB__(M,Init)(TREE *tree) { tree->root = NULL; }
+
+
+#define stb_bst_find(N,tree,fcompare)                                        \
+{                                                                            \
+   int c;                                                                    \
+   while (tree != NULL) {                                                    \
+      fcompare                                                               \
+      if (c == 0) return tree;                                               \
+      if (c < 0)  tree = tree->STB_(N,left);                                 \
+      else        tree = tree->STB_(N,right);                                \
+   }                                                                         \
+   return NULL;                                                              \
+}
+
+#define stb_bst_raw(TYPE,N,TREE,M,vfield,VTYPE,compare,PAR)                  \
+   stb_bst_base(TYPE,N,TREE,M,                                               \
+         VTYPE a = p->vfield; VTYPE b = q->vfield; return (compare);, PAR )  \
+                                                                             \
+TYPE *STB__(N,find)(TYPE *tree, VTYPE a)                                     \
+   stb_bst_find(N,tree,VTYPE b = tree->vfield; c = (compare);)               \
+TYPE *STB__(M,Find)(TREE *tree, VTYPE a)                                     \
+{ return STB__(N,find)(tree->root, a); }
+
+#define stb_bst(TYPE,N,TREE,M,vfield,VTYPE,compare) \
+   stb_bst_raw(TYPE,N,TREE,M,vfield,VTYPE,compare,stb__bst_noparent)
+#define stb_bst_parent(TYPE,N,TREE,M,vfield,VTYPE,compare) \
+   stb_bst_raw(TYPE,N,TREE,M,vfield,VTYPE,compare,stb__bst_parent)
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                             Pointer Nulling
+//
+//  This lets you automatically NULL dangling pointers to "registered"
+//  objects. Note that you have to make sure you call the appropriate
+//  functions when you free or realloc blocks of memory that contain
+//  pointers or pointer targets. stb.h can automatically do this for
+//  stb_arr, or for all frees/reallocs if it's wrapping them.
+//
+
+#ifdef STB_NPTR
+
+STB_EXTERN void stb_nptr_set(void *address_of_pointer, void *value_to_write);
+STB_EXTERN void stb_nptr_didset(void *address_of_pointer);
+
+STB_EXTERN void stb_nptr_didfree(void *address_being_freed, int len);
+STB_EXTERN void stb_nptr_free(void *address_being_freed, int len);
+
+STB_EXTERN void stb_nptr_didrealloc(void *new_address, void *old_address, int len);
+STB_EXTERN void stb_nptr_recache(void); // recache all known pointers
+                                       // do this after pointer sets outside your control, slow
+
+#ifdef STB_DEFINE
+// for fast updating on free/realloc, we need to be able to find
+// all the objects (pointers and targets) within a given block;
+// this precludes hashing
+
+// we use a three-level hierarchy of memory to minimize storage:
+//   level 1: 65536 pointers to stb__memory_node (always uses 256 KB)
+//   level 2: each stb__memory_node represents a 64K block of memory
+//            with 256 stb__memory_leafs (worst case 64MB)
+//   level 3: each stb__memory_leaf represents 256 bytes of memory
+//            using a list of target locations and a list of pointers
+//            (which are hopefully fairly short normally!)
+
+// this approach won't work in 64-bit, which has a much larger address
+// space. need to redesign
+
+#define STB__NPTR_ROOT_LOG2   16
+#define STB__NPTR_ROOT_NUM    (1 << STB__NPTR_ROOT_LOG2)
+#define STB__NPTR_ROOT_SHIFT  (32 - STB__NPTR_ROOT_LOG2)
+
+#define STB__NPTR_NODE_LOG2   5
+#define STB__NPTR_NODE_NUM    (1 << STB__NPTR_NODE_LOG2)
+#define STB__NPTR_NODE_MASK   (STB__NPTR_NODE_NUM-1)
+#define STB__NPTR_NODE_SHIFT  (STB__NPTR_ROOT_SHIFT - STB__NPTR_NODE_LOG2)
+#define STB__NPTR_NODE_OFFSET(x)   (((x) >> STB__NPTR_NODE_SHIFT) & STB__NPTR_NODE_MASK)
+
+typedef struct stb__st_nptr
+{
+   void *ptr;   // address of actual pointer
+   struct stb__st_nptr *next;   // next pointer with same target
+   struct stb__st_nptr **prev;  // prev pointer with same target, address of 'next' field (or first)
+   struct stb__st_nptr *next_in_block;
+} stb__nptr;
+
+typedef struct stb__st_nptr_target
+{
+   void *ptr;   // address of target
+   stb__nptr *first; // address of first nptr pointing to this
+   struct stb__st_nptr_target *next_in_block;
+} stb__nptr_target;
+
+typedef struct
+{
+   stb__nptr *pointers;
+   stb__nptr_target *targets;
+} stb__memory_leaf;
+
+typedef struct
+{
+   stb__memory_leaf *children[STB__NPTR_NODE_NUM];
+} stb__memory_node;
+
+stb__memory_node *stb__memtab_root[STB__NPTR_ROOT_NUM];
+
+static stb__memory_leaf *stb__nptr_find_leaf(void *mem)
+{
+   stb_uint32 address = (stb_uint32) mem;
+   stb__memory_node *z = stb__memtab_root[address >> STB__NPTR_ROOT_SHIFT];
+   if (z)
+      return z->children[STB__NPTR_NODE_OFFSET(address)];
+   else
+      return NULL;
+}
+
+static void * stb__nptr_alloc(int size)
+{
+   return stb__realloc_raw(0,size);
+}
+
+static void stb__nptr_free(void *p)
+{
+   stb__realloc_raw(p,0);
+}
+
+static stb__memory_leaf *stb__nptr_make_leaf(void *mem)
+{
+   stb_uint32 address = (stb_uint32) mem;
+   stb__memory_node *z = stb__memtab_root[address >> STB__NPTR_ROOT_SHIFT];
+   stb__memory_leaf *f;
+   if (!z) {
+      int i;
+      z = (stb__memory_node *) stb__nptr_alloc(sizeof(*stb__memtab_root[0]));
+      stb__memtab_root[address >> STB__NPTR_ROOT_SHIFT] = z;
+      for (i=0; i < 256; ++i)
+         z->children[i] = 0;
+   }
+   f = (stb__memory_leaf *) stb__nptr_alloc(sizeof(*f));
+   z->children[STB__NPTR_NODE_OFFSET(address)] = f;
+   f->pointers = NULL;
+   f->targets = NULL;
+   return f;
+}
+
+static stb__nptr_target *stb__nptr_find_target(void *target, int force)
+{
+   stb__memory_leaf *p = stb__nptr_find_leaf(target);
+   if (p) {
+      stb__nptr_target *t = p->targets;
+      while (t) {
+         if (t->ptr == target)
+            return t;
+         t = t->next_in_block;
+      }
+   }
+   if (force) {
+      stb__nptr_target *t = (stb__nptr_target*) stb__nptr_alloc(sizeof(*t));
+      if (!p) p = stb__nptr_make_leaf(target);
+      t->ptr = target;
+      t->first = NULL;
+      t->next_in_block = p->targets;
+      p->targets = t;
+      return t;
+   } else
+      return NULL;
+}
+
+static stb__nptr *stb__nptr_find_pointer(void *ptr, int force)
+{
+   stb__memory_leaf *p = stb__nptr_find_leaf(ptr);
+   if (p) {
+      stb__nptr *t = p->pointers;
+      while (t) {
+         if (t->ptr == ptr)
+            return t;
+         t = t->next_in_block;
+      }
+   }
+   if (force) {
+      stb__nptr *t = (stb__nptr *) stb__nptr_alloc(sizeof(*t));
+      if (!p) p = stb__nptr_make_leaf(ptr);
+      t->ptr = ptr;
+      t->next = NULL;
+      t->prev = NULL;
+      t->next_in_block = p->pointers;
+      p->pointers = t;
+      return t;
+   } else
+      return NULL;
+}
+
+void stb_nptr_set(void *address_of_pointer, void *value_to_write)
+{
+   if (*(void **)address_of_pointer != value_to_write) {
+      *(void **) address_of_pointer = value_to_write;
+      stb_nptr_didset(address_of_pointer);
+   }
+}
+
+void stb_nptr_didset(void *address_of_pointer)
+{
+   // first unlink from old chain
+   void *new_address;
+   stb__nptr *p = stb__nptr_find_pointer(address_of_pointer, 1); // force building if doesn't exist
+   if (p->prev) { // if p->prev is NULL, we just built it, or it was NULL
+      *(p->prev) = p->next;
+      if (p->next) p->next->prev = p->prev;
+   }
+   // now add to new chain
+   new_address = *(void **)address_of_pointer;
+   if (new_address != NULL) {
+      stb__nptr_target *t = stb__nptr_find_target(new_address, 1);
+      p->next = t->first;
+      if (p->next) p->next->prev = &p->next;
+      p->prev = &t->first;
+      t->first = p;
+   } else {
+      p->prev = NULL;
+      p->next = NULL;
+   }
+}
+
+void stb__nptr_block(void *address, int len, void (*function)(stb__memory_leaf *f, int datum, void *start, void *end), int datum)
+{
+   void *end_address = (void *) ((char *) address + len - 1);
+   stb__memory_node *n;
+   stb_uint32 start = (stb_uint32) address;
+   stb_uint32 end   = start + len - 1;
+
+   int b0 = start >> STB__NPTR_ROOT_SHIFT;
+   int b1 = end >> STB__NPTR_ROOT_SHIFT;
+   int b=b0,i,e0,e1;
+
+   e0 = STB__NPTR_NODE_OFFSET(start);
+
+   if (datum <= 0) {
+      // first block
+      n = stb__memtab_root[b0];
+      if (n) {
+         if (b0 != b1)
+            e1 = STB__NPTR_NODE_NUM-1;
+         else
+            e1 = STB__NPTR_NODE_OFFSET(end);
+         for (i=e0; i <= e1; ++i)
+            if (n->children[i])
+               function(n->children[i], datum, address, end_address);
+      }
+      if (b1 > b0) {
+         // blocks other than the first and last block
+         for (b=b0+1; b < b1; ++b) {
+            n = stb__memtab_root[b];
+            if (n)
+               for (i=0; i <= STB__NPTR_NODE_NUM-1; ++i)
+                  if (n->children[i])
+                     function(n->children[i], datum, address, end_address);
+         }
+         // last block
+         n = stb__memtab_root[b1];
+         if (n) {
+            e1 = STB__NPTR_NODE_OFFSET(end);
+            for (i=0; i <= e1; ++i)
+               if (n->children[i])
+                  function(n->children[i], datum, address, end_address);
+         }
+      }
+   } else {
+      if (b1 > b0) {
+         // last block
+         n = stb__memtab_root[b1];
+         if (n) {
+            e1 = STB__NPTR_NODE_OFFSET(end);
+            for (i=e1; i >= 0; --i)
+               if (n->children[i])
+                  function(n->children[i], datum, address, end_address);
+         }
+         // blocks other than the first and last block
+         for (b=b1-1; b > b0; --b) {
+            n = stb__memtab_root[b];
+            if (n)
+               for (i=STB__NPTR_NODE_NUM-1; i >= 0; --i)
+                  if (n->children[i])
+                     function(n->children[i], datum, address, end_address);
+         }
+      }
+      // first block
+      n = stb__memtab_root[b0];
+      if (n) {
+         if (b0 != b1)
+            e1 = STB__NPTR_NODE_NUM-1;
+         else
+            e1 = STB__NPTR_NODE_OFFSET(end);
+         for (i=e1; i >= e0; --i)
+            if (n->children[i])
+               function(n->children[i], datum, address, end_address);
+      }
+   }
+}
+
+static void stb__nptr_delete_pointers(stb__memory_leaf *f, int offset, void *start, void *end)
+{
+   stb__nptr **p = &f->pointers;
+   while (*p) {
+      stb__nptr *n = *p;
+      if (n->ptr >= start && n->ptr <= end) {
+         // unlink
+         if (n->prev) {
+            *(n->prev) = n->next;
+            if (n->next) n->next->prev = n->prev;
+         }
+         *p = n->next_in_block;
+         stb__nptr_free(n);
+      } else
+         p = &(n->next_in_block);
+   }
+}
+
+static void stb__nptr_delete_targets(stb__memory_leaf *f, int offset, void *start, void *end)
+{
+   stb__nptr_target **p = &f->targets;
+   while (*p) {
+      stb__nptr_target *n = *p;
+      if (n->ptr >= start && n->ptr <= end) {
+         // null pointers
+         stb__nptr *z = n->first;
+         while (z) {
+            stb__nptr *y = z->next;
+            z->prev = NULL;
+            z->next = NULL;
+            *(void **) z->ptr = NULL;
+            z = y;
+         }
+         // unlink this target
+         *p = n->next_in_block;
+         stb__nptr_free(n);
+      } else
+         p = &(n->next_in_block);
+   }
+}
+
+void stb_nptr_didfree(void *address_being_freed, int len)
+{
+   // step one: delete all pointers in this block
+   stb__nptr_block(address_being_freed, len, stb__nptr_delete_pointers, 0);
+   // step two: NULL all pointers to this block; do this second to avoid NULLing deleted pointers
+   stb__nptr_block(address_being_freed, len, stb__nptr_delete_targets, 0);
+}
+
+void stb_nptr_free(void *address_being_freed, int len)
+{
+   free(address_being_freed);
+   stb_nptr_didfree(address_being_freed, len);
+}
+
+static void stb__nptr_move_targets(stb__memory_leaf *f, int offset, void *start, void *end)
+{
+   stb__nptr_target **t = &f->targets;
+   while (*t) {
+      stb__nptr_target *n = *t;
+      if (n->ptr >= start && n->ptr <= end) {
+         stb__nptr *z;
+         stb__memory_leaf *f;
+         // unlink n
+         *t = n->next_in_block;
+         // update n to new address
+         n->ptr = (void *) ((char *) n->ptr + offset);
+         f = stb__nptr_find_leaf(n->ptr);
+         if (!f) f = stb__nptr_make_leaf(n->ptr);
+         n->next_in_block = f->targets;
+         f->targets = n;
+         // now go through all pointers and make them point here
+         z = n->first;
+         while (z) {
+            *(void**) z->ptr = n->ptr;
+            z = z->next;
+         }
+      } else
+         t = &(n->next_in_block);
+   }
+}
+
+static void stb__nptr_move_pointers(stb__memory_leaf *f, int offset, void *start, void *end)
+{
+   stb__nptr **p = &f->pointers;
+   while (*p) {
+      stb__nptr *n = *p;
+      if (n->ptr >= start && n->ptr <= end) {
+         // unlink
+         *p = n->next_in_block;
+         n->ptr = (void *) ((int) n->ptr + offset);
+         // move to new block
+         f = stb__nptr_find_leaf(n->ptr);
+         if (!f) f = stb__nptr_make_leaf(n->ptr);
+         n->next_in_block = f->pointers;
+         f->pointers = n;
+      } else
+         p = &(n->next_in_block);
+   }
+}
+
+void stb_nptr_realloc(void *new_address, void *old_address, int len)
+{
+   if (new_address == old_address) return;
+
+   // have to move the pointers first, because moving the targets
+   //      requires writing to the pointers-to-the-targets, and if some of those moved too,
+   //      we need to make sure we don't write to the old memory
+
+   // step one: move all pointers within the block
+   stb__nptr_block(old_address, len, stb__nptr_move_pointers, (char *) new_address - (char *) old_address);
+   // step two: move all targets within the block
+   stb__nptr_block(old_address, len, stb__nptr_move_targets, (char *) new_address - (char *) old_address);
+}
+
+void stb_nptr_move(void *new_address, void *old_address)
+{
+   stb_nptr_realloc(new_address, old_address, 1);
+}
+
+void stb_nptr_recache(void)
+{
+   int i,j;
+   for (i=0; i < STB__NPTR_ROOT_NUM; ++i)
+      if (stb__memtab_root[i])
+         for (j=0; j < STB__NPTR_NODE_NUM; ++j)
+            if (stb__memtab_root[i]->children[j]) {
+               stb__nptr *p = stb__memtab_root[i]->children[j]->pointers;
+               while (p) {
+                  stb_nptr_didset(p->ptr);
+                  p = p->next_in_block;
+               }
+            }
+}
+
+#endif // STB_DEFINE
+#endif // STB_NPTR
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                             File Processing
+//
+
+
+#ifdef _WIN32
+  #define stb_rename(x,y)   _wrename((const wchar_t *)stb__from_utf8(x), (const wchar_t *)stb__from_utf8_alt(y))
+#else
+  #define stb_rename   rename
+#endif
+
+STB_EXTERN void     stb_fput_varlen64(FILE *f, stb_uint64 v);
+STB_EXTERN stb_uint64  stb_fget_varlen64(FILE *f);
+STB_EXTERN int      stb_size_varlen64(stb_uint64 v);
+
+
+#define stb_filec    (char *) stb_file
+#define stb_fileu    (unsigned char *) stb_file
+STB_EXTERN void *  stb_file(char *filename, size_t *length);
+STB_EXTERN void *  stb_file_max(char *filename, size_t *length);
+STB_EXTERN size_t  stb_filelen(FILE *f);
+STB_EXTERN int     stb_filewrite(char *filename, void *data, size_t length);
+STB_EXTERN int     stb_filewritestr(char *filename, char *data);
+STB_EXTERN char ** stb_stringfile(char *filename, int *len);
+STB_EXTERN char ** stb_stringfile_trimmed(char *name, int *len, char comm);
+STB_EXTERN char *  stb_fgets(char *buffer, int buflen, FILE *f);
+STB_EXTERN char *  stb_fgets_malloc(FILE *f);
+STB_EXTERN int     stb_fexists(char *filename);
+STB_EXTERN int     stb_fcmp(char *s1, char *s2);
+STB_EXTERN int     stb_feq(char *s1, char *s2);
+STB_EXTERN time_t  stb_ftimestamp(char *filename);
+
+STB_EXTERN int     stb_fullpath(char *abs, int abs_size, char *rel);
+STB_EXTERN FILE *  stb_fopen(char *filename, const char *mode);
+STB_EXTERN int     stb_fclose(FILE *f, int keep);
+
+enum
+{
+   stb_keep_no = 0,
+   stb_keep_yes = 1,
+   stb_keep_if_different = 2,
+};
+
+STB_EXTERN int     stb_copyfile(char *src, char *dest);
+
+STB_EXTERN void     stb_fput_varlen64(FILE *f, stb_uint64 v);
+STB_EXTERN stb_uint64  stb_fget_varlen64(FILE *f);
+STB_EXTERN int      stb_size_varlen64(stb_uint64 v);
+
+STB_EXTERN void    stb_fwrite32(FILE *f, stb_uint32 datum);
+STB_EXTERN void    stb_fput_varlen (FILE *f, int v);
+STB_EXTERN void    stb_fput_varlenu(FILE *f, unsigned int v);
+STB_EXTERN int     stb_fget_varlen (FILE *f);
+STB_EXTERN stb_uint stb_fget_varlenu(FILE *f);
+STB_EXTERN void    stb_fput_ranged (FILE *f, int v, int b, stb_uint n);
+STB_EXTERN int     stb_fget_ranged (FILE *f, int b, stb_uint n);
+STB_EXTERN int     stb_size_varlen (int v);
+STB_EXTERN int     stb_size_varlenu(unsigned int v);
+STB_EXTERN int     stb_size_ranged (int b, stb_uint n);
+
+STB_EXTERN int     stb_fread(void *data, size_t len, size_t count, void *f);
+STB_EXTERN int     stb_fwrite(void *data, size_t len, size_t count, void *f);
+
+#if 0
+typedef struct
+{
+   FILE  *base_file;
+   char  *buffer;
+   int    buffer_size;
+   int    buffer_off;
+   int    buffer_left;
+} STBF;
+
+STB_EXTERN STBF *stb_tfopen(char *filename, char *mode);
+STB_EXTERN int stb_tfread(void *data, size_t len, size_t count, STBF *f);
+STB_EXTERN int stb_tfwrite(void *data, size_t len, size_t count, STBF *f);
+#endif
+
+#ifdef STB_DEFINE
+
+#if 0
+STBF *stb_tfopen(char *filename, char *mode)
+{
+   STBF *z;
+   FILE *f = stb_p_fopen(filename, mode);
+   if (!f) return NULL;
+   z = (STBF *) malloc(sizeof(*z));
+   if (!z) { fclose(f); return NULL; }
+   z->base_file = f;
+   if (!strcmp(mode, "rb") || !strcmp(mode, "wb")) {
+      z->buffer_size = 4096;
+      z->buffer_off = z->buffer_size;
+      z->buffer_left = 0;
+      z->buffer = malloc(z->buffer_size);
+      if (!z->buffer) { free(z); fclose(f); return NULL; }
+   } else {
+      z->buffer = 0;
+      z->buffer_size = 0;
+      z->buffer_left = 0;
+   }
+   return z;
+}
+
+int stb_tfread(void *data, size_t len, size_t count, STBF *f)
+{
+   int total = len*count, done=0;
+   if (!total) return 0;
+   if (total <= z->buffer_left) {
+      memcpy(data, z->buffer + z->buffer_off, total);
+      z->buffer_off += total;
+      z->buffer_left -= total;
+      return count;
+   } else {
+      char *out = (char *) data;
+
+      // consume all buffered data
+      memcpy(data, z->buffer + z->buffer_off, z->buffer_left);
+      done = z->buffer_left;
+      out += z->buffer_left;
+      z->buffer_left=0;
+
+      if (total-done > (z->buffer_size >> 1)) {
+         done += fread(out
+      }
+   }
+}
+#endif
+
+void stb_fwrite32(FILE *f, stb_uint32 x)
+{
+   fwrite(&x, 4, 1, f);
+}
+
+#if defined(_WIN32)
+   #define stb__stat   _stat
+#else
+   #define stb__stat   stat
+#endif
+
+int stb_fexists(char *filename)
+{
+   struct stb__stat buf;
+   return stb__windows(
+             _wstat((const wchar_t *)stb__from_utf8(filename), &buf),
+               stat(filename,&buf)
+          ) == 0;
+}
+
+time_t stb_ftimestamp(char *filename)
+{
+   struct stb__stat buf;
+   if (stb__windows(
+             _wstat((const wchar_t *)stb__from_utf8(filename), &buf),
+               stat(filename,&buf)
+          ) == 0)
+   {
+      return buf.st_mtime;
+   } else {
+      return 0;
+   }
+}
+
+size_t  stb_filelen(FILE *f)
+{
+   long len, pos;
+   pos = ftell(f);
+   fseek(f, 0, SEEK_END);
+   len = ftell(f);
+   fseek(f, pos, SEEK_SET);
+   return (size_t) len;
+}
+
+void *stb_file(char *filename, size_t *length)
+{
+   FILE *f = stb__fopen(filename, "rb");
+   char *buffer;
+   size_t len, len2;
+   if (!f) return NULL;
+   len = stb_filelen(f);
+   buffer = (char *) malloc(len+2); // nul + extra
+   len2 = fread(buffer, 1, len, f);
+   if (len2 == len) {
+      if (length) *length = len;
+      buffer[len] = 0;
+   } else {
+      free(buffer);
+      buffer = NULL;
+   }
+   fclose(f);
+   return buffer;
+}
+
+int stb_filewrite(char *filename, void *data, size_t length)
+{
+   FILE *f = stb_fopen(filename, "wb");
+   if (f) {
+      unsigned char *data_ptr = (unsigned char *) data;
+      size_t remaining = length;
+      while (remaining > 0) {
+         size_t len2 = remaining > 65536 ? 65536 : remaining;
+         size_t len3 = fwrite(data_ptr, 1, len2, f);
+         if (len2 != len3) {
+            fprintf(stderr, "Failed while writing %s\n", filename);
+            break;
+         }
+         remaining -= len2;
+         data_ptr += len2;
+      }
+      stb_fclose(f, stb_keep_if_different);
+   }
+   return f != NULL;
+}
+
+int stb_filewritestr(char *filename, char *data)
+{
+   return stb_filewrite(filename, data, strlen(data));
+}
+
+void *  stb_file_max(char *filename, size_t *length)
+{
+   FILE *f = stb__fopen(filename, "rb");
+   char *buffer;
+   size_t len, maxlen;
+   if (!f) return NULL;
+   maxlen = *length;
+   buffer = (char *) malloc(maxlen+1);
+   len = fread(buffer, 1, maxlen, f);
+   buffer[len] = 0;
+   fclose(f);
+   *length = len;
+   return buffer;
+}
+
+char ** stb_stringfile(char *filename, int *plen)
+{
+   FILE *f = stb__fopen(filename, "rb");
+   char *buffer, **list=NULL, *s;
+   size_t len, count, i;
+
+   if (!f) return NULL;
+   len = stb_filelen(f);
+   buffer = (char *) malloc(len+1);
+   len = fread(buffer, 1, len, f);
+   buffer[len] = 0;
+   fclose(f);
+
+   // two passes through: first time count lines, second time set them
+   for (i=0; i < 2; ++i) {
+      s = buffer;
+      if (i == 1)
+         list[0] = s;
+      count = 1;
+      while (*s) {
+         if (*s == '\n' || *s == '\r') {
+            // detect if both cr & lf are together
+            int crlf = (s[0] + s[1]) == ('\n' + '\r');
+            if (i == 1) *s = 0;
+            if (crlf) ++s;
+            if (s[1]) {  // it's not over yet
+               if (i == 1) list[count] = s+1;
+               ++count;
+            }
+         }
+         ++s;
+      }
+      if (i == 0) {
+         list = (char **) malloc(sizeof(*list) * (count+1) + len+1);
+         if (!list) return NULL;
+         list[count] = 0;
+         // recopy the file so there's just a single allocation to free
+         memcpy(&list[count+1], buffer, len+1);
+         free(buffer);
+         buffer = (char *) &list[count+1];
+         if (plen) *plen = (int) count;
+      }
+   }
+   return list;
+}
+
+char ** stb_stringfile_trimmed(char *name, int *len, char comment)
+{
+   int i,n,o=0;
+   char **s = stb_stringfile(name, &n);
+   if (s == NULL) return NULL;
+   for (i=0; i < n; ++i) {
+      char *p = stb_skipwhite(s[i]);
+      if (*p && *p != comment)
+         s[o++] = p;
+   }
+   s[o] = NULL;
+   if (len) *len = o;
+   return s;
+}
+
+char * stb_fgets(char *buffer, int buflen, FILE *f)
+{
+   char *p;
+   buffer[0] = 0;
+   p = fgets(buffer, buflen, f);
+   if (p) {
+      int n = (int) (strlen(p)-1);
+      if (n >= 0)
+         if (p[n] == '\n')
+            p[n] = 0;
+   }
+   return p;
+}
+
+char * stb_fgets_malloc(FILE *f)
+{
+   // avoid reallocing for small strings
+   char quick_buffer[800];
+   quick_buffer[sizeof(quick_buffer)-2] = 0;
+   if (!fgets(quick_buffer, sizeof(quick_buffer), f))
+      return NULL;
+
+   if (quick_buffer[sizeof(quick_buffer)-2] == 0) {
+      size_t n = strlen(quick_buffer);
+      if (n > 0 && quick_buffer[n-1] == '\n')
+         quick_buffer[n-1] = 0;
+      return stb_p_strdup(quick_buffer);
+   } else {
+      char *p;
+      char *a = stb_p_strdup(quick_buffer);
+      size_t len = sizeof(quick_buffer)-1;
+
+      while (!feof(f)) {
+         if (a[len-1] == '\n') break;
+         a = (char *) realloc(a, len*2);
+         p = &a[len];
+         p[len-2] = 0;
+         if (!fgets(p, (int) len, f))
+            break;
+         if (p[len-2] == 0) {
+            len += strlen(p);
+            break;
+         }
+         len = len + (len-1);
+      }
+      if (a[len-1] == '\n')
+         a[len-1] = 0;
+      return a;
+   }
+}
+
+int stb_fullpath(char *abs, int abs_size, char *rel)
+{
+   #ifdef _WIN32
+   return _fullpath(abs, rel, abs_size) != NULL;
+   #else
+   if (rel[0] == '/' || rel[0] == '~') {
+      if ((int) strlen(rel) >= abs_size)
+         return 0;
+      stb_p_strcpy_s(abs,65536,rel);
+      return STB_TRUE;
+   } else {
+      int n;
+      getcwd(abs, abs_size);
+      n = strlen(abs);
+      if (n+(int) strlen(rel)+2 <= abs_size) {
+         abs[n] = '/';
+         stb_p_strcpy_s(abs+n+1, 65536,rel);
+         return STB_TRUE;
+      } else {
+         return STB_FALSE;
+      }
+   }
+   #endif
+}
+
+static int stb_fcmp_core(FILE *f, FILE *g)
+{
+   char buf1[1024],buf2[1024];
+   int n1,n2, res=0;
+
+   while (1) {
+      n1 = (int) fread(buf1, 1, sizeof(buf1), f);
+      n2 = (int) fread(buf2, 1, sizeof(buf2), g);
+      res = memcmp(buf1,buf2,stb_min(n1,n2));
+      if (res)
+         break;
+      if (n1 != n2) {
+         res = n1 < n2 ? -1 : 1;
+         break;
+      }
+      if (n1 == 0)
+         break;
+   }
+
+   fclose(f);
+   fclose(g);
+   return res;
+}
+
+int stb_fcmp(char *s1, char *s2)
+{
+   FILE *f = stb__fopen(s1, "rb");
+   FILE *g = stb__fopen(s2, "rb");
+
+   if (f == NULL || g == NULL) {
+      if (f) fclose(f);
+      if (g) {
+         fclose(g);
+         return STB_TRUE;
+      }
+      return f != NULL;
+   }
+
+   return stb_fcmp_core(f,g);
+}
+
+int stb_feq(char *s1, char *s2)
+{
+   FILE *f = stb__fopen(s1, "rb");
+   FILE *g = stb__fopen(s2, "rb");
+
+   if (f == NULL || g == NULL) {
+      if (f) fclose(f);
+      if (g) fclose(g);
+      return f == g;
+   }
+
+   // feq is faster because it shortcuts if they're different length
+   if (stb_filelen(f) != stb_filelen(g)) {
+      fclose(f);
+      fclose(g);
+      return 0;
+   }
+
+   return !stb_fcmp_core(f,g);
+}
+
+static stb_ptrmap *stb__files;
+
+typedef struct
+{
+   char *temp_name;
+   char *name;
+   int   errors;
+} stb__file_data;
+
+static FILE *stb__open_temp_file(char *temp_name, char *src_name, const char *mode)
+{
+   size_t p;
+#ifdef _MSC_VER
+   int j;
+#endif
+   FILE *f;
+   // try to generate a temporary file in the same directory
+   p = strlen(src_name)-1;
+   while (p > 0 && src_name[p] != '/' && src_name[p] != '\\'
+                && src_name[p] != ':' && src_name[p] != '~')
+      --p;
+   ++p;
+
+   memcpy(temp_name, src_name, p);
+
+   #ifdef _MSC_VER
+   // try multiple times to make a temp file... just in
+   // case some other process makes the name first
+   for (j=0; j < 32; ++j) {
+      stb_p_strcpy_s(temp_name+p, 65536, "stmpXXXXXX");
+      if (!stb_p_mktemp(temp_name))
+         return 0;
+
+      f = stb_p_fopen(temp_name, mode);
+      if (f != NULL)
+         break;
+   }
+   #else
+   {
+      stb_p_strcpy_s(temp_name+p, 65536, "stmpXXXXXX");
+      #ifdef __MINGW32__
+         int fd = open(stb_p_mktemp(temp_name), O_RDWR);
+      #else
+         int fd = mkstemp(temp_name);
+      #endif
+      if (fd == -1) return NULL;
+      f = fdopen(fd, mode);
+      if (f == NULL) {
+         unlink(temp_name);
+         close(fd);
+         return NULL;
+      }
+   }
+   #endif
+   return f;
+}
+
+
+FILE *  stb_fopen(char *filename, const char *mode)
+{
+   FILE *f;
+   char name_full[4096];
+   char temp_full[sizeof(name_full) + 12];
+
+   // @TODO: if the file doesn't exist, we can also use the fastpath here
+   if (mode[0] != 'w' && !strchr(mode, '+'))
+      return stb__fopen(filename, mode);
+
+   // save away the full path to the file so if the program
+   // changes the cwd everything still works right! unix has
+   // better ways to do this, but we have to work in windows
+   name_full[0] = '\0'; // stb_fullpath reads name_full[0]
+   if (stb_fullpath(name_full, sizeof(name_full), filename)==0)
+      return 0;
+
+   f = stb__open_temp_file(temp_full, name_full, mode);
+   if (f != NULL) {
+      stb__file_data *d = (stb__file_data *) malloc(sizeof(*d));
+      if (!d) { assert(0);  /* NOTREACHED */fclose(f); return NULL; }
+      if (stb__files == NULL) stb__files = stb_ptrmap_create();
+      d->temp_name = stb_p_strdup(temp_full);
+      d->name      = stb_p_strdup(name_full);
+      d->errors    = 0;
+      stb_ptrmap_add(stb__files, f, d);
+      return f;
+   }
+
+   return NULL;
+}
+
+int     stb_fclose(FILE *f, int keep)
+{
+   stb__file_data *d;
+
+   int ok = STB_FALSE;
+   if (f == NULL) return 0;
+
+   if (ferror(f))
+      keep = stb_keep_no;
+
+   fclose(f);
+
+   if (stb__files && stb_ptrmap_remove(stb__files, f, (void **) &d)) {
+      if (stb__files->count == 0) {
+         stb_ptrmap_destroy(stb__files);
+         stb__files = NULL;
+      }
+   } else
+      return STB_TRUE; // not special
+
+   if (keep == stb_keep_if_different) {
+      // check if the files are identical
+      if (stb_feq(d->name, d->temp_name)) {
+         keep = stb_keep_no;
+         ok = STB_TRUE;  // report success if no change
+      }
+   }
+
+   if (keep == stb_keep_no) {
+      remove(d->temp_name);
+   } else {
+      if (!stb_fexists(d->name)) {
+         // old file doesn't exist, so just move the new file over it
+         stb_rename(d->temp_name, d->name);
+      } else {
+         // don't delete the old file yet in case there are troubles! First rename it!
+         char preserved_old_file[4096];
+
+         // generate a temp filename in the same directory (also creates it, which we don't need)
+         FILE *dummy = stb__open_temp_file(preserved_old_file, d->name, "wb");
+         if (dummy != NULL) {
+            // we don't actually want the open file
+            fclose(dummy);
+
+            // discard what we just created
+            remove(preserved_old_file);  // if this fails, there's nothing we can do, and following logic handles it as best as possible anyway
+
+            // move the existing file to the preserved name
+            if (0 != stb_rename(d->name, preserved_old_file)) {  // 0 on success
+               // failed, state is:
+               //    filename  -> old file
+               //    tempname  -> new file
+               // keep tempname around so we don't lose data
+            } else {
+               //  state is:
+               //    preserved -> old file
+               //    tempname  -> new file
+               // move the new file to the old name
+               if (0 == stb_rename(d->temp_name, d->name)) {
+                  //  state is:
+                  //    preserved -> old file
+                  //    filename  -> new file
+                  ok = STB_TRUE;
+
+                  // 'filename -> new file' has always been the goal, so clean up
+                  remove(preserved_old_file); // nothing to be done if it fails
+               } else {
+                  // couldn't rename, so try renaming preserved file back
+
+                  //  state is:
+                  //    preserved -> old file
+                  //    tempname  -> new file
+                  stb_rename(preserved_old_file, d->name);
+                  // if the rename failed, there's nothing more we can do
+               }
+            }
+         } else {
+            // we couldn't get a temp filename. do this the naive way; the worst case failure here
+            // leaves the filename pointing to nothing and the new file as a tempfile
+            remove(d->name);
+            stb_rename(d->temp_name, d->name);
+         }
+      }
+   }
+
+   free(d->temp_name);
+   free(d->name);
+   free(d);
+
+   return ok;
+}
+
+int stb_copyfile(char *src, char *dest)
+{
+   char raw_buffer[1024];
+   char *buffer;
+   int buf_size = 65536;
+
+   FILE *f, *g;
+
+   // if file already exists at destination, do nothing
+   if (stb_feq(src, dest)) return STB_TRUE;
+
+   // open file
+   f = stb__fopen(src, "rb");
+   if (f == NULL) return STB_FALSE;
+
+   // open file for writing
+   g = stb__fopen(dest, "wb");
+   if (g == NULL) {
+      fclose(f);
+      return STB_FALSE;
+   }
+
+   buffer = (char *) malloc(buf_size);
+   if (buffer == NULL) {
+      buffer = raw_buffer;
+      buf_size = sizeof(raw_buffer);
+   }
+
+   while (!feof(f)) {
+      size_t n = fread(buffer, 1, buf_size, f);
+      if (n != 0)
+         fwrite(buffer, 1, n, g);
+   }
+
+   fclose(f);
+   if (buffer != raw_buffer)
+      free(buffer);
+
+   fclose(g);
+   return STB_TRUE;
+}
+
+// varlen:
+//    v' = (v >> 31) + (v < 0 ? ~v : v)<<1;  // small abs(v) => small v'
+// output v as big endian v'+k for v' <= k:
+//   1 byte :  v' <= 0x00000080          (  -64 <= v <   64)   7 bits
+//   2 bytes:  v' <= 0x00004000          (-8192 <= v < 8192)  14 bits
+//   3 bytes:  v' <= 0x00200000                               21 bits
+//   4 bytes:  v' <= 0x10000000                               28 bits
+// the number of most significant 1-bits in the first byte
+// equals the number of bytes after the first
+
+#define stb__varlen_xform(v)     (v<0 ? (~v << 1)+1 : (v << 1))
+
+int stb_size_varlen(int v) { return stb_size_varlenu(stb__varlen_xform(v)); }
+int stb_size_varlenu(unsigned int v)
+{
+   if (v < 0x00000080) return 1;
+   if (v < 0x00004000) return 2;
+   if (v < 0x00200000) return 3;
+   if (v < 0x10000000) return 4;
+   return 5;
+}
+
+void    stb_fput_varlen(FILE *f, int v) { stb_fput_varlenu(f, stb__varlen_xform(v)); }
+
+void    stb_fput_varlenu(FILE *f, unsigned int z)
+{
+   if (z >= 0x10000000) fputc(0xF0,f);
+   if (z >= 0x00200000) fputc((z < 0x10000000 ? 0xE0 : 0)+(z>>24),f);
+   if (z >= 0x00004000) fputc((z < 0x00200000 ? 0xC0 : 0)+(z>>16),f);
+   if (z >= 0x00000080) fputc((z < 0x00004000 ? 0x80 : 0)+(z>> 8),f);
+   fputc(z,f);
+}
+
+#define stb_fgetc(f)    ((unsigned char) fgetc(f))
+
+int     stb_fget_varlen(FILE *f)
+{
+   unsigned int z = stb_fget_varlenu(f);
+   return (z & 1) ? ~(z>>1) : (z>>1);
+}
+
+unsigned int stb_fget_varlenu(FILE *f)
+{
+   unsigned int z;
+   unsigned char d;
+   d = stb_fgetc(f);
+
+   if (d >= 0x80) {
+      if (d >= 0xc0) {
+         if (d >= 0xe0) {
+            if (d == 0xf0) z = stb_fgetc(f) << 24;
+            else           z = (d - 0xe0) << 24;
+            z += stb_fgetc(f) << 16;
+         }
+         else
+            z = (d - 0xc0) << 16;
+         z += stb_fgetc(f) << 8;
+      } else
+         z = (d - 0x80) <<  8;
+      z += stb_fgetc(f);
+   } else
+      z = d;
+   return z;
+}
+
+stb_uint64   stb_fget_varlen64(FILE *f)
+{
+   stb_uint64 z;
+   unsigned char d;
+   d = stb_fgetc(f);
+
+   if (d >= 0x80) {
+      if (d >= 0xc0) {
+         if (d >= 0xe0) {
+            if (d >= 0xf0) {
+               if (d >= 0xf8) {
+                  if (d >= 0xfc) {
+                     if (d >= 0xfe) {
+                        if (d >= 0xff)
+                           z = (stb_uint64) stb_fgetc(f) << 56;
+                        else
+                           z = (stb_uint64) (d - 0xfe) << 56;
+                        z |= (stb_uint64) stb_fgetc(f) << 48;
+                     } else z = (stb_uint64) (d - 0xfc) << 48;
+                     z |= (stb_uint64) stb_fgetc(f) << 40;
+                  } else z = (stb_uint64) (d - 0xf8) << 40;
+                  z |= (stb_uint64) stb_fgetc(f) << 32;
+               } else z = (stb_uint64) (d - 0xf0) << 32;
+               z |= (stb_uint) stb_fgetc(f) << 24;
+            } else z = (stb_uint) (d - 0xe0) << 24;
+            z |= (stb_uint) stb_fgetc(f) << 16;
+         } else z = (stb_uint) (d - 0xc0) << 16;
+         z |= (stb_uint) stb_fgetc(f) << 8;
+      } else z = (stb_uint) (d - 0x80) << 8;
+      z |= stb_fgetc(f);
+   } else
+      z = d;
+
+   return (z & 1) ? ~(z >> 1) : (z >> 1);
+}
+
+int stb_size_varlen64(stb_uint64 v)
+{
+   if (v < 0x00000080) return 1;
+   if (v < 0x00004000) return 2;
+   if (v < 0x00200000) return 3;
+   if (v < 0x10000000) return 4;
+   if (v < STB_IMM_UINT64(0x0000000800000000)) return 5;
+   if (v < STB_IMM_UINT64(0x0000040000000000)) return 6;
+   if (v < STB_IMM_UINT64(0x0002000000000000)) return 7;
+   if (v < STB_IMM_UINT64(0x0100000000000000)) return 8;
+   return 9;
+}
+
+void    stb_fput_varlen64(FILE *f, stb_uint64 v)
+{
+   stb_uint64 z = stb__varlen_xform(v);
+   int first=1;
+   if (z >= STB_IMM_UINT64(0x100000000000000)) {
+      fputc(0xff,f);
+      first=0;
+   }
+   if (z >= STB_IMM_UINT64(0x02000000000000)) fputc((first ? 0xFE : 0)+(char)(z>>56),f), first=0;
+   if (z >= STB_IMM_UINT64(0x00040000000000)) fputc((first ? 0xFC : 0)+(char)(z>>48),f), first=0;
+   if (z >= STB_IMM_UINT64(0x00000800000000)) fputc((first ? 0xF8 : 0)+(char)(z>>40),f), first=0;
+   if (z >= STB_IMM_UINT64(0x00000010000000)) fputc((first ? 0xF0 : 0)+(char)(z>>32),f), first=0;
+   if (z >= STB_IMM_UINT64(0x00000000200000)) fputc((first ? 0xE0 : 0)+(char)(z>>24),f), first=0;
+   if (z >= STB_IMM_UINT64(0x00000000004000)) fputc((first ? 0xC0 : 0)+(char)(z>>16),f), first=0;
+   if (z >= STB_IMM_UINT64(0x00000000000080)) fputc((first ? 0x80 : 0)+(char)(z>> 8),f), first=0;
+   fputc((char)z,f);
+}
+
+void    stb_fput_ranged(FILE *f, int v, int b, stb_uint n)
+{
+   v -= b;
+   if (n <= (1 << 31))
+      assert((stb_uint) v < n);
+   if (n > (1 << 24)) fputc(v >> 24, f);
+   if (n > (1 << 16)) fputc(v >> 16, f);
+   if (n > (1 <<  8)) fputc(v >>  8, f);
+   fputc(v,f);
+}
+
+int     stb_fget_ranged(FILE *f, int b, stb_uint n)
+{
+   unsigned int v=0;
+   if (n > (1 << 24)) v += stb_fgetc(f) << 24;
+   if (n > (1 << 16)) v += stb_fgetc(f) << 16;
+   if (n > (1 <<  8)) v += stb_fgetc(f) <<  8;
+   v += stb_fgetc(f);
+   return b+v;
+}
+
+int     stb_size_ranged(int b, stb_uint n)
+{
+   if (n > (1 << 24)) return 4;
+   if (n > (1 << 16)) return 3;
+   if (n > (1 <<  8)) return 2;
+   return 1;
+}
+
+void stb_fput_string(FILE *f, char *s)
+{
+   size_t len = strlen(s);
+   stb_fput_varlenu(f, (unsigned int) len);
+   fwrite(s, 1, len, f);
+}
+
+// inverse of the above algorithm
+char *stb_fget_string(FILE *f, void *p)
+{
+   char *s;
+   int len = stb_fget_varlenu(f);
+   if (len > 4096) return NULL;
+   s = p ? stb_malloc_string(p, len+1) : (char *) malloc(len+1);
+   fread(s, 1, len, f);
+   s[len] = 0;
+   return s;
+}
+
+char *stb_strdup(char *str, void *pool)
+{
+   size_t len = strlen(str);
+   char *p = stb_malloc_string(pool, len+1);
+   stb_p_strcpy_s(p, len+1, str);
+   return p;
+}
+
+// strip the trailing '/' or '\\' from a directory so we can refer to it
+// as a file for _stat()
+char *stb_strip_final_slash(char *t)
+{
+   if (t[0]) {
+      char *z = t + strlen(t) - 1;
+      // *z is the last character
+      if (*z == '\\' || *z == '/')
+         if (z != t+2 || t[1] != ':') // but don't strip it if it's e.g. "c:/"
+            *z = 0;
+      if (*z == '\\')
+         *z = '/'; // canonicalize to make sure it matches db
+   }
+   return t;
+}
+
+char *stb_strip_final_slash_regardless(char *t)
+{
+   if (t[0]) {
+      char *z = t + strlen(t) - 1;
+      // *z is the last character
+      if (*z == '\\' || *z == '/')
+         *z = 0;
+      if (*z == '\\')
+         *z = '/'; // canonicalize to make sure it matches db
+   }
+   return t;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                 Options parsing
+//
+
+STB_EXTERN char **stb_getopt_param(int *argc, char **argv, char *param);
+STB_EXTERN char **stb_getopt(int *argc, char **argv);
+STB_EXTERN void   stb_getopt_free(char **opts);
+
+#ifdef STB_DEFINE
+
+void   stb_getopt_free(char **opts)
+{
+   int i;
+   char ** o2 = opts;
+   for (i=0; i < stb_arr_len(o2); ++i)
+      free(o2[i]);
+   stb_arr_free(o2);
+}
+
+char **stb_getopt(int *argc, char **argv)
+{
+   return stb_getopt_param(argc, argv, (char*) "");
+}
+
+char **stb_getopt_param(int *argc, char **argv, char *param)
+{
+   char ** opts=NULL;
+   int i,j=1;
+   for (i=1; i < *argc; ++i) {
+      if (argv[i][0] != '-') {
+         argv[j++] = argv[i];
+      } else {
+         if (argv[i][1] == 0) { // plain - == don't parse further options
+            ++i;
+            while (i < *argc)
+               argv[j++] = argv[i++];
+            break;
+         } else if (argv[i][1] == '-') {
+            // copy argument through including initial '-' for clarity
+            stb_arr_push(opts, stb_p_strdup(argv[i]));
+         } else {
+            int k;
+            char *q = argv[i];  // traverse options list
+            for (k=1; q[k]; ++k) {
+               char *s;
+               if (strchr(param, q[k])) {  // does it take a parameter?
+                  char *t = &q[k+1], z = q[k];
+                  size_t len=0;
+                  if (*t == 0) {
+                     if (i == *argc-1) { // takes a parameter, but none found
+                        *argc = 0;
+                        stb_getopt_free(opts);
+                        return NULL;
+                     }
+                     t = argv[++i];
+                  } else
+                     k += (int) strlen(t);
+                  len = strlen(t);
+                  s = (char *) malloc(len+2);
+                  if (!s) return NULL;
+                  s[0] = z;
+                  stb_p_strcpy_s(s+1, len+2, t);
+               } else {
+                  // no parameter
+                  s = (char *) malloc(2);
+                  if (!s) return NULL;
+                  s[0] = q[k];
+                  s[1] = 0;
+               }
+               stb_arr_push(opts, s);
+            }
+         }
+      }
+   }
+   stb_arr_push(opts, NULL);
+   *argc = j;
+   return opts;
+}
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                 Portable directory reading
+//
+
+STB_EXTERN char **stb_readdir_files  (char *dir);
+STB_EXTERN char **stb_readdir_files_mask(char *dir, char *wild);
+STB_EXTERN char **stb_readdir_subdirs(char *dir);
+STB_EXTERN char **stb_readdir_subdirs_mask(char *dir, char *wild);
+STB_EXTERN void   stb_readdir_free   (char **files);
+STB_EXTERN char **stb_readdir_recursive(char *dir, char *filespec);
+STB_EXTERN void stb_delete_directory_recursive(char *dir);
+
+#ifdef STB_DEFINE
+
+#ifdef _MSC_VER
+#include <io.h>
+#else
+#include <unistd.h>
+#include <dirent.h>
+#endif
+
+void stb_readdir_free(char **files)
+{
+   char **f2 = files;
+   int i;
+   for (i=0; i < stb_arr_len(f2); ++i)
+      free(f2[i]);
+   stb_arr_free(f2);
+}
+
+static int isdotdirname(char *name)
+{
+   if (name[0] == '.')
+      return (name[1] == '.') ? !name[2] : !name[1];
+   return 0;
+}
+
+STB_EXTERN int stb_wildmatchi(char *expr, char *candidate);
+static char **readdir_raw(char *dir, int return_subdirs, char *mask)
+{
+   char **results = NULL;
+   char buffer[4096], with_slash[4096];
+   size_t n;
+
+   #ifdef WIN32
+      stb__wchar *ws;
+      struct _wfinddata_t data;
+   #ifdef _WIN64
+      const intptr_t none = -1;
+      intptr_t z;
+   #else
+      const long none = -1;
+      long z;
+   #endif
+   #else // !WIN32
+      const DIR *none = NULL;
+      DIR *z;
+   #endif
+
+   n = stb_strscpy(buffer,dir,sizeof(buffer));
+   if (!n || n >= sizeof(buffer))
+      return NULL;
+   stb_fixpath(buffer);
+
+   if (n > 0 && (buffer[n-1] != '/')) {
+      buffer[n++] = '/';
+   }
+   buffer[n] = 0;
+   if (!stb_strscpy(with_slash,buffer,sizeof(with_slash)))
+      return NULL;
+
+   #ifdef WIN32
+      if (!stb_strscpy(buffer+n,"*.*",sizeof(buffer)-n))
+         return NULL;
+      ws = stb__from_utf8(buffer);
+      z = _wfindfirst((wchar_t *)ws, &data);
+   #else
+      z = opendir(dir);
+   #endif
+
+   if (z != none) {
+      int nonempty = STB_TRUE;
+      #ifndef WIN32
+      struct dirent *data = readdir(z);
+      nonempty = (data != NULL);
+      #endif
+
+      if (nonempty) {
+
+         do {
+            int is_subdir;
+            #ifdef WIN32
+            char *name = stb__to_utf8((stb__wchar *)data.name);
+            if (name == NULL) {
+               fprintf(stderr, "%s to convert '%S' to %s!\n", "Unable", data.name, "utf8");
+               continue;
+            }
+            is_subdir = !!(data.attrib & _A_SUBDIR);
+            #else
+            char *name = data->d_name;
+            if (!stb_strscpy(buffer+n,name,sizeof(buffer)-n))
+               break;
+            // Could follow DT_LNK, but would need to check for recursive links.
+            is_subdir = !!(data->d_type & DT_DIR);
+            #endif
+
+            if (is_subdir == return_subdirs) {
+               if (!is_subdir || !isdotdirname(name)) {
+                  if (!mask || stb_wildmatchi(mask, name)) {
+                     char buffer[4096],*p=buffer;
+                     if ( stb_snprintf(buffer, sizeof(buffer), "%s%s", with_slash, name) < 0 )
+                        break;
+                     if (buffer[0] == '.' && buffer[1] == '/')
+                        p = buffer+2;
+                     stb_arr_push(results, stb_p_strdup(p));
+                  }
+               }
+            }
+         }
+         #ifdef WIN32
+         while (0 == _wfindnext(z, &data));
+         #else
+         while ((data = readdir(z)) != NULL);
+         #endif
+      }
+      #ifdef WIN32
+         _findclose(z);
+      #else
+         closedir(z);
+      #endif
+   }
+   return results;
+}
+
+char **stb_readdir_files  (char *dir) { return readdir_raw(dir, 0, NULL); }
+char **stb_readdir_subdirs(char *dir) { return readdir_raw(dir, 1, NULL); }
+char **stb_readdir_files_mask(char *dir, char *wild) { return readdir_raw(dir, 0, wild); }
+char **stb_readdir_subdirs_mask(char *dir, char *wild) { return readdir_raw(dir, 1, wild); }
+
+int stb__rec_max=0x7fffffff;
+static char **stb_readdir_rec(char **sofar, char *dir, char *filespec)
+{
+   char **files;
+   char ** dirs;
+   char **p;
+
+   if (stb_arr_len(sofar) >= stb__rec_max) return sofar;
+
+   files = stb_readdir_files_mask(dir, filespec);
+   stb_arr_for(p, files) {
+      stb_arr_push(sofar, stb_p_strdup(*p));
+      if (stb_arr_len(sofar) >= stb__rec_max) break;
+   }
+   stb_readdir_free(files);
+   if (stb_arr_len(sofar) >= stb__rec_max) return sofar;
+
+   dirs = stb_readdir_subdirs(dir);
+   stb_arr_for(p, dirs)
+      sofar = stb_readdir_rec(sofar, *p, filespec);
+   stb_readdir_free(dirs);
+   return sofar;
+}
+
+char **stb_readdir_recursive(char *dir, char *filespec)
+{
+   return stb_readdir_rec(NULL, dir, filespec);
+}
+
+void stb_delete_directory_recursive(char *dir)
+{
+   char **list = stb_readdir_subdirs(dir);
+   int i;
+   for (i=0; i < stb_arr_len(list); ++i)
+      stb_delete_directory_recursive(list[i]);
+   stb_arr_free(list);
+   list = stb_readdir_files(dir);
+   for (i=0; i < stb_arr_len(list); ++i)
+      if (!remove(list[i])) {
+         // on windows, try again after making it writeable; don't ALWAYS
+         // do this first since that would be slow in the normal case
+         #ifdef _MSC_VER
+         _chmod(list[i], _S_IWRITE);
+         remove(list[i]);
+         #endif
+      }
+   stb_arr_free(list);
+   stb__windows(_rmdir,rmdir)(dir);
+}
+
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//   construct trees from filenames; useful for cmirror summaries
+
+typedef struct stb_dirtree2 stb_dirtree2;
+
+struct stb_dirtree2
+{
+   stb_dirtree2 **subdirs;
+
+   // make convenient for stb_summarize_tree
+   int num_subdir;
+   float weight;
+
+   // actual data
+   char *fullpath;
+   char *relpath;
+   char **files;
+};
+
+STB_EXTERN stb_dirtree2 *stb_dirtree2_from_files_relative(char *src, char **filelist, int count);
+STB_EXTERN stb_dirtree2 *stb_dirtree2_from_files(char **filelist, int count);
+STB_EXTERN int stb_dir_is_prefix(char *dir, int dirlen, char *file);
+
+#ifdef STB_DEFINE
+
+int stb_dir_is_prefix(char *dir, int dirlen, char *file)
+{
+   if (dirlen == 0) return STB_TRUE;
+   if (stb_strnicmp(dir, file, dirlen)) return STB_FALSE;
+   if (file[dirlen] == '/' || file[dirlen] == '\\') return STB_TRUE;
+   return STB_FALSE;
+}
+
+stb_dirtree2 *stb_dirtree2_from_files_relative(char *src, char **filelist, int count)
+{
+   char buffer1[1024];
+   int i;
+   int dlen = (int) strlen(src), elen;
+   stb_dirtree2 *d;
+   char ** descendents = NULL;
+   char ** files = NULL;
+   char *s;
+   if (!count) return NULL;
+   // first find all the ones that belong here... note this is will take O(NM) with N files and M subdirs
+   for (i=0; i < count; ++i) {
+      if (stb_dir_is_prefix(src, dlen, filelist[i])) {
+         stb_arr_push(descendents, filelist[i]);
+      }
+   }
+   if (descendents == NULL)
+      return NULL;
+   elen = dlen;
+   // skip a leading slash
+   if (elen == 0 && (descendents[0][0] == '/' || descendents[0][0] == '\\'))
+      ++elen;
+   else if (elen)
+      ++elen;
+   // now extract all the ones that have their root here
+   for (i=0; i < stb_arr_len(descendents);) {
+      if (!stb_strchr2(descendents[i]+elen, '/', '\\')) {
+         stb_arr_push(files, descendents[i]);
+         descendents[i] = descendents[stb_arr_len(descendents)-1];
+         stb_arr_pop(descendents);
+      } else
+         ++i;
+   }
+   // now create a record
+   d = (stb_dirtree2 *) malloc(sizeof(*d));
+   d->files = files;
+   d->subdirs = NULL;
+   d->fullpath = stb_p_strdup(src);
+   s = stb_strrchr2(d->fullpath, '/', '\\');
+   if (s)
+      ++s;
+   else
+      s = d->fullpath;
+   d->relpath = s;
+   // now create the children
+   qsort(descendents, stb_arr_len(descendents), sizeof(char *), stb_qsort_stricmp(0));
+   buffer1[0] = 0;
+   for (i=0; i < stb_arr_len(descendents); ++i) {
+      char buffer2[1024];
+      char *s = descendents[i] + elen, *t;
+      t = stb_strchr2(s, '/', '\\');
+      assert(t);
+      stb_strncpy(buffer2, descendents[i], (int) (t-descendents[i]+1));
+      if (stb_stricmp(buffer1, buffer2)) {
+         stb_dirtree2 *t = stb_dirtree2_from_files_relative(buffer2, descendents, stb_arr_len(descendents));
+         assert(t != NULL);
+         stb_p_strcpy_s(buffer1, sizeof(buffer1), buffer2);
+         stb_arr_push(d->subdirs, t);
+      }
+   }
+   d->num_subdir = stb_arr_len(d->subdirs);
+   d->weight = 0;
+   return d;
+}
+
+stb_dirtree2 *stb_dirtree2_from_files(char **filelist, int count)
+{
+   return stb_dirtree2_from_files_relative((char*) "", filelist, count);
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                 Checksums: CRC-32, ADLER32, SHA-1
+//
+//    CRC-32 and ADLER32 allow streaming blocks
+//    SHA-1 requires either a complete buffer, max size 2^32 - 73
+//          or it can checksum directly from a file, max 2^61
+
+#define STB_ADLER32_SEED   1
+#define STB_CRC32_SEED     0    // note that we logical NOT this in the code
+
+STB_EXTERN stb_uint
+  stb_adler32(stb_uint adler32, stb_uchar *buffer, stb_uint buflen);
+STB_EXTERN stb_uint
+  stb_crc32_block(stb_uint crc32, stb_uchar *buffer, stb_uint len);
+STB_EXTERN stb_uint stb_crc32(unsigned char *buffer, stb_uint len);
+
+STB_EXTERN void stb_sha1(
+  unsigned char output[20], unsigned char *buffer, unsigned int len);
+STB_EXTERN int stb_sha1_file(unsigned char output[20], char *file);
+
+STB_EXTERN void stb_sha1_readable(char display[27], unsigned char sha[20]);
+
+#ifdef STB_DEFINE
+stb_uint stb_crc32_block(stb_uint crc, unsigned char *buffer, stb_uint len)
+{
+   static stb_uint crc_table[256];
+   stb_uint i,j,s;
+   crc = ~crc;
+
+   if (crc_table[1] == 0)
+      for(i=0; i < 256; i++) {
+         for (s=i, j=0; j < 8; ++j)
+            s = (s >> 1) ^ (s & 1 ? 0xedb88320 : 0);
+         crc_table[i] = s;
+      }
+   for (i=0; i < len; ++i)
+      crc = (crc >> 8) ^ crc_table[buffer[i] ^ (crc & 0xff)];
+   return ~crc;
+}
+
+stb_uint stb_crc32(unsigned char *buffer, stb_uint len)
+{
+   return stb_crc32_block(0, buffer, len);
+}
+
+stb_uint stb_adler32(stb_uint adler32, stb_uchar *buffer, stb_uint buflen)
+{
+   const unsigned long ADLER_MOD = 65521;
+   unsigned long s1 = adler32 & 0xffff, s2 = adler32 >> 16;
+   unsigned long blocklen, i;
+
+   blocklen = buflen % 5552;
+   while (buflen) {
+      for (i=0; i + 7 < blocklen; i += 8) {
+         s1 += buffer[0], s2 += s1;
+         s1 += buffer[1], s2 += s1;
+         s1 += buffer[2], s2 += s1;
+         s1 += buffer[3], s2 += s1;
+         s1 += buffer[4], s2 += s1;
+         s1 += buffer[5], s2 += s1;
+         s1 += buffer[6], s2 += s1;
+         s1 += buffer[7], s2 += s1;
+
+         buffer += 8;
+      }
+
+      for (; i < blocklen; ++i)
+         s1 += *buffer++, s2 += s1;
+
+      s1 %= ADLER_MOD, s2 %= ADLER_MOD;
+      buflen -= blocklen;
+      blocklen = 5552;
+   }
+   return (s2 << 16) + s1;
+}
+
+static void stb__sha1(stb_uchar *chunk, stb_uint h[5])
+{
+   int i;
+   stb_uint a,b,c,d,e;
+   stb_uint w[80];
+
+   for (i=0; i < 16; ++i)
+      w[i] = stb_big32(&chunk[i*4]);
+   for (i=16; i < 80; ++i) {
+      stb_uint t;
+      t = w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16];
+      w[i] = (t + t) | (t >> 31);
+   }
+
+   a = h[0];
+   b = h[1];
+   c = h[2];
+   d = h[3];
+   e = h[4];
+
+   #define STB__SHA1(k,f)                                            \
+   {                                                                 \
+      stb_uint temp = (a << 5) + (a >> 27) + (f) + e + (k) + w[i];  \
+      e = d;                                                       \
+      d = c;                                                     \
+      c = (b << 30) + (b >> 2);                               \
+      b = a;                                              \
+      a = temp;                                    \
+   }
+
+   i=0;
+   for (; i < 20; ++i) STB__SHA1(0x5a827999, d ^ (b & (c ^ d))       );
+   for (; i < 40; ++i) STB__SHA1(0x6ed9eba1, b ^ c ^ d               );
+   for (; i < 60; ++i) STB__SHA1(0x8f1bbcdc, (b & c) + (d & (b ^ c)) );
+   for (; i < 80; ++i) STB__SHA1(0xca62c1d6, b ^ c ^ d               );
+
+   #undef STB__SHA1
+
+   h[0] += a;
+   h[1] += b;
+   h[2] += c;
+   h[3] += d;
+   h[4] += e;
+}
+
+void stb_sha1(stb_uchar output[20], stb_uchar *buffer, stb_uint len)
+{
+   unsigned char final_block[128];
+   stb_uint end_start, final_len, j;
+   int i;
+
+   stb_uint h[5];
+
+   h[0] = 0x67452301;
+   h[1] = 0xefcdab89;
+   h[2] = 0x98badcfe;
+   h[3] = 0x10325476;
+   h[4] = 0xc3d2e1f0;
+
+   // we need to write padding to the last one or two
+   // blocks, so build those first into 'final_block'
+
+   // we have to write one special byte, plus the 8-byte length
+
+   // compute the block where the data runs out
+   end_start = len & ~63;
+
+   // compute the earliest we can encode the length
+   if (((len+9) & ~63) == end_start) {
+      // it all fits in one block, so fill a second-to-last block
+      end_start -= 64;
+   }
+
+   final_len = end_start + 128;
+
+   // now we need to copy the data in
+   assert(end_start + 128 >= len+9);
+   assert(end_start < len || len < 64-9);
+
+   j = 0;
+   if (end_start > len)
+      j = (stb_uint) - (int) end_start;
+
+   for (; end_start + j < len; ++j)
+      final_block[j] = buffer[end_start + j];
+   final_block[j++] = 0x80;
+   while (j < 128-5) // 5 byte length, so write 4 extra padding bytes
+      final_block[j++] = 0;
+   // big-endian size
+   final_block[j++] = len >> 29;
+   final_block[j++] = len >> 21;
+   final_block[j++] = len >> 13;
+   final_block[j++] = len >>  5;
+   final_block[j++] = len <<  3;
+   assert(j == 128 && end_start + j == final_len);
+
+   for (j=0; j < final_len; j += 64) { // 512-bit chunks
+      if (j+64 >= end_start+64)
+         stb__sha1(&final_block[j - end_start], h);
+      else
+         stb__sha1(&buffer[j], h);
+   }
+
+   for (i=0; i < 5; ++i) {
+      output[i*4 + 0] = h[i] >> 24;
+      output[i*4 + 1] = h[i] >> 16;
+      output[i*4 + 2] = h[i] >>  8;
+      output[i*4 + 3] = h[i] >>  0;
+   }
+}
+
+#ifdef _MSC_VER
+int stb_sha1_file(stb_uchar output[20], char *file)
+{
+   int i;
+   stb_uint64 length=0;
+   unsigned char buffer[128];
+
+   FILE *f = stb__fopen(file, "rb");
+   stb_uint h[5];
+
+   if (f == NULL) return 0; // file not found
+
+   h[0] = 0x67452301;
+   h[1] = 0xefcdab89;
+   h[2] = 0x98badcfe;
+   h[3] = 0x10325476;
+   h[4] = 0xc3d2e1f0;
+
+   for(;;) {
+      size_t n = fread(buffer, 1, 64, f);
+      if (n == 64) {
+         stb__sha1(buffer, h);
+         length += n;
+      } else {
+         int block = 64;
+
+         length += n;
+
+         buffer[n++] = 0x80;
+
+         // if there isn't enough room for the length, double the block
+         if (n + 8 > 64)
+            block = 128;
+
+         // pad to end
+         memset(buffer+n, 0, block-8-n);
+
+         i = block - 8;
+         buffer[i++] = (stb_uchar) (length >> 53);
+         buffer[i++] = (stb_uchar) (length >> 45);
+         buffer[i++] = (stb_uchar) (length >> 37);
+         buffer[i++] = (stb_uchar) (length >> 29);
+         buffer[i++] = (stb_uchar) (length >> 21);
+         buffer[i++] = (stb_uchar) (length >> 13);
+         buffer[i++] = (stb_uchar) (length >>  5);
+         buffer[i++] = (stb_uchar) (length <<  3);
+         assert(i == block);
+         stb__sha1(buffer, h);
+         if (block == 128)
+            stb__sha1(buffer+64, h);
+         else
+            assert(block == 64);
+         break;
+      }
+   }
+   fclose(f);
+
+   for (i=0; i < 5; ++i) {
+      output[i*4 + 0] = h[i] >> 24;
+      output[i*4 + 1] = h[i] >> 16;
+      output[i*4 + 2] = h[i] >>  8;
+      output[i*4 + 3] = h[i] >>  0;
+   }
+
+   return 1;
+}
+#endif // _MSC_VER
+
+// client can truncate this wherever they like
+void stb_sha1_readable(char display[27], unsigned char sha[20])
+{
+   char encoding[65] = "0123456789abcdefghijklmnopqrstuv"
+                       "wxyzABCDEFGHIJKLMNOPQRSTUVWXYZ%$";
+   int num_bits = 0, acc=0;
+   int i=0,o=0;
+   while (o < 26) {
+      int v;
+      // expand the accumulator
+      if (num_bits < 6) {
+         assert(i != 20);
+         acc += sha[i++] << num_bits;
+         num_bits += 8;
+      }
+      v = acc & ((1 << 6) - 1);
+      display[o++] = encoding[v];
+      acc >>= 6;
+      num_bits -= 6;
+   }
+   assert(num_bits == 20*8 - 26*6);
+   display[o++] = encoding[acc];
+}
+
+#endif // STB_DEFINE
+
+///////////////////////////////////////////////////////////
+//
+// simplified WINDOWS registry interface... hopefully
+// we'll never actually use this?
+
+#if defined(_WIN32)
+
+STB_EXTERN void * stb_reg_open(const char *mode, const char *where); // mode: "rHKLM" or "rHKCU" or "w.."
+STB_EXTERN void   stb_reg_close(void *reg);
+STB_EXTERN int    stb_reg_read(void *zreg, const char *str, void *data, unsigned long len);
+STB_EXTERN int    stb_reg_read_string(void *zreg, const char *str, char *data, int len);
+STB_EXTERN void   stb_reg_write(void *zreg, const char *str, const void *data, unsigned long len);
+STB_EXTERN void   stb_reg_write_string(void *zreg, const char *str, const char *data);
+
+#if defined(STB_DEFINE) && !defined(STB_NO_REGISTRY)
+
+#define STB_HAS_REGISTRY
+
+#ifndef _WINDOWS_
+
+#define HKEY void *
+
+STB_EXTERN __declspec(dllimport) long __stdcall RegCloseKey ( HKEY hKey );
+STB_EXTERN __declspec(dllimport) long __stdcall RegCreateKeyExA ( HKEY hKey, const char * lpSubKey,
+    int  Reserved, char * lpClass, int  dwOptions,
+    int samDesired, void *lpSecurityAttributes,     HKEY * phkResult,     int * lpdwDisposition );
+STB_EXTERN __declspec(dllimport) long __stdcall RegDeleteKeyA ( HKEY hKey, const char * lpSubKey );
+STB_EXTERN __declspec(dllimport) long __stdcall RegQueryValueExA ( HKEY hKey, const char * lpValueName,
+    int * lpReserved, unsigned long * lpType, unsigned char * lpData, unsigned long * lpcbData );
+STB_EXTERN __declspec(dllimport) long __stdcall RegSetValueExA ( HKEY hKey, const char * lpValueName,
+    int  Reserved, int  dwType, const unsigned char* lpData, int  cbData );
+STB_EXTERN __declspec(dllimport) long __stdcall  RegOpenKeyExA ( HKEY hKey, const char * lpSubKey,
+    int ulOptions, int samDesired, HKEY * phkResult );
+
+#endif // _WINDOWS_
+
+#define STB__REG_OPTION_NON_VOLATILE  0
+#define STB__REG_KEY_ALL_ACCESS       0x000f003f
+#define STB__REG_KEY_READ             0x00020019
+
+#ifdef _M_AMD64
+#define STB__HKEY_CURRENT_USER        0x80000001ull
+#define STB__HKEY_LOCAL_MACHINE       0x80000002ull
+#else
+#define STB__HKEY_CURRENT_USER        0x80000001
+#define STB__HKEY_LOCAL_MACHINE       0x80000002
+#endif
+
+void *stb_reg_open(const char *mode, const char *where)
+{
+   long res;
+   HKEY base;
+   HKEY zreg;
+   if (!stb_stricmp(mode+1, "cu") || !stb_stricmp(mode+1, "hkcu"))
+      base = (HKEY) STB__HKEY_CURRENT_USER;
+   else if (!stb_stricmp(mode+1, "lm") || !stb_stricmp(mode+1, "hklm"))
+      base = (HKEY) STB__HKEY_LOCAL_MACHINE;
+   else
+      return NULL;
+
+   if (mode[0] == 'r')
+      res = RegOpenKeyExA(base, where, 0, STB__REG_KEY_READ, &zreg);
+   else if (mode[0] == 'w')
+      res = RegCreateKeyExA(base, where,  0, NULL, STB__REG_OPTION_NON_VOLATILE, STB__REG_KEY_ALL_ACCESS, NULL, &zreg, NULL);
+   else
+      return NULL;
+
+   return res ? NULL : zreg;
+}
+
+void stb_reg_close(void *reg)
+{
+   RegCloseKey((HKEY) reg);
+}
+
+#define STB__REG_SZ         1
+#define STB__REG_BINARY     3
+#define STB__REG_DWORD      4
+
+int stb_reg_read(void *zreg, const char *str, void *data, unsigned long len)
+{
+   unsigned long type;
+   unsigned long alen = len;
+   if (0 == RegQueryValueExA((HKEY) zreg, str, 0, &type, (unsigned char *) data, &len))
+      if (type == STB__REG_BINARY || type == STB__REG_SZ || type == STB__REG_DWORD) {
+         if (len < alen)
+            *((char *) data + len) = 0;
+         return 1;
+      }
+   return 0;
+}
+
+void stb_reg_write(void *zreg, const char *str, const void *data, unsigned long len)
+{
+   if (zreg)
+      RegSetValueExA((HKEY) zreg, str, 0, STB__REG_BINARY, (const unsigned char *) data, len);
+}
+
+int stb_reg_read_string(void *zreg, const char *str, char *data, int len)
+{
+   if (!stb_reg_read(zreg, str, data, len)) return 0;
+   data[len-1] = 0; // force a 0 at the end of the string no matter what
+   return 1;
+}
+
+void stb_reg_write_string(void *zreg, const char *str, const char *data)
+{
+   if (zreg)
+      RegSetValueExA((HKEY) zreg, str, 0, STB__REG_SZ, (const unsigned char *)  data, (int) strlen(data)+1);
+}
+#endif  // STB_DEFINE
+#endif  // _WIN32
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//     stb_cfg - This is like the registry, but the config info
+//               is all stored in plain old files where we can
+//               backup and restore them easily. The LOCATION of
+//               the config files is gotten from... the registry!
+
+#ifndef STB_NO_STB_STRINGS
+typedef struct stb_cfg_st stb_cfg;
+
+STB_EXTERN stb_cfg * stb_cfg_open(char *config, const char *mode); // mode = "r", "w"
+STB_EXTERN void      stb_cfg_close(stb_cfg *cfg);
+STB_EXTERN int       stb_cfg_read(stb_cfg *cfg, char *key, void *value, int len);
+STB_EXTERN void      stb_cfg_write(stb_cfg *cfg, char *key, void *value, int len);
+STB_EXTERN int       stb_cfg_read_string(stb_cfg *cfg, char *key, char *value, int len);
+STB_EXTERN void      stb_cfg_write_string(stb_cfg *cfg, char *key, char *value);
+STB_EXTERN int       stb_cfg_delete(stb_cfg *cfg, char *key);
+STB_EXTERN void      stb_cfg_set_directory(char *dir);
+
+#ifdef STB_DEFINE
+
+typedef struct
+{
+   char *key;
+   void *value;
+   int value_len;
+} stb__cfg_item;
+
+struct stb_cfg_st
+{
+   stb__cfg_item *data;
+   char *loaded_file;   // this needs to be freed
+   FILE *f; // write the data to this file on close
+};
+
+static const char *stb__cfg_sig = "sTbCoNfIg!\0\0";
+static char stb__cfg_dir[512];
+STB_EXTERN void stb_cfg_set_directory(char *dir)
+{
+   stb_p_strcpy_s(stb__cfg_dir, sizeof(stb__cfg_dir), dir);
+}
+
+STB_EXTERN stb_cfg * stb_cfg_open(char *config, const char *mode)
+{
+   size_t len;
+   stb_cfg *z;
+   char file[512];
+   if (mode[0] != 'r' && mode[0] != 'w') return NULL;
+
+   if (!stb__cfg_dir[0]) {
+      #ifdef _WIN32
+      stb_p_strcpy_s(stb__cfg_dir, sizeof(stb__cfg_dir), "c:/stb");
+      #else
+      strcpy(stb__cfg_dir, "~/.stbconfig");
+      #endif
+
+      #ifdef STB_HAS_REGISTRY
+      {
+         void *reg = stb_reg_open("rHKLM", "Software\\SilverSpaceship\\stb");
+         if (reg) {
+            stb_reg_read_string(reg, "config_dir", stb__cfg_dir, sizeof(stb__cfg_dir));
+            stb_reg_close(reg);
+         }
+      }
+      #endif
+   }
+
+   stb_p_sprintf(file stb_p_size(sizeof(file)), "%s/%s.cfg", stb__cfg_dir, config);
+
+   z = (stb_cfg *) stb_malloc(0, sizeof(*z));
+   z->data = NULL;
+
+   z->loaded_file = stb_filec(file, &len);
+   if (z->loaded_file) {
+      char *s = z->loaded_file;
+      if (!memcmp(s, stb__cfg_sig, 12)) {
+         char *s = z->loaded_file + 12;
+         while (s < z->loaded_file + len) {
+            stb__cfg_item a;
+            int n = *(stb_int16 *) s;
+            a.key = s+2;
+            s = s+2 + n;
+            a.value_len = *(int *) s;
+            s += 4;
+            a.value = s;
+            s += a.value_len;
+            stb_arr_push(z->data, a);
+         }
+         assert(s == z->loaded_file + len);
+      }
+   }
+
+   if (mode[0] == 'w')
+      z->f = stb_p_fopen(file, "wb");
+   else
+      z->f = NULL;
+
+   return z;
+}
+
+void stb_cfg_close(stb_cfg *z)
+{
+   if (z->f) {
+      int i;
+      // write the file out
+      fwrite(stb__cfg_sig, 12, 1, z->f);
+      for (i=0; i < stb_arr_len(z->data); ++i) {
+         stb_int16 n = (stb_int16) strlen(z->data[i].key)+1;
+         fwrite(&n, 2, 1, z->f);
+         fwrite(z->data[i].key, n, 1, z->f);
+         fwrite(&z->data[i].value_len, 4, 1, z->f);
+         fwrite(z->data[i].value, z->data[i].value_len, 1, z->f);
+      }
+      fclose(z->f);
+   }
+   stb_arr_free(z->data);
+   stb_free(z);
+}
+
+int stb_cfg_read(stb_cfg *z, char *key, void *value, int len)
+{
+   int i;
+   for (i=0; i < stb_arr_len(z->data); ++i) {
+      if (!stb_stricmp(z->data[i].key, key)) {
+         int n = stb_min(len, z->data[i].value_len);
+         memcpy(value, z->data[i].value, n);
+         if (n < len)
+            *((char *) value + n) = 0;
+         return 1;
+      }
+   }
+   return 0;
+}
+
+void stb_cfg_write(stb_cfg *z, char *key, void *value, int len)
+{
+   int i;
+   for (i=0; i < stb_arr_len(z->data); ++i)
+      if (!stb_stricmp(z->data[i].key, key))
+         break;
+   if (i == stb_arr_len(z->data)) {
+      stb__cfg_item p;
+      p.key = stb_strdup(key, z);
+      p.value = NULL;
+      p.value_len = 0;
+      stb_arr_push(z->data, p);
+   }
+   z->data[i].value = stb_malloc(z, len);
+   z->data[i].value_len = len;
+   memcpy(z->data[i].value, value, len);
+}
+
+int stb_cfg_delete(stb_cfg *z, char *key)
+{
+   int i;
+   for (i=0; i < stb_arr_len(z->data); ++i)
+      if (!stb_stricmp(z->data[i].key, key)) {
+         stb_arr_fastdelete(z->data, i);
+         return 1;
+      }
+   return 0;
+}
+
+int stb_cfg_read_string(stb_cfg *z, char *key, char *value, int len)
+{
+   if (!stb_cfg_read(z, key, value, len)) return 0;
+   value[len-1] = 0;
+   return 1;
+}
+
+void stb_cfg_write_string(stb_cfg *z, char *key, char *value)
+{
+   stb_cfg_write(z, key, value, (int) strlen(value)+1);
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//     stb_dirtree  - load a description of a directory tree
+//                      uses a cache and stat()s the directories for changes
+//                      MUCH faster on NTFS, _wrong_ on FAT32, so should
+//                      ignore the db on FAT32
+
+#ifdef _WIN32
+
+typedef struct
+{
+   char   * path;           // full path from passed-in root
+   time_t   last_modified;
+   int      num_files;
+   int      flag;
+} stb_dirtree_dir;
+
+typedef struct
+{
+   char *name;              // name relative to path
+   int   dir;               // index into dirs[] array
+   stb_int64 size;      // size, max 4GB
+   time_t   last_modified;
+   int      flag;
+} stb_dirtree_file;
+
+typedef struct
+{
+   stb_dirtree_dir  *dirs;
+   stb_dirtree_file *files;
+
+   // internal use
+   void             * string_pool;   // used to free data en masse
+} stb_dirtree;
+
+extern void         stb_dirtree_free          ( stb_dirtree *d );
+extern stb_dirtree *stb_dirtree_get           ( char *dir);
+extern stb_dirtree *stb_dirtree_get_dir       ( char *dir, char *cache_dir);
+extern stb_dirtree *stb_dirtree_get_with_file ( char *dir, char *cache_file);
+
+// get a list of all the files recursively underneath 'dir'
+//
+// cache_file is used to store a copy of the directory tree to speed up
+// later calls. It must be unique to 'dir' and the current working
+// directory! Otherwise who knows what will happen (a good solution
+// is to put it _in_ dir, but this API doesn't force that).
+//
+// Also, it might be possible to break this if you have two different processes
+// do a call to stb_dirtree_get() with the same cache file at about the same
+// time, but I _think_ it might just work.
+
+// i needed to build an identical data structure representing the state of
+// a mirrored copy WITHOUT bothering to rescan it (i.e. we're mirroring to
+// it WITHOUT scanning it, e.g. it's over the net), so this requires access
+// to all of the innards.
+extern void stb_dirtree_db_add_dir(stb_dirtree *active, char *path, time_t last);
+extern void stb_dirtree_db_add_file(stb_dirtree *active, char *name, int dir, stb_int64 size, time_t last);
+extern void stb_dirtree_db_read(stb_dirtree *target, char *filename, char *dir);
+extern void stb_dirtree_db_write(stb_dirtree *target, char *filename, char *dir);
+
+#ifdef STB_DEFINE
+static void stb__dirtree_add_dir(char *path, time_t last, stb_dirtree *active)
+{
+   stb_dirtree_dir d;
+   d.last_modified = last;
+   d.num_files = 0;
+   d.path = stb_strdup(path, active->string_pool);
+   stb_arr_push(active->dirs, d);
+}
+
+static void stb__dirtree_add_file(char *name, int dir, stb_int64 size, time_t last, stb_dirtree *active)
+{
+   stb_dirtree_file f;
+   f.dir = dir;
+   f.size = size;
+   f.last_modified = last;
+   f.name = stb_strdup(name, active->string_pool);
+   ++active->dirs[dir].num_files;
+   stb_arr_push(active->files, f);
+}
+
+// version 02 supports > 4GB files
+static char stb__signature[12] = { 's', 'T', 'b', 'D', 'i', 'R', 't', 'R', 'e', 'E', '0', '2' };
+
+static void stb__dirtree_save_db(char *filename, stb_dirtree *data, char *root)
+{
+   int i, num_dirs_final=0, num_files_final;
+   char *info = root ? root : (char*)"";
+   int *remap;
+   FILE *f = stb_p_fopen(filename, "wb");
+   if (!f) return;
+
+   fwrite(stb__signature, sizeof(stb__signature), 1, f);
+   fwrite(info, strlen(info)+1, 1, f);
+   // need to be slightly tricky and not write out NULLed directories, nor the root
+
+   // build remapping table of all dirs we'll be writing out
+   remap = (int *) malloc(sizeof(remap[0]) * stb_arr_len(data->dirs));
+   for (i=0; i < stb_arr_len(data->dirs); ++i) {
+      if (data->dirs[i].path == NULL || (root && 0==stb_stricmp(data->dirs[i].path, root))) {
+         remap[i] = -1;
+      } else {
+         remap[i] = num_dirs_final++;
+      }
+   }
+
+   fwrite(&num_dirs_final, 4, 1, f);
+   for (i=0; i < stb_arr_len(data->dirs); ++i) {
+      if (remap[i] >= 0) {
+         fwrite(&data->dirs[i].last_modified, 4, 1, f);
+         stb_fput_string(f, data->dirs[i].path);
+      }
+   }
+
+   num_files_final = 0;
+   for (i=0; i < stb_arr_len(data->files); ++i)
+      if (remap[data->files[i].dir] >= 0 && data->files[i].name)
+         ++num_files_final;
+
+   fwrite(&num_files_final, 4, 1, f);
+   for (i=0; i < stb_arr_len(data->files); ++i) {
+      if (remap[data->files[i].dir] >= 0 && data->files[i].name) {
+         stb_fput_ranged(f, remap[data->files[i].dir], 0, num_dirs_final);
+         stb_fput_varlen64(f, data->files[i].size);
+         fwrite(&data->files[i].last_modified, 4, 1, f);
+         stb_fput_string(f, data->files[i].name);
+      }
+   }
+
+   fclose(f);
+}
+
+// note: stomps any existing data, rather than appending
+static void stb__dirtree_load_db(char *filename, stb_dirtree *data, char *dir)
+{
+   char sig[2048];
+   int i,n;
+   FILE *f = stb_p_fopen(filename, "rb");
+
+   if (!f) return;
+
+   data->string_pool = stb_malloc(0,1);
+
+   fread(sig, sizeof(stb__signature), 1, f);
+   if (memcmp(stb__signature, sig, sizeof(stb__signature))) { fclose(f); return; }
+   if (!fread(sig, strlen(dir)+1, 1, f))                    { fclose(f); return; }
+   if (stb_stricmp(sig,dir))                                { fclose(f); return; }
+
+   // we can just read them straight in, because they're guaranteed to be valid
+   fread(&n, 4, 1, f);
+   stb_arr_setlen(data->dirs, n);
+   for(i=0; i < stb_arr_len(data->dirs); ++i) {
+      fread(&data->dirs[i].last_modified, 4, 1, f);
+      data->dirs[i].path = stb_fget_string(f, data->string_pool);
+      if (data->dirs[i].path == NULL) goto bail;
+   }
+   fread(&n, 4, 1, f);
+   stb_arr_setlen(data->files, n);
+   for (i=0; i < stb_arr_len(data->files); ++i) {
+      data->files[i].dir  = stb_fget_ranged(f, 0, stb_arr_len(data->dirs));
+      data->files[i].size = stb_fget_varlen64(f);
+      fread(&data->files[i].last_modified, 4, 1, f);
+      data->files[i].name = stb_fget_string(f, data->string_pool);
+      if (data->files[i].name == NULL) goto bail;
+   }
+
+   if (0) {
+      bail:
+         stb_arr_free(data->dirs);
+         stb_arr_free(data->files);
+   }
+   fclose(f);
+}
+
+FILE *hlog;
+
+static int stb__dircount, stb__dircount_mask, stb__showfile;
+static void stb__dirtree_scandir(char *path, time_t last_time, stb_dirtree *active)
+{
+   // this is dumb depth first; theoretically it might be faster
+   // to fully traverse each directory before visiting its children,
+   // but it's complicated and didn't seem like a gain in the test app
+
+   int n;
+
+   struct _wfinddatai64_t c_file;
+   long hFile;
+   stb__wchar full_path[1024];
+   int has_slash;
+   if (stb__showfile) printf("<");
+
+   has_slash = (path[0] && path[strlen(path)-1] == '/');
+
+   // @TODO: do this concatenation without using swprintf to avoid this mess:
+#if (defined(_MSC_VER) && _MSC_VER < 1400) // || (defined(__clang__))
+   // confusingly, Windows Kits\10 needs to go down this path?!?
+   // except now it doesn't, I don't know what changed
+   if (has_slash)
+      swprintf(full_path, L"%s*", stb__from_utf8(path));
+   else
+      swprintf(full_path, L"%s/*", stb__from_utf8(path));
+#else
+   if (has_slash)
+      swprintf((wchar_t *) full_path, (size_t) 1024, L"%s*", (wchar_t *) stb__from_utf8(path));
+   else
+      swprintf((wchar_t *) full_path, (size_t) 1024, L"%s/*", (wchar_t *) stb__from_utf8(path));
+#endif
+
+   // it's possible this directory is already present: that means it was in the
+   // cache, but its parent wasn't... in that case, we're done with it
+   if (stb__showfile) printf("C[%d]", stb_arr_len(active->dirs));
+   for (n=0; n < stb_arr_len(active->dirs); ++n)
+      if (0 == stb_stricmp(active->dirs[n].path, path)) {
+         if (stb__showfile) printf("D");
+         return;
+      }
+   if (stb__showfile) printf("E");
+
+   // otherwise, we need to add it
+   stb__dirtree_add_dir(path, last_time, active);
+   n = stb_arr_lastn(active->dirs);
+
+   if (stb__showfile) printf("[");
+   if( (hFile = (long) _wfindfirsti64( (wchar_t *) full_path, &c_file )) != -1L ) {
+      do {
+         if (stb__showfile) printf(")");
+         if (c_file.attrib & _A_SUBDIR) {
+            // ignore subdirectories starting with '.', e.g. "." and ".."
+            if (c_file.name[0] != '.') {
+               char *new_path = (char *) full_path;
+               char *temp = stb__to_utf8((stb__wchar *) c_file.name);
+
+               if (has_slash)
+                  stb_p_sprintf(new_path stb_p_size(sizeof(full_path)), "%s%s", path, temp);
+               else
+                  stb_p_sprintf(new_path stb_p_size(sizeof(full_path)), "%s/%s", path, temp);
+
+               if (stb__dircount_mask) {
+                  ++stb__dircount;
+                  if (!(stb__dircount & stb__dircount_mask)) {
+                     char dummy_path[128], *pad;
+                     stb_strncpy(dummy_path, new_path, sizeof(dummy_path)-1);
+                     if (strlen(dummy_path) > 96) {
+                        stb_p_strcpy_s(dummy_path+96/2-1,128, "...");
+                        stb_p_strcpy_s(dummy_path+96/2+2,128, new_path + strlen(new_path)-96/2+2);
+                     }
+                     pad = dummy_path + strlen(dummy_path);
+                     while (pad < dummy_path+98)
+                        *pad++ = ' ';
+                     *pad = 0;
+                     printf("%s\r", dummy_path);
+                     #if 0
+                     if (hlog == 0) {
+                        hlog = stb_p_fopen("c:/x/temp.log", "w");
+                        fprintf(hlog, "%s\n", dummy_path);
+                     }
+                     #endif
+                  }
+               }
+
+               stb__dirtree_scandir(new_path, c_file.time_write, active);
+            }
+         } else {
+            char *temp = stb__to_utf8((stb__wchar *) c_file.name);
+            stb__dirtree_add_file(temp, n, c_file.size, c_file.time_write, active);
+         }
+         if (stb__showfile) printf("(");
+      } while( _wfindnexti64( hFile, &c_file ) == 0 );
+      if (stb__showfile) printf("]");
+      _findclose( hFile );
+   }
+   if (stb__showfile) printf(">\n");
+}
+
+// scan the database and see if it's all valid
+static int stb__dirtree_update_db(stb_dirtree *db, stb_dirtree *active)
+{
+   int changes_detected = STB_FALSE;
+   int i;
+   int *remap;
+   int *rescan=NULL;
+   remap = (int *) malloc(sizeof(remap[0]) * stb_arr_len(db->dirs));
+   memset(remap, 0, sizeof(remap[0]) * stb_arr_len(db->dirs));
+   rescan = NULL;
+
+   for (i=0; i < stb_arr_len(db->dirs); ++i) {
+      struct _stat info;
+      if (stb__dircount_mask) {
+         ++stb__dircount;
+         if (!(stb__dircount & stb__dircount_mask)) {
+            printf(".");
+         }
+      }
+      if (0 == _stat(db->dirs[i].path, &info)) {
+         if (info.st_mode & _S_IFDIR) {
+            // it's still a directory, as expected
+            int n = abs((int) (info.st_mtime - db->dirs[i].last_modified));
+            if (n > 1 && n != 3600) {  // the 3600 is a hack because sometimes this jumps for no apparent reason, even when no time zone or DST issues are at play
+               // it's changed! force a rescan
+               // we don't want to scan it until we've stat()d its
+               // subdirs, though, so we queue it
+               if (stb__showfile) printf("Changed: %s - %08x:%08x\n", db->dirs[i].path, (unsigned int) db->dirs[i].last_modified, (unsigned int) info.st_mtime);
+               stb_arr_push(rescan, i);
+               // update the last_mod time
+               db->dirs[i].last_modified = info.st_mtime;
+               // ignore existing files in this dir
+               remap[i] = -1;
+               changes_detected = STB_TRUE;
+            } else {
+               // it hasn't changed, just copy it through unchanged
+               stb__dirtree_add_dir(db->dirs[i].path, db->dirs[i].last_modified, active);
+               remap[i] = stb_arr_lastn(active->dirs);
+            }
+         } else {
+            // this path used to refer to a directory, but now it's a file!
+            // assume that the parent directory is going to be forced to rescan anyway
+            goto delete_entry;
+         }
+      } else {
+        delete_entry:
+         // directory no longer exists, so don't copy it
+         // we don't free it because it's in the string pool now
+         db->dirs[i].path = NULL;
+         remap[i] = -1;
+         changes_detected = STB_TRUE;
+      }
+   }
+
+   // at this point, we have:
+   //
+   //   <rescan> holds a list of directory indices that need to be scanned due to being out of date
+   //   <remap> holds the directory index in <active> for each dir in <db>, if it exists; -1 if not
+   //           directories in <rescan> are not in <active> yet
+
+   // so we can go ahead and remap all the known files right now
+   for (i=0; i < stb_arr_len(db->files); ++i) {
+      int dir = db->files[i].dir;
+      if (remap[dir] >= 0) {
+         stb__dirtree_add_file(db->files[i].name, remap[dir], db->files[i].size, db->files[i].last_modified, active);
+      }
+   }
+
+   // at this point we're done with db->files, and done with remap
+   free(remap);
+
+   // now scan those directories using the standard scan
+   for (i=0; i < stb_arr_len(rescan); ++i) {
+      int z = rescan[i];
+      stb__dirtree_scandir(db->dirs[z].path, db->dirs[z].last_modified, active);
+   }
+   stb_arr_free(rescan);
+
+   return changes_detected;
+}
+
+static void stb__dirtree_free_raw(stb_dirtree *d)
+{
+   stb_free(d->string_pool);
+   stb_arr_free(d->dirs);
+   stb_arr_free(d->files);
+}
+
+stb_dirtree *stb_dirtree_get_with_file(char *dir, char *cache_file)
+{
+   stb_dirtree *output = (stb_dirtree *) malloc(sizeof(*output));
+   stb_dirtree db,active;
+   int prev_dir_count, cache_mismatch;
+
+   char *stripped_dir; // store the directory name without a trailing '/' or '\\'
+
+   // load the database of last-known state on disk
+   db.string_pool = NULL;
+   db.files = NULL;
+   db.dirs = NULL;
+
+   stripped_dir = stb_strip_final_slash(stb_p_strdup(dir));
+
+   if (cache_file != NULL)
+      stb__dirtree_load_db(cache_file, &db, stripped_dir);
+   else if (stb__showfile)
+      printf("No cache file\n");
+
+   active.files = NULL;
+   active.dirs = NULL;
+   active.string_pool = stb_malloc(0,1); // @TODO: share string pools between both?
+
+   // check all the directories in the database; make note if
+   // anything we scanned had changed, and rescan those things
+   cache_mismatch = stb__dirtree_update_db(&db, &active);
+
+   // check the root tree
+   prev_dir_count = stb_arr_len(active.dirs);  // record how many directories we've seen
+
+   stb__dirtree_scandir(stripped_dir, 0, &active);  // no last_modified time available for root
+
+   if (stb__dircount_mask)
+      printf("                                                                              \r");
+
+   // done with the DB; write it back out if any changes, i.e. either
+   //      1. any inconsistency found between cached information and actual disk
+   //   or 2. if scanning the root found any new directories--which we detect because
+   //         more than one directory got added to the active db during that scan
+   if (cache_mismatch || stb_arr_len(active.dirs) > prev_dir_count+1)
+      stb__dirtree_save_db(cache_file, &active, stripped_dir);
+
+   free(stripped_dir);
+
+   stb__dirtree_free_raw(&db);
+
+   *output = active;
+   return output;
+}
+
+stb_dirtree *stb_dirtree_get_dir(char *dir, char *cache_dir)
+{
+   int i;
+   stb_uint8 sha[20];
+   char dir_lower[1024];
+   char cache_file[1024],*s;
+   if (cache_dir == NULL)
+      return stb_dirtree_get_with_file(dir, NULL);
+   stb_p_strcpy_s(dir_lower, sizeof(dir_lower), dir);
+   stb_tolower(dir_lower);
+   stb_sha1(sha, (unsigned char *) dir_lower, (unsigned int) strlen(dir_lower));
+   stb_p_strcpy_s(cache_file, sizeof(cache_file), cache_dir);
+   s = cache_file + strlen(cache_file);
+   if (s[-1] != '/' && s[-1] != '\\') *s++ = '/';
+   stb_p_strcpy_s(s, sizeof(cache_file), "dirtree_");
+   s += strlen(s);
+   for (i=0; i < 8; ++i) {
+      char *hex = (char*)"0123456789abcdef";
+      stb_uint z = sha[i];
+      *s++ = hex[z >> 4];
+      *s++ = hex[z & 15];
+   }
+   stb_p_strcpy_s(s, sizeof(cache_file), ".bin");
+   return stb_dirtree_get_with_file(dir, cache_file);
+}
+
+stb_dirtree *stb_dirtree_get(char *dir)
+{
+   char cache_dir[256];
+   stb_p_strcpy_s(cache_dir, sizeof(cache_dir), "c:/bindata");
+   #ifdef STB_HAS_REGISTRY
+   {
+      void *reg = stb_reg_open("rHKLM", "Software\\SilverSpaceship\\stb");
+      if (reg) {
+         stb_reg_read(reg, "dirtree", cache_dir, sizeof(cache_dir));
+         stb_reg_close(reg);
+      }
+   }
+   #endif
+   return stb_dirtree_get_dir(dir, cache_dir);
+}
+
+void stb_dirtree_free(stb_dirtree *d)
+{
+   stb__dirtree_free_raw(d);
+   free(d);
+}
+
+void stb_dirtree_db_add_dir(stb_dirtree *active, char *path, time_t last)
+{
+   stb__dirtree_add_dir(path, last, active);
+}
+
+void stb_dirtree_db_add_file(stb_dirtree *active, char *name, int dir, stb_int64 size, time_t last)
+{
+   stb__dirtree_add_file(name, dir, size, last, active);
+}
+
+void stb_dirtree_db_read(stb_dirtree *target, char *filename, char *dir)
+{
+   char *s = stb_strip_final_slash(stb_p_strdup(dir));
+   target->dirs = 0;
+   target->files = 0;
+   target->string_pool = 0;
+   stb__dirtree_load_db(filename, target, s);
+   free(s);
+}
+
+void stb_dirtree_db_write(stb_dirtree *target, char *filename, char *dir)
+{
+   stb__dirtree_save_db(filename, target, 0); // don't strip out any directories
+}
+
+#endif // STB_DEFINE
+
+#endif // _WIN32
+#endif // STB_NO_STB_STRINGS
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  STB_MALLOC_WRAPPER
+//
+//    you can use the wrapper functions with your own malloc wrapper,
+//    or define STB_MALLOC_WRAPPER project-wide to have
+//    malloc/free/realloc/strdup all get vectored to it
+
+// this has too many very specific error messages you could google for and find in stb.h,
+// so don't use it if they don't want any stb.h-identifiable strings
+#if defined(STB_DEFINE) && !defined(STB_NO_STB_STRINGS)
+
+typedef struct
+{
+   void *p;
+   char *file;
+   int  line;
+   size_t size;
+} stb_malloc_record;
+
+#ifndef STB_MALLOC_HISTORY_COUNT
+#define STB_MALLOC_HISTORY_COUNT 50 // 800 bytes
+#endif
+
+stb_malloc_record *stb__allocations;
+static int stb__alloc_size, stb__alloc_limit, stb__alloc_mask;
+int stb__alloc_count;
+
+stb_malloc_record stb__alloc_history[STB_MALLOC_HISTORY_COUNT];
+int stb__history_pos;
+
+static int stb__hashfind(void *p)
+{
+   stb_uint32 h = stb_hashptr(p);
+   int s,n = h & stb__alloc_mask;
+   if (stb__allocations[n].p == p)
+      return n;
+   s = stb_rehash(h)|1;
+   for(;;) {
+      if (stb__allocations[n].p == NULL)
+         return -1;
+      n = (n+s) & stb__alloc_mask;
+      if (stb__allocations[n].p == p)
+         return n;
+   }
+}
+
+size_t stb_wrapper_allocsize(void *p)
+{
+   int n = stb__hashfind(p);
+   if (n < 0) return 0;
+   return stb__allocations[n].size;
+}
+
+static int stb__historyfind(void *p)
+{
+   int n = stb__history_pos;
+   int i;
+   for (i=0; i < STB_MALLOC_HISTORY_COUNT; ++i) {
+      if (--n < 0) n = STB_MALLOC_HISTORY_COUNT-1;
+      if (stb__alloc_history[n].p == p)
+         return n;
+   }
+   return -1;
+}
+
+static void stb__add_alloc(void *p, size_t sz, char *file, int line);
+static void stb__grow_alloc(void)
+{
+   int i,old_num = stb__alloc_size;
+   stb_malloc_record *old = stb__allocations;
+   if (stb__alloc_size == 0)
+      stb__alloc_size = 64;
+   else
+      stb__alloc_size *= 2;
+
+   stb__allocations = (stb_malloc_record *) stb__realloc_raw(NULL, stb__alloc_size * sizeof(stb__allocations[0]));
+   if (stb__allocations == NULL)
+      stb_fatal("Internal error: couldn't grow malloc wrapper table");
+   memset(stb__allocations, 0, stb__alloc_size * sizeof(stb__allocations[0]));
+   stb__alloc_limit = (stb__alloc_size*3)>>2;
+   stb__alloc_mask = stb__alloc_size-1;
+
+   stb__alloc_count = 0;
+
+   for (i=0; i < old_num; ++i)
+      if (old[i].p > STB_DEL) {
+         stb__add_alloc(old[i].p, old[i].size, old[i].file, old[i].line);
+         assert(stb__hashfind(old[i].p) >= 0);
+      }
+   for (i=0; i < old_num; ++i)
+      if (old[i].p > STB_DEL)
+         assert(stb__hashfind(old[i].p) >= 0);
+   stb__realloc_raw(old, 0);
+}
+
+static void stb__add_alloc(void *p, size_t sz, char *file, int line)
+{
+   stb_uint32 h;
+   int n;
+   if (stb__alloc_count >= stb__alloc_limit)
+      stb__grow_alloc();
+   h = stb_hashptr(p);
+   n = h & stb__alloc_mask;
+   if (stb__allocations[n].p > STB_DEL) {
+      int s = stb_rehash(h)|1;
+      do {
+         n = (n+s) & stb__alloc_mask;
+      } while (stb__allocations[n].p > STB_DEL);
+   }
+   assert(stb__allocations[n].p == NULL || stb__allocations[n].p == STB_DEL);
+   stb__allocations[n].p = p;
+   stb__allocations[n].size = sz;
+   stb__allocations[n].line = line;
+   stb__allocations[n].file = file;
+   ++stb__alloc_count;
+}
+
+static void stb__remove_alloc(int n, char *file, int line)
+{
+   stb__alloc_history[stb__history_pos] = stb__allocations[n];
+   stb__alloc_history[stb__history_pos].file = file;
+   stb__alloc_history[stb__history_pos].line = line;
+   if (++stb__history_pos == STB_MALLOC_HISTORY_COUNT)
+      stb__history_pos = 0;
+   stb__allocations[n].p = STB_DEL;
+   --stb__alloc_count;
+}
+
+void stb_wrapper_malloc(void *p, size_t sz, char *file, int line)
+{
+   if (!p) return;
+   stb__add_alloc(p,sz,file,line);
+}
+
+void stb_wrapper_free(void *p, char *file, int line)
+{
+   int n;
+
+   if (p == NULL) return;
+
+   n = stb__hashfind(p);
+
+   if (n >= 0)
+      stb__remove_alloc(n, file, line);
+   else {
+      // tried to free something we hadn't allocated!
+      n = stb__historyfind(p);
+      assert(0); /* NOTREACHED */
+      if (n >= 0)
+         stb_fatal("Attempted to free %d-byte block %p at %s:%d previously freed/realloced at %s:%d",
+                       stb__alloc_history[n].size, p,
+                       file, line,
+                       stb__alloc_history[n].file, stb__alloc_history[n].line);
+      else
+         stb_fatal("Attempted to free unknown block %p at %s:%d", p, file,line);
+   }
+}
+
+void stb_wrapper_check(void *p)
+{
+   int n;
+
+   if (p == NULL) return;
+
+   n = stb__hashfind(p);
+
+   if (n >= 0) return;
+
+   for (n=0; n < stb__alloc_size; ++n)
+      if (stb__allocations[n].p == p)
+         stb_fatal("Internal error: pointer %p was allocated, but hash search failed", p);
+
+   // tried to free something that wasn't allocated!
+   n = stb__historyfind(p);
+   if (n >= 0)
+      stb_fatal("Checked %d-byte block %p previously freed/realloced at %s:%d",
+                    stb__alloc_history[n].size, p,
+                    stb__alloc_history[n].file, stb__alloc_history[n].line);
+   stb_fatal("Checked unknown block %p");
+}
+
+void stb_wrapper_realloc(void *p, void *q, size_t sz, char *file, int line)
+{
+   int n;
+   if (p == NULL) { stb_wrapper_malloc(q, sz, file, line); return; }
+   if (q == NULL) return; // nothing happened
+
+   n = stb__hashfind(p);
+   if (n == -1) {
+      // tried to free something we hadn't allocated!
+      // this is weird, though, because we got past the realloc!
+      n = stb__historyfind(p);
+      assert(0); /* NOTREACHED */
+      if (n >= 0)
+         stb_fatal("Attempted to realloc %d-byte block %p at %s:%d previously freed/realloced at %s:%d",
+                       stb__alloc_history[n].size, p,
+                       file, line,
+                       stb__alloc_history[n].file, stb__alloc_history[n].line);
+      else
+         stb_fatal("Attempted to realloc unknown block %p at %s:%d", p, file,line);
+   } else {
+      if (q == p) {
+         stb__allocations[n].size = sz;
+         stb__allocations[n].file = file;
+         stb__allocations[n].line = line;
+      } else {
+         stb__remove_alloc(n, file, line);
+         stb__add_alloc(q,sz,file,line);
+      }
+   }
+}
+
+void stb_wrapper_listall(void (*func)(void *ptr, size_t sz, char *file, int line))
+{
+   int i;
+   for (i=0; i < stb__alloc_size; ++i)
+      if (stb__allocations[i].p > STB_DEL)
+         func(stb__allocations[i].p   , stb__allocations[i].size,
+              stb__allocations[i].file, stb__allocations[i].line);
+}
+
+void stb_wrapper_dump(char *filename)
+{
+   int i;
+   FILE *f = stb_p_fopen(filename, "w");
+   if (!f) return;
+   for (i=0; i < stb__alloc_size; ++i)
+      if (stb__allocations[i].p > STB_DEL)
+         fprintf(f, "%p %7d - %4d %s\n",
+            stb__allocations[i].p   , (int) stb__allocations[i].size,
+            stb__allocations[i].line, stb__allocations[i].file);
+}
+#endif // STB_DEFINE
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                  stb_pointer_set
+//
+//
+// For data structures that support querying by key, data structure
+// classes always hand-wave away the issue of what to do if two entries
+// have the same key: basically, store a linked list of all the nodes
+// which have the same key (a LISP-style list).
+//
+// The thing is, it's not that trivial. If you have an O(log n)
+// lookup data structure, but then n/4 items have the same value,
+// you don't want to spend O(n) time scanning that list when
+// deleting an item if you already have a pointer to the item.
+// (You have to spend O(n) time enumerating all the items with
+// a given key, sure, and you can't accelerate deleting a particular
+// item if you only have the key, not a pointer to the item.)
+//
+// I'm going to call this data structure, whatever it turns out to
+// be, a "pointer set", because we don't store any associated data for
+// items in this data structure, we just answer the question of
+// whether an item is in it or not (it's effectively one bit per pointer).
+// Technically they don't have to be pointers; you could cast ints
+// to (void *) if you want, but you can't store 0 or 1 because of the
+// hash table.
+//
+// Since the fastest data structure we might want to add support for
+// identical-keys to is a hash table with O(1)-ish lookup time,
+// that means that the conceptual "linked list of all items with
+// the same indexed value" that we build needs to have the same
+// performance; that way when we index a table we think is arbitrary
+// ints, but in fact half of them are 0, we don't get screwed.
+//
+// Therefore, it needs to be a hash table, at least when it gets
+// large. On the other hand, when the data has totally arbitrary ints
+// or floats, there won't be many collisions, and we'll have tons of
+// 1-item bitmaps. That will be grossly inefficient as hash tables;
+// trade-off; the hash table is reasonably efficient per-item when
+// it's large, but not when it's small. So we need to do something
+// Judy-like and use different strategies depending on the size.
+//
+// Like Judy, we'll use the bottom bit to encode the strategy:
+//
+//      bottom bits:
+//          00     -   direct pointer
+//          01     -   4-item bucket (16 bytes, no length, NULLs)
+//          10     -   N-item array
+//          11     -   hash table
+
+typedef struct stb_ps stb_ps;
+
+STB_EXTERN int      stb_ps_find  (stb_ps *ps, void *value);
+STB_EXTERN stb_ps * stb_ps_add   (stb_ps *ps, void *value);
+STB_EXTERN stb_ps * stb_ps_remove(stb_ps *ps, void *value);
+STB_EXTERN stb_ps * stb_ps_remove_any(stb_ps *ps, void **value);
+STB_EXTERN void     stb_ps_delete(stb_ps *ps);
+STB_EXTERN int      stb_ps_count (stb_ps *ps);
+
+STB_EXTERN stb_ps * stb_ps_copy  (stb_ps *ps);
+STB_EXTERN int      stb_ps_subset(stb_ps *bigger, stb_ps *smaller);
+STB_EXTERN int      stb_ps_eq    (stb_ps *p0, stb_ps *p1);
+
+STB_EXTERN void ** stb_ps_getlist  (stb_ps *ps, int *count);
+STB_EXTERN int     stb_ps_writelist(stb_ps *ps, void **list, int size );
+
+// enum and fastlist don't allocate storage, but you must consume the
+// list before there's any chance the data structure gets screwed up;
+STB_EXTERN int     stb_ps_enum     (stb_ps *ps, void *data,
+                                       int (*func)(void *value, void*data) );
+STB_EXTERN void ** stb_ps_fastlist(stb_ps *ps, int *count);
+//  result:
+//     returns a list, *count is the length of that list,
+//     but some entries of the list may be invalid;
+//     test with 'stb_ps_fastlist_valid(x)'
+
+#define stb_ps_fastlist_valid(x)   ((stb_uinta) (x) > 1)
+
+#ifdef STB_DEFINE
+
+enum
+{
+   STB_ps_direct = 0,
+   STB_ps_bucket = 1,
+   STB_ps_array  = 2,
+   STB_ps_hash   = 3,
+};
+
+#define STB_BUCKET_SIZE  4
+
+typedef struct
+{
+   void *p[STB_BUCKET_SIZE];
+} stb_ps_bucket;
+#define GetBucket(p)    ((stb_ps_bucket *) ((char *) (p) - STB_ps_bucket))
+#define EncodeBucket(p) ((stb_ps *) ((char *) (p) + STB_ps_bucket))
+
+static void stb_bucket_free(stb_ps_bucket *b)
+{
+   free(b);
+}
+
+static stb_ps_bucket *stb_bucket_create2(void *v0, void *v1)
+{
+   stb_ps_bucket *b = (stb_ps_bucket*) malloc(sizeof(*b));
+   b->p[0] = v0;
+   b->p[1] = v1;
+   b->p[2] = NULL;
+   b->p[3] = NULL;
+   return b;
+}
+
+static stb_ps_bucket * stb_bucket_create3(void **v)
+{
+   stb_ps_bucket *b = (stb_ps_bucket*) malloc(sizeof(*b));
+   b->p[0] = v[0];
+   b->p[1] = v[1];
+   b->p[2] = v[2];
+   b->p[3] = NULL;
+   return b;
+}
+
+
+// could use stb_arr, but this will save us memory
+typedef struct
+{
+   int count;
+   void *p[1];
+} stb_ps_array;
+#define GetArray(p)     ((stb_ps_array *) ((char *) (p) - STB_ps_array))
+#define EncodeArray(p)  ((stb_ps *) ((char *) (p) + STB_ps_array))
+
+static int stb_ps_array_max = 13;
+
+typedef struct
+{
+   int size, mask;
+   int count, count_deletes;
+   int grow_threshhold;
+   int shrink_threshhold;
+   int rehash_threshhold;
+   int any_offset;
+   void *table[1];
+} stb_ps_hash;
+#define GetHash(p)      ((stb_ps_hash *) ((char *) (p) - STB_ps_hash))
+#define EncodeHash(p)   ((stb_ps *) ((char *) (p) + STB_ps_hash))
+
+#define stb_ps_empty(v)   (((stb_uint32) v) <= 1)
+
+static stb_ps_hash *stb_ps_makehash(int size, int old_size, void **old_data)
+{
+   int i;
+   stb_ps_hash *h = (stb_ps_hash *) malloc(sizeof(*h) + (size-1) * sizeof(h->table[0]));
+   assert(stb_is_pow2(size));
+   h->size = size;
+   h->mask = size-1;
+   h->shrink_threshhold = (int) (0.3f * size);
+   h->  grow_threshhold = (int) (0.8f * size);
+   h->rehash_threshhold = (int) (0.9f * size);
+   h->count = 0;
+   h->count_deletes = 0;
+   h->any_offset = 0;
+   memset(h->table, 0, size * sizeof(h->table[0]));
+   for (i=0; i < old_size; ++i)
+        if (!stb_ps_empty((size_t)old_data[i]))
+         stb_ps_add(EncodeHash(h), old_data[i]);
+   return h;
+}
+
+void stb_ps_delete(stb_ps *ps)
+{
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct: break;
+      case STB_ps_bucket: stb_bucket_free(GetBucket(ps)); break;
+      case STB_ps_array : free(GetArray(ps)); break;
+      case STB_ps_hash  : free(GetHash(ps)); break;
+   }
+}
+
+stb_ps *stb_ps_copy(stb_ps *ps)
+{
+   int i;
+   // not a switch: order based on expected performance/power-law distribution
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct: return ps;
+      case STB_ps_bucket: {
+         stb_ps_bucket *n = (stb_ps_bucket *) malloc(sizeof(*n));
+         *n = *GetBucket(ps);
+         return EncodeBucket(n);
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         stb_ps_array *n = (stb_ps_array *) malloc(sizeof(*n) + stb_ps_array_max * sizeof(n->p[0]));
+         n->count = a->count;
+         for (i=0; i < a->count; ++i)
+            n->p[i] = a->p[i];
+         return EncodeArray(n);
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         stb_ps_hash *n = stb_ps_makehash(h->size, h->size, h->table);
+         return EncodeHash(n);
+      }
+   }
+   assert(0); /* NOTREACHED */
+   return NULL;
+}
+
+int stb_ps_find(stb_ps *ps, void *value)
+{
+    int i, code = 3 & (int)(size_t) ps;
+    assert((3 & (int)(size_t) value) == STB_ps_direct);
+   assert(stb_ps_fastlist_valid(value));
+   // not a switch: order based on expected performance/power-law distribution
+   if (code == STB_ps_direct)
+      return value == ps;
+   if (code == STB_ps_bucket) {
+      stb_ps_bucket *b = GetBucket(ps);
+      assert(STB_BUCKET_SIZE == 4);
+      if (b->p[0] == value || b->p[1] == value ||
+          b->p[2] == value || b->p[3] == value)
+          return STB_TRUE;
+      return STB_FALSE;
+   }
+   if (code == STB_ps_array) {
+      stb_ps_array *a = GetArray(ps);
+      for (i=0; i < a->count; ++i)
+         if (a->p[i] == value)
+            return STB_TRUE;
+      return STB_FALSE;
+   } else {
+      stb_ps_hash *h = GetHash(ps);
+      stb_uint32 hash = stb_hashptr(value);
+      stb_uint32 s, n = hash & h->mask;
+      void **t = h->table;
+      if (t[n] == value) return STB_TRUE;
+      if (t[n] == NULL) return STB_FALSE;
+      s = stb_rehash(hash) | 1;
+      do {
+         n = (n + s) & h->mask;
+         if (t[n] == value) return STB_TRUE;
+      } while (t[n] != NULL);
+      return STB_FALSE;
+   }
+}
+
+stb_ps *  stb_ps_add   (stb_ps *ps, void *value)
+{
+   #ifdef STB_DEBUG
+   assert(!stb_ps_find(ps,value));
+   #endif
+   if (value == NULL) return ps; // ignore NULL adds to avoid bad breakage
+    assert((3 & (int)(size_t) value) == STB_ps_direct);
+   assert(stb_ps_fastlist_valid(value));
+   assert(value != STB_DEL);     // STB_DEL is less likely
+
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         if (ps == NULL) return (stb_ps *) value;
+         return EncodeBucket(stb_bucket_create2(ps,value));
+
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         stb_ps_array  *a;
+         assert(STB_BUCKET_SIZE == 4);
+         if (b->p[0] == NULL) { b->p[0] = value; return ps; }
+         if (b->p[1] == NULL) { b->p[1] = value; return ps; }
+         if (b->p[2] == NULL) { b->p[2] = value; return ps; }
+         if (b->p[3] == NULL) { b->p[3] = value; return ps; }
+         a = (stb_ps_array *) malloc(sizeof(*a) + 7 * sizeof(a->p[0])); // 8 slots, must be 2^k
+         memcpy(a->p, b, sizeof(*b));
+         a->p[4] = value;
+         a->count = 5;
+         stb_bucket_free(b);
+         return EncodeArray(a);
+      }
+
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         if (a->count == stb_ps_array_max) {
+            // promote from array to hash
+            stb_ps_hash *h = stb_ps_makehash(2 << stb_log2_ceil(a->count), a->count, a->p);
+            free(a);
+            return stb_ps_add(EncodeHash(h), value);
+         }
+         // do we need to resize the array? the array doubles in size when it
+         // crosses a power-of-two
+         if ((a->count & (a->count-1))==0) {
+            int newsize = a->count*2;
+            // clamp newsize to max if:
+            //    1. it's larger than max
+            //    2. newsize*1.5 is larger than max (to avoid extra resizing)
+            if (newsize + a->count > stb_ps_array_max)
+               newsize = stb_ps_array_max;
+            a = (stb_ps_array *) realloc(a, sizeof(*a) + (newsize-1) * sizeof(a->p[0]));
+         }
+         a->p[a->count++] = value;
+         return EncodeArray(a);
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         stb_uint32 hash = stb_hashptr(value);
+         stb_uint32 n = hash & h->mask;
+         void **t = h->table;
+         // find first NULL or STB_DEL entry
+          if (!stb_ps_empty((size_t)t[n])) {
+            stb_uint32 s = stb_rehash(hash) | 1;
+            do {
+               n = (n + s) & h->mask;
+            } while (!stb_ps_empty((size_t)t[n]));
+         }
+         if (t[n] == STB_DEL)
+            -- h->count_deletes;
+         t[n] = value;
+         ++ h->count;
+         if (h->count == h->grow_threshhold) {
+            stb_ps_hash *h2 = stb_ps_makehash(h->size*2, h->size, t);
+            free(h);
+            return EncodeHash(h2);
+         }
+         if (h->count + h->count_deletes == h->rehash_threshhold) {
+            stb_ps_hash *h2 = stb_ps_makehash(h->size, h->size, t);
+            free(h);
+            return EncodeHash(h2);
+         }
+         return ps;
+      }
+   }
+   return NULL; /* NOTREACHED */
+}
+
+stb_ps *stb_ps_remove(stb_ps *ps, void *value)
+{
+   #ifdef STB_DEBUG
+   assert(stb_ps_find(ps, value));
+   #endif
+    assert((3 & (int)(size_t) value) == STB_ps_direct);
+   if (value == NULL) return ps; // ignore NULL removes to avoid bad breakage
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         return ps == value ? NULL : ps;
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         int count=0;
+         assert(STB_BUCKET_SIZE == 4);
+         if (b->p[0] == value) b->p[0] = NULL; else count += (b->p[0] != NULL);
+         if (b->p[1] == value) b->p[1] = NULL; else count += (b->p[1] != NULL);
+         if (b->p[2] == value) b->p[2] = NULL; else count += (b->p[2] != NULL);
+         if (b->p[3] == value) b->p[3] = NULL; else count += (b->p[3] != NULL);
+         if (count == 1) { // shrink bucket at size 1
+            value = b->p[0];
+            if (value == NULL) value = b->p[1];
+            if (value == NULL) value = b->p[2];
+            if (value == NULL) value = b->p[3];
+            assert(value != NULL);
+            stb_bucket_free(b);
+            return (stb_ps *) value; // return STB_ps_direct of value
+         }
+         return ps;
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         int i;
+         for (i=0; i < a->count; ++i) {
+            if (a->p[i] == value) {
+               a->p[i] = a->p[--a->count];
+               if (a->count == 3) { // shrink to bucket!
+                  stb_ps_bucket *b = stb_bucket_create3(a->p);
+                  free(a);
+                  return EncodeBucket(b);
+               }
+               return ps;
+            }
+         }
+         return ps;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         stb_uint32 hash = stb_hashptr(value);
+         stb_uint32 s, n = hash & h->mask;
+         void **t = h->table;
+         if (t[n] != value) {
+            s = stb_rehash(hash) | 1;
+            do {
+               n = (n + s) & h->mask;
+            } while (t[n] != value);
+         }
+         t[n] = STB_DEL;
+         -- h->count;
+         ++ h->count_deletes;
+         // should we shrink down to an array?
+         if (h->count < stb_ps_array_max) {
+            int n = 1 << stb_log2_floor(stb_ps_array_max);
+            if (h->count < n) {
+               stb_ps_array *a = (stb_ps_array *) malloc(sizeof(*a) + (n-1) * sizeof(a->p[0]));
+               int i,j=0;
+               for (i=0; i < h->size; ++i)
+                    if (!stb_ps_empty((size_t)t[i]))
+                     a->p[j++] = t[i];
+               assert(j == h->count);
+               a->count = j;
+               free(h);
+               return EncodeArray(a);
+            }
+         }
+         if (h->count == h->shrink_threshhold) {
+            stb_ps_hash *h2 = stb_ps_makehash(h->size >> 1, h->size, t);
+            free(h);
+            return EncodeHash(h2);
+         }
+         return ps;
+      }
+   }
+   return ps; /* NOTREACHED */
+}
+
+stb_ps *stb_ps_remove_any(stb_ps *ps, void **value)
+{
+   assert(ps != NULL);
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         *value = ps;
+         return NULL;
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         int count=0, slast=0, last=0;
+         assert(STB_BUCKET_SIZE == 4);
+         if (b->p[0]) { ++count;               last = 0; }
+         if (b->p[1]) { ++count; slast = last; last = 1; }
+         if (b->p[2]) { ++count; slast = last; last = 2; }
+         if (b->p[3]) { ++count; slast = last; last = 3; }
+         *value = b->p[last];
+         b->p[last] = 0;
+         if (count == 2) {
+            void *leftover = b->p[slast]; // second to last
+            stb_bucket_free(b);
+            return (stb_ps *) leftover;
+         }
+         return ps;
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         *value = a->p[a->count-1];
+         if (a->count == 4)
+            return stb_ps_remove(ps, *value);
+         --a->count;
+         return ps;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         void **t = h->table;
+         stb_uint32 n = h->any_offset;
+          while (stb_ps_empty((size_t)t[n]))
+            n = (n + 1) & h->mask;
+         *value = t[n];
+         h->any_offset = (n+1) & h->mask;
+         // check if we need to skip down to the previous type
+         if (h->count-1 < stb_ps_array_max || h->count-1 == h->shrink_threshhold)
+            return stb_ps_remove(ps, *value);
+         t[n] = STB_DEL;
+         -- h->count;
+         ++ h->count_deletes;
+         return ps;
+      }
+   }
+   return ps; /* NOTREACHED */
+}
+
+
+void ** stb_ps_getlist(stb_ps *ps, int *count)
+{
+   int i,n=0;
+   void **p = NULL;
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         if (ps == NULL) { *count = 0; return NULL; }
+         p = (void **) malloc(sizeof(*p) * 1);
+         p[0] = ps;
+         *count = 1;
+         return p;
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         p = (void **) malloc(sizeof(*p) * STB_BUCKET_SIZE);
+         for (i=0; i < STB_BUCKET_SIZE; ++i)
+            if (b->p[i] != NULL)
+               p[n++] = b->p[i];
+         break;
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         p = (void **) malloc(sizeof(*p) * a->count);
+         memcpy(p, a->p, sizeof(*p) * a->count);
+         *count = a->count;
+         return p;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         p = (void **) malloc(sizeof(*p) * h->count);
+         for (i=0; i < h->size; ++i)
+              if (!stb_ps_empty((size_t)h->table[i]))
+               p[n++] = h->table[i];
+         break;
+      }
+   }
+   *count = n;
+   return p;
+}
+
+int stb_ps_writelist(stb_ps *ps, void **list, int size )
+{
+   int i,n=0;
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         if (ps == NULL || size <= 0) return 0;
+         list[0] = ps;
+         return 1;
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         for (i=0; i < STB_BUCKET_SIZE; ++i)
+            if (b->p[i] != NULL && n < size)
+               list[n++] = b->p[i];
+         return n;
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         n = stb_min(size, a->count);
+         memcpy(list, a->p, sizeof(*list) * n);
+         return n;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         if (size <= 0) return 0;
+         for (i=0; i < h->count; ++i) {
+             if (!stb_ps_empty((size_t)h->table[i])) {
+               list[n++] = h->table[i];
+               if (n == size) break;
+            }
+         }
+         return n;
+      }
+   }
+   return 0; /* NOTREACHED */
+}
+
+int stb_ps_enum(stb_ps *ps, void *data, int (*func)(void *value, void *data))
+{
+   int i;
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         if (ps == NULL) return STB_TRUE;
+         return func(ps, data);
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         for (i=0; i < STB_BUCKET_SIZE; ++i)
+            if (b->p[i] != NULL)
+               if (!func(b->p[i], data))
+                  return STB_FALSE;
+         return STB_TRUE;
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         for (i=0; i < a->count; ++i)
+            if (!func(a->p[i], data))
+               return STB_FALSE;
+         return STB_TRUE;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         for (i=0; i < h->count; ++i)
+              if (!stb_ps_empty((size_t)h->table[i]))
+               if (!func(h->table[i], data))
+                  return STB_FALSE;
+         return STB_TRUE;
+      }
+   }
+   return STB_TRUE; /* NOTREACHED */
+}
+
+int stb_ps_count (stb_ps *ps)
+{
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         return ps != NULL;
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         return (b->p[0] != NULL) + (b->p[1] != NULL) +
+                (b->p[2] != NULL) + (b->p[3] != NULL);
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         return a->count;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         return h->count;
+      }
+   }
+   return 0;
+}
+
+void ** stb_ps_fastlist(stb_ps *ps, int *count)
+{
+   static void *storage;
+
+    switch (3 & (int)(size_t) ps) {
+      case STB_ps_direct:
+         if (ps == NULL) { *count = 0; return NULL; }
+         storage = ps;
+         *count = 1;
+         return &storage;
+      case STB_ps_bucket: {
+         stb_ps_bucket *b = GetBucket(ps);
+         *count = STB_BUCKET_SIZE;
+         return b->p;
+      }
+      case STB_ps_array: {
+         stb_ps_array *a = GetArray(ps);
+         *count = a->count;
+         return a->p;
+      }
+      case STB_ps_hash: {
+         stb_ps_hash *h = GetHash(ps);
+         *count = h->size;
+         return h->table;
+      }
+   }
+   return NULL; /* NOTREACHED */
+}
+
+int stb_ps_subset(stb_ps *bigger, stb_ps *smaller)
+{
+   int i, listlen;
+   void **list = stb_ps_fastlist(smaller, &listlen);
+   for(i=0; i < listlen; ++i)
+      if (stb_ps_fastlist_valid(list[i]))
+         if (!stb_ps_find(bigger, list[i]))
+            return 0;
+   return 1;
+}
+
+int stb_ps_eq(stb_ps *p0, stb_ps *p1)
+{
+   if (stb_ps_count(p0) != stb_ps_count(p1))
+      return 0;
+   return stb_ps_subset(p0, p1);
+}
+
+#undef GetBucket
+#undef GetArray
+#undef GetHash
+
+#undef EncodeBucket
+#undef EncodeArray
+#undef EncodeHash
+
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//               Random Numbers via Meresenne Twister or LCG
+//
+
+STB_EXTERN unsigned int  stb_srandLCG(unsigned int seed);
+STB_EXTERN unsigned int  stb_randLCG(void);
+STB_EXTERN double        stb_frandLCG(void);
+
+STB_EXTERN void          stb_srand(unsigned int seed);
+STB_EXTERN unsigned int  stb_rand(void);
+STB_EXTERN double        stb_frand(void);
+STB_EXTERN void          stb_shuffle(void *p, size_t n, size_t sz,
+                                        unsigned int seed);
+STB_EXTERN void stb_reverse(void *p, size_t n, size_t sz);
+
+STB_EXTERN unsigned int  stb_randLCG_explicit(unsigned int  seed);
+
+#define stb_rand_define(x,y)                                         \
+                                                                     \
+   unsigned int  x(void)                                             \
+   {                                                                 \
+      static unsigned int  stb__rand = y;                            \
+      stb__rand = stb__rand * 2147001325 + 715136305; /* BCPL */     \
+      return 0x31415926 ^ ((stb__rand >> 16) + (stb__rand << 16));   \
+   }
+
+#ifdef STB_DEFINE
+unsigned int  stb_randLCG_explicit(unsigned int seed)
+{
+   return seed * 2147001325 + 715136305;
+}
+
+static unsigned int  stb__rand_seed=0;
+
+unsigned int  stb_srandLCG(unsigned int seed)
+{
+   unsigned int  previous = stb__rand_seed;
+   stb__rand_seed = seed;
+   return previous;
+}
+
+unsigned int  stb_randLCG(void)
+{
+   stb__rand_seed = stb__rand_seed * 2147001325 + 715136305; // BCPL generator
+   // shuffle non-random bits to the middle, and xor to decorrelate with seed
+   return 0x31415926 ^ ((stb__rand_seed >> 16) + (stb__rand_seed << 16));
+}
+
+double stb_frandLCG(void)
+{
+   return stb_randLCG() / ((double) (1 << 16) * (1 << 16));
+}
+
+void stb_shuffle(void *p, size_t n, size_t sz, unsigned int seed)
+{
+   char *a;
+   unsigned int old_seed;
+   int i;
+   if (seed)
+      old_seed = stb_srandLCG(seed);
+   a = (char *) p + (n-1) * sz;
+
+   for (i=(int) n; i > 1; --i) {
+      int j = stb_randLCG() % i;
+      stb_swap(a, (char *) p + j * sz, sz);
+      a -= sz;
+   }
+   if (seed)
+      stb_srandLCG(old_seed);
+}
+
+void stb_reverse(void *p, size_t n, size_t sz)
+{
+   size_t i,j = n-1;
+   for (i=0; i < j; ++i,--j) {
+      stb_swap((char *) p + i * sz, (char *) p + j * sz, sz);
+   }
+}
+
+// public domain Mersenne Twister by Michael Brundage
+#define STB__MT_LEN       624
+
+int stb__mt_index = STB__MT_LEN*sizeof(int)+1;
+unsigned int stb__mt_buffer[STB__MT_LEN];
+
+void stb_srand(unsigned int seed)
+{
+   int i;
+   stb__mt_buffer[0]= seed & 0xffffffffUL;
+   for (i=1 ; i < STB__MT_LEN; ++i)
+      stb__mt_buffer[i] = (1812433253UL * (stb__mt_buffer[i-1] ^ (stb__mt_buffer[i-1] >> 30)) + i);
+   stb__mt_index = STB__MT_LEN*sizeof(unsigned int);
+}
+
+#define STB__MT_IA           397
+#define STB__MT_IB           (STB__MT_LEN - STB__MT_IA)
+#define STB__UPPER_MASK      0x80000000
+#define STB__LOWER_MASK      0x7FFFFFFF
+#define STB__MATRIX_A        0x9908B0DF
+#define STB__TWIST(b,i,j)    ((b)[i] & STB__UPPER_MASK) | ((b)[j] & STB__LOWER_MASK)
+#define STB__MAGIC(s)        (((s)&1)*STB__MATRIX_A)
+
+unsigned int stb_rand()
+{
+   unsigned int  * b = stb__mt_buffer;
+   int idx = stb__mt_index;
+   unsigned int  s,r;
+   int i;
+
+   if (idx >= STB__MT_LEN*sizeof(unsigned int)) {
+      if (idx > STB__MT_LEN*sizeof(unsigned int))
+         stb_srand(0);
+      idx = 0;
+      i = 0;
+      for (; i < STB__MT_IB; i++) {
+         s = STB__TWIST(b, i, i+1);
+         b[i] = b[i + STB__MT_IA] ^ (s >> 1) ^ STB__MAGIC(s);
+      }
+      for (; i < STB__MT_LEN-1; i++) {
+         s = STB__TWIST(b, i, i+1);
+         b[i] = b[i - STB__MT_IB] ^ (s >> 1) ^ STB__MAGIC(s);
+      }
+
+      s = STB__TWIST(b, STB__MT_LEN-1, 0);
+      b[STB__MT_LEN-1] = b[STB__MT_IA-1] ^ (s >> 1) ^ STB__MAGIC(s);
+   }
+   stb__mt_index = idx + sizeof(unsigned int);
+
+   r = *(unsigned int *)((unsigned char *)b + idx);
+
+   r ^= (r >> 11);
+   r ^= (r << 7) & 0x9D2C5680;
+   r ^= (r << 15) & 0xEFC60000;
+   r ^= (r >> 18);
+
+   return r;
+}
+
+double stb_frand(void)
+{
+   return stb_rand() / ((double) (1 << 16) * (1 << 16));
+}
+
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                        stb_dupe
+//
+// stb_dupe is a duplicate-finding system for very, very large data
+// structures--large enough that sorting is too slow, but not so large
+// that we can't keep all the data in memory. using it works as follows:
+//
+//     1. create an stb_dupe:
+//          provide a hash function
+//          provide an equality function
+//          provide an estimate for the size
+//          optionally provide a comparison function
+//
+//     2. traverse your data, 'adding' pointers to the stb_dupe
+//
+//     3. finish and ask for duplicates
+//
+//        the stb_dupe will discard its intermediate data and build
+//        a collection of sorted lists of duplicates, with non-duplicate
+//        entries omitted entirely
+//
+//
+//  Implementation strategy:
+//
+//     while collecting the N items, we keep a hash table of approximate
+//     size sqrt(N). (if you tell use the N up front, the hash table is
+//     just that size exactly)
+//
+//     each entry in the hash table is just an stb__arr of pointers (no need
+//     to use stb_ps, because we don't need to delete from these)
+//
+//     for step 3, for each entry in the hash table, we apply stb_dupe to it
+//     recursively. once the size gets small enough (or doesn't decrease
+//     significantly), we switch to either using qsort() on the comparison
+//     function, or else we just do the icky N^2 gather
+
+
+typedef struct stb_dupe stb_dupe;
+
+typedef int (*stb_compare_func)(void *a, void *b);
+typedef int (*stb_hash_func)(void *a, unsigned int seed);
+
+STB_EXTERN void stb_dupe_free(stb_dupe *sd);
+STB_EXTERN stb_dupe *stb_dupe_create(stb_hash_func hash,
+                          stb_compare_func eq, int size, stb_compare_func ineq);
+STB_EXTERN void stb_dupe_add(stb_dupe *sd, void *item);
+STB_EXTERN void stb_dupe_finish(stb_dupe *sd);
+STB_EXTERN int stb_dupe_numsets(stb_dupe *sd);
+STB_EXTERN void **stb_dupe_set(stb_dupe *sd, int num);
+STB_EXTERN int stb_dupe_set_count(stb_dupe *sd, int num);
+
+struct stb_dupe
+{
+   void ***hash_table;
+   int hash_size;
+   int size_log2;
+   int population;
+
+   int hash_shift;
+   stb_hash_func     hash;
+
+   stb_compare_func  eq;
+   stb_compare_func  ineq;
+
+   void ***dupes;
+};
+
+#ifdef STB_DEFINE
+
+int stb_dupe_numsets(stb_dupe *sd)
+{
+   assert(sd->hash_table == NULL);
+   return stb_arr_len(sd->dupes);
+}
+
+void **stb_dupe_set(stb_dupe *sd, int num)
+{
+   assert(sd->hash_table == NULL);
+   return sd->dupes[num];
+}
+
+int stb_dupe_set_count(stb_dupe *sd, int num)
+{
+   assert(sd->hash_table == NULL);
+   return stb_arr_len(sd->dupes[num]);
+}
+
+stb_dupe *stb_dupe_create(stb_hash_func hash, stb_compare_func eq, int size,
+                                              stb_compare_func ineq)
+{
+   int i, hsize;
+   stb_dupe *sd = (stb_dupe *) malloc(sizeof(*sd));
+
+   sd->size_log2 = 4;
+   hsize = 1 << sd->size_log2;
+   while (hsize * hsize < size) {
+      ++sd->size_log2;
+      hsize *= 2;
+   }
+
+   sd->hash = hash;
+   sd->eq   = eq;
+   sd->ineq = ineq;
+   sd->hash_shift = 0;
+
+   sd->population = 0;
+   sd->hash_size = hsize;
+   sd->hash_table = (void ***) malloc(sizeof(*sd->hash_table) * hsize);
+   for (i=0; i < hsize; ++i)
+      sd->hash_table[i] = NULL;
+
+   sd->dupes = NULL;
+
+   return sd;
+}
+
+void stb_dupe_add(stb_dupe *sd, void *item)
+{
+   stb_uint32 hash = sd->hash(item, sd->hash_shift);
+   int z = hash & (sd->hash_size-1);
+   stb_arr_push(sd->hash_table[z], item);
+   ++sd->population;
+}
+
+void stb_dupe_free(stb_dupe *sd)
+{
+   int i;
+   for (i=0; i < stb_arr_len(sd->dupes); ++i)
+      if (sd->dupes[i])
+         stb_arr_free(sd->dupes[i]);
+   stb_arr_free(sd->dupes);
+   free(sd);
+}
+
+static stb_compare_func stb__compare;
+
+static int stb__dupe_compare(const void *a, const void *b)
+{
+   void *p = *(void **) a;
+   void *q = *(void **) b;
+
+   return stb__compare(p,q);
+}
+
+void stb_dupe_finish(stb_dupe *sd)
+{
+   int i,j,k;
+   assert(sd->dupes == NULL);
+   for (i=0; i < sd->hash_size; ++i) {
+      void ** list = sd->hash_table[i];
+      if (list != NULL) {
+         int n = stb_arr_len(list);
+         // @TODO: measure to find good numbers instead of just making them up!
+         int thresh = (sd->ineq ? 200 : 20);
+         // if n is large enough to be worth it, and n is smaller than
+         // before (so we can guarantee we'll use a smaller hash table);
+         // and there are enough hash bits left, assuming full 32-bit hash
+         if (n > thresh && n < (sd->population >> 3) && sd->hash_shift + sd->size_log2*2 < 32) {
+
+            // recursively process this row using stb_dupe, O(N log log N)
+
+            stb_dupe *d = stb_dupe_create(sd->hash, sd->eq, n, sd->ineq);
+            d->hash_shift = stb_randLCG_explicit(sd->hash_shift);
+            for (j=0; j < n; ++j)
+               stb_dupe_add(d, list[j]);
+            stb_arr_free(sd->hash_table[i]);
+            stb_dupe_finish(d);
+            for (j=0; j < stb_arr_len(d->dupes); ++j) {
+               stb_arr_push(sd->dupes, d->dupes[j]);
+               d->dupes[j] = NULL; // take over ownership
+            }
+            stb_dupe_free(d);
+
+         } else if (sd->ineq) {
+
+            // process this row using qsort(), O(N log N)
+            stb__compare = sd->ineq;
+            qsort(list, n, sizeof(list[0]), stb__dupe_compare);
+
+            // find equal subsequences of the list
+            for (j=0; j < n-1; ) {
+               // find a subsequence from j..k
+               for (k=j; k < n; ++k)
+                  // only use ineq so eq can be left undefined
+                  if (sd->ineq(list[j], list[k]))
+                     break;
+               // k is the first one not in the subsequence
+               if (k-j > 1) {
+                  void **mylist = NULL;
+                  stb_arr_setlen(mylist, k-j);
+                  memcpy(mylist, list+j, sizeof(list[j]) * (k-j));
+                  stb_arr_push(sd->dupes, mylist);
+               }
+               j = k;
+            }
+            stb_arr_free(sd->hash_table[i]);
+         } else {
+
+            // process this row using eq(), O(N^2)
+            for (j=0; j < n; ++j) {
+               if (list[j] != NULL) {
+                  void **output  = NULL;
+                  for (k=j+1; k < n; ++k) {
+                     if (sd->eq(list[j], list[k])) {
+                        if (output == NULL)
+                           stb_arr_push(output, list[j]);
+                        stb_arr_push(output, list[k]);
+                        list[k] = NULL;
+                     }
+                  }
+                  list[j] = NULL;
+                  if (output)
+                     stb_arr_push(sd->dupes, output);
+               }
+            }
+            stb_arr_free(sd->hash_table[i]);
+         }
+      }
+   }
+   free(sd->hash_table);
+   sd->hash_table = NULL;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                       templatized Sort routine
+//
+// This is an attempt to implement a templated sorting algorithm.
+// To use it, you have to explicitly instantiate it as a _function_,
+// then you call that function. This allows the comparison to be inlined,
+// giving the sort similar performance to C++ sorts.
+//
+// It implements quicksort with three-way-median partitioning (generally
+// well-behaved), with a final insertion sort pass.
+//
+// When you define the compare expression, you should assume you have
+// elements of your array pointed to by 'a' and 'b', and perform the comparison
+// on those. OR you can use one or more statements; first say '0;', then
+// write whatever code you want, and compute the result into a variable 'c'.
+
+#define stb_declare_sort(FUNCNAME, TYPE)    \
+                       void FUNCNAME(TYPE *p, int n)
+#define stb_define_sort(FUNCNAME,TYPE,COMPARE) \
+                       stb__define_sort(       void, FUNCNAME,TYPE,COMPARE)
+#define stb_define_sort_static(FUNCNAME,TYPE,COMPARE) \
+                       stb__define_sort(static void, FUNCNAME,TYPE,COMPARE)
+
+#define stb__define_sort(MODE, FUNCNAME, TYPE, COMPARE)                       \
+                                                                              \
+static void STB_(FUNCNAME,_ins_sort)(TYPE *p, int n)                          \
+{                                                                             \
+   int i,j;                                                                   \
+   for (i=1; i < n; ++i) {                                                    \
+      TYPE t = p[i], *a = &t;                                                 \
+      j = i;                                                                  \
+      while (j > 0) {                                                         \
+         TYPE *b = &p[j-1];                                                   \
+         int c = COMPARE;                                                     \
+         if (!c) break;                                                       \
+         p[j] = p[j-1];                                                       \
+         --j;                                                                 \
+      }                                                                       \
+      if (i != j)                                                             \
+         p[j] = t;                                                            \
+   }                                                                          \
+}                                                                             \
+                                                                              \
+static void STB_(FUNCNAME,_quicksort)(TYPE *p, int n)                         \
+{                                                                             \
+   /* threshold for transitioning to insertion sort */                       \
+   while (n > 12) {                                                           \
+      TYPE *a,*b,t;                                                           \
+      int c01,c12,c,m,i,j;                                                    \
+                                                                              \
+      /* compute median of three */                                           \
+      m = n >> 1;                                                             \
+      a = &p[0];                                                              \
+      b = &p[m];                                                              \
+      c = COMPARE;                                                            \
+      c01 = c;                                                                \
+      a = &p[m];                                                              \
+      b = &p[n-1];                                                            \
+      c = COMPARE;                                                            \
+      c12 = c;                                                                \
+      /* if 0 >= mid >= end, or 0 < mid < end, then use mid */                \
+      if (c01 != c12) {                                                       \
+         /* otherwise, we'll need to swap something else to middle */         \
+         int z;                                                               \
+         a = &p[0];                                                           \
+         b = &p[n-1];                                                         \
+         c = COMPARE;                                                         \
+         /* 0>mid && mid<n:  0>n => n; 0<n => 0 */                            \
+         /* 0<mid && mid>n:  0>n => 0; 0<n => n */                            \
+         z = (c == c12) ? 0 : n-1;                                            \
+         t = p[z];                                                            \
+         p[z] = p[m];                                                         \
+         p[m] = t;                                                            \
+      }                                                                       \
+      /* now p[m] is the median-of-three */                                   \
+      /* swap it to the beginning so it won't move around */                  \
+      t = p[0];                                                               \
+      p[0] = p[m];                                                            \
+      p[m] = t;                                                               \
+                                                                              \
+      /* partition loop */                                                    \
+      i=1;                                                                    \
+      j=n-1;                                                                  \
+      for(;;) {                                                               \
+         /* handling of equality is crucial here */                           \
+         /* for sentinels & efficiency with duplicates */                     \
+         b = &p[0];                                                           \
+         for (;;++i) {                                                        \
+            a=&p[i];                                                          \
+            c = COMPARE;                                                      \
+            if (!c) break;                                                    \
+         }                                                                    \
+         a = &p[0];                                                           \
+         for (;;--j) {                                                        \
+            b=&p[j];                                                          \
+            c = COMPARE;                                                      \
+            if (!c) break;                                                    \
+         }                                                                    \
+         /* make sure we haven't crossed */                                   \
+         if (i >= j) break;                                                   \
+         t = p[i];                                                            \
+         p[i] = p[j];                                                         \
+         p[j] = t;                                                            \
+                                                                              \
+         ++i;                                                                 \
+         --j;                                                                 \
+      }                                                                       \
+      /* recurse on smaller side, iterate on larger */                        \
+      if (j < (n-i)) {                                                        \
+         STB_(FUNCNAME,_quicksort)(p,j);                                       \
+         p = p+i;                                                             \
+         n = n-i;                                                             \
+      } else {                                                                \
+         STB_(FUNCNAME,_quicksort)(p+i, n-i);                                  \
+         n = j;                                                               \
+      }                                                                       \
+   }                                                                          \
+}                                                                             \
+                                                                              \
+MODE FUNCNAME(TYPE *p, int n)                                                 \
+{                                                                             \
+   STB_(FUNCNAME, _quicksort)(p, n);                                           \
+   STB_(FUNCNAME, _ins_sort)(p, n);                                            \
+}                                                                             \
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//      stb_bitset   an array of booleans indexed by integers
+//
+
+typedef stb_uint32 stb_bitset;
+
+STB_EXTERN  stb_bitset *stb_bitset_new(int value, int len);
+
+#define stb_bitset_clearall(arr,len)     (memset(arr,   0, 4 * (len)))
+#define stb_bitset_setall(arr,len)       (memset(arr, 255, 4 * (len)))
+
+#define stb_bitset_setbit(arr,n)         ((arr)[(n) >> 5] |=  (1 << (n & 31)))
+#define stb_bitset_clearbit(arr,n)       ((arr)[(n) >> 5] &= ~(1 << (n & 31)))
+#define stb_bitset_testbit(arr,n)        ((arr)[(n) >> 5] &   (1 << (n & 31)))
+
+STB_EXTERN  stb_bitset *stb_bitset_union(stb_bitset *p0, stb_bitset *p1, int len);
+
+STB_EXTERN  int *stb_bitset_getlist(stb_bitset *out, int start, int end);
+
+STB_EXTERN  int  stb_bitset_eq(stb_bitset *p0, stb_bitset *p1, int len);
+STB_EXTERN  int  stb_bitset_disjoint(stb_bitset *p0, stb_bitset *p1, int len);
+STB_EXTERN  int  stb_bitset_disjoint_0(stb_bitset *p0, stb_bitset *p1, int len);
+STB_EXTERN  int  stb_bitset_subset(stb_bitset *bigger, stb_bitset *smaller, int len);
+STB_EXTERN  int  stb_bitset_unioneq_changed(stb_bitset *p0, stb_bitset *p1, int len);
+
+#ifdef STB_DEFINE
+int stb_bitset_eq(stb_bitset *p0, stb_bitset *p1, int len)
+{
+   int i;
+   for (i=0; i < len; ++i)
+      if (p0[i] != p1[i]) return 0;
+   return 1;
+}
+
+int stb_bitset_disjoint(stb_bitset *p0, stb_bitset *p1, int len)
+{
+   int i;
+   for (i=0; i < len; ++i)
+      if (p0[i] & p1[i]) return 0;
+   return 1;
+}
+
+int stb_bitset_disjoint_0(stb_bitset *p0, stb_bitset *p1, int len)
+{
+   int i;
+   for (i=0; i < len; ++i)
+      if ((p0[i] | p1[i]) != 0xffffffff) return 0;
+   return 1;
+}
+
+int stb_bitset_subset(stb_bitset *bigger, stb_bitset *smaller, int len)
+{
+   int i;
+   for (i=0; i < len; ++i)
+      if ((bigger[i] & smaller[i]) != smaller[i]) return 0;
+   return 1;
+}
+
+stb_bitset *stb_bitset_union(stb_bitset *p0, stb_bitset *p1, int len)
+{
+   int i;
+   stb_bitset *d = (stb_bitset *) malloc(sizeof(*d) * len);
+   for (i=0; i < len; ++i) d[i] = p0[i] | p1[i];
+   return d;
+}
+
+int stb_bitset_unioneq_changed(stb_bitset *p0, stb_bitset *p1, int len)
+{
+   int i, changed=0;
+   for (i=0; i < len; ++i) {
+      stb_bitset d = p0[i] | p1[i];
+      if (d != p0[i]) {
+         p0[i] = d;
+         changed = 1;
+      }
+   }
+   return changed;
+}
+
+stb_bitset *stb_bitset_new(int value, int len)
+{
+   int i;
+   stb_bitset *d = (stb_bitset *) malloc(sizeof(*d) * len);
+   if (value) value = 0xffffffff;
+   for (i=0; i < len; ++i) d[i] = value;
+   return d;
+}
+
+int *stb_bitset_getlist(stb_bitset *out, int start, int end)
+{
+   int *list = NULL;
+   int i;
+   for (i=start; i < end; ++i)
+      if (stb_bitset_testbit(out, i))
+         stb_arr_push(list, i);
+   return list;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//      stb_wordwrap    quality word-wrapping for fixed-width fonts
+//
+
+STB_EXTERN int stb_wordwrap(int *pairs, int pair_max, int count, char *str);
+STB_EXTERN int *stb_wordwrapalloc(int count, char *str);
+
+#ifdef STB_DEFINE
+
+int stb_wordwrap(int *pairs, int pair_max, int count, char *str)
+{
+   int n=0,i=0, start=0,nonwhite=0;
+   if (pairs == NULL) pair_max = 0x7ffffff0;
+   else pair_max *= 2;
+   // parse
+   for(;;) {
+      int s=i; // first whitespace char; last nonwhite+1
+      int w;   // word start
+      // accept whitespace
+      while (isspace(str[i])) {
+         if (str[i] == '\n' || str[i] == '\r') {
+            if (str[i] + str[i+1] == '\n' + '\r') ++i;
+            if (n >= pair_max) return -1;
+            if (pairs) pairs[n] = start, pairs[n+1] = s-start;
+            n += 2;
+            nonwhite=0;
+            start = i+1;
+            s = start;
+         }
+         ++i;
+      }
+      if (i >= start+count) {
+         // we've gone off the end using whitespace
+         if (nonwhite) {
+            if (n >= pair_max) return -1;
+            if (pairs) pairs[n] = start, pairs[n+1] = s-start;
+            n += 2;
+            start = s = i;
+            nonwhite=0;
+         } else {
+            // output all the whitespace
+            while (i >= start+count) {
+               if (n >= pair_max) return -1;
+               if (pairs) pairs[n] = start, pairs[n+1] = count;
+               n += 2;
+               start += count;
+            }
+            s = start;
+         }
+      }
+
+      if (str[i] == 0) break;
+      // now scan out a word and see if it fits
+      w = i;
+      while (str[i] && !isspace(str[i])) {
+         ++i;
+      }
+      // wrapped?
+      if (i > start + count) {
+         // huge?
+         if (i-s <= count) {
+            if (n >= pair_max) return -1;
+            if (pairs) pairs[n] = start, pairs[n+1] = s-start;
+            n += 2;
+            start = w;
+         } else {
+            // This word is longer than one line. If we wrap it onto N lines
+            // there are leftover chars. do those chars fit on the cur line?
+            // But if we have leading whitespace, we force it to start here.
+            if ((w-start) + ((i-w) % count) <= count || !nonwhite) {
+               // output a full line
+               if (n >= pair_max) return -1;
+               if (pairs) pairs[n] = start, pairs[n+1] = count;
+               n += 2;
+               start += count;
+               w = start;
+            } else {
+               // output a partial line, trimming trailing whitespace
+               if (s != start) {
+                  if (n >= pair_max) return -1;
+                  if (pairs) pairs[n] = start, pairs[n+1] = s-start;
+                  n += 2;
+                  start = w;
+               }
+            }
+            // now output full lines as needed
+            while (start + count <= i) {
+               if (n >= pair_max) return -1;
+               if (pairs) pairs[n] = start, pairs[n+1] = count;
+               n += 2;
+               start += count;
+            }
+         }
+      }
+      nonwhite=1;
+   }
+   if (start < i) {
+      if (n >= pair_max) return -1;
+      if (pairs) pairs[n] = start, pairs[n+1] = i-start;
+      n += 2;
+   }
+   return n>>1;
+}
+
+int *stb_wordwrapalloc(int count, char *str)
+{
+   int n = stb_wordwrap(NULL,0,count,str);
+   int *z = NULL;
+   stb_arr_setlen(z, n*2);
+   stb_wordwrap(z, n, count, str);
+   return z;
+}
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//         stb_match:    wildcards and regexping
+//
+
+STB_EXTERN int stb_wildmatch (char *expr, char *candidate);
+STB_EXTERN int stb_wildmatchi(char *expr, char *candidate);
+STB_EXTERN int stb_wildfind  (char *expr, char *candidate);
+STB_EXTERN int stb_wildfindi (char *expr, char *candidate);
+
+STB_EXTERN int stb_regex(char *regex, char *candidate);
+
+typedef struct stb_matcher stb_matcher;
+
+STB_EXTERN stb_matcher *stb_regex_matcher(char *regex);
+STB_EXTERN int stb_matcher_match(stb_matcher *m, char *str);
+STB_EXTERN int stb_matcher_find(stb_matcher *m, char *str);
+STB_EXTERN void stb_matcher_free(stb_matcher *f);
+
+STB_EXTERN stb_matcher *stb_lex_matcher(void);
+STB_EXTERN int stb_lex_item(stb_matcher *m, const char *str, int result);
+STB_EXTERN int stb_lex_item_wild(stb_matcher *matcher, const char *regex, int result);
+STB_EXTERN int stb_lex(stb_matcher *m, char *str, int *len);
+
+
+
+#ifdef STB_DEFINE
+
+static int stb__match_qstring(char *candidate, char *qstring, int qlen, int insensitive)
+{
+   int i;
+   if (insensitive) {
+      for (i=0; i < qlen; ++i)
+         if (qstring[i] == '?') {
+            if (!candidate[i]) return 0;
+         } else
+            if (tolower(qstring[i]) != tolower(candidate[i]))
+               return 0;
+   } else {
+      for (i=0; i < qlen; ++i)
+         if (qstring[i] == '?') {
+            if (!candidate[i]) return 0;
+         } else
+            if (qstring[i] != candidate[i])
+               return 0;
+   }
+   return 1;
+}
+
+static int stb__find_qstring(char *candidate, char *qstring, int qlen, int insensitive)
+{
+   char c;
+
+   int offset=0;
+   while (*qstring == '?') {
+      ++qstring;
+      --qlen;
+      ++candidate;
+      if (qlen == 0) return 0;
+      if (*candidate == 0) return -1;
+   }
+
+   c = *qstring++;
+   --qlen;
+   if (insensitive) c = tolower(c);
+
+   while (candidate[offset]) {
+      if (c == (insensitive ? tolower(candidate[offset]) : candidate[offset]))
+         if (stb__match_qstring(candidate+offset+1, qstring, qlen, insensitive))
+            return offset;
+      ++offset;
+   }
+
+   return -1;
+}
+
+int stb__wildmatch_raw2(char *expr, char *candidate, int search, int insensitive)
+{
+   int where=0;
+   int start = -1;
+
+   if (!search) {
+      // parse to first '*'
+      if (*expr != '*')
+         start = 0;
+      while (*expr != '*') {
+         if (!*expr)
+            return *candidate == 0 ? 0 : -1;
+         if (*expr == '?') {
+            if (!*candidate) return -1;
+         } else {
+            if (insensitive) {
+               if (tolower(*candidate) != tolower(*expr))
+                  return -1;
+            } else
+               if (*candidate != *expr)
+                  return -1;
+         }
+         ++candidate, ++expr, ++where;
+      }
+   } else {
+      // 0-length search string
+      if (!*expr)
+         return 0;
+   }
+
+   assert(search || *expr == '*');
+   if (!search)
+      ++expr;
+
+   // implicit '*' at this point
+
+   while (*expr) {
+      int o=0;
+      // combine redundant * characters
+      while (expr[0] == '*') ++expr;
+
+      // ok, at this point, expr[-1] == '*',
+      // and expr[0] != '*'
+
+      if (!expr[0]) return start >= 0 ? start : 0;
+
+      // now find next '*'
+      o = 0;
+      while (expr[o] != '*') {
+         if (expr[o] == 0)
+            break;
+         ++o;
+      }
+      // if no '*', scan to end, then match at end
+      if (expr[o] == 0 && !search) {
+         int z;
+         for (z=0; z < o; ++z)
+            if (candidate[z] == 0)
+               return -1;
+         while (candidate[z])
+            ++z;
+         // ok, now check if they match
+         if (stb__match_qstring(candidate+z-o, expr, o, insensitive))
+            return start >= 0 ? start : 0;
+         return -1;
+      } else {
+         // if yes '*', then do stb__find_qmatch on the intervening chars
+         int n = stb__find_qstring(candidate, expr, o, insensitive);
+         if (n < 0)
+            return -1;
+         if (start < 0)
+            start = where + n;
+         expr += o;
+         candidate += n+o;
+      }
+
+      if (*expr == 0) {
+         assert(search);
+         return start;
+      }
+
+      assert(*expr == '*');
+      ++expr;
+   }
+
+   return start >= 0 ? start : 0;
+}
+
+int stb__wildmatch_raw(char *expr, char *candidate, int search, int insensitive)
+{
+   char buffer[256];
+   // handle multiple search strings
+   char *s = strchr(expr, ';');
+   char *last = expr;
+   while (s) {
+      int z;
+      // need to allow for non-writeable strings... assume they're small
+      if (s - last < 256) {
+         stb_strncpy(buffer, last, (int) (s-last+1));
+         buffer[s-last] = 0;
+         z = stb__wildmatch_raw2(buffer, candidate, search, insensitive);
+      } else {
+         *s = 0;
+         z = stb__wildmatch_raw2(last, candidate, search, insensitive);
+         *s = ';';
+      }
+      if (z >= 0) return z;
+      last = s+1;
+      s = strchr(last, ';');
+   }
+   return stb__wildmatch_raw2(last, candidate, search, insensitive);
+}
+
+int stb_wildmatch(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 0,0) >= 0;
+}
+
+int stb_wildmatchi(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 0,1) >= 0;
+}
+
+int stb_wildfind(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 1,0);
+}
+
+int stb_wildfindi(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 1,1);
+}
+
+typedef struct
+{
+   stb_int16 transition[256];
+} stb_dfa;
+
+// an NFA node represents a state you're in; it then has
+// an arbitrary number of edges dangling off of it
+// note this isn't utf8-y
+typedef struct
+{
+   stb_int16  match; // character/set to match
+   stb_uint16 node;  // output node to go to
+} stb_nfa_edge;
+
+typedef struct
+{
+   stb_int16 goal;   // does reaching this win the prize?
+   stb_uint8 active; // is this in the active list
+   stb_nfa_edge *out;
+   stb_uint16 *eps;  // list of epsilon closures
+} stb_nfa_node;
+
+#define STB__DFA_UNDEF  -1
+#define STB__DFA_GOAL   -2
+#define STB__DFA_END    -3
+#define STB__DFA_MGOAL  -4
+#define STB__DFA_VALID  0
+
+#define STB__NFA_STOP_GOAL -1
+
+// compiled regexp
+struct stb_matcher
+{
+   stb_uint16 start_node;
+   stb_int16 dfa_start;
+   stb_uint32 *charset;
+   int num_charset;
+   int match_start;
+   stb_nfa_node *nodes;
+   int does_lex;
+
+   // dfa matcher
+   stb_dfa    * dfa;
+   stb_uint32 * dfa_mapping;
+   stb_int16  * dfa_result;
+   int num_words_per_dfa;
+};
+
+static int stb__add_node(stb_matcher *matcher)
+{
+   stb_nfa_node z;
+   z.active = 0;
+   z.eps    = 0;
+   z.goal   = 0;
+   z.out    = 0;
+   stb_arr_push(matcher->nodes, z);
+   return stb_arr_len(matcher->nodes)-1;
+}
+
+static void stb__add_epsilon(stb_matcher *matcher, int from, int to)
+{
+   assert(from != to);
+   if (matcher->nodes[from].eps == NULL)
+      stb_arr_malloc((void **) &matcher->nodes[from].eps, matcher);
+   stb_arr_push(matcher->nodes[from].eps, to);
+}
+
+static void stb__add_edge(stb_matcher *matcher, int from, int to, int type)
+{
+    stb_nfa_edge z = { (stb_int16)type, (stb_uint16)to };
+   if (matcher->nodes[from].out == NULL)
+      stb_arr_malloc((void **) &matcher->nodes[from].out, matcher);
+   stb_arr_push(matcher->nodes[from].out, z);
+}
+
+static char *stb__reg_parse_alt(stb_matcher *m, int s, char *r, stb_uint16 *e);
+static char *stb__reg_parse(stb_matcher *matcher, int start, char *regex, stb_uint16 *end)
+{
+   int n;
+   int last_start = -1;
+   stb_uint16 last_end = start;
+
+   while (*regex) {
+      switch (*regex) {
+         case '(':
+            last_start = last_end;
+            regex = stb__reg_parse_alt(matcher, last_end, regex+1, &last_end);
+            if (regex == NULL || *regex != ')')
+               return NULL;
+            ++regex;
+            break;
+
+         case '|':
+         case ')':
+            *end = last_end;
+            return regex;
+
+         case '?':
+            if (last_start < 0) return NULL;
+            stb__add_epsilon(matcher, last_start, last_end);
+            ++regex;
+            break;
+
+         case '*':
+            if (last_start < 0) return NULL;
+            stb__add_epsilon(matcher, last_start, last_end);
+
+            // fall through
+
+         case '+':
+            if (last_start < 0) return NULL;
+            stb__add_epsilon(matcher, last_end, last_start);
+            // prevent links back to last_end from chaining to last_start
+            n = stb__add_node(matcher);
+            stb__add_epsilon(matcher, last_end, n);
+            last_end = n;
+            ++regex;
+            break;
+
+         case '{':   // not supported!
+            // @TODO: given {n,m}, clone last_start to last_end m times,
+            // and include epsilons from start to first m-n blocks
+            return NULL;
+
+         case '\\':
+            ++regex;
+            if (!*regex) return NULL;
+
+            // fallthrough
+         default: // match exactly this character
+            n = stb__add_node(matcher);
+            stb__add_edge(matcher, last_end, n, *regex);
+            last_start = last_end;
+            last_end = n;
+            ++regex;
+            break;
+
+         case '$':
+            n = stb__add_node(matcher);
+            stb__add_edge(matcher, last_end, n, '\n');
+            last_start = last_end;
+            last_end = n;
+            ++regex;
+            break;
+
+         case '.':
+            n = stb__add_node(matcher);
+            stb__add_edge(matcher, last_end, n, -1);
+            last_start = last_end;
+            last_end = n;
+            ++regex;
+            break;
+
+         case '[': {
+            stb_uint8 flags[256];
+            int invert = 0,z;
+            ++regex;
+            if (matcher->num_charset == 0) {
+               matcher->charset = (stb_uint *) stb_malloc(matcher, sizeof(*matcher->charset) * 256);
+               memset(matcher->charset, 0, sizeof(*matcher->charset) * 256);
+            }
+
+            memset(flags,0,sizeof(flags));
+
+            // leading ^ is special
+            if (*regex == '^')
+               ++regex, invert = 1;
+
+            // leading ] is special
+            if (*regex == ']') {
+               flags[(int) ']'] = 1;
+               ++regex;
+            }
+            while (*regex != ']') {
+               stb_uint a;
+               if (!*regex) return NULL;
+               a = *regex++;
+               if (regex[0] == '-' && regex[1] != ']') {
+                  stb_uint i,b = regex[1];
+                  regex += 2;
+                  if (b == 0) return NULL;
+                  if (a > b) return NULL;
+                  for (i=a; i <= b; ++i)
+                     flags[i] = 1;
+               } else
+                  flags[a] = 1;
+            }
+            ++regex;
+            if (invert) {
+               int i;
+               for (i=0; i < 256; ++i)
+                  flags[i] = 1-flags[i];
+            }
+
+            // now check if any existing charset matches
+            for (z=0; z < matcher->num_charset; ++z) {
+               int i, k[2] = { 0, 1 << z};
+               for (i=0; i < 256; ++i) {
+                  unsigned int f = k[flags[i]];
+                  if ((matcher->charset[i] & k[1]) != f)
+                     break;
+               }
+               if (i == 256) break;
+            }
+
+            if (z == matcher->num_charset) {
+               int i;
+               ++matcher->num_charset;
+               if (matcher->num_charset > 32) {
+                  assert(0); /* NOTREACHED */
+                  return NULL; // too many charsets, oops
+               }
+               for (i=0; i < 256; ++i)
+                  if (flags[i])
+                     matcher->charset[i] |= (1 << z);
+            }
+
+            n = stb__add_node(matcher);
+            stb__add_edge(matcher, last_end, n, -2 - z);
+            last_start = last_end;
+            last_end = n;
+            break;
+         }
+      }
+   }
+   *end = last_end;
+   return regex;
+}
+
+static char *stb__reg_parse_alt(stb_matcher *matcher, int start, char *regex, stb_uint16 *end)
+{
+   stb_uint16 last_end = start;
+   stb_uint16 main_end;
+
+   int head, tail;
+
+   head = stb__add_node(matcher);
+   stb__add_epsilon(matcher, start, head);
+
+   regex = stb__reg_parse(matcher, head, regex, &last_end);
+   if (regex == NULL) return NULL;
+   if (*regex == 0 || *regex == ')') {
+      *end = last_end;
+      return regex;
+   }
+
+   main_end = last_end;
+   tail = stb__add_node(matcher);
+
+   stb__add_epsilon(matcher, last_end, tail);
+
+   // start alternatives from the same starting node; use epsilon
+   // transitions to combine their endings
+   while(*regex && *regex != ')') {
+      assert(*regex == '|');
+      head = stb__add_node(matcher);
+      stb__add_epsilon(matcher, start, head);
+      regex = stb__reg_parse(matcher, head, regex+1, &last_end);
+      if (regex == NULL)
+         return NULL;
+      stb__add_epsilon(matcher, last_end, tail);
+   }
+
+   *end = tail;
+   return regex;
+}
+
+static char *stb__wild_parse(stb_matcher *matcher, int start, char *str, stb_uint16 *end)
+{
+   int n;
+   stb_uint16 last_end;
+
+   last_end = stb__add_node(matcher);
+   stb__add_epsilon(matcher, start, last_end);
+
+   while (*str) {
+      switch (*str) {
+            // fallthrough
+         default: // match exactly this character
+            n = stb__add_node(matcher);
+            if (toupper(*str) == tolower(*str)) {
+               stb__add_edge(matcher, last_end, n, *str);
+            } else {
+               stb__add_edge(matcher, last_end, n, tolower(*str));
+               stb__add_edge(matcher, last_end, n, toupper(*str));
+            }
+            last_end = n;
+            ++str;
+            break;
+
+         case '?':
+            n = stb__add_node(matcher);
+            stb__add_edge(matcher, last_end, n, -1);
+            last_end = n;
+            ++str;
+            break;
+
+         case '*':
+            n = stb__add_node(matcher);
+            stb__add_edge(matcher, last_end, n, -1);
+            stb__add_epsilon(matcher, last_end, n);
+            stb__add_epsilon(matcher, n, last_end);
+            last_end = n;
+            ++str;
+            break;
+      }
+   }
+
+   // now require end of string to match
+   n = stb__add_node(matcher);
+   stb__add_edge(matcher, last_end, n, 0);
+   last_end = n;
+
+   *end = last_end;
+   return str;
+}
+
+static int stb__opt(stb_matcher *m, int n)
+{
+   for(;;) {
+      stb_nfa_node *p = &m->nodes[n];
+      if (p->goal)                  return n;
+      if (stb_arr_len(p->out))      return n;
+      if (stb_arr_len(p->eps) != 1) return n;
+      n = p->eps[0];
+   }
+}
+
+static void stb__optimize(stb_matcher *m)
+{
+   // if the target of any edge is a node with exactly
+   // one out-epsilon, shorten it
+   int i,j;
+   for (i=0; i < stb_arr_len(m->nodes); ++i) {
+      stb_nfa_node *p = &m->nodes[i];
+      for (j=0; j < stb_arr_len(p->out); ++j)
+         p->out[j].node = stb__opt(m,p->out[j].node);
+      for (j=0; j < stb_arr_len(p->eps); ++j)
+         p->eps[j]      = stb__opt(m,p->eps[j]     );
+   }
+   m->start_node = stb__opt(m,m->start_node);
+}
+
+void stb_matcher_free(stb_matcher *f)
+{
+   stb_free(f);
+}
+
+static stb_matcher *stb__alloc_matcher(void)
+{
+   stb_matcher *matcher = (stb_matcher *) stb_malloc(0,sizeof(*matcher));
+
+   matcher->start_node  = 0;
+   stb_arr_malloc((void **) &matcher->nodes, matcher);
+   matcher->num_charset = 0;
+   matcher->match_start = 0;
+   matcher->does_lex    = 0;
+
+   matcher->dfa_start   = STB__DFA_UNDEF;
+   stb_arr_malloc((void **) &matcher->dfa, matcher);
+   stb_arr_malloc((void **) &matcher->dfa_mapping, matcher);
+   stb_arr_malloc((void **) &matcher->dfa_result, matcher);
+
+   stb__add_node(matcher);
+
+   return matcher;
+}
+
+static void stb__lex_reset(stb_matcher *matcher)
+{
+   // flush cached dfa data
+   stb_arr_setlen(matcher->dfa, 0);
+   stb_arr_setlen(matcher->dfa_mapping, 0);
+   stb_arr_setlen(matcher->dfa_result, 0);
+   matcher->dfa_start = STB__DFA_UNDEF;
+}
+
+stb_matcher *stb_regex_matcher(char *regex)
+{
+   char *z;
+   stb_uint16 end;
+   stb_matcher *matcher = stb__alloc_matcher();
+   if (*regex == '^') {
+      matcher->match_start = 1;
+      ++regex;
+   }
+
+   z = stb__reg_parse_alt(matcher, matcher->start_node, regex, &end);
+
+   if (!z || *z) {
+      stb_free(matcher);
+      return NULL;
+   }
+
+   ((matcher->nodes)[(int) end]).goal = STB__NFA_STOP_GOAL;
+
+   return matcher;
+}
+
+stb_matcher *stb_lex_matcher(void)
+{
+   stb_matcher *matcher = stb__alloc_matcher();
+
+   matcher->match_start = 1;
+   matcher->does_lex    = 1;
+
+   return matcher;
+}
+
+int stb_lex_item(stb_matcher *matcher, const char *regex, int result)
+{
+   char *z;
+   stb_uint16 end;
+
+   z = stb__reg_parse_alt(matcher, matcher->start_node, (char*) regex, &end);
+
+   if (z == NULL)
+      return 0;
+
+   stb__lex_reset(matcher);
+
+   matcher->nodes[(int) end].goal = result;
+   return 1;
+}
+
+int stb_lex_item_wild(stb_matcher *matcher, const char *regex, int result)
+{
+   char *z;
+   stb_uint16 end;
+
+   z = stb__wild_parse(matcher, matcher->start_node, (char*) regex, &end);
+
+   if (z == NULL)
+      return 0;
+
+   stb__lex_reset(matcher);
+
+   matcher->nodes[(int) end].goal = result;
+   return 1;
+}
+
+static void stb__clear(stb_matcher *m, stb_uint16 *list)
+{
+   int i;
+   for (i=0; i < stb_arr_len(list); ++i)
+      m->nodes[(int) list[i]].active = 0;
+}
+
+static int stb__clear_goalcheck(stb_matcher *m, stb_uint16 *list)
+{
+   int i, t=0;
+   for (i=0; i < stb_arr_len(list); ++i) {
+      t += m->nodes[(int) list[i]].goal;
+      m->nodes[(int) list[i]].active = 0;
+   }
+   return t;
+}
+
+static stb_uint16 * stb__add_if_inactive(stb_matcher *m, stb_uint16 *list, int n)
+{
+   if (!m->nodes[n].active) {
+      stb_arr_push(list, n);
+      m->nodes[n].active = 1;
+   }
+   return list;
+}
+
+static stb_uint16 * stb__eps_closure(stb_matcher *m, stb_uint16 *list)
+{
+   int i,n = stb_arr_len(list);
+
+   for(i=0; i < n; ++i) {
+      stb_uint16 *e = m->nodes[(int) list[i]].eps;
+      if (e) {
+         int j,k = stb_arr_len(e);
+         for (j=0; j < k; ++j)
+            list = stb__add_if_inactive(m, list, e[j]);
+         n = stb_arr_len(list);
+      }
+   }
+
+   return list;
+}
+
+int stb_matcher_match(stb_matcher *m, char *str)
+{
+   int result = 0;
+   int i,j,y,z;
+   stb_uint16 *previous = NULL;
+   stb_uint16 *current = NULL;
+   stb_uint16 *temp;
+
+   stb_arr_setsize(previous, 4);
+   stb_arr_setsize(current, 4);
+
+   previous = stb__add_if_inactive(m, previous, m->start_node);
+   previous = stb__eps_closure(m,previous);
+   stb__clear(m, previous);
+
+   while (*str && stb_arr_len(previous)) {
+      y = stb_arr_len(previous);
+      for (i=0; i < y; ++i) {
+         stb_nfa_node *n = &m->nodes[(int) previous[i]];
+         z = stb_arr_len(n->out);
+         for (j=0; j < z; ++j) {
+            if (n->out[j].match >= 0) {
+               if (n->out[j].match == *str)
+                  current = stb__add_if_inactive(m, current, n->out[j].node);
+            } else if (n->out[j].match == -1) {
+               if (*str != '\n')
+                  current = stb__add_if_inactive(m, current, n->out[j].node);
+            } else if (n->out[j].match < -1) {
+               int z = -n->out[j].match - 2;
+               if (m->charset[(stb_uint8) *str] & (1 << z))
+                  current = stb__add_if_inactive(m, current, n->out[j].node);
+            }
+         }
+      }
+      stb_arr_setlen(previous, 0);
+
+      temp = previous;
+      previous = current;
+      current = temp;
+
+      previous = stb__eps_closure(m,previous);
+      stb__clear(m, previous);
+
+      ++str;
+   }
+
+   // transition to pick up a '$' at the end
+   y = stb_arr_len(previous);
+   for (i=0; i < y; ++i)
+      m->nodes[(int) previous[i]].active = 1;
+
+   for (i=0; i < y; ++i) {
+      stb_nfa_node *n = &m->nodes[(int) previous[i]];
+      z = stb_arr_len(n->out);
+      for (j=0; j < z; ++j) {
+         if (n->out[j].match == '\n')
+            current = stb__add_if_inactive(m, current, n->out[j].node);
+      }
+   }
+
+   previous = stb__eps_closure(m,previous);
+   stb__clear(m, previous);
+
+   y = stb_arr_len(previous);
+   for (i=0; i < y; ++i)
+      if (m->nodes[(int) previous[i]].goal)
+         result = 1;
+
+   stb_arr_free(previous);
+   stb_arr_free(current);
+
+   return result && *str == 0;
+}
+
+stb_int16 stb__get_dfa_node(stb_matcher *m, stb_uint16 *list)
+{
+   stb_uint16 node;
+   stb_uint32 data[8], *state, *newstate;
+   int i,j,n;
+
+   state = (stb_uint32 *) stb_temp(data, m->num_words_per_dfa * 4);
+   memset(state, 0, m->num_words_per_dfa*4);
+
+   n = stb_arr_len(list);
+   for (i=0; i < n; ++i) {
+      int x = list[i];
+      state[x >> 5] |= 1 << (x & 31);
+   }
+
+   // @TODO use a hash table
+   n = stb_arr_len(m->dfa_mapping);
+   i=j=0;
+   for(; j < n; ++i, j += m->num_words_per_dfa) {
+      // @TODO special case for <= 32
+      if (!memcmp(state, m->dfa_mapping + j, m->num_words_per_dfa*4)) {
+         node = i;
+         goto done;
+      }
+   }
+
+   assert(stb_arr_len(m->dfa) == i);
+   node = i;
+
+   newstate = stb_arr_addn(m->dfa_mapping, m->num_words_per_dfa);
+   memcpy(newstate, state, m->num_words_per_dfa*4);
+
+   // set all transitions to 'unknown'
+   stb_arr_add(m->dfa);
+   memset(m->dfa[i].transition, -1, sizeof(m->dfa[i].transition));
+
+   if (m->does_lex) {
+      int result = -1;
+      n = stb_arr_len(list);
+      for (i=0; i < n; ++i) {
+         if (m->nodes[(int) list[i]].goal > result)
+            result = m->nodes[(int) list[i]].goal;
+      }
+
+      stb_arr_push(m->dfa_result, result);
+   }
+
+done:
+   stb_tempfree(data, state);
+   return node;
+}
+
+static int stb__matcher_dfa(stb_matcher *m, char *str_c, int *len)
+{
+   stb_uint8 *str = (stb_uint8 *) str_c;
+   stb_int16 node,prevnode;
+   stb_dfa *trans;
+   int match_length = 0;
+   stb_int16 match_result=0;
+
+   if (m->dfa_start == STB__DFA_UNDEF) {
+      stb_uint16 *list;
+
+      m->num_words_per_dfa = (stb_arr_len(m->nodes)+31) >> 5;
+      stb__optimize(m);
+
+      list = stb__add_if_inactive(m, NULL, m->start_node);
+      list = stb__eps_closure(m,list);
+      if (m->does_lex) {
+         m->dfa_start = stb__get_dfa_node(m,list);
+         stb__clear(m, list);
+         // DON'T allow start state to be a goal state!
+         // this allows people to specify regexes that can match 0
+         // characters without them actually matching (also we don't
+         // check _before_ advancing anyway
+         if (m->dfa_start <= STB__DFA_MGOAL)
+            m->dfa_start = -(m->dfa_start - STB__DFA_MGOAL);
+      } else {
+         if (stb__clear_goalcheck(m, list))
+            m->dfa_start = STB__DFA_GOAL;
+         else
+            m->dfa_start = stb__get_dfa_node(m,list);
+      }
+      stb_arr_free(list);
+   }
+
+   prevnode = STB__DFA_UNDEF;
+   node = m->dfa_start;
+   trans = m->dfa;
+
+   if (m->dfa_start == STB__DFA_GOAL)
+      return 1;
+
+   for(;;) {
+      assert(node >= STB__DFA_VALID);
+
+      // fast inner DFA loop; especially if STB__DFA_VALID is 0
+
+      do {
+         prevnode = node;
+         node = trans[node].transition[*str++];
+      } while (node >= STB__DFA_VALID);
+
+      assert(node >= STB__DFA_MGOAL - stb_arr_len(m->dfa));
+      assert(node < stb_arr_len(m->dfa));
+
+      // special case for lex: need _longest_ match, so notice goal
+      // state without stopping
+      if (node <= STB__DFA_MGOAL) {
+         match_length = (int) (str - (stb_uint8 *) str_c);
+         node = -(node - STB__DFA_MGOAL);
+         match_result = node;
+         continue;
+      }
+
+      // slow NFA->DFA conversion
+
+      // or we hit the goal or the end of the string, but those
+      // can only happen once per search...
+
+      if (node == STB__DFA_UNDEF) {
+         // build a list  -- @TODO special case <= 32 states
+         // heck, use a more compact data structure for <= 16 and <= 8 ?!
+
+         // @TODO keep states/newstates around instead of reallocating them
+         stb_uint16 *states = NULL;
+         stb_uint16 *newstates = NULL;
+         int i,j,y,z;
+         stb_uint32 *flags = &m->dfa_mapping[prevnode * m->num_words_per_dfa];
+         assert(prevnode != STB__DFA_UNDEF);
+         stb_arr_setsize(states, 4);
+         stb_arr_setsize(newstates,4);
+         for (j=0; j < m->num_words_per_dfa; ++j) {
+            for (i=0; i < 32; ++i) {
+               if (*flags & (1 << i))
+                  stb_arr_push(states, j*32+i);
+            }
+            ++flags;
+         }
+         // states is now the states we were in in the previous node;
+         // so now we can compute what node it transitions to on str[-1]
+
+         y = stb_arr_len(states);
+         for (i=0; i < y; ++i) {
+            stb_nfa_node *n = &m->nodes[(int) states[i]];
+            z = stb_arr_len(n->out);
+            for (j=0; j < z; ++j) {
+               if (n->out[j].match >= 0) {
+                  if (n->out[j].match == str[-1] || (str[-1] == 0 && n->out[j].match == '\n'))
+                     newstates = stb__add_if_inactive(m, newstates, n->out[j].node);
+               } else if (n->out[j].match == -1) {
+                  if (str[-1] != '\n' && str[-1])
+                     newstates = stb__add_if_inactive(m, newstates, n->out[j].node);
+               } else if (n->out[j].match < -1) {
+                  int z = -n->out[j].match - 2;
+                  if (m->charset[str[-1]] & (1 << z))
+                     newstates = stb__add_if_inactive(m, newstates, n->out[j].node);
+               }
+            }
+         }
+         // AND add in the start state!
+         if (!m->match_start || (str[-1] == '\n' && !m->does_lex))
+            newstates = stb__add_if_inactive(m, newstates, m->start_node);
+         // AND epsilon close it
+         newstates = stb__eps_closure(m, newstates);
+         // if it's a goal state, then that's all there is to it
+         if (stb__clear_goalcheck(m, newstates)) {
+            if (m->does_lex) {
+               match_length = (int) (str - (stb_uint8 *) str_c);
+               node = stb__get_dfa_node(m,newstates);
+               match_result = node;
+               node = -node + STB__DFA_MGOAL;
+               trans = m->dfa; // could have gotten realloc()ed
+            } else
+               node = STB__DFA_GOAL;
+         } else if (str[-1] == 0 || stb_arr_len(newstates) == 0) {
+            node = STB__DFA_END;
+         } else {
+            node = stb__get_dfa_node(m,newstates);
+            trans = m->dfa; // could have gotten realloc()ed
+         }
+         trans[prevnode].transition[str[-1]] = node;
+         if (node <= STB__DFA_MGOAL)
+            node = -(node - STB__DFA_MGOAL);
+         stb_arr_free(newstates);
+         stb_arr_free(states);
+      }
+
+      if (node == STB__DFA_GOAL) {
+         return 1;
+      }
+      if (node == STB__DFA_END) {
+         if (m->does_lex) {
+            if (match_result) {
+               if (len) *len = match_length;
+               return m->dfa_result[(int) match_result];
+            }
+         }
+         return 0;
+      }
+
+      assert(node != STB__DFA_UNDEF);
+   }
+}
+
+int stb_matcher_find(stb_matcher *m, char *str)
+{
+   assert(m->does_lex == 0);
+   return stb__matcher_dfa(m, str, NULL);
+}
+
+int stb_lex(stb_matcher *m, char *str, int *len)
+{
+   assert(m->does_lex);
+   return stb__matcher_dfa(m, str, len);
+}
+
+#ifdef STB_PERFECT_HASH
+int stb_regex(char *regex, char *str)
+{
+   static stb_perfect p;
+   static stb_matcher ** matchers;
+   static char        ** regexps;
+   static char        ** regexp_cache;
+   static unsigned short *mapping;
+    int z = stb_perfect_hash(&p, (int)(size_t) regex);
+   if (z >= 0) {
+      if (strcmp(regex, regexp_cache[(int) mapping[z]])) {
+         int i = mapping[z];
+         stb_matcher_free(matchers[i]);
+         free(regexp_cache[i]);
+         regexps[i] = regex;
+         regexp_cache[i] = stb_p_strdup(regex);
+         matchers[i] = stb_regex_matcher(regex);
+      }
+   } else {
+      int i,n;
+      if (regex == NULL) {
+         for (i=0; i < stb_arr_len(matchers); ++i) {
+            stb_matcher_free(matchers[i]);
+            free(regexp_cache[i]);
+         }
+         stb_arr_free(matchers);
+         stb_arr_free(regexps);
+         stb_arr_free(regexp_cache);
+         stb_perfect_destroy(&p);
+         free(mapping); mapping = NULL;
+         return -1;
+      }
+      stb_arr_push(regexps, regex);
+      stb_arr_push(regexp_cache, stb_p_strdup(regex));
+      stb_arr_push(matchers, stb_regex_matcher(regex));
+      stb_perfect_destroy(&p);
+      n = stb_perfect_create(&p, (unsigned int *) (char **) regexps, stb_arr_len(regexps));
+      mapping = (unsigned short *) realloc(mapping, n * sizeof(*mapping));
+      for (i=0; i < stb_arr_len(regexps); ++i)
+          mapping[stb_perfect_hash(&p, (int)(size_t) regexps[i])] = i;
+      z = stb_perfect_hash(&p, (int)(size_t) regex);
+   }
+   return stb_matcher_find(matchers[(int) mapping[z]], str);
+}
+#endif
+#endif // STB_DEFINE
+
+
+#if 0
+//////////////////////////////////////////////////////////////////////////////
+//
+//                      C source-code introspection
+//
+
+// runtime structure
+typedef struct
+{
+   char *name;
+   char *type;     // base type
+   char *comment;  // content of comment field
+   int   size;     // size of base type
+   int   offset;   // field offset
+   int   arrcount[8]; // array sizes; -1 = pointer indirection; 0 = end of list
+} stb_info_field;
+
+typedef struct
+{
+   char *structname;
+   int size;
+   int num_fields;
+   stb_info_field *fields;
+} stb_info_struct;
+
+extern stb_info_struct stb_introspect_output[];
+
+//
+
+STB_EXTERN void stb_introspect_precompiled(stb_info_struct *compiled);
+STB_EXTERN void stb__introspect(char *path, char *file);
+
+#define stb_introspect_ship()            stb__introspect(NULL, NULL, stb__introspect_output)
+
+#ifdef STB_SHIP
+#define stb_introspect()                 stb_introspect_ship()
+#define stb_introspect_path(p)           stb_introspect_ship()
+#else
+// bootstrapping: define stb_introspect() (or 'path') the first time
+#define stb_introspect()                 stb__introspect(NULL, __FILE__, NULL)
+#define stb_introspect_auto()            stb__introspect(NULL, __FILE__, stb__introspect_output)
+
+#define stb_introspect_path(p)           stb__introspect(p, __FILE__, NULL)
+#define stb_introspect_path(p)           stb__introspect(p, __FILE__, NULL)
+#endif
+
+#ifdef STB_DEFINE
+
+#ifndef STB_INTROSPECT_CPP
+   #ifdef __cplusplus
+   #define STB_INTROSPECT_CPP 1
+   #else
+   #define STB_INTROSPECT_CPP 0
+   #endif
+#endif
+
+void stb_introspect_precompiled(stb_info_struct *compiled)
+{
+
+}
+
+
+static void stb__introspect_filename(char *buffer, char *path)
+{
+   #if STB_INTROSPECT_CPP
+   stb_p_sprintf(buffer stb_p_size(9999), "%s/stb_introspect.cpp", path);
+   #else
+   stb_p_sprintf(buffer stb_p_size(9999), "%s/stb_introspect.c", path);
+   #endif
+}
+
+static void stb__introspect_compute(char *path, char *file)
+{
+   int i;
+   char ** include_list = NULL;
+   char ** introspect_list = NULL;
+   FILE *f;
+   f = stb_p_fopen(file, "w");
+   if (!f) return;
+
+   fputs("// if you get compiler errors, change the following 0 to a 1:\n", f);
+   fputs("#define STB_INTROSPECT_INVALID 0\n\n", f);
+   fputs("// this will force the code to compile, and force the introspector\n", f);
+   fputs("// to run and then exit, allowing you to recompile\n\n\n", f);
+   fputs("#include \"stb.h\"\n\n",f );
+   fputs("#if STB_INTROSPECT_INVALID\n", f);
+   fputs("   stb_info_struct stb__introspect_output[] = { (void *) 1 }\n", f);
+   fputs("#else\n\n", f);
+   for (i=0; i < stb_arr_len(include_list); ++i)
+      fprintf(f, " #include \"%s\"\n", include_list[i]);
+
+   fputs(" stb_info_struct stb__introspect_output[] =\n{\n", f);
+   for (i=0; i < stb_arr_len(introspect_list); ++i)
+      fprintf(f, "  stb_introspect_%s,\n", introspect_list[i]);
+   fputs(" };\n", f);
+   fputs("#endif\n", f);
+   fclose(f);
+}
+
+static stb_info_struct *stb__introspect_info;
+
+#ifndef STB_SHIP
+
+#endif
+
+void stb__introspect(char *path, char *file, stb_info_struct *compiled)
+{
+   static int first=1;
+   if (!first) return;
+   first=0;
+
+   stb__introspect_info = compiled;
+
+   #ifndef STB_SHIP
+   if (path || file) {
+      int bail_flag = compiled && compiled[0].structname == (void *) 1;
+      int needs_building = bail_flag;
+      struct stb__stat st;
+      char buffer[1024], buffer2[1024];
+      if (!path) {
+         stb_splitpath(buffer, file, STB_PATH);
+         path = buffer;
+      }
+      // bail if the source path doesn't exist
+      if (!stb_fexists(path)) return;
+
+      stb__introspect_filename(buffer2, path);
+
+      // get source/include files timestamps, compare to output-file timestamp;
+      // if mismatched, regenerate
+
+      if (stb__stat(buffer2, &st))
+         needs_building = STB_TRUE;
+
+      {
+         // find any file that contains an introspection command and is newer
+         // if needs_building is already true, we don't need to do this test,
+         // but we still need these arrays, so go ahead and get them
+         char **all[3];
+         all[0] = stb_readdir_files_mask(path, "*.h");
+         all[1] = stb_readdir_files_mask(path, "*.c");
+         all[2] = stb_readdir_files_mask(path, "*.cpp");
+         int i,j;
+         if (needs_building) {
+            for (j=0; j < 3; ++j) {
+               for (i=0; i < stb_arr_len(all[j]); ++i) {
+                  struct stb__stat st2;
+                  if (!stb__stat(all[j][i], &st2)) {
+                     if (st.st_mtime < st2.st_mtime) {
+                        char *z = stb_filec(all[j][i], NULL);
+                        int found=STB_FALSE;
+                        while (y) {
+                           y = strstr(y, "//si");
+                           if (y && isspace(y[4])) {
+                              found = STB_TRUE;
+                              break;
+                           }
+                        }
+                        needs_building = STB_TRUE;
+                        goto done;
+                     }
+                  }
+               }
+            }
+           done:;
+         }
+               char *z = stb_filec(all[i], NULL), *y = z;
+               int found=STB_FALSE;
+               while (y) {
+                  y = strstr(y, "//si");
+                  if (y && isspace(y[4])) {
+                     found = STB_TRUE;
+                     break;
+                  }
+               }
+               if (found)
+                  stb_arr_push(introspect_h, stb_p_strdup(all[i]));
+               free(z);
+            }
+         }
+         stb_readdir_free(all);
+         if (!needs_building) {
+            for (i=0; i < stb_arr_len(introspect_h); ++i) {
+               struct stb__stat st2;
+               if (!stb__stat(introspect_h[i], &st2))
+                  if (st.st_mtime < st2.st_mtime)
+                     needs_building = STB_TRUE;
+            }
+         }
+
+         if (needs_building) {
+            stb__introspect_compute(path, buffer2);
+         }
+      }
+   }
+   #endif
+}
+#endif
+#endif
+
+#ifdef STB_INTROSPECT
+// compile-time code-generator
+#define INTROSPECT(x)   int main(int argc, char **argv) { stb__introspect(__FILE__); return 0; }
+#define FILE(x)
+
+void stb__introspect(char *filename)
+{
+   char *file = stb_file(filename, NULL);
+   char *s = file, *t, **p;
+   char *out_name = "stb_introspect.c";
+   char *out_path;
+   STB_ARR(char) filelist = NULL;
+   int i,n;
+   if (!file) stb_fatal("Couldn't open %s", filename);
+
+   out_path = stb_splitpathdup(filename, STB_PATH);
+
+   // search for the macros
+   while (*s) {
+      char buffer[256];
+      while (*s && !isupper(*s)) ++s;
+      s = stb_strtok_invert(buffer, s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ");
+      s = stb_skipwhite(s);
+      if (*s == '(') {
+         ++s;
+         t = strchr(s, ')');
+         if (t == NULL) stb_fatal("Error parsing %s", filename);
+
+      }
+   }
+}
+
+
+
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//             STB-C sliding-window dictionary compression
+//
+//  This uses a DEFLATE-style sliding window, but no bitwise entropy.
+//  Everything is on byte boundaries, so you could then apply a byte-wise
+//  entropy code, though that's nowhere near as effective.
+//
+//  An STB-C stream begins with a 16-byte header:
+//      4 bytes: 0x57 0xBC 0x00 0x00
+//      8 bytes: big-endian size of decompressed data, 64-bits
+//      4 bytes: big-endian size of window (how far back decompressor may need)
+//
+//  The following symbols appear in the stream (these were determined ad hoc,
+//  not by analysis):
+//
+//  [dict]      00000100 yyyyyyyy yyyyyyyy yyyyyyyy xxxxxxxx xxxxxxxx
+//  [END]       00000101 11111010 cccccccc cccccccc cccccccc cccccccc
+//  [dict]      00000110 yyyyyyyy yyyyyyyy yyyyyyyy xxxxxxxx
+//  [literals]  00000111 zzzzzzzz zzzzzzzz
+//  [literals]  00001zzz zzzzzzzz
+//  [dict]      00010yyy yyyyyyyy yyyyyyyy xxxxxxxx xxxxxxxx
+//  [dict]      00011yyy yyyyyyyy yyyyyyyy xxxxxxxx
+//  [literals]  001zzzzz
+//  [dict]      01yyyyyy yyyyyyyy xxxxxxxx
+//  [dict]      1xxxxxxx yyyyyyyy
+//
+//      xxxxxxxx: match length - 1
+//      yyyyyyyy: backwards distance - 1
+//      zzzzzzzz: num literals - 1
+//      cccccccc: adler32 checksum of decompressed data
+//   (all big-endian)
+
+
+STB_EXTERN stb_uint stb_decompress_length(stb_uchar *input);
+STB_EXTERN stb_uint stb_decompress(stb_uchar *out,stb_uchar *in,stb_uint len);
+STB_EXTERN stb_uint stb_compress  (stb_uchar *out,stb_uchar *in,stb_uint len);
+STB_EXTERN void stb_compress_window(int z);
+STB_EXTERN void stb_compress_hashsize(unsigned int z);
+
+STB_EXTERN int stb_compress_tofile(char *filename, char *in,  stb_uint  len);
+STB_EXTERN int stb_compress_intofile(FILE *f, char *input,    stb_uint  len);
+STB_EXTERN char *stb_decompress_fromfile(char *filename,      stb_uint *len);
+
+STB_EXTERN int stb_compress_stream_start(FILE *f);
+STB_EXTERN void stb_compress_stream_end(int close);
+STB_EXTERN void stb_write(char *data, int data_len);
+
+#ifdef STB_DEFINE
+
+stb_uint stb_decompress_length(stb_uchar *input)
+{
+   return (input[8] << 24) + (input[9] << 16) + (input[10] << 8) + input[11];
+}
+
+////////////////////           decompressor         ///////////////////////
+
+// simple implementation that just writes whole thing into big block
+
+static unsigned char *stb__barrier;
+static unsigned char *stb__barrier2;
+static unsigned char *stb__barrier3;
+static unsigned char *stb__barrier4;
+
+static stb_uchar *stb__dout;
+static void stb__match(stb_uchar *data, stb_uint length)
+{
+   // INVERSE of memmove... write each byte before copying the next...
+   assert (stb__dout + length <= stb__barrier);
+   if (stb__dout + length > stb__barrier) { stb__dout += length; return; }
+   if (data < stb__barrier4) { stb__dout = stb__barrier+1; return; }
+   while (length--) *stb__dout++ = *data++;
+}
+
+static void stb__lit(stb_uchar *data, stb_uint length)
+{
+   assert (stb__dout + length <= stb__barrier);
+   if (stb__dout + length > stb__barrier) { stb__dout += length; return; }
+   if (data < stb__barrier2) { stb__dout = stb__barrier+1; return; }
+   memcpy(stb__dout, data, length);
+   stb__dout += length;
+}
+
+#define stb__in2(x)   ((i[x] << 8) + i[(x)+1])
+#define stb__in3(x)   ((i[x] << 16) + stb__in2((x)+1))
+#define stb__in4(x)   ((i[x] << 24) + stb__in3((x)+1))
+
+static stb_uchar *stb_decompress_token(stb_uchar *i)
+{
+   if (*i >= 0x20) { // use fewer if's for cases that expand small
+      if (*i >= 0x80)       stb__match(stb__dout-i[1]-1, i[0] - 0x80 + 1), i += 2;
+      else if (*i >= 0x40)  stb__match(stb__dout-(stb__in2(0) - 0x4000 + 1), i[2]+1), i += 3;
+      else /* *i >= 0x20 */ stb__lit(i+1, i[0] - 0x20 + 1), i += 1 + (i[0] - 0x20 + 1);
+   } else { // more ifs for cases that expand large, since overhead is amortized
+      if (*i >= 0x18)       stb__match(stb__dout-(stb__in3(0) - 0x180000 + 1), i[3]+1), i += 4;
+      else if (*i >= 0x10)  stb__match(stb__dout-(stb__in3(0) - 0x100000 + 1), stb__in2(3)+1), i += 5;
+      else if (*i >= 0x08)  stb__lit(i+2, stb__in2(0) - 0x0800 + 1), i += 2 + (stb__in2(0) - 0x0800 + 1);
+      else if (*i == 0x07)  stb__lit(i+3, stb__in2(1) + 1), i += 3 + (stb__in2(1) + 1);
+      else if (*i == 0x06)  stb__match(stb__dout-(stb__in3(1)+1), i[4]+1), i += 5;
+      else if (*i == 0x04)  stb__match(stb__dout-(stb__in3(1)+1), stb__in2(4)+1), i += 6;
+   }
+   return i;
+}
+
+stb_uint stb_decompress(stb_uchar *output, stb_uchar *i, stb_uint length)
+{
+   stb_uint olen;
+   if (stb__in4(0) != 0x57bC0000) return 0;
+   if (stb__in4(4) != 0)          return 0; // error! stream is > 4GB
+   olen = stb_decompress_length(i);
+   stb__barrier2 = i;
+   stb__barrier3 = i+length;
+   stb__barrier = output + olen;
+   stb__barrier4 = output;
+   i += 16;
+
+   stb__dout = output;
+   while (1) {
+      stb_uchar *old_i = i;
+      i = stb_decompress_token(i);
+      if (i == old_i) {
+         if (*i == 0x05 && i[1] == 0xfa) {
+            assert(stb__dout == output + olen);
+            if (stb__dout != output + olen) return 0;
+            if (stb_adler32(1, output, olen) != (stb_uint) stb__in4(2))
+               return 0;
+            return olen;
+         } else {
+            assert(0); /* NOTREACHED */
+            return 0;
+         }
+      }
+      assert(stb__dout <= output + olen);
+      if (stb__dout > output + olen)
+         return 0;
+   }
+}
+
+char *stb_decompress_fromfile(char *filename, unsigned int *len)
+{
+   unsigned int n;
+   char *q;
+   unsigned char *p;
+   FILE *f = stb_p_fopen(filename, "rb");   if (f == NULL) return NULL;
+   fseek(f, 0, SEEK_END);
+   n = ftell(f);
+   fseek(f, 0, SEEK_SET);
+   p = (unsigned char * ) malloc(n); if (p == NULL) return NULL;
+   fread(p, 1, n, f);
+   fclose(f);
+   if (p == NULL) return NULL;
+   if (p[0] != 0x57 || p[1] != 0xBc || p[2] || p[3]) { free(p); return NULL; }
+   q = (char *) malloc(stb_decompress_length(p)+1);
+   if (!q) { free(p); return NULL; }
+   *len = stb_decompress((unsigned char *) q, p, n);
+   if (*len) q[*len] = 0;
+   free(p);
+   return q;
+}
+
+#if 0
+//  streaming decompressor
+
+static struct
+{
+   stb__uchar *in_buffer;
+   stb__uchar *match;
+
+   stb__uint pending_literals;
+   stb__uint pending_match;
+} xx;
+
+
+
+static void stb__match(stb_uchar *data, stb_uint length)
+{
+   // INVERSE of memmove... write each byte before copying the next...
+   assert (stb__dout + length <= stb__barrier);
+   if (stb__dout + length > stb__barrier) { stb__dout += length; return; }
+   if (data < stb__barrier2) { stb__dout = stb__barrier+1; return; }
+   while (length--) *stb__dout++ = *data++;
+}
+
+static void stb__lit(stb_uchar *data, stb_uint length)
+{
+   assert (stb__dout + length <= stb__barrier);
+   if (stb__dout + length > stb__barrier) { stb__dout += length; return; }
+   if (data < stb__barrier2) { stb__dout = stb__barrier+1; return; }
+   memcpy(stb__dout, data, length);
+   stb__dout += length;
+}
+
+static void sx_match(stb_uchar *data, stb_uint length)
+{
+   xx.match = data;
+   xx.pending_match = length;
+}
+
+static void sx_lit(stb_uchar *data, stb_uint length)
+{
+   xx.pending_lit = length;
+}
+
+static int stb_decompress_token_state(void)
+{
+   stb__uchar *i = xx.in_buffer;
+
+   if (*i >= 0x20) { // use fewer if's for cases that expand small
+      if (*i >= 0x80)       sx_match(stb__dout-i[1]-1, i[0] - 0x80 + 1), i += 2;
+      else if (*i >= 0x40)  sx_match(stb__dout-(stb__in2(0) - 0x4000 + 1), i[2]+1), i += 3;
+      else /* *i >= 0x20 */ sx_lit(i+1, i[0] - 0x20 + 1), i += 1;
+   } else { // more ifs for cases that expand large, since overhead is amortized
+      if (*i >= 0x18)       sx_match(stb__dout-(stb__in3(0) - 0x180000 + 1), i[3]+1), i += 4;
+      else if (*i >= 0x10)  sx_match(stb__dout-(stb__in3(0) - 0x100000 + 1), stb__in2(3)+1), i += 5;
+      else if (*i >= 0x08)  sx_lit(i+2, stb__in2(0) - 0x0800 + 1), i += 2;
+      else if (*i == 0x07)  sx_lit(i+3, stb__in2(1) + 1), i += 3;
+      else if (*i == 0x06)  sx_match(stb__dout-(stb__in3(1)+1), i[4]+1), i += 5;
+      else if (*i == 0x04)  sx_match(stb__dout-(stb__in3(1)+1), stb__in2(4)+1), i += 6;
+      else return 0;
+   }
+   xx.in_buffer = i;
+   return 1;
+}
+#endif
+
+
+
+////////////////////           compressor         ///////////////////////
+
+static unsigned int stb_matchlen(stb_uchar *m1, stb_uchar *m2, stb_uint maxlen)
+{
+   stb_uint i;
+   for (i=0; i < maxlen; ++i)
+      if (m1[i] != m2[i]) return i;
+   return i;
+}
+
+// simple implementation that just takes the source data in a big block
+
+static stb_uchar *stb__out;
+static FILE      *stb__outfile;
+static stb_uint   stb__outbytes;
+
+static void stb__write(unsigned char v)
+{
+   fputc(v, stb__outfile);
+   ++stb__outbytes;
+}
+
+#define stb_out(v)    (stb__out ? (void)(*stb__out++ = (stb_uchar) (v)) : stb__write((stb_uchar) (v)))
+
+static void stb_out2(stb_uint v)
+{
+   stb_out(v >> 8);
+   stb_out(v);
+}
+
+static void stb_out3(stb_uint v) { stb_out(v >> 16); stb_out(v >> 8); stb_out(v); }
+static void stb_out4(stb_uint v) { stb_out(v >> 24); stb_out(v >> 16);
+                                   stb_out(v >> 8 ); stb_out(v);                  }
+
+static void outliterals(stb_uchar *in, ptrdiff_t numlit)
+{
+   while (numlit > 65536) {
+      outliterals(in,65536);
+      in     += 65536;
+      numlit -= 65536;
+   }
+
+   if      (numlit ==     0)    ;
+   else if (numlit <=    32)    stb_out (0x000020 + (stb_uint) numlit-1);
+   else if (numlit <=  2048)    stb_out2(0x000800 + (stb_uint) numlit-1);
+   else /*  numlit <= 65536) */ stb_out3(0x070000 + (stb_uint) numlit-1);
+
+   if (stb__out) {
+      memcpy(stb__out,in,numlit);
+      stb__out += numlit;
+   } else
+      fwrite(in, 1, numlit, stb__outfile);
+}
+
+static int stb__window = 0x40000; // 256K
+void stb_compress_window(int z)
+{
+   if (z >= 0x1000000) z = 0x1000000; // limit of implementation
+   if (z <      0x100) z = 0x100;   // insanely small
+   stb__window = z;
+}
+
+static int stb_not_crap(int best, int dist)
+{
+   return   ((best > 2  &&  dist <= 0x00100)
+          || (best > 5  &&  dist <= 0x04000)
+          || (best > 7  &&  dist <= 0x80000));
+}
+
+static  stb_uint stb__hashsize = 32768;
+void stb_compress_hashsize(unsigned int y)
+{
+   unsigned int z = 1024;
+   while (z < y) z <<= 1;
+   stb__hashsize = z >> 2;   // pass in bytes, store #pointers
+}
+
+// note that you can play with the hashing functions all you
+// want without needing to change the decompressor
+#define stb__hc(q,h,c)      (((h) << 7) + ((h) >> 25) + q[c])
+#define stb__hc2(q,h,c,d)   (((h) << 14) + ((h) >> 18) + (q[c] << 7) + q[d])
+#define stb__hc3(q,c,d,e)   ((q[c] << 14) + (q[d] << 7) + q[e])
+
+static stb_uint32 stb__running_adler;
+
+static int stb_compress_chunk(stb_uchar *history,
+                              stb_uchar *start,
+                              stb_uchar *end,
+                              int length,
+                              int *pending_literals,
+                              stb_uchar **chash,
+                              stb_uint mask)
+{
+   int window = stb__window;
+   stb_uint match_max;
+   stb_uchar *lit_start = start - *pending_literals;
+   stb_uchar *q = start;
+
+   #define STB__SCRAMBLE(h)   (((h) + ((h) >> 16)) & mask)
+
+   // stop short of the end so we don't scan off the end doing
+   // the hashing; this means we won't compress the last few bytes
+   // unless they were part of something longer
+   while (q < start+length && q+12 < end) {
+      int m;
+      stb_uint h1,h2,h3,h4, h;
+      stb_uchar *t;
+      int best = 2, dist=0;
+
+      if (q+65536 > end)
+         match_max = (stb_uint) (end-q);
+      else
+         match_max = 65536u;
+
+      #define stb__nc(b,d)  ((d) <= window && ((b) > 9 || stb_not_crap(b,d)))
+
+      #define STB__TRY(t,p)  /* avoid retrying a match we already tried */ \
+                      if (p ? dist != (int) (q-t) : 1)                     \
+                      if ((m = (int) stb_matchlen(t, q, match_max)) > best)\
+                      if (stb__nc(m,(int) (q-(t))))                        \
+                          best = m, dist = (int) (q - (t))
+
+      // rather than search for all matches, only try 4 candidate locations,
+      // chosen based on 4 different hash functions of different lengths.
+      // this strategy is inspired by LZO; hashing is unrolled here using the
+      // 'hc' macro
+      h = stb__hc3(q,0, 1, 2); h1 = STB__SCRAMBLE(h);
+                                      t = chash[h1]; if (t) STB__TRY(t,0);
+      h = stb__hc2(q,h, 3, 4); h2 = STB__SCRAMBLE(h);
+      h = stb__hc2(q,h, 5, 6);        t = chash[h2]; if (t) STB__TRY(t,1);
+      h = stb__hc2(q,h, 7, 8); h3 = STB__SCRAMBLE(h);
+      h = stb__hc2(q,h, 9,10);        t = chash[h3]; if (t) STB__TRY(t,1);
+      h = stb__hc2(q,h,11,12); h4 = STB__SCRAMBLE(h);
+                                      t = chash[h4]; if (t) STB__TRY(t,1);
+
+      // because we use a shared hash table, can only update it
+      // _after_ we've probed all of them
+      chash[h1] = chash[h2] = chash[h3] = chash[h4] = q;
+
+      if (best > 2)
+         assert(dist > 0);
+
+      // see if our best match qualifies
+      if (best < 3) { // fast path literals
+         ++q;
+      } else if (best > 2  &&  best <= 0x80    &&  dist <= 0x100) {
+         outliterals(lit_start, q-lit_start); lit_start = (q += best);
+         stb_out(0x80 + best-1);
+         stb_out(dist-1);
+      } else if (best > 5  &&  best <= 0x100   &&  dist <= 0x4000) {
+         outliterals(lit_start, q-lit_start); lit_start = (q += best);
+         stb_out2(0x4000 + dist-1);
+         stb_out(best-1);
+      } else if (best > 7  &&  best <= 0x100   &&  dist <= 0x80000) {
+         outliterals(lit_start, q-lit_start); lit_start = (q += best);
+         stb_out3(0x180000 + dist-1);
+         stb_out(best-1);
+      } else if (best > 8  &&  best <= 0x10000 &&  dist <= 0x80000) {
+         outliterals(lit_start, q-lit_start); lit_start = (q += best);
+         stb_out3(0x100000 + dist-1);
+         stb_out2(best-1);
+      } else if (best > 9                      &&  dist <= 0x1000000) {
+         if (best > 65536) best = 65536;
+         outliterals(lit_start, q-lit_start); lit_start = (q += best);
+         if (best <= 0x100) {
+            stb_out(0x06);
+            stb_out3(dist-1);
+            stb_out(best-1);
+         } else {
+            stb_out(0x04);
+            stb_out3(dist-1);
+            stb_out2(best-1);
+         }
+      } else {  // fallback literals if no match was a balanced tradeoff
+         ++q;
+      }
+   }
+
+   // if we didn't get all the way, add the rest to literals
+   if (q-start < length)
+      q = start+length;
+
+   // the literals are everything from lit_start to q
+   *pending_literals = (int) (q - lit_start);
+
+   stb__running_adler = stb_adler32(stb__running_adler, start, (int) (q - start));
+   return (int) (q - start);
+}
+
+static int stb_compress_inner(stb_uchar *input, stb_uint length)
+{
+   int literals = 0;
+   stb_uint len,i;
+
+   stb_uchar **chash;
+   chash = (stb_uchar**) malloc(stb__hashsize * sizeof(stb_uchar*));
+   if (chash == NULL) return 0; // failure
+   for (i=0; i < stb__hashsize; ++i)
+      chash[i] = NULL;
+
+   // stream signature
+   stb_out(0x57); stb_out(0xbc);
+   stb_out2(0);
+
+   stb_out4(0);       // 64-bit length requires 32-bit leading 0
+   stb_out4(length);
+   stb_out4(stb__window);
+
+   stb__running_adler = 1;
+
+   len = stb_compress_chunk(input, input, input+length, length, &literals, chash, stb__hashsize-1);
+   assert(len == length);
+
+   outliterals(input+length - literals, literals);
+
+   free(chash);
+
+   stb_out2(0x05fa); // end opcode
+
+   stb_out4(stb__running_adler);
+
+   return 1; // success
+}
+
+stb_uint stb_compress(stb_uchar *out, stb_uchar *input, stb_uint length)
+{
+   stb__out = out;
+   stb__outfile = NULL;
+
+   stb_compress_inner(input, length);
+
+   return (stb_uint) (stb__out - out);
+}
+
+int stb_compress_tofile(char *filename, char *input, unsigned int length)
+{
+   //int maxlen = length + 512 + (length >> 2); // total guess
+   //char *buffer = (char *) malloc(maxlen);
+   //int blen = stb_compress((stb_uchar*)buffer, (stb_uchar*)input, length);
+
+   stb__out = NULL;
+   stb__outfile = stb_p_fopen(filename, "wb");
+   if (!stb__outfile) return 0;
+
+   stb__outbytes = 0;
+
+   if (!stb_compress_inner((stb_uchar*)input, length))
+      return 0;
+
+   fclose(stb__outfile);
+
+   return stb__outbytes;
+}
+
+int stb_compress_intofile(FILE *f, char *input, unsigned int length)
+{
+   //int maxlen = length + 512 + (length >> 2); // total guess
+   //char *buffer = (char*)malloc(maxlen);
+   //int blen = stb_compress((stb_uchar*)buffer, (stb_uchar*)input, length);
+
+   stb__out = NULL;
+   stb__outfile = f;
+   if (!stb__outfile) return 0;
+
+   stb__outbytes = 0;
+
+   if (!stb_compress_inner((stb_uchar*)input, length))
+      return 0;
+
+   return stb__outbytes;
+}
+
+//////////////////////    streaming I/O version    /////////////////////
+
+
+static size_t stb_out_backpatch_id(void)
+{
+   if (stb__out)
+      return (size_t) stb__out;
+   else
+      return ftell(stb__outfile);
+}
+
+static void stb_out_backpatch(size_t id, stb_uint value)
+{
+   stb_uchar data[4] = { (stb_uchar)(value >> 24), (stb_uchar)(value >> 16), (stb_uchar)(value >> 8), (stb_uchar)(value) };
+   if (stb__out) {
+      memcpy((void *) id, data, 4);
+   } else {
+      stb_uint where = ftell(stb__outfile);
+      fseek(stb__outfile, (long) id, SEEK_SET);
+      fwrite(data, 4, 1, stb__outfile);
+      fseek(stb__outfile, where, SEEK_SET);
+   }
+}
+
+// ok, the wraparound buffer was a total failure. let's instead
+// use a copying-in-place buffer, which lets us share the code.
+// This is way less efficient but it'll do for now.
+
+static struct
+{
+   stb_uchar *buffer;
+   int size;           // physical size of buffer in bytes
+
+   int valid;          // amount of valid data in bytes
+   int start;          // bytes of data already output
+
+   int window;
+   int fsize;
+
+   int pending_literals; // bytes not-quite output but counted in start
+   int length_id;
+
+   stb_uint total_bytes;
+
+   stb_uchar **chash;
+   stb_uint    hashmask;
+} xtb;
+
+static int stb_compress_streaming_start(void)
+{
+   stb_uint i;
+   xtb.size = stb__window * 3;
+   xtb.buffer = (stb_uchar*)malloc(xtb.size);
+   if (!xtb.buffer) return 0;
+
+   xtb.chash = (stb_uchar**)malloc(sizeof(*xtb.chash) * stb__hashsize);
+   if (!xtb.chash) {
+      free(xtb.buffer);
+      return 0;
+   }
+
+   for (i=0; i < stb__hashsize; ++i)
+      xtb.chash[i] = NULL;
+
+   xtb.hashmask = stb__hashsize-1;
+
+   xtb.valid        = 0;
+   xtb.start        = 0;
+   xtb.window       = stb__window;
+   xtb.fsize        = stb__window;
+   xtb.pending_literals = 0;
+   xtb.total_bytes  = 0;
+
+      // stream signature
+   stb_out(0x57); stb_out(0xbc); stb_out2(0);
+
+   stb_out4(0);       // 64-bit length requires 32-bit leading 0
+
+   xtb.length_id = (int) stb_out_backpatch_id();
+   stb_out4(0);       // we don't know the output length yet
+
+   stb_out4(stb__window);
+
+   stb__running_adler = 1;
+
+   return 1;
+}
+
+static int stb_compress_streaming_end(void)
+{
+   // flush out any remaining data
+   stb_compress_chunk(xtb.buffer, xtb.buffer+xtb.start, xtb.buffer+xtb.valid,
+                      xtb.valid-xtb.start, &xtb.pending_literals, xtb.chash, xtb.hashmask);
+
+   // write out pending literals
+   outliterals(xtb.buffer + xtb.valid - xtb.pending_literals, xtb.pending_literals);
+
+   stb_out2(0x05fa); // end opcode
+   stb_out4(stb__running_adler);
+
+   stb_out_backpatch(xtb.length_id, xtb.total_bytes);
+
+   free(xtb.buffer);
+   free(xtb.chash);
+   return 1;
+}
+
+void stb_write(char *data, int data_len)
+{
+   stb_uint i;
+
+   // @TODO: fast path for filling the buffer and doing nothing else
+   //   if (xtb.valid + data_len < xtb.size)
+
+   xtb.total_bytes += data_len;
+
+   while (data_len) {
+      // fill buffer
+      if (xtb.valid < xtb.size) {
+         int amt = xtb.size - xtb.valid;
+         if (data_len < amt) amt = data_len;
+         memcpy(xtb.buffer + xtb.valid, data, amt);
+         data_len -= amt;
+         data     += amt;
+         xtb.valid += amt;
+      }
+      if (xtb.valid < xtb.size)
+         return;
+
+      // at this point, the buffer is full
+
+      // if we can process some data, go for it; make sure
+      // we leave an 'fsize's worth of data, though
+      if (xtb.start + xtb.fsize < xtb.valid) {
+         int amount = (xtb.valid - xtb.fsize) - xtb.start;
+         int n;
+         assert(amount > 0);
+         n = stb_compress_chunk(xtb.buffer, xtb.buffer + xtb.start, xtb.buffer + xtb.valid,
+                                amount, &xtb.pending_literals, xtb.chash, xtb.hashmask);
+         xtb.start += n;
+      }
+
+      assert(xtb.start + xtb.fsize >= xtb.valid);
+      // at this point, our future size is too small, so we
+      // need to flush some history. we, in fact, flush exactly
+      // one window's worth of history
+
+      {
+         int flush = xtb.window;
+         assert(xtb.start >= flush);
+         assert(xtb.valid >= flush);
+
+         // if 'pending literals' extends back into the shift region,
+         // write them out
+         if (xtb.start - xtb.pending_literals < flush) {
+            outliterals(xtb.buffer + xtb.start - xtb.pending_literals, xtb.pending_literals);
+            xtb.pending_literals = 0;
+         }
+
+         // now shift the window
+         memmove(xtb.buffer, xtb.buffer + flush, xtb.valid - flush);
+         xtb.start -= flush;
+         xtb.valid -= flush;
+
+         for (i=0; i <= xtb.hashmask; ++i)
+            if (xtb.chash[i] < xtb.buffer + flush)
+               xtb.chash[i] = NULL;
+            else
+               xtb.chash[i] -= flush;
+      }
+      // and now that we've made room for more data, go back to the top
+   }
+}
+
+int stb_compress_stream_start(FILE *f)
+{
+   stb__out = NULL;
+   stb__outfile = f;
+
+   if (f == NULL)
+      return 0;
+
+   if (!stb_compress_streaming_start())
+      return 0;
+
+   return 1;
+}
+
+void stb_compress_stream_end(int close)
+{
+   stb_compress_streaming_end();
+   if (close && stb__outfile) {
+      fclose(stb__outfile);
+   }
+}
+
+#endif // STB_DEFINE
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  File abstraction... tired of not having this... we can write
+//  compressors to be layers over these that auto-close their children.
+
+
+typedef struct stbfile
+{
+   int (*getbyte)(struct stbfile *);  // -1 on EOF
+   unsigned int (*getdata)(struct stbfile *, void *block, unsigned int len);
+
+   int (*putbyte)(struct stbfile *, int byte);
+   unsigned int (*putdata)(struct stbfile *, void *block, unsigned int len);
+
+   unsigned int (*size)(struct stbfile *);
+
+   unsigned int (*tell)(struct stbfile *);
+   void (*backpatch)(struct stbfile *, unsigned int tell, void *block, unsigned int len);
+
+   void (*close)(struct stbfile *);
+
+   FILE *f;  // file to fread/fwrite
+   unsigned char *buffer; // input/output buffer
+   unsigned char *indata, *inend; // input buffer
+   union {
+      int various;
+      void *ptr;
+   };
+} stbfile;
+
+STB_EXTERN unsigned int stb_getc(stbfile *f); // read
+STB_EXTERN int stb_putc(stbfile *f, int ch); // write
+STB_EXTERN unsigned int stb_getdata(stbfile *f, void *buffer, unsigned int len); // read
+STB_EXTERN unsigned int stb_putdata(stbfile *f, void *buffer, unsigned int len); // write
+STB_EXTERN unsigned int stb_tell(stbfile *f); // read
+STB_EXTERN unsigned int stb_size(stbfile *f); // read/write
+STB_EXTERN void stb_backpatch(stbfile *f, unsigned int tell, void *buffer, unsigned int len); // write
+
+#ifdef STB_DEFINE
+
+unsigned int stb_getc(stbfile *f) { return f->getbyte(f); }
+int stb_putc(stbfile *f, int ch)  { return f->putbyte(f, ch); }
+
+unsigned int stb_getdata(stbfile *f, void *buffer, unsigned int len)
+{
+   return f->getdata(f, buffer, len);
+}
+unsigned int stb_putdata(stbfile *f, void *buffer, unsigned int len)
+{
+   return f->putdata(f, buffer, len);
+}
+void stb_close(stbfile *f)
+{
+   f->close(f);
+   free(f);
+}
+unsigned int stb_tell(stbfile *f) { return f->tell(f); }
+unsigned int stb_size(stbfile *f) { return f->size(f); }
+void stb_backpatch(stbfile *f, unsigned int tell, void *buffer, unsigned int len)
+{
+   f->backpatch(f,tell,buffer,len);
+}
+
+// FILE * implementation
+static int stb__fgetbyte(stbfile *f) { return fgetc(f->f); }
+static int stb__fputbyte(stbfile *f, int ch) { return fputc(ch, f->f)==0; }
+static unsigned int stb__fgetdata(stbfile *f, void *buffer, unsigned int len) { return (unsigned int) fread(buffer,1,len,f->f); }
+static unsigned int stb__fputdata(stbfile *f, void *buffer, unsigned int len) { return (unsigned int) fwrite(buffer,1,len,f->f); }
+static unsigned int stb__fsize(stbfile *f) { return (unsigned int) stb_filelen(f->f); }
+static unsigned int stb__ftell(stbfile *f) { return (unsigned int) ftell(f->f); }
+static void stb__fbackpatch(stbfile *f, unsigned int where, void *buffer, unsigned int len)
+{
+   fseek(f->f, where, SEEK_SET);
+   fwrite(buffer, 1, len, f->f);
+   fseek(f->f, 0, SEEK_END);
+}
+static void         stb__fclose(stbfile *f) { fclose(f->f); }
+
+stbfile *stb_openf(FILE *f)
+{
+   stbfile m = { stb__fgetbyte, stb__fgetdata,
+                 stb__fputbyte, stb__fputdata,
+                 stb__fsize, stb__ftell, stb__fbackpatch, stb__fclose,
+                 0,0,0, };
+   stbfile *z = (stbfile *) malloc(sizeof(*z));
+   if (z) {
+      *z = m;
+      z->f = f;
+   }
+   return z;
+}
+
+static int stb__nogetbyte(stbfile *f) { assert(0); return -1; }
+static unsigned int stb__nogetdata(stbfile *f, void *buffer, unsigned int len) { assert(0); return 0; }
+static int stb__noputbyte(stbfile *f, int ch) { assert(0); return 0; }
+static unsigned int stb__noputdata(stbfile *f, void *buffer, unsigned int len) { assert(0); return 0; }
+static void stb__nobackpatch(stbfile *f, unsigned int where, void *buffer, unsigned int len) { assert(0); }
+
+static int stb__bgetbyte(stbfile *s)
+{
+   if (s->indata < s->inend)
+      return *s->indata++;
+   else
+      return -1;
+}
+
+static unsigned int stb__bgetdata(stbfile *s, void *buffer, unsigned int len)
+{
+   if (s->indata + len > s->inend)
+      len = (unsigned int) (s->inend - s->indata);
+   memcpy(buffer, s->indata, len);
+   s->indata += len;
+   return len;
+}
+static unsigned int stb__bsize(stbfile *s) { return (unsigned int) (s->inend  - s->buffer); }
+static unsigned int stb__btell(stbfile *s) { return (unsigned int) (s->indata - s->buffer); }
+
+static void stb__bclose(stbfile *s)
+{
+   if (s->various)
+      free(s->buffer);
+}
+
+stbfile *stb_open_inbuffer(void *buffer, unsigned int len)
+{
+   stbfile m = { stb__bgetbyte, stb__bgetdata,
+                 stb__noputbyte, stb__noputdata,
+                 stb__bsize, stb__btell, stb__nobackpatch, stb__bclose };
+   stbfile *z = (stbfile *) malloc(sizeof(*z));
+   if (z) {
+      *z = m;
+      z->buffer = (unsigned char *) buffer;
+      z->indata = z->buffer;
+      z->inend = z->indata + len;
+   }
+   return z;
+}
+
+stbfile *stb_open_inbuffer_free(void *buffer, unsigned int len)
+{
+   stbfile *z = stb_open_inbuffer(buffer, len);
+   if (z)
+      z->various = 1; // free
+   return z;
+}
+
+#ifndef STB_VERSION
+// if we've been cut-and-pasted elsewhere, you get a limited
+// version of stb_open, without the 'k' flag and utf8 support
+static void stb__fclose2(stbfile *f)
+{
+   fclose(f->f);
+}
+
+stbfile *stb_open(char *filename, char *mode)
+{
+   FILE *f = stb_p_fopen(filename, mode);
+   stbfile *s;
+   if (f == NULL) return NULL;
+   s = stb_openf(f);
+   if (s)
+      s->close = stb__fclose2;
+   return s;
+}
+#else
+// the full version depends on some code in stb.h; this
+// also includes the memory buffer output format implemented with stb_arr
+static void stb__fclose2(stbfile *f)
+{
+   stb_fclose(f->f, f->various);
+}
+
+stbfile *stb_open(char *filename, char *mode)
+{
+   FILE *f = stb_fopen(filename, mode[0] == 'k' ? mode+1 : mode);
+   stbfile *s;
+   if (f == NULL) return NULL;
+   s = stb_openf(f);
+   if (s) {
+      s->close = stb__fclose2;
+      s->various = mode[0] == 'k' ? stb_keep_if_different : stb_keep_yes;
+   }
+   return s;
+}
+
+static int stb__aputbyte(stbfile *f, int ch)
+{
+   stb_arr_push(f->buffer, ch);
+   return 1;
+}
+static unsigned int stb__aputdata(stbfile *f, void *data, unsigned int len)
+{
+   memcpy(stb_arr_addn(f->buffer, (int) len), data, len);
+   return len;
+}
+static unsigned int stb__asize(stbfile *f) { return stb_arr_len(f->buffer); }
+static void stb__abackpatch(stbfile *f, unsigned int where, void *data, unsigned int len)
+{
+   memcpy(f->buffer+where, data, len);
+}
+static void stb__aclose(stbfile *f)
+{
+   *(unsigned char **) f->ptr = f->buffer;
+}
+
+stbfile *stb_open_outbuffer(unsigned char **update_on_close)
+{
+   stbfile m = { stb__nogetbyte, stb__nogetdata,
+                 stb__aputbyte, stb__aputdata,
+                 stb__asize, stb__asize, stb__abackpatch, stb__aclose };
+   stbfile *z = (stbfile *) malloc(sizeof(*z));
+   if (z) {
+      z->ptr = update_on_close;
+      *z = m;
+   }
+   return z;
+}
+#endif
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  Arithmetic coder... based on cbloom's notes on the subject, should be
+//  less code than a huffman code.
+
+typedef struct
+{
+   unsigned int range_low;
+   unsigned int range_high;
+   unsigned int code, range; // decode
+   int buffered_u8;
+   int pending_ffs;
+   stbfile *output;
+} stb_arith;
+
+STB_EXTERN void stb_arith_init_encode(stb_arith *a, stbfile *out);
+STB_EXTERN void stb_arith_init_decode(stb_arith *a, stbfile *in);
+STB_EXTERN stbfile *stb_arith_encode_close(stb_arith *a);
+STB_EXTERN stbfile *stb_arith_decode_close(stb_arith *a);
+
+STB_EXTERN void stb_arith_encode(stb_arith *a, unsigned int totalfreq, unsigned int freq, unsigned int cumfreq);
+STB_EXTERN void stb_arith_encode_log2(stb_arith *a, unsigned int totalfreq2, unsigned int freq, unsigned int cumfreq);
+STB_EXTERN unsigned int stb_arith_decode_value(stb_arith *a, unsigned int totalfreq);
+STB_EXTERN void stb_arith_decode_advance(stb_arith *a, unsigned int totalfreq, unsigned int freq, unsigned int cumfreq);
+STB_EXTERN unsigned int stb_arith_decode_value_log2(stb_arith *a, unsigned int totalfreq2);
+STB_EXTERN void stb_arith_decode_advance_log2(stb_arith *a, unsigned int totalfreq2, unsigned int freq, unsigned int cumfreq);
+
+STB_EXTERN void stb_arith_encode_byte(stb_arith *a, int byte);
+STB_EXTERN int  stb_arith_decode_byte(stb_arith *a);
+
+// this is a memory-inefficient way of doing things, but it's
+// fast(?) and simple
+typedef struct
+{
+   unsigned short cumfreq;
+   unsigned short samples;
+} stb_arith_symstate_item;
+
+typedef struct
+{
+   int num_sym;
+   unsigned int pow2;
+   int countdown;
+   stb_arith_symstate_item data[1];
+} stb_arith_symstate;
+
+#ifdef STB_DEFINE
+void stb_arith_init_encode(stb_arith *a, stbfile *out)
+{
+   a->range_low = 0;
+   a->range_high = 0xffffffff;
+   a->pending_ffs = -1; // means no buffered character currently, to speed up normal case
+   a->output = out;
+}
+
+static void stb__arith_carry(stb_arith *a)
+{
+   int i;
+   assert(a->pending_ffs != -1); // can't carry with no data
+   stb_putc(a->output, a->buffered_u8);
+   for (i=0; i < a->pending_ffs; ++i)
+      stb_putc(a->output, 0);
+}
+
+static void stb__arith_putbyte(stb_arith *a, int byte)
+{
+   if (a->pending_ffs) {
+      if (a->pending_ffs == -1) { // means no buffered data; encoded for fast path efficiency
+         if (byte == 0xff)
+            stb_putc(a->output, byte); // just write it immediately
+         else {
+            a->buffered_u8 = byte;
+            a->pending_ffs = 0;
+         }
+      } else if (byte == 0xff) {
+         ++a->pending_ffs;
+      } else {
+         int i;
+         stb_putc(a->output, a->buffered_u8);
+         for (i=0; i < a->pending_ffs; ++i)
+            stb_putc(a->output, 0xff);
+      }
+   } else if (byte == 0xff) {
+      ++a->pending_ffs;
+   } else {
+      // fast path
+      stb_putc(a->output, a->buffered_u8);
+      a->buffered_u8 = byte;
+   }
+}
+
+static void stb__arith_flush(stb_arith *a)
+{
+   if (a->pending_ffs >= 0) {
+      int i;
+      stb_putc(a->output, a->buffered_u8);
+      for (i=0; i < a->pending_ffs; ++i)
+         stb_putc(a->output, 0xff);
+   }
+}
+
+static void stb__renorm_encoder(stb_arith *a)
+{
+   stb__arith_putbyte(a, a->range_low >> 24);
+   a->range_low <<= 8;
+   a->range_high = (a->range_high << 8) | 0xff;
+}
+
+static void stb__renorm_decoder(stb_arith *a)
+{
+   int c = stb_getc(a->output);
+   a->code = (a->code << 8) + (c >= 0 ? c : 0); // if EOF, insert 0
+}
+
+void stb_arith_encode(stb_arith *a, unsigned int totalfreq, unsigned int freq, unsigned int cumfreq)
+{
+   unsigned int range = a->range_high - a->range_low;
+   unsigned int old = a->range_low;
+   range /= totalfreq;
+   a->range_low += range * cumfreq;
+   a->range_high = a->range_low + range*freq;
+   if (a->range_low < old)
+      stb__arith_carry(a);
+   while (a->range_high - a->range_low < 0x1000000)
+      stb__renorm_encoder(a);
+}
+
+void stb_arith_encode_log2(stb_arith *a, unsigned int totalfreq2, unsigned int freq, unsigned int cumfreq)
+{
+   unsigned int range = a->range_high - a->range_low;
+   unsigned int old = a->range_low;
+   range >>= totalfreq2;
+   a->range_low += range * cumfreq;
+   a->range_high = a->range_low + range*freq;
+   if (a->range_low < old)
+      stb__arith_carry(a);
+   while (a->range_high - a->range_low < 0x1000000)
+      stb__renorm_encoder(a);
+}
+
+unsigned int stb_arith_decode_value(stb_arith *a, unsigned int totalfreq)
+{
+   unsigned int freqsize = a->range / totalfreq;
+   unsigned int z = a->code / freqsize;
+   return z >= totalfreq ? totalfreq-1 : z;
+}
+
+void stb_arith_decode_advance(stb_arith *a, unsigned int totalfreq, unsigned int freq, unsigned int cumfreq)
+{
+   unsigned int freqsize = a->range / totalfreq; // @OPTIMIZE, share with above divide somehow?
+   a->code -= freqsize * cumfreq;
+   a->range = freqsize * freq;
+   while (a->range < 0x1000000)
+      stb__renorm_decoder(a);
+}
+
+unsigned int stb_arith_decode_value_log2(stb_arith *a, unsigned int totalfreq2)
+{
+   unsigned int freqsize = a->range >> totalfreq2;
+   unsigned int z = a->code / freqsize;
+   return z >= (1U<<totalfreq2) ? (1U<<totalfreq2)-1 : z;
+}
+
+void stb_arith_decode_advance_log2(stb_arith *a, unsigned int totalfreq2, unsigned int freq, unsigned int cumfreq)
+{
+   unsigned int freqsize = a->range >> totalfreq2;
+   a->code -= freqsize * cumfreq;
+   a->range = freqsize * freq;
+   while (a->range < 0x1000000)
+      stb__renorm_decoder(a);
+}
+
+stbfile *stb_arith_encode_close(stb_arith *a)
+{
+   // put exactly as many bytes as we'll read, so we can turn on/off arithmetic coding in a stream
+   stb__arith_putbyte(a, a->range_low >> 24);
+   stb__arith_putbyte(a, a->range_low >> 16);
+   stb__arith_putbyte(a, a->range_low >>  8);
+   stb__arith_putbyte(a, a->range_low >>  0);
+   stb__arith_flush(a);
+   return a->output;
+}
+
+stbfile *stb_arith_decode_close(stb_arith *a)
+{
+   return a->output;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Threads
+//
+
+#ifndef _WIN32
+#ifdef STB_THREADS
+#error "threads not implemented except for Windows"
+#endif
+#endif
+
+// call this function to free any global variables for memory testing
+STB_EXTERN void stb_thread_cleanup(void);
+
+typedef void * (*stb_thread_func)(void *);
+
+// do not rely on these types, this is an implementation detail.
+// compare against STB_THREAD_NULL and ST_SEMAPHORE_NULL
+typedef void *stb_thread;
+typedef void *stb_semaphore;
+typedef void *stb_mutex;
+typedef struct stb__sync *stb_sync;
+
+#define STB_SEMAPHORE_NULL    NULL
+#define STB_THREAD_NULL       NULL
+#define STB_MUTEX_NULL        NULL
+#define STB_SYNC_NULL         NULL
+
+// get the number of processors (limited to those in the affinity mask for this process).
+STB_EXTERN int stb_processor_count(void);
+// force to run on a single core -- needed for RDTSC to work, e.g. for iprof
+STB_EXTERN void stb_force_uniprocessor(void);
+
+// stb_work functions: queue up work to be done by some worker threads
+
+// set number of threads to serve the queue; you can change this on the fly,
+// but if you decrease it, it won't decrease until things currently on the
+// queue are finished
+STB_EXTERN void          stb_work_numthreads(int n);
+// set maximum number of units in the queue; you can only set this BEFORE running any work functions
+STB_EXTERN int           stb_work_maxunits(int n);
+// enqueue some work to be done (can do this from any thread, or even from a piece of work);
+// return value of f is stored in *return_code if non-NULL
+STB_EXTERN int           stb_work(stb_thread_func f, void *d, volatile void **return_code);
+// as above, but stb_sync_reach is called on 'rel' after work is complete
+STB_EXTERN int           stb_work_reach(stb_thread_func f, void *d, volatile void **return_code, stb_sync rel);
+
+
+// necessary to call this when using volatile to order writes/reads
+STB_EXTERN void          stb_barrier(void);
+
+// support for independent queues with their own threads
+
+typedef struct stb__workqueue stb_workqueue;
+
+STB_EXTERN stb_workqueue*stb_workq_new(int numthreads, int max_units);
+STB_EXTERN stb_workqueue*stb_workq_new_flags(int numthreads, int max_units, int no_add_mutex, int no_remove_mutex);
+STB_EXTERN void          stb_workq_delete(stb_workqueue *q);
+STB_EXTERN void          stb_workq_numthreads(stb_workqueue *q, int n);
+STB_EXTERN int           stb_workq(stb_workqueue *q, stb_thread_func f, void *d, volatile void **return_code);
+STB_EXTERN int           stb_workq_reach(stb_workqueue *q, stb_thread_func f, void *d, volatile void **return_code, stb_sync rel);
+STB_EXTERN int           stb_workq_length(stb_workqueue *q);
+
+STB_EXTERN stb_thread    stb_create_thread (stb_thread_func f, void *d);
+STB_EXTERN stb_thread    stb_create_thread2(stb_thread_func f, void *d, volatile void **return_code, stb_semaphore rel);
+STB_EXTERN void          stb_destroy_thread(stb_thread t);
+
+STB_EXTERN stb_semaphore stb_sem_new(int max_val);
+STB_EXTERN stb_semaphore stb_sem_new_extra(int max_val, int start_val);
+STB_EXTERN void          stb_sem_delete (stb_semaphore s);
+STB_EXTERN void          stb_sem_waitfor(stb_semaphore s);
+STB_EXTERN void          stb_sem_release(stb_semaphore s);
+
+STB_EXTERN stb_mutex     stb_mutex_new(void);
+STB_EXTERN void          stb_mutex_delete(stb_mutex m);
+STB_EXTERN void          stb_mutex_begin(stb_mutex m);
+STB_EXTERN void          stb_mutex_end(stb_mutex m);
+
+STB_EXTERN stb_sync      stb_sync_new(void);
+STB_EXTERN void          stb_sync_delete(stb_sync s);
+STB_EXTERN int           stb_sync_set_target(stb_sync s, int count);
+STB_EXTERN void          stb_sync_reach_and_wait(stb_sync s);    // wait for 'target' reachers
+STB_EXTERN int           stb_sync_reach(stb_sync s);
+
+typedef struct stb__threadqueue stb_threadqueue;
+#define STB_THREADQ_DYNAMIC   0
+STB_EXTERN stb_threadqueue *stb_threadq_new(int item_size, int num_items, int many_add, int many_remove);
+STB_EXTERN void             stb_threadq_delete(stb_threadqueue *tq);
+STB_EXTERN int              stb_threadq_get(stb_threadqueue *tq, void *output);
+STB_EXTERN void             stb_threadq_get_block(stb_threadqueue *tq, void *output);
+STB_EXTERN int              stb_threadq_add(stb_threadqueue *tq, void *input);
+// can return FALSE if STB_THREADQ_DYNAMIC and attempt to grow fails
+STB_EXTERN int              stb_threadq_add_block(stb_threadqueue *tq, void *input);
+
+#ifdef STB_THREADS
+#ifdef STB_DEFINE
+
+typedef struct
+{
+   stb_thread_func f;
+   void *d;
+   volatile void **return_val;
+   stb_semaphore sem;
+} stb__thread;
+
+// this is initialized along all possible paths to create threads, therefore
+// it's always initialized before any other threads are create, therefore
+// it's free of races AS LONG AS you only create threads through stb_*
+static stb_mutex stb__threadmutex, stb__workmutex;
+
+static void stb__threadmutex_init(void)
+{
+   if (stb__threadmutex == STB_SEMAPHORE_NULL) {
+      stb__threadmutex = stb_mutex_new();
+      stb__workmutex = stb_mutex_new();
+   }
+}
+
+#ifdef STB_THREAD_TEST
+volatile float stb__t1=1, stb__t2;
+
+static void stb__wait(int n)
+{
+   float z = 0;
+   int i;
+   for (i=0; i < n; ++i)
+      z += 1 / (stb__t1+i);
+   stb__t2 = z;
+}
+#else
+#define stb__wait(x)
+#endif
+
+#ifdef _WIN32
+
+// avoid including windows.h -- note that our definitions aren't
+// exactly the same (we don't define the security descriptor struct)
+// so if you want to include windows.h, make sure you do it first.
+#include <process.h>
+
+#ifndef _WINDOWS_  // check windows.h guard
+#define STB__IMPORT   STB_EXTERN __declspec(dllimport)
+#define STB__DW       unsigned long
+
+STB__IMPORT int     __stdcall TerminateThread(void *, STB__DW);
+STB__IMPORT void *  __stdcall CreateSemaphoreA(void *sec, long,long,char*);
+STB__IMPORT int     __stdcall CloseHandle(void *);
+STB__IMPORT STB__DW __stdcall WaitForSingleObject(void *, STB__DW);
+STB__IMPORT int     __stdcall ReleaseSemaphore(void *, long, long *);
+STB__IMPORT void    __stdcall Sleep(STB__DW);
+#endif
+
+// necessary to call this when using volatile to order writes/reads
+void stb_barrier(void)
+{
+   #ifdef MemoryBarrier
+   MemoryBarrier();
+   #else
+   long temp;
+   __asm xchg temp,eax;
+   #endif
+}
+
+static void stb__thread_run(void *t)
+{
+   void *res;
+   stb__thread info = * (stb__thread *) t;
+   free(t);
+   res = info.f(info.d);
+   if (info.return_val)
+      *info.return_val = res;
+   if (info.sem != STB_SEMAPHORE_NULL)
+      stb_sem_release(info.sem);
+}
+
+static stb_thread stb_create_thread_raw(stb_thread_func f, void *d, volatile void **return_code, stb_semaphore rel)
+{
+#ifdef _MT
+#if defined(STB_FASTMALLOC) && !defined(STB_FASTMALLOC_ITS_OKAY_I_ONLY_MALLOC_IN_ONE_THREAD)
+   stb_fatal("Error! Cannot use STB_FASTMALLOC with threads.\n");
+   return STB_THREAD_NULL;
+#else
+   unsigned long id;
+   stb__thread *data = (stb__thread *) malloc(sizeof(*data));
+   if (!data) return NULL;
+   stb__threadmutex_init();
+   data->f = f;
+   data->d = d;
+   data->return_val = return_code;
+   data->sem = rel;
+   id = _beginthread(stb__thread_run, 0, data);
+   if (id == -1) return NULL;
+   return (void *) id;
+#endif
+#else
+#ifdef STB_NO_STB_STRINGS
+   stb_fatal("Invalid compilation");
+#else
+   stb_fatal("Must compile mult-threaded to use stb_thread/stb_work.");
+#endif
+   return NULL;
+#endif
+}
+
+// trivial win32 wrappers
+void          stb_destroy_thread(stb_thread t)   { TerminateThread(t,0); }
+stb_semaphore stb_sem_new(int maxv)                {return CreateSemaphoreA(NULL,0,maxv,NULL); }
+stb_semaphore stb_sem_new_extra(int maxv,int start){return CreateSemaphoreA(NULL,start,maxv,NULL); }
+void          stb_sem_delete(stb_semaphore s)    { if (s != NULL) CloseHandle(s); }
+void          stb_sem_waitfor(stb_semaphore s)   { WaitForSingleObject(s, 0xffffffff); } // INFINITE
+void          stb_sem_release(stb_semaphore s)   { ReleaseSemaphore(s,1,NULL); }
+static void   stb__thread_sleep(int ms)          { Sleep(ms); }
+
+#ifndef _WINDOWS_
+STB__IMPORT int __stdcall GetProcessAffinityMask(void *, STB__DW *, STB__DW *);
+STB__IMPORT void * __stdcall GetCurrentProcess(void);
+STB__IMPORT int __stdcall SetProcessAffinityMask(void *, STB__DW);
+#endif
+
+int stb_processor_count(void)
+{
+   unsigned long proc,sys;
+   GetProcessAffinityMask(GetCurrentProcess(), &proc, &sys);
+   return stb_bitcount(proc);
+}
+
+void stb_force_uniprocessor(void)
+{
+   unsigned long proc,sys;
+   GetProcessAffinityMask(GetCurrentProcess(), &proc, &sys);
+   if (stb_bitcount(proc) > 1) {
+      int z;
+      for (z=0; z < 32; ++z)
+         if (proc & (1 << z))
+            break;
+      if (z < 32) {
+         proc = 1 << z;
+         SetProcessAffinityMask(GetCurrentProcess(), proc);
+      }
+   }
+}
+
+#ifdef _WINDOWS_
+#define STB_MUTEX_NATIVE
+void *stb_mutex_new(void)
+{
+   CRITICAL_SECTION *p = (CRITICAL_SECTION *) malloc(sizeof(*p));
+   if (p)
+#if _WIN32_WINNT >= 0x0500
+      InitializeCriticalSectionAndSpinCount(p, 500);
+#else
+      InitializeCriticalSection(p);
+#endif
+   return p;
+}
+
+void stb_mutex_delete(void *p)
+{
+   if (p) {
+      DeleteCriticalSection((CRITICAL_SECTION *) p);
+      free(p);
+   }
+}
+
+void stb_mutex_begin(void *p)
+{
+   stb__wait(500);
+   if (p)
+      EnterCriticalSection((CRITICAL_SECTION *) p);
+}
+
+void stb_mutex_end(void *p)
+{
+   if (p)
+      LeaveCriticalSection((CRITICAL_SECTION *) p);
+   stb__wait(500);
+}
+#endif // _WINDOWS_
+
+#if 0
+// for future reference,
+// InterlockedCompareExchange for x86:
+ int cas64_mp(void * dest, void * xcmp, void * xxchg) {
+        __asm
+        {
+                mov             esi, [xxchg]            ; exchange
+                mov             ebx, [esi + 0]
+                mov             ecx, [esi + 4]
+
+                mov             esi, [xcmp]                     ; comparand
+                mov             eax, [esi + 0]
+                mov             edx, [esi + 4]
+
+                mov             edi, [dest]                     ; destination
+                lock cmpxchg8b  [edi]
+                jz              yyyy;
+
+                mov             [esi + 0], eax;
+                mov             [esi + 4], edx;
+
+yyyy:
+                xor             eax, eax;
+                setz    al;
+        };
+
+inline unsigned __int64 _InterlockedCompareExchange64(volatile unsigned __int64 *dest
+                           ,unsigned __int64 exchange
+                           ,unsigned __int64 comperand)
+{
+    //value returned in eax::edx
+    __asm {
+        lea esi,comperand;
+        lea edi,exchange;
+
+        mov eax,[esi];
+        mov edx,4[esi];
+        mov ebx,[edi];
+        mov ecx,4[edi];
+        mov esi,dest;
+        lock CMPXCHG8B [esi];
+    }
+#endif // #if 0
+
+#endif // _WIN32
+
+stb_thread stb_create_thread2(stb_thread_func f, void *d, volatile void **return_code, stb_semaphore rel)
+{
+   return stb_create_thread_raw(f,d,return_code,rel);
+}
+
+stb_thread stb_create_thread(stb_thread_func f, void *d)
+{
+   return stb_create_thread2(f,d,NULL,STB_SEMAPHORE_NULL);
+}
+
+// mutex implemented by wrapping semaphore
+#ifndef STB_MUTEX_NATIVE
+stb_mutex stb_mutex_new(void)            { return stb_sem_new_extra(1,1); }
+void      stb_mutex_delete(stb_mutex m)  { stb_sem_delete (m);      }
+void      stb_mutex_begin(stb_mutex m)   { stb__wait(500); if (m) stb_sem_waitfor(m); }
+void      stb_mutex_end(stb_mutex m)     { if (m) stb_sem_release(m); stb__wait(500); }
+#endif
+
+// thread merge operation
+struct stb__sync
+{
+   int target;  // target number of threads to hit it
+   int sofar;   // total threads that hit it
+   int waiting; // total threads waiting
+
+   stb_mutex start;   // mutex to prevent starting again before finishing previous
+   stb_mutex mutex;   // mutex while tweaking state
+   stb_semaphore release; // semaphore wake up waiting threads
+      // we have to wake them up one at a time, rather than using a single release
+      // call, because win32 semaphores don't let you dynamically change the max count!
+};
+
+stb_sync stb_sync_new(void)
+{
+   stb_sync s = (stb_sync) malloc(sizeof(*s));
+   if (!s) return s;
+
+   s->target = s->sofar = s->waiting = 0;
+   s->mutex   = stb_mutex_new();
+   s->start   = stb_mutex_new();
+   s->release = stb_sem_new(1);
+   if (s->mutex == STB_MUTEX_NULL || s->release == STB_SEMAPHORE_NULL || s->start == STB_MUTEX_NULL) {
+      stb_mutex_delete(s->mutex);
+      stb_mutex_delete(s->mutex);
+      stb_sem_delete(s->release);
+      free(s);
+      return NULL;
+   }
+   return s;
+}
+
+void stb_sync_delete(stb_sync s)
+{
+   if (s->waiting) {
+      // it's bad to delete while there are threads waiting!
+      // shall we wait for them to reach, or just bail? just bail
+      assert(0);
+   }
+   stb_mutex_delete(s->mutex);
+   stb_mutex_delete(s->release);
+   free(s);
+}
+
+int stb_sync_set_target(stb_sync s, int count)
+{
+   // don't allow setting a target until the last one is fully released;
+   // note that this can lead to inefficient pipelining, and maybe we'd
+   // be better off ping-ponging between two internal syncs?
+   // I tried seeing how often this happened using TryEnterCriticalSection
+   // and could _never_ get it to happen in imv(stb), even with more threads
+   // than processors. So who knows!
+   stb_mutex_begin(s->start);
+
+   // this mutex is pointless, since it's not valid for threads
+   // to call reach() before anyone calls set_target() anyway
+   stb_mutex_begin(s->mutex);
+
+   assert(s->target == 0); // enforced by start mutex
+   s->target  = count;
+   s->sofar   = 0;
+   s->waiting = 0;
+   stb_mutex_end(s->mutex);
+   return STB_TRUE;
+}
+
+void stb__sync_release(stb_sync s)
+{
+   if (s->waiting)
+      stb_sem_release(s->release);
+   else {
+      s->target = 0;
+      stb_mutex_end(s->start);
+   }
+}
+
+int stb_sync_reach(stb_sync s)
+{
+   int n;
+   stb_mutex_begin(s->mutex);
+   assert(s->sofar < s->target);
+   n = ++s->sofar; // record this value to avoid possible race if we did 'return s->sofar';
+   if (s->sofar == s->target)
+      stb__sync_release(s);
+   stb_mutex_end(s->mutex);
+   return n;
+}
+
+void stb_sync_reach_and_wait(stb_sync s)
+{
+   stb_mutex_begin(s->mutex);
+   assert(s->sofar < s->target);
+   ++s->sofar;
+   if (s->sofar == s->target) {
+      stb__sync_release(s);
+      stb_mutex_end(s->mutex);
+   } else {
+      ++s->waiting; // we're waiting, so one more waiter
+      stb_mutex_end(s->mutex); // release the mutex to other threads
+
+      stb_sem_waitfor(s->release); // wait for merge completion
+
+      stb_mutex_begin(s->mutex); // on merge completion, grab the mutex
+      --s->waiting; // we're done waiting
+      stb__sync_release(s);    // restart the next waiter
+      stb_mutex_end(s->mutex); // and now we're done
+      // this ends the same as the first case, but it's a lot
+      // clearer to understand without sharing the code
+   }
+}
+
+struct stb__threadqueue
+{
+   stb_mutex add, remove;
+   stb_semaphore nonempty, nonfull;
+   int head_blockers;  // number of threads blocking--used to know whether to release(avail)
+   int tail_blockers;
+   int head, tail, array_size, growable;
+   int item_size;
+   char *data;
+};
+
+static int stb__tq_wrap(volatile stb_threadqueue *z, int p)
+{
+   if (p == z->array_size)
+      return p - z->array_size;
+   else
+      return p;
+}
+
+int stb__threadq_get_raw(stb_threadqueue *tq2, void *output, int block)
+{
+   volatile stb_threadqueue *tq = (volatile stb_threadqueue *) tq2;
+   if (tq->head == tq->tail && !block) return 0;
+
+   stb_mutex_begin(tq->remove);
+
+   while (tq->head == tq->tail) {
+      if (!block) {
+         stb_mutex_end(tq->remove);
+         return 0;
+      }
+      ++tq->head_blockers;
+      stb_mutex_end(tq->remove);
+
+      stb_sem_waitfor(tq->nonempty);
+
+      stb_mutex_begin(tq->remove);
+      --tq->head_blockers;
+   }
+
+   memcpy(output, tq->data + tq->head*tq->item_size, tq->item_size);
+   stb_barrier();
+   tq->head = stb__tq_wrap(tq, tq->head+1);
+
+   stb_sem_release(tq->nonfull);
+   if (tq->head_blockers) // can't check if actually non-empty due to race?
+      stb_sem_release(tq->nonempty); // if there are other blockers, wake one
+
+   stb_mutex_end(tq->remove);
+   return STB_TRUE;
+}
+
+int stb__threadq_grow(volatile stb_threadqueue *tq)
+{
+   int n;
+   char *p;
+   assert(tq->remove != STB_MUTEX_NULL); // must have this to allow growth!
+   stb_mutex_begin(tq->remove);
+
+   n = tq->array_size * 2;
+   p = (char *) realloc(tq->data, n * tq->item_size);
+   if (p == NULL) {
+      stb_mutex_end(tq->remove);
+      stb_mutex_end(tq->add);
+      return STB_FALSE;
+   }
+   if (tq->tail < tq->head) {
+      memcpy(p + tq->array_size * tq->item_size, p, tq->tail * tq->item_size);
+      tq->tail += tq->array_size;
+   }
+   tq->data = p;
+   tq->array_size = n;
+
+   stb_mutex_end(tq->remove);
+   return STB_TRUE;
+}
+
+int stb__threadq_add_raw(stb_threadqueue *tq2, void *input, int block)
+{
+   int tail,pos;
+   volatile stb_threadqueue *tq = (volatile stb_threadqueue *) tq2;
+   stb_mutex_begin(tq->add);
+   for(;;) {
+      pos = tq->tail;
+      tail = stb__tq_wrap(tq, pos+1);
+      if (tail != tq->head) break;
+
+      // full
+      if (tq->growable) {
+         if (!stb__threadq_grow(tq)) {
+            stb_mutex_end(tq->add);
+            return STB_FALSE; // out of memory
+         }
+      } else if (!block) {
+         stb_mutex_end(tq->add);
+         return STB_FALSE;
+      } else {
+         ++tq->tail_blockers;
+         stb_mutex_end(tq->add);
+
+         stb_sem_waitfor(tq->nonfull);
+
+         stb_mutex_begin(tq->add);
+         --tq->tail_blockers;
+      }
+   }
+   memcpy(tq->data + tq->item_size * pos, input, tq->item_size);
+   stb_barrier();
+   tq->tail = tail;
+   stb_sem_release(tq->nonempty);
+   if (tq->tail_blockers) // can't check if actually non-full due to race?
+      stb_sem_release(tq->nonfull);
+   stb_mutex_end(tq->add);
+   return STB_TRUE;
+}
+
+int stb_threadq_length(stb_threadqueue *tq2)
+{
+   int a,b,n;
+   volatile stb_threadqueue *tq = (volatile stb_threadqueue *) tq2;
+   stb_mutex_begin(tq->add);
+   a = tq->head;
+   b = tq->tail;
+   n = tq->array_size;
+   stb_mutex_end(tq->add);
+   if (a > b) b += n;
+   return b-a;
+}
+
+int stb_threadq_get(stb_threadqueue *tq, void *output)
+{
+   return stb__threadq_get_raw(tq, output, STB_FALSE);
+}
+
+void stb_threadq_get_block(stb_threadqueue *tq, void *output)
+{
+   stb__threadq_get_raw(tq, output, STB_TRUE);
+}
+
+int stb_threadq_add(stb_threadqueue *tq, void *input)
+{
+   return stb__threadq_add_raw(tq, input, STB_FALSE);
+}
+
+int stb_threadq_add_block(stb_threadqueue *tq, void *input)
+{
+   return stb__threadq_add_raw(tq, input, STB_TRUE);
+}
+
+void stb_threadq_delete(stb_threadqueue *tq)
+{
+   if (tq) {
+      free(tq->data);
+      stb_mutex_delete(tq->add);
+      stb_mutex_delete(tq->remove);
+      stb_sem_delete(tq->nonempty);
+      stb_sem_delete(tq->nonfull);
+      free(tq);
+   }
+}
+
+#define STB_THREADQUEUE_DYNAMIC   0
+stb_threadqueue *stb_threadq_new(int item_size, int num_items, int many_add, int many_remove)
+{
+   int error=0;
+   stb_threadqueue *tq = (stb_threadqueue *) malloc(sizeof(*tq));
+   if (tq == NULL) return NULL;
+
+   if (num_items == STB_THREADQUEUE_DYNAMIC) {
+      tq->growable = STB_TRUE;
+      num_items = 32;
+   } else
+      tq->growable = STB_FALSE;
+
+   tq->item_size = item_size;
+   tq->array_size = num_items+1;
+
+   tq->add = tq->remove = STB_MUTEX_NULL;
+   tq->nonempty = tq->nonfull = STB_SEMAPHORE_NULL;
+   tq->data = NULL;
+   if (many_add)
+      { tq->add    = stb_mutex_new(); if (tq->add    == STB_MUTEX_NULL) goto error; }
+   if (many_remove || tq->growable)
+      { tq->remove = stb_mutex_new(); if (tq->remove == STB_MUTEX_NULL) goto error; }
+   tq->nonempty = stb_sem_new(1); if (tq->nonempty == STB_SEMAPHORE_NULL) goto error;
+   tq->nonfull  = stb_sem_new(1); if (tq->nonfull  == STB_SEMAPHORE_NULL) goto error;
+   tq->data = (char *) malloc(tq->item_size * tq->array_size);
+   if (tq->data == NULL) goto error;
+
+   tq->head = tq->tail = 0;
+   tq->head_blockers = tq->tail_blockers = 0;
+
+   return tq;
+
+error:
+   stb_threadq_delete(tq);
+   return NULL;
+}
+
+typedef struct
+{
+   stb_thread_func f;
+   void *d;
+   volatile void **retval;
+   stb_sync sync;
+} stb__workinfo;
+
+//static volatile stb__workinfo *stb__work;
+
+struct stb__workqueue
+{
+   int numthreads;
+   stb_threadqueue *tq;
+};
+
+static stb_workqueue *stb__work_global;
+
+static void *stb__thread_workloop(void *p)
+{
+   volatile stb_workqueue *q = (volatile stb_workqueue *) p;
+   for(;;) {
+      void *z;
+      stb__workinfo w;
+      stb_threadq_get_block(q->tq, &w);
+      if (w.f == NULL) // null work is a signal to end the thread
+         return NULL;
+      z = w.f(w.d);
+      if (w.retval) { stb_barrier(); *w.retval = z; }
+      if (w.sync != STB_SYNC_NULL) stb_sync_reach(w.sync);
+   }
+}
+
+stb_workqueue *stb_workq_new(int num_threads, int max_units)
+{
+   return stb_workq_new_flags(num_threads, max_units, 0,0);
+}
+
+stb_workqueue *stb_workq_new_flags(int numthreads, int max_units, int no_add_mutex, int no_remove_mutex)
+{
+   stb_workqueue *q = (stb_workqueue *) malloc(sizeof(*q));
+   if (q == NULL) return NULL;
+   q->tq = stb_threadq_new(sizeof(stb__workinfo), max_units, !no_add_mutex, !no_remove_mutex);
+   if (q->tq == NULL) { free(q); return NULL; }
+   q->numthreads = 0;
+   stb_workq_numthreads(q, numthreads);
+   return q;
+}
+
+void stb_workq_delete(stb_workqueue *q)
+{
+   while (stb_workq_length(q) != 0)
+      stb__thread_sleep(1);
+   stb_threadq_delete(q->tq);
+   free(q);
+}
+
+static int stb__work_maxitems = STB_THREADQUEUE_DYNAMIC;
+
+static void stb_work_init(int num_threads)
+{
+   if (stb__work_global == NULL) {
+      stb__threadmutex_init();
+      stb_mutex_begin(stb__workmutex);
+      stb_barrier();
+      if (*(stb_workqueue * volatile *) &stb__work_global == NULL)
+         stb__work_global = stb_workq_new(num_threads, stb__work_maxitems);
+      stb_mutex_end(stb__workmutex);
+   }
+}
+
+static int stb__work_raw(stb_workqueue *q, stb_thread_func f, void *d, volatile void **return_code, stb_sync rel)
+{
+   stb__workinfo w;
+   if (q == NULL) {
+      stb_work_init(1);
+      q = stb__work_global;
+   }
+   w.f = f;
+   w.d = d;
+   w.retval = return_code;
+   w.sync = rel;
+   return stb_threadq_add(q->tq, &w);
+}
+
+int stb_workq_length(stb_workqueue *q)
+{
+   return stb_threadq_length(q->tq);
+}
+
+int stb_workq(stb_workqueue *q, stb_thread_func f, void *d, volatile void **return_code)
+{
+   if (f == NULL) return 0;
+   return stb_workq_reach(q, f, d, return_code, NULL);
+}
+
+int stb_workq_reach(stb_workqueue *q, stb_thread_func f, void *d, volatile void **return_code, stb_sync rel)
+{
+   if (f == NULL) return 0;
+   return stb__work_raw(q, f, d, return_code, rel);
+}
+
+static void stb__workq_numthreads(stb_workqueue *q, int n)
+{
+   while (q->numthreads < n) {
+      stb_create_thread(stb__thread_workloop, q);
+      ++q->numthreads;
+   }
+   while (q->numthreads > n) {
+      stb__work_raw(q, NULL, NULL, NULL, NULL);
+      --q->numthreads;
+   }
+}
+
+void stb_workq_numthreads(stb_workqueue *q, int n)
+{
+   stb_mutex_begin(stb__threadmutex);
+   stb__workq_numthreads(q,n);
+   stb_mutex_end(stb__threadmutex);
+}
+
+int stb_work_maxunits(int n)
+{
+   if (stb__work_global == NULL) {
+      stb__work_maxitems = n;
+      stb_work_init(1);
+   }
+   return stb__work_maxitems;
+}
+
+int stb_work(stb_thread_func f, void *d, volatile void **return_code)
+{
+   return stb_workq(stb__work_global, f,d,return_code);
+}
+
+int stb_work_reach(stb_thread_func f, void *d, volatile void **return_code, stb_sync rel)
+{
+   return stb_workq_reach(stb__work_global, f,d,return_code,rel);
+}
+
+void stb_work_numthreads(int n)
+{
+   if (stb__work_global == NULL)
+      stb_work_init(n);
+   else
+      stb_workq_numthreads(stb__work_global, n);
+}
+#endif // STB_DEFINE
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Background disk I/O
+//
+//
+
+#define STB_BGIO_READ_ALL   (-1)
+STB_EXTERN int stb_bgio_read    (char *filename, int offset, int len, stb_uchar **result, int *olen);
+STB_EXTERN int stb_bgio_readf   (FILE *f       , int offset, int len, stb_uchar **result, int *olen);
+STB_EXTERN int stb_bgio_read_to (char *filename, int offset, int len, stb_uchar  *buffer, int *olen);
+STB_EXTERN int stb_bgio_readf_to(FILE *f       , int offset, int len, stb_uchar  *buffer, int *olen);
+
+typedef struct
+{
+   int have_data;
+   int is_valid;
+   int is_dir;
+   time_t filetime;
+   stb_int64 filesize;
+} stb_bgstat;
+
+STB_EXTERN int stb_bgio_stat    (char *filename, stb_bgstat *result);
+
+#ifdef STB_DEFINE
+
+static stb_workqueue *stb__diskio;
+static stb_mutex stb__diskio_mutex;
+
+void stb_thread_cleanup(void)
+{
+   if (stb__work_global) stb_workq_delete(stb__work_global); stb__work_global = NULL;
+   if (stb__threadmutex) stb_mutex_delete(stb__threadmutex); stb__threadmutex = NULL;
+   if (stb__workmutex)   stb_mutex_delete(stb__workmutex);   stb__workmutex   = NULL;
+   if (stb__diskio)      stb_workq_delete(stb__diskio);      stb__diskio      = NULL;
+   if (stb__diskio_mutex)stb_mutex_delete(stb__diskio_mutex);stb__diskio_mutex= NULL;
+}
+
+
+typedef struct
+{
+   char *filename;
+   FILE *f;
+   int offset;
+   int len;
+
+   stb_bgstat *stat_out;
+   stb_uchar *output;
+   stb_uchar **result;
+   int *len_output;
+   int *flag;
+} stb__disk_command;
+
+#define STB__MAX_DISK_COMMAND 100
+static stb__disk_command stb__dc_queue[STB__MAX_DISK_COMMAND];
+static int stb__dc_offset;
+
+void stb__io_init(void)
+{
+   if (!stb__diskio) {
+      stb__threadmutex_init();
+      stb_mutex_begin(stb__threadmutex);
+      stb_barrier();
+      if (*(stb_thread * volatile *) &stb__diskio == NULL) {
+         stb__diskio_mutex = stb_mutex_new();
+         // use many threads so OS can try to schedule seeks
+         stb__diskio = stb_workq_new_flags(16,STB__MAX_DISK_COMMAND,STB_FALSE,STB_FALSE);
+      }
+      stb_mutex_end(stb__threadmutex);
+   }
+}
+
+static void * stb__io_error(stb__disk_command *dc)
+{
+   if (dc->len_output) *dc->len_output = 0;
+   if (dc->result) *dc->result = NULL;
+   if (dc->flag) *dc->flag = -1;
+   return NULL;
+}
+
+static void * stb__io_task(void *p)
+{
+   stb__disk_command *dc = (stb__disk_command *) p;
+   int len;
+   FILE *f;
+   stb_uchar *buf;
+
+   if (dc->stat_out) {
+      struct _stati64 s;
+      if (!_stati64(dc->filename, &s)) {
+         dc->stat_out->filesize = s.st_size;
+         dc->stat_out->filetime = s.st_mtime;
+         dc->stat_out->is_dir = s.st_mode & _S_IFDIR;
+         dc->stat_out->is_valid = (s.st_mode & _S_IFREG) || dc->stat_out->is_dir;
+      } else
+         dc->stat_out->is_valid = 0;
+      stb_barrier();
+      dc->stat_out->have_data = 1;
+      free(dc->filename);
+      return 0;
+   }
+   if (dc->f) {
+      #ifdef WIN32
+      f = _fdopen(_dup(_fileno(dc->f)), "rb");
+      #else
+      f = fdopen(dup(fileno(dc->f)), "rb");
+      #endif
+      if (!f)
+         return stb__io_error(dc);
+   } else {
+      f = fopen(dc->filename, "rb");
+      free(dc->filename);
+      if (!f)
+         return stb__io_error(dc);
+   }
+
+   len = dc->len;
+   if (len < 0) {
+      fseek(f, 0, SEEK_END);
+      len = ftell(f) - dc->offset;
+   }
+
+   if (fseek(f, dc->offset, SEEK_SET)) {
+      fclose(f);
+      return stb__io_error(dc);
+   }
+
+   if (dc->output)
+      buf = dc->output;
+   else {
+      buf = (stb_uchar *) malloc(len);
+      if (buf == NULL) {
+         fclose(f);
+         return stb__io_error(dc);
+      }
+   }
+
+   len = fread(buf, 1, len, f);
+   fclose(f);
+   if (dc->len_output) *dc->len_output = len;
+   if (dc->result) *dc->result = buf;
+   if (dc->flag) *dc->flag = 1;
+
+   return NULL;
+}
+
+int stb__io_add(char *fname, FILE *f, int off, int len, stb_uchar *out, stb_uchar **result, int *olen, int *flag, stb_bgstat *stat)
+{
+   int res;
+   stb__io_init();
+   // do memory allocation outside of mutex
+   if (fname) fname = stb_p_strdup(fname);
+   stb_mutex_begin(stb__diskio_mutex);
+   {
+      stb__disk_command *dc = &stb__dc_queue[stb__dc_offset];
+      dc->filename = fname;
+      dc->f = f;
+      dc->offset = off;
+      dc->len = len;
+      dc->output = out;
+      dc->result = result;
+      dc->len_output = olen;
+      dc->flag = flag;
+      dc->stat_out = stat;
+      res = stb_workq(stb__diskio, stb__io_task, dc, NULL);
+      if (res)
+         stb__dc_offset = (stb__dc_offset + 1 == STB__MAX_DISK_COMMAND ? 0 : stb__dc_offset+1);
+   }
+   stb_mutex_end(stb__diskio_mutex);
+   return res;
+}
+
+int stb_bgio_read(char *filename, int offset, int len, stb_uchar **result, int *olen)
+{
+   return stb__io_add(filename,NULL,offset,len,NULL,result,olen,NULL,NULL);
+}
+
+int stb_bgio_readf(FILE *f, int offset, int len, stb_uchar **result, int *olen)
+{
+   return stb__io_add(NULL,f,offset,len,NULL,result,olen,NULL,NULL);
+}
+
+int stb_bgio_read_to(char *filename, int offset, int len, stb_uchar *buffer, int *olen)
+{
+   return stb__io_add(filename,NULL,offset,len,buffer,NULL,olen,NULL,NULL);
+}
+
+int stb_bgio_readf_to(FILE *f, int offset, int len, stb_uchar *buffer, int *olen)
+{
+   return stb__io_add(NULL,f,offset,len,buffer,NULL,olen,NULL,NULL);
+}
+
+STB_EXTERN int stb_bgio_stat    (char *filename, stb_bgstat *result)
+{
+   result->have_data = 0;
+   return stb__io_add(filename,NULL,0,0,0,NULL,0,NULL, result);
+}
+#endif
+#endif
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Fast malloc implementation
+//
+//   This is a clone of TCMalloc, but without the thread support.
+//      1. large objects are allocated directly, page-aligned
+//      2. small objects are allocated in homogeonous heaps, 0 overhead
+//
+//   We keep an allocation table for pages a la TCMalloc. This would
+//   require 4MB for the entire address space, but we only allocate
+//   the parts that are in use. The overhead from using homogenous heaps
+//   everywhere is 3MB. (That is, if you allocate 1 object of each size,
+//   you'll use 3MB.)
+
+#if defined(STB_DEFINE) && ((defined(_WIN32) && !defined(_M_AMD64)) || defined(STB_FASTMALLOC))
+
+#ifdef _WIN32
+   #ifndef _WINDOWS_
+   #ifndef STB__IMPORT
+   #define STB__IMPORT   STB_EXTERN __declspec(dllimport)
+   #define STB__DW       unsigned long
+   #endif
+   STB__IMPORT void * __stdcall VirtualAlloc(void *p, unsigned long size, unsigned long type, unsigned long protect);
+   STB__IMPORT int   __stdcall VirtualFree(void *p, unsigned long size, unsigned long freetype);
+   #endif
+   #define stb__alloc_pages_raw(x)     (stb_uint32) VirtualAlloc(NULL, (x), 0x3000, 0x04)
+   #define stb__dealloc_pages_raw(p)   VirtualFree((void *) p, 0, 0x8000)
+#else
+   #error "Platform not currently supported"
+#endif
+
+typedef struct stb__span
+{
+   int                start, len;
+   struct stb__span  *next, *prev;
+   void              *first_free;
+   unsigned short     list; // 1..256 free; 257..511 sizeclass; 0=large block
+   short              allocations; // # outstanding allocations for sizeclass
+} stb__span;  // 24
+
+static stb__span **stb__span_for_page;
+static int stb__firstpage, stb__lastpage;
+static void stb__update_page_range(int first, int last)
+{
+   stb__span **sfp;
+   int i, f,l;
+   if (first >= stb__firstpage && last <= stb__lastpage) return;
+   if (stb__span_for_page == NULL) {
+      f = first;
+      l = f+stb_max(last-f, 16384);
+      l = stb_min(l, 1<<20);
+   } else if (last > stb__lastpage) {
+      f = stb__firstpage;
+      l = f + (stb__lastpage - f) * 2;
+      l = stb_clamp(last, l,1<<20);
+   } else {
+      l = stb__lastpage;
+      f = l - (l - stb__firstpage) * 2;
+      f = stb_clamp(f, 0,first);
+   }
+   sfp = (stb__span **) stb__alloc_pages_raw(sizeof(void *) * (l-f));
+   for (i=f; i < stb__firstpage; ++i) sfp[i - f] = NULL;
+   for (   ; i < stb__lastpage ; ++i) sfp[i - f] = stb__span_for_page[i - stb__firstpage];
+   for (   ; i < l             ; ++i) sfp[i - f] = NULL;
+   if (stb__span_for_page) stb__dealloc_pages_raw(stb__span_for_page);
+   stb__firstpage = f;
+   stb__lastpage  = l;
+   stb__span_for_page = sfp;
+}
+
+static stb__span *stb__span_free=NULL;
+static stb__span *stb__span_first, *stb__span_end;
+static stb__span *stb__span_alloc(void)
+{
+   stb__span *s = stb__span_free;
+   if (s)
+      stb__span_free = s->next;
+   else {
+      if (!stb__span_first) {
+         stb__span_first = (stb__span *) stb__alloc_pages_raw(65536);
+         if (stb__span_first == NULL) return NULL;
+         stb__span_end = stb__span_first + (65536 / sizeof(stb__span));
+      }
+      s = stb__span_first++;
+      if (stb__span_first == stb__span_end) stb__span_first = NULL;
+   }
+   return s;
+}
+
+static stb__span *stb__spanlist[512];
+
+static void stb__spanlist_unlink(stb__span *s)
+{
+   if (s->prev)
+      s->prev->next = s->next;
+   else {
+      int n = s->list;
+      assert(stb__spanlist[n] == s);
+      stb__spanlist[n] = s->next;
+   }
+   if (s->next)
+      s->next->prev = s->prev;
+   s->next = s->prev = NULL;
+   s->list = 0;
+}
+
+static void stb__spanlist_add(int n, stb__span *s)
+{
+   s->list = n;
+   s->next = stb__spanlist[n];
+   s->prev = NULL;
+   stb__spanlist[n] = s;
+   if (s->next) s->next->prev = s;
+}
+
+#define stb__page_shift       12
+#define stb__page_size        (1 << stb__page_shift)
+#define stb__page_number(x)   ((x) >> stb__page_shift)
+#define stb__page_address(x)  ((x) << stb__page_shift)
+
+static void stb__set_span_for_page(stb__span *s)
+{
+   int i;
+   for (i=0; i < s->len; ++i)
+      stb__span_for_page[s->start + i - stb__firstpage] = s;
+}
+
+static stb__span *stb__coalesce(stb__span *a, stb__span *b)
+{
+   assert(a->start + a->len == b->start);
+   if (a->list) stb__spanlist_unlink(a);
+   if (b->list) stb__spanlist_unlink(b);
+   a->len += b->len;
+   b->len = 0;
+   b->next = stb__span_free;
+   stb__span_free = b;
+   stb__set_span_for_page(a);
+   return a;
+}
+
+static void stb__free_span(stb__span *s)
+{
+   stb__span *n = NULL;
+   if (s->start > stb__firstpage) {
+      n = stb__span_for_page[s->start-1 - stb__firstpage];
+      if (n && n->allocations == -2 && n->start + n->len == s->start) s = stb__coalesce(n,s);
+   }
+   if (s->start + s->len < stb__lastpage) {
+      n = stb__span_for_page[s->start + s->len - stb__firstpage];
+      if (n && n->allocations == -2 && s->start + s->len == n->start) s = stb__coalesce(s,n);
+   }
+   s->allocations = -2;
+   stb__spanlist_add(s->len > 256 ? 256 : s->len, s);
+}
+
+static stb__span *stb__alloc_pages(int num)
+{
+   stb__span *s = stb__span_alloc();
+   int p;
+   if (!s) return NULL;
+   p = stb__alloc_pages_raw(num << stb__page_shift);
+   if (p == 0) { s->next = stb__span_free; stb__span_free = s; return 0; }
+   assert(stb__page_address(stb__page_number(p)) == p);
+   p = stb__page_number(p);
+   stb__update_page_range(p, p+num);
+   s->start = p;
+   s->len   = num;
+   s->next  = NULL;
+   s->prev  = NULL;
+   stb__set_span_for_page(s);
+   return s;
+}
+
+static stb__span *stb__alloc_span(int pagecount)
+{
+   int i;
+   stb__span *p = NULL;
+   for(i=pagecount; i < 256; ++i)
+      if (stb__spanlist[i]) {
+         p = stb__spanlist[i];
+         break;
+      }
+   if (!p) {
+      p = stb__spanlist[256];
+      while (p && p->len < pagecount)
+         p = p->next;
+   }
+   if (!p) {
+      p = stb__alloc_pages(pagecount < 16 ? 16 : pagecount);
+      if (p == NULL) return 0;
+   } else
+      stb__spanlist_unlink(p);
+
+   if (p->len > pagecount) {
+      stb__span *q = stb__span_alloc();
+      if (q) {
+         q->start = p->start + pagecount;
+         q->len   = p->len   - pagecount;
+         p->len   = pagecount;
+         for (i=0; i < q->len; ++i)
+            stb__span_for_page[q->start+i - stb__firstpage] = q;
+         stb__spanlist_add(q->len > 256 ? 256 : q->len, q);
+      }
+   }
+   return p;
+}
+
+#define STB__MAX_SMALL_SIZE     32768
+#define STB__MAX_SIZE_CLASSES   256
+
+static unsigned char stb__class_base[32];
+static unsigned char stb__class_shift[32];
+static unsigned char stb__pages_for_class[STB__MAX_SIZE_CLASSES];
+static           int stb__size_for_class[STB__MAX_SIZE_CLASSES];
+
+stb__span *stb__get_nonempty_sizeclass(int c)
+{
+   int s = c + 256, i, size, tsize; // remap to span-list index
+   char *z;
+   void *q;
+   stb__span *p = stb__spanlist[s];
+   if (p) {
+      if (p->first_free) return p; // fast path: it's in the first one in list
+      for (p=p->next; p; p=p->next)
+         if (p->first_free) {
+            // move to front for future queries
+            stb__spanlist_unlink(p);
+            stb__spanlist_add(s, p);
+            return p;
+         }
+   }
+   // no non-empty ones, so allocate a new one
+   p = stb__alloc_span(stb__pages_for_class[c]);
+   if (!p) return NULL;
+   // create the free list up front
+   size = stb__size_for_class[c];
+   tsize = stb__pages_for_class[c] << stb__page_shift;
+   i = 0;
+   z = (char *) stb__page_address(p->start);
+   q = NULL;
+   while (i + size <= tsize) {
+      * (void **) z = q; q = z;
+      z += size;
+      i += size;
+   }
+   p->first_free = q;
+   p->allocations = 0;
+   stb__spanlist_add(s,p);
+   return p;
+}
+
+static int stb__sizeclass(size_t sz)
+{
+   int z = stb_log2_floor(sz); // -1 below to group e.g. 13,14,15,16 correctly
+   return stb__class_base[z] + ((sz-1) >> stb__class_shift[z]);
+}
+
+static void stb__init_sizeclass(void)
+{
+   int i, size, overhead;
+   int align_shift = 2;  // allow 4-byte and 12-byte blocks as well, vs. TCMalloc
+   int next_class = 1;
+   int last_log = 0;
+
+   for (i = 0; i < align_shift; i++) {
+      stb__class_base [i] = next_class;
+      stb__class_shift[i] = align_shift;
+   }
+
+   for (size = 1 << align_shift; size <= STB__MAX_SMALL_SIZE; size += 1 << align_shift) {
+      i = stb_log2_floor(size);
+      if (i > last_log) {
+         if (size == 16) ++align_shift; // switch from 4-byte to 8-byte alignment
+         else if (size >= 128 && align_shift < 8) ++align_shift;
+         stb__class_base[i]  = next_class - ((size-1) >> align_shift);
+         stb__class_shift[i] = align_shift;
+         last_log = i;
+      }
+      stb__size_for_class[next_class++] = size;
+   }
+
+   for (i=1; i <= STB__MAX_SMALL_SIZE; ++i)
+      assert(i <= stb__size_for_class[stb__sizeclass(i)]);
+
+   overhead = 0;
+   for (i = 1; i < next_class; i++) {
+      int s = stb__size_for_class[i];
+      size = stb__page_size;
+      while (size % s > size >> 3)
+         size += stb__page_size;
+      stb__pages_for_class[i] = (unsigned char) (size >> stb__page_shift);
+      overhead += size;
+   }
+   assert(overhead < (4 << 20)); // make sure it's under 4MB of overhead
+}
+
+#ifdef STB_DEBUG
+#define stb__smemset(a,b,c)  memset((void *) a, b, c)
+#elif defined(STB_FASTMALLOC_INIT)
+#define stb__smemset(a,b,c)  memset((void *) a, b, c)
+#else
+#define stb__smemset(a,b,c)
+#endif
+void *stb_smalloc(size_t sz)
+{
+   stb__span *s;
+   if (sz == 0) return NULL;
+   if (stb__size_for_class[1] == 0) stb__init_sizeclass();
+   if (sz > STB__MAX_SMALL_SIZE) {
+      s = stb__alloc_span((sz + stb__page_size - 1) >> stb__page_shift);
+      if (s == NULL) return NULL;
+      s->list = 0;
+      s->next = s->prev = NULL;
+      s->allocations = -32767;
+      stb__smemset(stb__page_address(s->start), 0xcd, (sz+3)&~3);
+      return (void *) stb__page_address(s->start);
+   } else {
+      void *p;
+      int c = stb__sizeclass(sz);
+      s = stb__spanlist[256+c];
+      if (!s || !s->first_free)
+         s = stb__get_nonempty_sizeclass(c);
+      if (s == NULL) return NULL;
+      p = s->first_free;
+      s->first_free = * (void **) p;
+      ++s->allocations;
+      stb__smemset(p,0xcd, sz);
+      return p;
+   }
+}
+
+int stb_ssize(void *p)
+{
+   stb__span *s;
+   if (p == NULL) return 0;
+   s = stb__span_for_page[stb__page_number((stb_uint) p) - stb__firstpage];
+   if (s->list >= 256) {
+      return stb__size_for_class[s->list - 256];
+   } else {
+      assert(s->list == 0);
+      return s->len << stb__page_shift;
+   }
+}
+
+void stb_sfree(void *p)
+{
+   stb__span *s;
+   if (p == NULL) return;
+   s = stb__span_for_page[stb__page_number((stb_uint) p) - stb__firstpage];
+   if (s->list >= 256) {
+      stb__smemset(p, 0xfe, stb__size_for_class[s->list-256]);
+      * (void **) p = s->first_free;
+      s->first_free = p;
+      if (--s->allocations == 0) {
+         stb__spanlist_unlink(s);
+         stb__free_span(s);
+      }
+   } else {
+      assert(s->list == 0);
+      stb__smemset(p, 0xfe, stb_ssize(p));
+      stb__free_span(s);
+   }
+}
+
+void *stb_srealloc(void *p, size_t sz)
+{
+   size_t cur_size;
+   if (p == NULL) return stb_smalloc(sz);
+   if (sz == 0) { stb_sfree(p); return NULL; }
+   cur_size = stb_ssize(p);
+   if (sz > cur_size || sz <= (cur_size >> 1)) {
+      void *q;
+      if (sz > cur_size && sz < (cur_size << 1)) sz = cur_size << 1;
+      q = stb_smalloc(sz); if (q == NULL) return NULL;
+      memcpy(q, p, sz < cur_size ? sz : cur_size);
+      stb_sfree(p);
+      return q;
+   }
+   return p;
+}
+
+void *stb_scalloc(size_t n, size_t sz)
+{
+   void *p;
+   if (n == 0 || sz == 0) return NULL;
+   if (stb_log2_ceil(n) + stb_log2_ceil(n) >= 32) return NULL;
+   p = stb_smalloc(n*sz);
+   if (p) memset(p, 0, n*sz);
+   return p;
+}
+
+char *stb_sstrdup(char *s)
+{
+   int n = strlen(s);
+   char *p = (char *) stb_smalloc(n+1);
+   if (p) stb_p_strcpy_s(p,n+1,s);
+   return p;
+}
+#endif // STB_DEFINE
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Source code constants
+//
+// This is a trivial system to let you specify constants in source code,
+// then while running you can change the constants.
+//
+// Note that you can't wrap the #defines, because we need to know their
+// names. So we provide a pre-wrapped version without 'STB_' for convenience;
+// to request it, #define STB_CONVENIENT_H, yielding:
+//       KI -- integer
+//       KU -- unsigned integer
+//       KF -- float
+//       KD -- double
+//       KS -- string constant
+//
+// Defaults to functioning in debug build, not in release builds.
+// To force on, define STB_ALWAYS_H
+
+#ifdef STB_CONVENIENT_H
+#define KI(x) STB_I(x)
+#define KU(x) STB_UI(x)
+#define KF(x) STB_F(x)
+#define KD(x) STB_D(x)
+#define KS(x) STB_S(x)
+#endif
+
+STB_EXTERN void stb_source_path(char *str);
+#ifdef STB_DEFINE
+char *stb__source_path;
+void stb_source_path(char *path)
+{
+   stb__source_path = path;
+}
+
+char *stb__get_sourcefile_path(char *file)
+{
+   static char filebuf[512];
+   if (stb__source_path) {
+      stb_p_sprintf(filebuf stb_p_size(sizeof(filebuf)), "%s/%s", stb__source_path, file);
+      if (stb_fexists(filebuf)) return filebuf;
+   }
+
+   if (stb_fexists(file)) return file;
+
+   stb_p_sprintf(filebuf stb_p_size(sizeof(filebuf)), "../%s", file);
+   if (!stb_fexists(filebuf)) return filebuf;
+
+   return file;
+}
+#endif
+
+#define STB_F(x)   ((float) STB_H(x))
+#define STB_UI(x)  ((unsigned int) STB_I(x))
+
+#if !defined(STB_DEBUG) && !defined(STB_ALWAYS_H)
+#define STB_D(x)   ((double) (x))
+#define STB_I(x)   ((int) (x))
+#define STB_S(x)   ((char *) (x))
+#else
+#define STB_D(x)   stb__double_constant(__FILE__, __LINE__-1, (x))
+#define STB_I(x)   stb__int_constant(__FILE__, __LINE__-1, (x))
+#define STB_S(x)   stb__string_constant(__FILE__, __LINE__-1, (x))
+
+STB_EXTERN double stb__double_constant(char *file, int line, double x);
+STB_EXTERN int    stb__int_constant(char *file, int line, int x);
+STB_EXTERN char * stb__string_constant(char *file, int line, char *str);
+
+#ifdef STB_DEFINE
+
+enum
+{
+   STB__CTYPE_int,
+   STB__CTYPE_uint,
+   STB__CTYPE_float,
+   STB__CTYPE_double,
+   STB__CTYPE_string,
+};
+
+typedef struct
+{
+   int line;
+   int type;
+   union {
+      int ival;
+      double dval;
+      char *sval;
+   };
+} stb__Entry;
+
+typedef struct
+{
+   stb__Entry *entries;
+   char *filename;
+   time_t timestamp;
+   char **file_data;
+   int file_len;
+   unsigned short *line_index;
+} stb__FileEntry;
+
+static void stb__constant_parse(stb__FileEntry *f, int i)
+{
+   char *s;
+   int n;
+   if (!stb_arr_valid(f->entries, i)) return;
+   n = f->entries[i].line;
+   if (n >= f->file_len) return;
+   s = f->file_data[n];
+   switch (f->entries[i].type) {
+      case STB__CTYPE_float:
+         while (*s) {
+            if (!strncmp(s, "STB_D(", 6)) { s+=6; goto matched_float; }
+            if (!strncmp(s, "STB_F(", 6)) { s+=6; goto matched_float; }
+            if (!strncmp(s, "KD(", 3)) { s+=3; goto matched_float; }
+            if (!strncmp(s, "KF(", 3)) { s+=3; goto matched_float; }
+            ++s;
+         }
+         break;
+      matched_float:
+         f->entries[i].dval = strtod(s, NULL);
+         break;
+      case STB__CTYPE_int:
+         while (*s) {
+            if (!strncmp(s, "STB_I(", 6)) { s+=6; goto matched_int; }
+            if (!strncmp(s, "STB_UI(", 7)) { s+=7; goto matched_int; }
+            if (!strncmp(s, "KI(", 3)) { s+=3; goto matched_int; }
+            if (!strncmp(s, "KU(", 3)) { s+=3; goto matched_int; }
+            ++s;
+         }
+         break;
+      matched_int: {
+         int neg=0;
+         s = stb_skipwhite(s);
+         while (*s == '-') { neg = !neg; s = stb_skipwhite(s+1); } // handle '- - 5', pointlessly
+         if (s[0] == '0' && tolower(s[1]) == 'x')
+            f->entries[i].ival = strtol(s, NULL, 16);
+         else if (s[0] == '0')
+            f->entries[i].ival = strtol(s, NULL, 8);
+         else
+            f->entries[i].ival = strtol(s, NULL, 10);
+         if (neg) f->entries[i].ival = -f->entries[i].ival;
+         break;
+      }
+      case STB__CTYPE_string:
+         // @TODO
+         break;
+   }
+}
+
+static stb_sdict *stb__constant_file_hash;
+
+stb__Entry *stb__constant_get_entry(char *filename, int line, int type)
+{
+   int i;
+   stb__FileEntry *f;
+   if (stb__constant_file_hash == NULL)
+      stb__constant_file_hash = stb_sdict_new(STB_TRUE);
+   f = (stb__FileEntry*) stb_sdict_get(stb__constant_file_hash, filename);
+   if (f == NULL) {
+      char *s = stb__get_sourcefile_path(filename);
+      if (s == NULL || !stb_fexists(s)) return 0;
+      f = (stb__FileEntry *) malloc(sizeof(*f));
+      f->timestamp = stb_ftimestamp(s);
+      f->file_data = stb_stringfile(s, &f->file_len);
+      f->filename = stb_p_strdup(s); // cache the full path
+      f->entries = NULL;
+      f->line_index = 0;
+      stb_arr_setlen(f->line_index, f->file_len);
+      memset(f->line_index, 0xff, stb_arr_storage(f->line_index));
+   } else {
+      time_t t = stb_ftimestamp(f->filename);
+      if (f->timestamp != t) {
+         f->timestamp = t;
+         free(f->file_data);
+         f->file_data = stb_stringfile(f->filename, &f->file_len);
+         stb_arr_setlen(f->line_index, f->file_len);
+         for (i=0; i < stb_arr_len(f->entries); ++i)
+            stb__constant_parse(f, i);
+      }
+   }
+
+   if (line >= f->file_len) return 0;
+
+   if (f->line_index[line] >= stb_arr_len(f->entries)) {
+      // need a new entry
+      int n = stb_arr_len(f->entries);
+      stb__Entry e;
+      e.line = line;
+      if (line < f->file_len)
+         f->line_index[line] = n;
+      e.type = type;
+      stb_arr_push(f->entries, e);
+      stb__constant_parse(f, n);
+   }
+   return f->entries + f->line_index[line];
+}
+
+double stb__double_constant(char *file, int line, double x)
+{
+   stb__Entry *e = stb__constant_get_entry(file, line, STB__CTYPE_float);
+   if (!e) return x;
+   return e->dval;
+}
+
+int    stb__int_constant(char *file, int line, int x)
+{
+   stb__Entry *e = stb__constant_get_entry(file, line, STB__CTYPE_int);
+   if (!e) return x;
+   return e->ival;
+}
+
+char * stb__string_constant(char *file, int line, char *x)
+{
+   stb__Entry *e = stb__constant_get_entry(file, line, STB__CTYPE_string);
+   if (!e) return x;
+   return e->sval;
+}
+
+#endif // STB_DEFINE
+#endif // !STB_DEBUG && !STB_ALWAYS_H
+
+#undef STB_EXTERN
+#endif // STB_INCLUDE_STB_H
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/deprecated/stb_image.c b/vendor/stb/deprecated/stb_image.c
new file mode 100644
index 0000000..de0d935
--- /dev/null
+++ b/vendor/stb/deprecated/stb_image.c
@@ -0,0 +1,4678 @@
+/* stb_image - v1.35 - public domain JPEG/PNG reader - http://nothings.org/stb_image.c
+   when you control the images you're loading
+                                     no warranty implied; use at your own risk
+
+   QUICK NOTES:
+      Primarily of interest to game developers and other people who can
+          avoid problematic images and only need the trivial interface
+
+      JPEG baseline (no JPEG progressive)
+      PNG 8-bit-per-channel only
+
+      TGA (not sure what subset, if a subset)
+      BMP non-1bpp, non-RLE
+      PSD (composited view only, no extra channels)
+
+      GIF (*comp always reports as 4-channel)
+      HDR (radiance rgbE format)
+      PIC (Softimage PIC)
+
+      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
+      - decode from arbitrary I/O callbacks
+      - overridable dequantizing-IDCT, YCbCr-to-RGB conversion (define STBI_SIMD)
+
+   Latest revisions:
+      1.35 (2014-05-27) warnings, bugfixes, TGA optimization, etc
+      1.34 (unknown   ) warning fix
+      1.33 (2011-07-14) minor fixes suggested by Dave Moore
+      1.32 (2011-07-13) info support for all filetypes (SpartanJ)
+      1.31 (2011-06-19) a few more leak fixes, bug in PNG handling (SpartanJ)
+      1.30 (2011-06-11) added ability to load files via io callbacks (Ben Wenger)
+      1.29 (2010-08-16) various warning fixes from Aurelien Pocheville 
+      1.28 (2010-08-01) fix bug in GIF palette transparency (SpartanJ)
+
+   See end of file for full revision history.
+
+   TODO:
+      stbi_info support for BMP,PSD,HDR,PIC
+
+
+ ============================    Contributors    =========================
+              
+ Image formats                                Bug fixes & warning fixes
+    Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
+    Nicolas Schulz (hdr, psd)                    Christpher Lloyd
+    Jonathan Dummer (tga)                        Dave Moore
+    Jean-Marc Lienher (gif)                      Won Chun
+    Tom Seddon (pic)                             the Horde3D community
+    Thatcher Ulrich (psd)                        Janez Zemva
+                                                 Jonathan Blow
+                                                 Laurent Gomila
+ Extensions, features                            Aruelien Pocheville
+    Jetro Lauha (stbi_info)                      Ryamond Barbiero
+    James "moose2000" Brown (iPhone PNG)         David Woo
+    Ben "Disch" Wenger (io callbacks)            Roy Eltham
+    Martin "SpartanJ" Golini                     Luke Graham
+                                                 Thomas Ruf
+                                                 John Bartholomew
+ Optimizations & bugfixes                        Ken Hamada
+    Fabian "ryg" Giesen                          Cort Stratton
+    Arseny Kapoulkine                            Blazej Dariusz Roszkowski
+                                                 Thibault Reuille
+ If your name should be here but                 Paul Du Bois
+ isn't let Sean know.                            Guillaume George
+
+*/
+
+#ifndef STBI_INCLUDE_STB_IMAGE_H
+#define STBI_INCLUDE_STB_IMAGE_H
+
+// To get a header file for this, either cut and paste the header,
+// or create stb_image.h, #define STBI_HEADER_FILE_ONLY, and
+// then include stb_image.c from it.
+
+////   begin header file  ////////////////////////////////////////////////////
+//
+// Limitations:
+//    - no jpeg progressive support
+//    - non-HDR formats support 8-bit samples only (jpeg, png)
+//    - no delayed line count (jpeg) -- IJG doesn't support either
+//    - no 1-bit BMP
+//    - GIF always returns *comp=4
+//
+// Basic usage (see HDR discussion below):
+//    int x,y,n;
+//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
+//    // ... process data if not NULL ... 
+//    // ... x = width, y = height, n = # 8-bit components per pixel ...
+//    // ... replace '0' with '1'..'4' to force that many components per pixel
+//    // ... but 'n' will always be the number that it would have been if you said 0
+//    stbi_image_free(data)
+//
+// Standard parameters:
+//    int *x       -- outputs image width in pixels
+//    int *y       -- outputs image height in pixels
+//    int *comp    -- outputs # of image components in image file
+//    int req_comp -- if non-zero, # of image components requested in result
+//
+// The return value from an image loader is an 'unsigned char *' which points
+// to the pixel data. The pixel data consists of *y scanlines of *x pixels,
+// with each pixel consisting of N interleaved 8-bit components; the first
+// pixel pointed to is top-left-most in the image. There is no padding between
+// image scanlines or between pixels, regardless of format. The number of
+// components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
+// If req_comp is non-zero, *comp has the number of components that _would_
+// have been output otherwise. E.g. if you set req_comp to 4, you will always
+// get RGBA output, but you can check *comp to easily see if it's opaque.
+//
+// An output image with N components has the following components interleaved
+// in this order in each pixel:
+//
+//     N=#comp     components
+//       1           grey
+//       2           grey, alpha
+//       3           red, green, blue
+//       4           red, green, blue, alpha
+//
+// If image loading fails for any reason, the return value will be NULL,
+// and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
+// can be queried for an extremely brief, end-user unfriendly explanation
+// of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
+// compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
+// more user-friendly ones.
+//
+// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
+//
+// ===========================================================================
+//
+// iPhone PNG support:
+//
+// By default we convert iphone-formatted PNGs back to RGB; nominally they
+// would silently load as BGR, except the existing code should have just
+// failed on such iPhone PNGs. But you can disable this conversion by
+// by calling stbi_convert_iphone_png_to_rgb(0), in which case
+// you will always just get the native iphone "format" through.
+//
+// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
+// pixel to remove any premultiplied alpha *only* if the image file explicitly
+// says there's premultiplied data (currently only happens in iPhone images,
+// and only if iPhone convert-to-rgb processing is on).
+//
+// ===========================================================================
+//
+// HDR image support   (disable by defining STBI_NO_HDR)
+//
+// stb_image now supports loading HDR images in general, and currently
+// the Radiance .HDR file format, although the support is provided
+// generically. You can still load any file through the existing interface;
+// if you attempt to load an HDR file, it will be automatically remapped to
+// LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
+// both of these constants can be reconfigured through this interface:
+//
+//     stbi_hdr_to_ldr_gamma(2.2f);
+//     stbi_hdr_to_ldr_scale(1.0f);
+//
+// (note, do not use _inverse_ constants; stbi_image will invert them
+// appropriately).
+//
+// Additionally, there is a new, parallel interface for loading files as
+// (linear) floats to preserve the full dynamic range:
+//
+//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
+// 
+// If you load LDR images through this interface, those images will
+// be promoted to floating point values, run through the inverse of
+// constants corresponding to the above:
+//
+//     stbi_ldr_to_hdr_scale(1.0f);
+//     stbi_ldr_to_hdr_gamma(2.2f);
+//
+// Finally, given a filename (or an open file or memory block--see header
+// file for details) containing image data, you can query for the "most
+// appropriate" interface to use (that is, whether the image is HDR or
+// not), using:
+//
+//     stbi_is_hdr(char *filename);
+//
+// ===========================================================================
+//
+// I/O callbacks
+//
+// I/O callbacks allow you to read from arbitrary sources, like packaged
+// files or some other source. Data read from callbacks are processed
+// through a small internal buffer (currently 128 bytes) to try to reduce
+// overhead. 
+//
+// The three functions you must define are "read" (reads some bytes of data),
+// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
+
+
+#ifndef STBI_NO_STDIO
+
+#if defined(_MSC_VER) && _MSC_VER >= 1400
+#define _CRT_SECURE_NO_WARNINGS // suppress warnings about fopen()
+#pragma warning(push)
+#pragma warning(disable:4996)   // suppress even more warnings about fopen()
+#endif
+#include <stdio.h>
+#endif // STBI_NO_STDIO
+
+#define STBI_VERSION 1
+
+enum
+{
+   STBI_default = 0, // only used for req_comp
+
+   STBI_grey       = 1,
+   STBI_grey_alpha = 2,
+   STBI_rgb        = 3,
+   STBI_rgb_alpha  = 4
+};
+
+typedef unsigned char stbi_uc;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// PRIMARY API - works on images of any type
+//
+
+//
+// load image by filename, open file, or memory buffer
+//
+
+extern stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
+
+#ifndef STBI_NO_STDIO
+extern stbi_uc *stbi_load            (char const *filename,     int *x, int *y, int *comp, int req_comp);
+extern stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
+// for stbi_load_from_file, file pointer is left pointing immediately after image
+#endif
+
+typedef struct
+{
+   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read 
+   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
+   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
+} stbi_io_callbacks;
+
+extern stbi_uc *stbi_load_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
+
+#ifndef STBI_NO_HDR
+   extern float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
+
+   #ifndef STBI_NO_STDIO
+   extern float *stbi_loadf            (char const *filename,   int *x, int *y, int *comp, int req_comp);
+   extern float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
+   #endif
+   
+   extern float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
+
+   extern void   stbi_hdr_to_ldr_gamma(float gamma);
+   extern void   stbi_hdr_to_ldr_scale(float scale);
+
+   extern void   stbi_ldr_to_hdr_gamma(float gamma);
+   extern void   stbi_ldr_to_hdr_scale(float scale);
+#endif // STBI_NO_HDR
+
+// stbi_is_hdr is always defined
+extern int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
+extern int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
+#ifndef STBI_NO_STDIO
+extern int      stbi_is_hdr          (char const *filename);
+extern int      stbi_is_hdr_from_file(FILE *f);
+#endif // STBI_NO_STDIO
+
+
+// get a VERY brief reason for failure
+// NOT THREADSAFE
+extern const char *stbi_failure_reason  (void); 
+
+// free the loaded image -- this is just free()
+extern void     stbi_image_free      (void *retval_from_stbi_load);
+
+// get image dimensions & components without fully decoding
+extern int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
+extern int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
+
+#ifndef STBI_NO_STDIO
+extern int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
+extern int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
+
+#endif
+
+
+
+// for image formats that explicitly notate that they have premultiplied alpha,
+// we just return the colors as stored in the file. set this flag to force
+// unpremultiplication. results are undefined if the unpremultiply overflow.
+extern void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
+
+// indicate whether we should process iphone images back to canonical format,
+// or just pass them through "as-is"
+extern void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
+
+
+// ZLIB client - used by PNG, available for other purposes
+
+extern char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
+extern char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
+extern char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
+extern int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
+
+extern char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
+extern int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
+
+
+// define faster low-level operations (typically SIMD support)
+#ifdef STBI_SIMD
+typedef void (*stbi_idct_8x8)(stbi_uc *out, int out_stride, short data[64], unsigned short *dequantize);
+// compute an integer IDCT on "input"
+//     input[x] = data[x] * dequantize[x]
+//     write results to 'out': 64 samples, each run of 8 spaced by 'out_stride'
+//                             CLAMP results to 0..255
+typedef void (*stbi_YCbCr_to_RGB_run)(stbi_uc *output, stbi_uc const  *y, stbi_uc const *cb, stbi_uc const *cr, int count, int step);
+// compute a conversion from YCbCr to RGB
+//     'count' pixels
+//     write pixels to 'output'; each pixel is 'step' bytes (either 3 or 4; if 4, write '255' as 4th), order R,G,B
+//     y: Y input channel
+//     cb: Cb input channel; scale/biased to be 0..255
+//     cr: Cr input channel; scale/biased to be 0..255
+
+extern void stbi_install_idct(stbi_idct_8x8 func);
+extern void stbi_install_YCbCr_to_RGB(stbi_YCbCr_to_RGB_run func);
+#endif // STBI_SIMD
+
+
+#ifdef __cplusplus
+}
+#endif
+
+//
+//
+////   end header file   /////////////////////////////////////////////////////
+#endif // STBI_INCLUDE_STB_IMAGE_H
+
+#ifndef STBI_HEADER_FILE_ONLY
+
+#ifndef STBI_NO_HDR
+#include <math.h>  // ldexp
+#include <string.h> // strcmp, strtok
+#endif
+
+#ifndef STBI_NO_STDIO
+#include <stdio.h>
+#endif
+#include <stdlib.h>
+#include <memory.h>
+#include <assert.h>
+#include <stdarg.h>
+#include <stddef.h> // ptrdiff_t on osx
+
+#ifndef _MSC_VER
+   #ifdef __cplusplus
+   #define stbi_inline inline
+   #else
+   #define stbi_inline
+   #endif
+#else
+   #define stbi_inline __forceinline
+#endif
+
+
+#ifdef _MSC_VER
+typedef unsigned char  stbi__uint8;
+typedef unsigned short stbi__uint16;
+typedef   signed short stbi__int16;
+typedef unsigned int   stbi__uint32;
+typedef   signed int   stbi__int32;
+#else
+#include <stdint.h>
+typedef uint8_t  stbi__uint8;
+typedef uint16_t stbi__uint16;
+typedef int16_t  stbi__int16;
+typedef uint32_t stbi__uint32;
+typedef int32_t  stbi__int32;
+#endif
+
+// should produce compiler error if size is wrong
+typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
+
+#ifdef _MSC_VER
+#define STBI_NOTUSED(v)  (void)(v)
+#else
+#define STBI_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#ifdef _MSC_VER
+#define STBI_HAS_LROTL
+#endif
+
+#ifdef STBI_HAS_LROTL
+   #define stbi_lrot(x,y)  _lrotl(x,y)
+#else
+   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
+#endif
+
+///////////////////////////////////////////////
+//
+//  stbi struct and start_xxx functions
+
+// stbi structure is our basic context used by all images, so it
+// contains all the IO context, plus some basic image information
+typedef struct
+{
+   stbi__uint32 img_x, img_y;
+   int img_n, img_out_n;
+   
+   stbi_io_callbacks io;
+   void *io_user_data;
+
+   int read_from_callbacks;
+   int buflen;
+   stbi__uint8 buffer_start[128];
+
+   stbi__uint8 *img_buffer, *img_buffer_end;
+   stbi__uint8 *img_buffer_original;
+} stbi;
+
+
+static void refill_buffer(stbi *s);
+
+// initialize a memory-decode context
+static void start_mem(stbi *s, stbi__uint8 const *buffer, int len)
+{
+   s->io.read = NULL;
+   s->read_from_callbacks = 0;
+   s->img_buffer = s->img_buffer_original = (stbi__uint8 *) buffer;
+   s->img_buffer_end = (stbi__uint8 *) buffer+len;
+}
+
+// initialize a callback-based context
+static void start_callbacks(stbi *s, stbi_io_callbacks *c, void *user)
+{
+   s->io = *c;
+   s->io_user_data = user;
+   s->buflen = sizeof(s->buffer_start);
+   s->read_from_callbacks = 1;
+   s->img_buffer_original = s->buffer_start;
+   refill_buffer(s);
+}
+
+#ifndef STBI_NO_STDIO
+
+static int stdio_read(void *user, char *data, int size)
+{
+   return (int) fread(data,1,size,(FILE*) user);
+}
+
+static void stdio_skip(void *user, int n)
+{
+   fseek((FILE*) user, n, SEEK_CUR);
+}
+
+static int stdio_eof(void *user)
+{
+   return feof((FILE*) user);
+}
+
+static stbi_io_callbacks stbi_stdio_callbacks =
+{
+   stdio_read,
+   stdio_skip,
+   stdio_eof,
+};
+
+static void start_file(stbi *s, FILE *f)
+{
+   start_callbacks(s, &stbi_stdio_callbacks, (void *) f);
+}
+
+//static void stop_file(stbi *s) { }
+
+#endif // !STBI_NO_STDIO
+
+static void stbi_rewind(stbi *s)
+{
+   // conceptually rewind SHOULD rewind to the beginning of the stream,
+   // but we just rewind to the beginning of the initial buffer, because
+   // we only use it after doing 'test', which only ever looks at at most 92 bytes
+   s->img_buffer = s->img_buffer_original;
+}
+
+static int      stbi_jpeg_test(stbi *s);
+static stbi_uc *stbi_jpeg_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+static int      stbi_jpeg_info(stbi *s, int *x, int *y, int *comp);
+static int      stbi_png_test(stbi *s);
+static stbi_uc *stbi_png_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+static int      stbi_png_info(stbi *s, int *x, int *y, int *comp);
+static int      stbi_bmp_test(stbi *s);
+static stbi_uc *stbi_bmp_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+static int      stbi_tga_test(stbi *s);
+static stbi_uc *stbi_tga_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+static int      stbi_tga_info(stbi *s, int *x, int *y, int *comp);
+static int      stbi_psd_test(stbi *s);
+static stbi_uc *stbi_psd_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+#ifndef STBI_NO_HDR
+static int      stbi_hdr_test(stbi *s);
+static float   *stbi_hdr_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+#endif
+static int      stbi_pic_test(stbi *s);
+static stbi_uc *stbi_pic_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+static int      stbi_gif_test(stbi *s);
+static stbi_uc *stbi_gif_load(stbi *s, int *x, int *y, int *comp, int req_comp);
+static int      stbi_gif_info(stbi *s, int *x, int *y, int *comp);
+
+
+// this is not threadsafe
+static const char *failure_reason;
+
+const char *stbi_failure_reason(void)
+{
+   return failure_reason;
+}
+
+static int e(const char *str)
+{
+   failure_reason = str;
+   return 0;
+}
+
+// e - error
+// epf - error returning pointer to float
+// epuc - error returning pointer to unsigned char
+
+#ifdef STBI_NO_FAILURE_STRINGS
+   #define e(x,y)  0
+#elif defined(STBI_FAILURE_USERMSG)
+   #define e(x,y)  e(y)
+#else
+   #define e(x,y)  e(x)
+#endif
+
+#define epf(x,y)   ((float *) (e(x,y)?NULL:NULL))
+#define epuc(x,y)  ((unsigned char *) (e(x,y)?NULL:NULL))
+
+void stbi_image_free(void *retval_from_stbi_load)
+{
+   free(retval_from_stbi_load);
+}
+
+#ifndef STBI_NO_HDR
+static float   *ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
+static stbi_uc *hdr_to_ldr(float   *data, int x, int y, int comp);
+#endif
+
+static unsigned char *stbi_load_main(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   if (stbi_jpeg_test(s)) return stbi_jpeg_load(s,x,y,comp,req_comp);
+   if (stbi_png_test(s))  return stbi_png_load(s,x,y,comp,req_comp);
+   if (stbi_bmp_test(s))  return stbi_bmp_load(s,x,y,comp,req_comp);
+   if (stbi_gif_test(s))  return stbi_gif_load(s,x,y,comp,req_comp);
+   if (stbi_psd_test(s))  return stbi_psd_load(s,x,y,comp,req_comp);
+   if (stbi_pic_test(s))  return stbi_pic_load(s,x,y,comp,req_comp);
+
+   #ifndef STBI_NO_HDR
+   if (stbi_hdr_test(s)) {
+      float *hdr = stbi_hdr_load(s, x,y,comp,req_comp);
+      return hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
+   }
+   #endif
+
+   // test tga last because it's a crappy test!
+   if (stbi_tga_test(s))
+      return stbi_tga_load(s,x,y,comp,req_comp);
+   return epuc("unknown image type", "Image not of any known type, or corrupt");
+}
+
+#ifndef STBI_NO_STDIO
+unsigned char *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   FILE *f = fopen(filename, "rb");
+   unsigned char *result;
+   if (!f) return epuc("can't fopen", "Unable to open file");
+   result = stbi_load_from_file(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+unsigned char *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   unsigned char *result;
+   stbi s;
+   start_file(&s,f);
+   result = stbi_load_main(&s,x,y,comp,req_comp);
+   if (result) {
+      // need to 'unget' all the characters in the IO buffer
+      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
+   }
+   return result;
+}
+#endif //!STBI_NO_STDIO
+
+unsigned char *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
+{
+   stbi s;
+   start_mem(&s,buffer,len);
+   return stbi_load_main(&s,x,y,comp,req_comp);
+}
+
+unsigned char *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
+{
+   stbi s;
+   start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi_load_main(&s,x,y,comp,req_comp);
+}
+
+#ifndef STBI_NO_HDR
+
+float *stbi_loadf_main(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   unsigned char *data;
+   #ifndef STBI_NO_HDR
+   if (stbi_hdr_test(s))
+      return stbi_hdr_load(s,x,y,comp,req_comp);
+   #endif
+   data = stbi_load_main(s, x, y, comp, req_comp);
+   if (data)
+      return ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
+   return epf("unknown image type", "Image not of any known type, or corrupt");
+}
+
+float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
+{
+   stbi s;
+   start_mem(&s,buffer,len);
+   return stbi_loadf_main(&s,x,y,comp,req_comp);
+}
+
+float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
+{
+   stbi s;
+   start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi_loadf_main(&s,x,y,comp,req_comp);
+}
+
+#ifndef STBI_NO_STDIO
+float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   FILE *f = fopen(filename, "rb");
+   float *result;
+   if (!f) return epf("can't fopen", "Unable to open file");
+   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   stbi s;
+   start_file(&s,f);
+   return stbi_loadf_main(&s,x,y,comp,req_comp);
+}
+#endif // !STBI_NO_STDIO
+
+#endif // !STBI_NO_HDR
+
+// these is-hdr-or-not is defined independent of whether STBI_NO_HDR is
+// defined, for API simplicity; if STBI_NO_HDR is defined, it always
+// reports false!
+
+int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
+{
+   #ifndef STBI_NO_HDR
+   stbi s;
+   start_mem(&s,buffer,len);
+   return stbi_hdr_test(&s);
+   #else
+   STBI_NOTUSED(buffer);
+   STBI_NOTUSED(len);
+   return 0;
+   #endif
+}
+
+#ifndef STBI_NO_STDIO
+extern int      stbi_is_hdr          (char const *filename)
+{
+   FILE *f = fopen(filename, "rb");
+   int result=0;
+   if (f) {
+      result = stbi_is_hdr_from_file(f);
+      fclose(f);
+   }
+   return result;
+}
+
+extern int      stbi_is_hdr_from_file(FILE *f)
+{
+   #ifndef STBI_NO_HDR
+   stbi s;
+   start_file(&s,f);
+   return stbi_hdr_test(&s);
+   #else
+   return 0;
+   #endif
+}
+#endif // !STBI_NO_STDIO
+
+extern int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
+{
+   #ifndef STBI_NO_HDR
+   stbi s;
+   start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi_hdr_test(&s);
+   #else
+   return 0;
+   #endif
+}
+
+#ifndef STBI_NO_HDR
+static float h2l_gamma_i=1.0f/2.2f, h2l_scale_i=1.0f;
+static float l2h_gamma=2.2f, l2h_scale=1.0f;
+
+void   stbi_hdr_to_ldr_gamma(float gamma) { h2l_gamma_i = 1/gamma; }
+void   stbi_hdr_to_ldr_scale(float scale) { h2l_scale_i = 1/scale; }
+
+void   stbi_ldr_to_hdr_gamma(float gamma) { l2h_gamma = gamma; }
+void   stbi_ldr_to_hdr_scale(float scale) { l2h_scale = scale; }
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Common code used by all image loaders
+//
+
+enum
+{
+   SCAN_load=0,
+   SCAN_type,
+   SCAN_header
+};
+
+static void refill_buffer(stbi *s)
+{
+   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
+   if (n == 0) {
+      // at end of file, treat same as if from memory, but need to handle case
+      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
+      s->read_from_callbacks = 0;
+      s->img_buffer = s->buffer_start;
+      s->img_buffer_end = s->buffer_start+1;
+      *s->img_buffer = 0;
+   } else {
+      s->img_buffer = s->buffer_start;
+      s->img_buffer_end = s->buffer_start + n;
+   }
+}
+
+stbi_inline static int get8(stbi *s)
+{
+   if (s->img_buffer < s->img_buffer_end)
+      return *s->img_buffer++;
+   if (s->read_from_callbacks) {
+      refill_buffer(s);
+      return *s->img_buffer++;
+   }
+   return 0;
+}
+
+stbi_inline static int at_eof(stbi *s)
+{
+   if (s->io.read) {
+      if (!(s->io.eof)(s->io_user_data)) return 0;
+      // if feof() is true, check if buffer = end
+      // special case: we've only got the special 0 character at the end
+      if (s->read_from_callbacks == 0) return 1;
+   }
+
+   return s->img_buffer >= s->img_buffer_end;   
+}
+
+stbi_inline static stbi__uint8 get8u(stbi *s)
+{
+   return (stbi__uint8) get8(s);
+}
+
+static void skip(stbi *s, int n)
+{
+   if (s->io.read) {
+      int blen = (int) (s->img_buffer_end - s->img_buffer);
+      if (blen < n) {
+         s->img_buffer = s->img_buffer_end;
+         (s->io.skip)(s->io_user_data, n - blen);
+         return;
+      }
+   }
+   s->img_buffer += n;
+}
+
+static int getn(stbi *s, stbi_uc *buffer, int n)
+{
+   if (s->io.read) {
+      int blen = (int) (s->img_buffer_end - s->img_buffer);
+      if (blen < n) {
+         int res, count;
+
+         memcpy(buffer, s->img_buffer, blen);
+         
+         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
+         res = (count == (n-blen));
+         s->img_buffer = s->img_buffer_end;
+         return res;
+      }
+   }
+
+   if (s->img_buffer+n <= s->img_buffer_end) {
+      memcpy(buffer, s->img_buffer, n);
+      s->img_buffer += n;
+      return 1;
+   } else
+      return 0;
+}
+
+static int get16(stbi *s)
+{
+   int z = get8(s);
+   return (z << 8) + get8(s);
+}
+
+static stbi__uint32 get32(stbi *s)
+{
+   stbi__uint32 z = get16(s);
+   return (z << 16) + get16(s);
+}
+
+static int get16le(stbi *s)
+{
+   int z = get8(s);
+   return z + (get8(s) << 8);
+}
+
+static stbi__uint32 get32le(stbi *s)
+{
+   stbi__uint32 z = get16le(s);
+   return z + (get16le(s) << 16);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  generic converter from built-in img_n to req_comp
+//    individual types do this automatically as much as possible (e.g. jpeg
+//    does all cases internally since it needs to colorspace convert anyway,
+//    and it never has alpha, so very few cases ). png can automatically
+//    interleave an alpha=255 channel, but falls back to this for other cases
+//
+//  assume data buffer is malloced, so malloc a new one and free that one
+//  only failure mode is malloc failing
+
+static stbi__uint8 compute_y(int r, int g, int b)
+{
+   return (stbi__uint8) (((r*77) + (g*150) +  (29*b)) >> 8);
+}
+
+static unsigned char *convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
+{
+   int i,j;
+   unsigned char *good;
+
+   if (req_comp == img_n) return data;
+   assert(req_comp >= 1 && req_comp <= 4);
+
+   good = (unsigned char *) malloc(req_comp * x * y);
+   if (good == NULL) {
+      free(data);
+      return epuc("outofmem", "Out of memory");
+   }
+
+   for (j=0; j < (int) y; ++j) {
+      unsigned char *src  = data + j * x * img_n   ;
+      unsigned char *dest = good + j * x * req_comp;
+
+      #define COMBO(a,b)  ((a)*8+(b))
+      #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
+      // convert source image with img_n components to one with req_comp components;
+      // avoid switch per pixel, so use switch per scanline and massive macros
+      switch (COMBO(img_n, req_comp)) {
+         CASE(1,2) dest[0]=src[0], dest[1]=255; break;
+         CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
+         CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
+         CASE(2,1) dest[0]=src[0]; break;
+         CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
+         CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
+         CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
+         CASE(3,1) dest[0]=compute_y(src[0],src[1],src[2]); break;
+         CASE(3,2) dest[0]=compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
+         CASE(4,1) dest[0]=compute_y(src[0],src[1],src[2]); break;
+         CASE(4,2) dest[0]=compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
+         CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
+         default: assert(0);
+      }
+      #undef CASE
+   }
+
+   free(data);
+   return good;
+}
+
+#ifndef STBI_NO_HDR
+static float   *ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
+{
+   int i,k,n;
+   float *output = (float *) malloc(x * y * comp * sizeof(float));
+   if (output == NULL) { free(data); return epf("outofmem", "Out of memory"); }
+   // compute number of non-alpha components
+   if (comp & 1) n = comp; else n = comp-1;
+   for (i=0; i < x*y; ++i) {
+      for (k=0; k < n; ++k) {
+         output[i*comp + k] = (float) pow(data[i*comp+k]/255.0f, l2h_gamma) * l2h_scale;
+      }
+      if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
+   }
+   free(data);
+   return output;
+}
+
+#define float2int(x)   ((int) (x))
+static stbi_uc *hdr_to_ldr(float   *data, int x, int y, int comp)
+{
+   int i,k,n;
+   stbi_uc *output = (stbi_uc *) malloc(x * y * comp);
+   if (output == NULL) { free(data); return epuc("outofmem", "Out of memory"); }
+   // compute number of non-alpha components
+   if (comp & 1) n = comp; else n = comp-1;
+   for (i=0; i < x*y; ++i) {
+      for (k=0; k < n; ++k) {
+         float z = (float) pow(data[i*comp+k]*h2l_scale_i, h2l_gamma_i) * 255 + 0.5f;
+         if (z < 0) z = 0;
+         if (z > 255) z = 255;
+         output[i*comp + k] = (stbi__uint8) float2int(z);
+      }
+      if (k < comp) {
+         float z = data[i*comp+k] * 255 + 0.5f;
+         if (z < 0) z = 0;
+         if (z > 255) z = 255;
+         output[i*comp + k] = (stbi__uint8) float2int(z);
+      }
+   }
+   free(data);
+   return output;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  "baseline" JPEG/JFIF decoder (not actually fully baseline implementation)
+//
+//    simple implementation
+//      - channel subsampling of at most 2 in each dimension
+//      - doesn't support delayed output of y-dimension
+//      - simple interface (only one output format: 8-bit interleaved RGB)
+//      - doesn't try to recover corrupt jpegs
+//      - doesn't allow partial loading, loading multiple at once
+//      - still fast on x86 (copying globals into locals doesn't help x86)
+//      - allocates lots of intermediate memory (full size of all components)
+//        - non-interleaved case requires this anyway
+//        - allows good upsampling (see next)
+//    high-quality
+//      - upsampled channels are bilinearly interpolated, even across blocks
+//      - quality integer IDCT derived from IJG's 'slow'
+//    performance
+//      - fast huffman; reasonable integer IDCT
+//      - uses a lot of intermediate memory, could cache poorly
+//      - load http://nothings.org/remote/anemones.jpg 3 times on 2.8Ghz P4
+//          stb_jpeg:   1.34 seconds (MSVC6, default release build)
+//          stb_jpeg:   1.06 seconds (MSVC6, processor = Pentium Pro)
+//          IJL11.dll:  1.08 seconds (compiled by intel)
+//          IJG 1998:   0.98 seconds (MSVC6, makefile provided by IJG)
+//          IJG 1998:   0.95 seconds (MSVC6, makefile + proc=PPro)
+
+// huffman decoding acceleration
+#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
+
+typedef struct
+{
+   stbi__uint8  fast[1 << FAST_BITS];
+   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
+   stbi__uint16 code[256];
+   stbi__uint8  values[256];
+   stbi__uint8  size[257];
+   unsigned int maxcode[18];
+   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
+} huffman;
+
+typedef struct
+{
+   #ifdef STBI_SIMD
+   unsigned short dequant2[4][64];
+   #endif
+   stbi *s;
+   huffman huff_dc[4];
+   huffman huff_ac[4];
+   stbi__uint8 dequant[4][64];
+
+// sizes for components, interleaved MCUs
+   int img_h_max, img_v_max;
+   int img_mcu_x, img_mcu_y;
+   int img_mcu_w, img_mcu_h;
+
+// definition of jpeg image component
+   struct
+   {
+      int id;
+      int h,v;
+      int tq;
+      int hd,ha;
+      int dc_pred;
+
+      int x,y,w2,h2;
+      stbi__uint8 *data;
+      void *raw_data;
+      stbi__uint8 *linebuf;
+   } img_comp[4];
+
+   stbi__uint32         code_buffer; // jpeg entropy-coded buffer
+   int            code_bits;   // number of valid bits
+   unsigned char  marker;      // marker seen while filling entropy buffer
+   int            nomore;      // flag if we saw a marker so must stop
+
+   int scan_n, order[4];
+   int restart_interval, todo;
+} jpeg;
+
+static int build_huffman(huffman *h, int *count)
+{
+   int i,j,k=0,code;
+   // build size list for each symbol (from JPEG spec)
+   for (i=0; i < 16; ++i)
+      for (j=0; j < count[i]; ++j)
+         h->size[k++] = (stbi__uint8) (i+1);
+   h->size[k] = 0;
+
+   // compute actual symbols (from jpeg spec)
+   code = 0;
+   k = 0;
+   for(j=1; j <= 16; ++j) {
+      // compute delta to add to code to compute symbol id
+      h->delta[j] = k - code;
+      if (h->size[k] == j) {
+         while (h->size[k] == j)
+            h->code[k++] = (stbi__uint16) (code++);
+         if (code-1 >= (1 << j)) return e("bad code lengths","Corrupt JPEG");
+      }
+      // compute largest code + 1 for this size, preshifted as needed later
+      h->maxcode[j] = code << (16-j);
+      code <<= 1;
+   }
+   h->maxcode[j] = 0xffffffff;
+
+   // build non-spec acceleration table; 255 is flag for not-accelerated
+   memset(h->fast, 255, 1 << FAST_BITS);
+   for (i=0; i < k; ++i) {
+      int s = h->size[i];
+      if (s <= FAST_BITS) {
+         int c = h->code[i] << (FAST_BITS-s);
+         int m = 1 << (FAST_BITS-s);
+         for (j=0; j < m; ++j) {
+            h->fast[c+j] = (stbi__uint8) i;
+         }
+      }
+   }
+   return 1;
+}
+
+static void grow_buffer_unsafe(jpeg *j)
+{
+   do {
+      int b = j->nomore ? 0 : get8(j->s);
+      if (b == 0xff) {
+         int c = get8(j->s);
+         if (c != 0) {
+            j->marker = (unsigned char) c;
+            j->nomore = 1;
+            return;
+         }
+      }
+      j->code_buffer |= b << (24 - j->code_bits);
+      j->code_bits += 8;
+   } while (j->code_bits <= 24);
+}
+
+// (1 << n) - 1
+static stbi__uint32 bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
+
+// decode a jpeg huffman value from the bitstream
+stbi_inline static int decode(jpeg *j, huffman *h)
+{
+   unsigned int temp;
+   int c,k;
+
+   if (j->code_bits < 16) grow_buffer_unsafe(j);
+
+   // look at the top FAST_BITS and determine what symbol ID it is,
+   // if the code is <= FAST_BITS
+   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+   k = h->fast[c];
+   if (k < 255) {
+      int s = h->size[k];
+      if (s > j->code_bits)
+         return -1;
+      j->code_buffer <<= s;
+      j->code_bits -= s;
+      return h->values[k];
+   }
+
+   // naive test is to shift the code_buffer down so k bits are
+   // valid, then test against maxcode. To speed this up, we've
+   // preshifted maxcode left so that it has (16-k) 0s at the
+   // end; in other words, regardless of the number of bits, it
+   // wants to be compared against something shifted to have 16;
+   // that way we don't need to shift inside the loop.
+   temp = j->code_buffer >> 16;
+   for (k=FAST_BITS+1 ; ; ++k)
+      if (temp < h->maxcode[k])
+         break;
+   if (k == 17) {
+      // error! code not found
+      j->code_bits -= 16;
+      return -1;
+   }
+
+   if (k > j->code_bits)
+      return -1;
+
+   // convert the huffman code to the symbol id
+   c = ((j->code_buffer >> (32 - k)) & bmask[k]) + h->delta[k];
+   assert((((j->code_buffer) >> (32 - h->size[c])) & bmask[h->size[c]]) == h->code[c]);
+
+   // convert the id to a symbol
+   j->code_bits -= k;
+   j->code_buffer <<= k;
+   return h->values[c];
+}
+
+// combined JPEG 'receive' and JPEG 'extend', since baseline
+// always extends everything it receives.
+stbi_inline static int extend_receive(jpeg *j, int n)
+{
+   unsigned int m = 1 << (n-1);
+   unsigned int k;
+   if (j->code_bits < n) grow_buffer_unsafe(j);
+
+   #if 1
+   k = stbi_lrot(j->code_buffer, n);
+   j->code_buffer = k & ~bmask[n];
+   k &= bmask[n];
+   j->code_bits -= n;
+   #else
+   k = (j->code_buffer >> (32 - n)) & bmask[n];
+   j->code_bits -= n;
+   j->code_buffer <<= n;
+   #endif
+   // the following test is probably a random branch that won't
+   // predict well. I tried to table accelerate it but failed.
+   // maybe it's compiling as a conditional move?
+   if (k < m)
+      return (-1 << n) + k + 1;
+   else
+      return k;
+}
+
+// given a value that's at position X in the zigzag stream,
+// where does it appear in the 8x8 matrix coded as row-major?
+static stbi__uint8 dezigzag[64+15] =
+{
+    0,  1,  8, 16,  9,  2,  3, 10,
+   17, 24, 32, 25, 18, 11,  4,  5,
+   12, 19, 26, 33, 40, 48, 41, 34,
+   27, 20, 13,  6,  7, 14, 21, 28,
+   35, 42, 49, 56, 57, 50, 43, 36,
+   29, 22, 15, 23, 30, 37, 44, 51,
+   58, 59, 52, 45, 38, 31, 39, 46,
+   53, 60, 61, 54, 47, 55, 62, 63,
+   // let corrupt input sample past end
+   63, 63, 63, 63, 63, 63, 63, 63,
+   63, 63, 63, 63, 63, 63, 63
+};
+
+// decode one 64-entry block--
+static int decode_block(jpeg *j, short data[64], huffman *hdc, huffman *hac, int b)
+{
+   int diff,dc,k;
+   int t = decode(j, hdc);
+   if (t < 0) return e("bad huffman code","Corrupt JPEG");
+
+   // 0 all the ac values now so we can do it 32-bits at a time
+   memset(data,0,64*sizeof(data[0]));
+
+   diff = t ? extend_receive(j, t) : 0;
+   dc = j->img_comp[b].dc_pred + diff;
+   j->img_comp[b].dc_pred = dc;
+   data[0] = (short) dc;
+
+   // decode AC components, see JPEG spec
+   k = 1;
+   do {
+      int r,s;
+      int rs = decode(j, hac);
+      if (rs < 0) return e("bad huffman code","Corrupt JPEG");
+      s = rs & 15;
+      r = rs >> 4;
+      if (s == 0) {
+         if (rs != 0xf0) break; // end block
+         k += 16;
+      } else {
+         k += r;
+         // decode into unzigzag'd location
+         data[dezigzag[k++]] = (short) extend_receive(j,s);
+      }
+   } while (k < 64);
+   return 1;
+}
+
+// take a -128..127 value and clamp it and convert to 0..255
+stbi_inline static stbi__uint8 clamp(int x)
+{
+   // trick to use a single test to catch both cases
+   if ((unsigned int) x > 255) {
+      if (x < 0) return 0;
+      if (x > 255) return 255;
+   }
+   return (stbi__uint8) x;
+}
+
+#define f2f(x)  (int) (((x) * 4096 + 0.5))
+#define fsh(x)  ((x) << 12)
+
+// derived from jidctint -- DCT_ISLOW
+#define IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7)       \
+   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
+   p2 = s2;                                    \
+   p3 = s6;                                    \
+   p1 = (p2+p3) * f2f(0.5411961f);             \
+   t2 = p1 + p3*f2f(-1.847759065f);            \
+   t3 = p1 + p2*f2f( 0.765366865f);            \
+   p2 = s0;                                    \
+   p3 = s4;                                    \
+   t0 = fsh(p2+p3);                            \
+   t1 = fsh(p2-p3);                            \
+   x0 = t0+t3;                                 \
+   x3 = t0-t3;                                 \
+   x1 = t1+t2;                                 \
+   x2 = t1-t2;                                 \
+   t0 = s7;                                    \
+   t1 = s5;                                    \
+   t2 = s3;                                    \
+   t3 = s1;                                    \
+   p3 = t0+t2;                                 \
+   p4 = t1+t3;                                 \
+   p1 = t0+t3;                                 \
+   p2 = t1+t2;                                 \
+   p5 = (p3+p4)*f2f( 1.175875602f);            \
+   t0 = t0*f2f( 0.298631336f);                 \
+   t1 = t1*f2f( 2.053119869f);                 \
+   t2 = t2*f2f( 3.072711026f);                 \
+   t3 = t3*f2f( 1.501321110f);                 \
+   p1 = p5 + p1*f2f(-0.899976223f);            \
+   p2 = p5 + p2*f2f(-2.562915447f);            \
+   p3 = p3*f2f(-1.961570560f);                 \
+   p4 = p4*f2f(-0.390180644f);                 \
+   t3 += p1+p4;                                \
+   t2 += p2+p3;                                \
+   t1 += p2+p4;                                \
+   t0 += p1+p3;
+
+#ifdef STBI_SIMD
+typedef unsigned short stbi_dequantize_t;
+#else
+typedef stbi__uint8 stbi_dequantize_t;
+#endif
+
+// .344 seconds on 3*anemones.jpg
+static void idct_block(stbi__uint8 *out, int out_stride, short data[64], stbi_dequantize_t *dequantize)
+{
+   int i,val[64],*v=val;
+   stbi_dequantize_t *dq = dequantize;
+   stbi__uint8 *o;
+   short *d = data;
+
+   // columns
+   for (i=0; i < 8; ++i,++d,++dq, ++v) {
+      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
+      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
+           && d[40]==0 && d[48]==0 && d[56]==0) {
+         //    no shortcut                 0     seconds
+         //    (1|2|3|4|5|6|7)==0          0     seconds
+         //    all separate               -0.047 seconds
+         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
+         int dcterm = d[0] * dq[0] << 2;
+         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
+      } else {
+         IDCT_1D(d[ 0]*dq[ 0],d[ 8]*dq[ 8],d[16]*dq[16],d[24]*dq[24],
+                 d[32]*dq[32],d[40]*dq[40],d[48]*dq[48],d[56]*dq[56])
+         // constants scaled things up by 1<<12; let's bring them back
+         // down, but keep 2 extra bits of precision
+         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
+         v[ 0] = (x0+t3) >> 10;
+         v[56] = (x0-t3) >> 10;
+         v[ 8] = (x1+t2) >> 10;
+         v[48] = (x1-t2) >> 10;
+         v[16] = (x2+t1) >> 10;
+         v[40] = (x2-t1) >> 10;
+         v[24] = (x3+t0) >> 10;
+         v[32] = (x3-t0) >> 10;
+      }
+   }
+
+   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
+      // no fast case since the first 1D IDCT spread components out
+      IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
+      // constants scaled things up by 1<<12, plus we had 1<<2 from first
+      // loop, plus horizontal and vertical each scale by sqrt(8) so together
+      // we've got an extra 1<<3, so 1<<17 total we need to remove.
+      // so we want to round that, which means adding 0.5 * 1<<17,
+      // aka 65536. Also, we'll end up with -128 to 127 that we want
+      // to encode as 0..255 by adding 128, so we'll add that before the shift
+      x0 += 65536 + (128<<17);
+      x1 += 65536 + (128<<17);
+      x2 += 65536 + (128<<17);
+      x3 += 65536 + (128<<17);
+      // tried computing the shifts into temps, or'ing the temps to see
+      // if any were out of range, but that was slower
+      o[0] = clamp((x0+t3) >> 17);
+      o[7] = clamp((x0-t3) >> 17);
+      o[1] = clamp((x1+t2) >> 17);
+      o[6] = clamp((x1-t2) >> 17);
+      o[2] = clamp((x2+t1) >> 17);
+      o[5] = clamp((x2-t1) >> 17);
+      o[3] = clamp((x3+t0) >> 17);
+      o[4] = clamp((x3-t0) >> 17);
+   }
+}
+
+#ifdef STBI_SIMD
+static stbi_idct_8x8 stbi_idct_installed = idct_block;
+
+void stbi_install_idct(stbi_idct_8x8 func)
+{
+   stbi_idct_installed = func;
+}
+#endif
+
+#define MARKER_none  0xff
+// if there's a pending marker from the entropy stream, return that
+// otherwise, fetch from the stream and get a marker. if there's no
+// marker, return 0xff, which is never a valid marker value
+static stbi__uint8 get_marker(jpeg *j)
+{
+   stbi__uint8 x;
+   if (j->marker != MARKER_none) { x = j->marker; j->marker = MARKER_none; return x; }
+   x = get8u(j->s);
+   if (x != 0xff) return MARKER_none;
+   while (x == 0xff)
+      x = get8u(j->s);
+   return x;
+}
+
+// in each scan, we'll have scan_n components, and the order
+// of the components is specified by order[]
+#define RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
+
+// after a restart interval, reset the entropy decoder and
+// the dc prediction
+static void reset(jpeg *j)
+{
+   j->code_bits = 0;
+   j->code_buffer = 0;
+   j->nomore = 0;
+   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
+   j->marker = MARKER_none;
+   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
+   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
+   // since we don't even allow 1<<30 pixels
+}
+
+static int parse_entropy_coded_data(jpeg *z)
+{
+   reset(z);
+   if (z->scan_n == 1) {
+      int i,j;
+      #ifdef STBI_SIMD
+      __declspec(align(16))
+      #endif
+      short data[64];
+      int n = z->order[0];
+      // non-interleaved data, we just need to process one block at a time,
+      // in trivial scanline order
+      // number of blocks to do just depends on how many actual "pixels" this
+      // component has, independent of interleaved MCU blocking and such
+      int w = (z->img_comp[n].x+7) >> 3;
+      int h = (z->img_comp[n].y+7) >> 3;
+      for (j=0; j < h; ++j) {
+         for (i=0; i < w; ++i) {
+            if (!decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+z->img_comp[n].ha, n)) return 0;
+            #ifdef STBI_SIMD
+            stbi_idct_installed(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data, z->dequant2[z->img_comp[n].tq]);
+            #else
+            idct_block(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data, z->dequant[z->img_comp[n].tq]);
+            #endif
+            // every data block is an MCU, so countdown the restart interval
+            if (--z->todo <= 0) {
+               if (z->code_bits < 24) grow_buffer_unsafe(z);
+               // if it's NOT a restart, then just bail, so we get corrupt data
+               // rather than no data
+               if (!RESTART(z->marker)) return 1;
+               reset(z);
+            }
+         }
+      }
+   } else { // interleaved!
+      int i,j,k,x,y;
+      short data[64];
+      for (j=0; j < z->img_mcu_y; ++j) {
+         for (i=0; i < z->img_mcu_x; ++i) {
+            // scan an interleaved mcu... process scan_n components in order
+            for (k=0; k < z->scan_n; ++k) {
+               int n = z->order[k];
+               // scan out an mcu's worth of this component; that's just determined
+               // by the basic H and V specified for the component
+               for (y=0; y < z->img_comp[n].v; ++y) {
+                  for (x=0; x < z->img_comp[n].h; ++x) {
+                     int x2 = (i*z->img_comp[n].h + x)*8;
+                     int y2 = (j*z->img_comp[n].v + y)*8;
+                     if (!decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+z->img_comp[n].ha, n)) return 0;
+                     #ifdef STBI_SIMD
+                     stbi_idct_installed(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data, z->dequant2[z->img_comp[n].tq]);
+                     #else
+                     idct_block(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data, z->dequant[z->img_comp[n].tq]);
+                     #endif
+                  }
+               }
+            }
+            // after all interleaved components, that's an interleaved MCU,
+            // so now count down the restart interval
+            if (--z->todo <= 0) {
+               if (z->code_bits < 24) grow_buffer_unsafe(z);
+               // if it's NOT a restart, then just bail, so we get corrupt data
+               // rather than no data
+               if (!RESTART(z->marker)) return 1;
+               reset(z);
+            }
+         }
+      }
+   }
+   return 1;
+}
+
+static int process_marker(jpeg *z, int m)
+{
+   int L;
+   switch (m) {
+      case MARKER_none: // no marker found
+         return e("expected marker","Corrupt JPEG");
+
+      case 0xC2: // SOF - progressive
+         return e("progressive jpeg","JPEG format not supported (progressive)");
+
+      case 0xDD: // DRI - specify restart interval
+         if (get16(z->s) != 4) return e("bad DRI len","Corrupt JPEG");
+         z->restart_interval = get16(z->s);
+         return 1;
+
+      case 0xDB: // DQT - define quantization table
+         L = get16(z->s)-2;
+         while (L > 0) {
+            int q = get8(z->s);
+            int p = q >> 4;
+            int t = q & 15,i;
+            if (p != 0) return e("bad DQT type","Corrupt JPEG");
+            if (t > 3) return e("bad DQT table","Corrupt JPEG");
+            for (i=0; i < 64; ++i)
+               z->dequant[t][dezigzag[i]] = get8u(z->s);
+            #ifdef STBI_SIMD
+            for (i=0; i < 64; ++i)
+               z->dequant2[t][i] = z->dequant[t][i];
+            #endif
+            L -= 65;
+         }
+         return L==0;
+
+      case 0xC4: // DHT - define huffman table
+         L = get16(z->s)-2;
+         while (L > 0) {
+            stbi__uint8 *v;
+            int sizes[16],i,n=0;
+            int q = get8(z->s);
+            int tc = q >> 4;
+            int th = q & 15;
+            if (tc > 1 || th > 3) return e("bad DHT header","Corrupt JPEG");
+            for (i=0; i < 16; ++i) {
+               sizes[i] = get8(z->s);
+               n += sizes[i];
+            }
+            L -= 17;
+            if (tc == 0) {
+               if (!build_huffman(z->huff_dc+th, sizes)) return 0;
+               v = z->huff_dc[th].values;
+            } else {
+               if (!build_huffman(z->huff_ac+th, sizes)) return 0;
+               v = z->huff_ac[th].values;
+            }
+            for (i=0; i < n; ++i)
+               v[i] = get8u(z->s);
+            L -= n;
+         }
+         return L==0;
+   }
+   // check for comment block or APP blocks
+   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
+      skip(z->s, get16(z->s)-2);
+      return 1;
+   }
+   return 0;
+}
+
+// after we see SOS
+static int process_scan_header(jpeg *z)
+{
+   int i;
+   int Ls = get16(z->s);
+   z->scan_n = get8(z->s);
+   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return e("bad SOS component count","Corrupt JPEG");
+   if (Ls != 6+2*z->scan_n) return e("bad SOS len","Corrupt JPEG");
+   for (i=0; i < z->scan_n; ++i) {
+      int id = get8(z->s), which;
+      int q = get8(z->s);
+      for (which = 0; which < z->s->img_n; ++which)
+         if (z->img_comp[which].id == id)
+            break;
+      if (which == z->s->img_n) return 0;
+      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return e("bad DC huff","Corrupt JPEG");
+      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return e("bad AC huff","Corrupt JPEG");
+      z->order[i] = which;
+   }
+   if (get8(z->s) != 0) return e("bad SOS","Corrupt JPEG");
+   get8(z->s); // should be 63, but might be 0
+   if (get8(z->s) != 0) return e("bad SOS","Corrupt JPEG");
+
+   return 1;
+}
+
+static int process_frame_header(jpeg *z, int scan)
+{
+   stbi *s = z->s;
+   int Lf,p,i,q, h_max=1,v_max=1,c;
+   Lf = get16(s);         if (Lf < 11) return e("bad SOF len","Corrupt JPEG"); // JPEG
+   p  = get8(s);          if (p != 8) return e("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
+   s->img_y = get16(s);   if (s->img_y == 0) return e("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
+   s->img_x = get16(s);   if (s->img_x == 0) return e("0 width","Corrupt JPEG"); // JPEG requires
+   c = get8(s);
+   if (c != 3 && c != 1) return e("bad component count","Corrupt JPEG");    // JFIF requires
+   s->img_n = c;
+   for (i=0; i < c; ++i) {
+      z->img_comp[i].data = NULL;
+      z->img_comp[i].linebuf = NULL;
+   }
+
+   if (Lf != 8+3*s->img_n) return e("bad SOF len","Corrupt JPEG");
+
+   for (i=0; i < s->img_n; ++i) {
+      z->img_comp[i].id = get8(s);
+      if (z->img_comp[i].id != i+1)   // JFIF requires
+         if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
+            return e("bad component ID","Corrupt JPEG");
+      q = get8(s);
+      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return e("bad H","Corrupt JPEG");
+      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return e("bad V","Corrupt JPEG");
+      z->img_comp[i].tq = get8(s);  if (z->img_comp[i].tq > 3) return e("bad TQ","Corrupt JPEG");
+   }
+
+   if (scan != SCAN_load) return 1;
+
+   if ((1 << 30) / s->img_x / s->img_n < s->img_y) return e("too large", "Image too large to decode");
+
+   for (i=0; i < s->img_n; ++i) {
+      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
+      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
+   }
+
+   // compute interleaved mcu info
+   z->img_h_max = h_max;
+   z->img_v_max = v_max;
+   z->img_mcu_w = h_max * 8;
+   z->img_mcu_h = v_max * 8;
+   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
+   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
+
+   for (i=0; i < s->img_n; ++i) {
+      // number of effective pixels (e.g. for non-interleaved MCU)
+      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
+      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
+      // to simplify generation, we'll allocate enough memory to decode
+      // the bogus oversized data from using interleaved MCUs and their
+      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
+      // discard the extra data until colorspace conversion
+      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
+      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
+      z->img_comp[i].raw_data = malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
+      if (z->img_comp[i].raw_data == NULL) {
+         for(--i; i >= 0; --i) {
+            free(z->img_comp[i].raw_data);
+            z->img_comp[i].data = NULL;
+         }
+         return e("outofmem", "Out of memory");
+      }
+      // align blocks for installable-idct using mmx/sse
+      z->img_comp[i].data = (stbi__uint8*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
+      z->img_comp[i].linebuf = NULL;
+   }
+
+   return 1;
+}
+
+// use comparisons since in some cases we handle more than one case (e.g. SOF)
+#define DNL(x)         ((x) == 0xdc)
+#define SOI(x)         ((x) == 0xd8)
+#define EOI(x)         ((x) == 0xd9)
+#define SOF(x)         ((x) == 0xc0 || (x) == 0xc1)
+#define SOS(x)         ((x) == 0xda)
+
+static int decode_jpeg_header(jpeg *z, int scan)
+{
+   int m;
+   z->marker = MARKER_none; // initialize cached marker to empty
+   m = get_marker(z);
+   if (!SOI(m)) return e("no SOI","Corrupt JPEG");
+   if (scan == SCAN_type) return 1;
+   m = get_marker(z);
+   while (!SOF(m)) {
+      if (!process_marker(z,m)) return 0;
+      m = get_marker(z);
+      while (m == MARKER_none) {
+         // some files have extra padding after their blocks, so ok, we'll scan
+         if (at_eof(z->s)) return e("no SOF", "Corrupt JPEG");
+         m = get_marker(z);
+      }
+   }
+   if (!process_frame_header(z, scan)) return 0;
+   return 1;
+}
+
+static int decode_jpeg_image(jpeg *j)
+{
+   int m;
+   j->restart_interval = 0;
+   if (!decode_jpeg_header(j, SCAN_load)) return 0;
+   m = get_marker(j);
+   while (!EOI(m)) {
+      if (SOS(m)) {
+         if (!process_scan_header(j)) return 0;
+         if (!parse_entropy_coded_data(j)) return 0;
+         if (j->marker == MARKER_none ) {
+            // handle 0s at the end of image data from IP Kamera 9060
+            while (!at_eof(j->s)) {
+               int x = get8(j->s);
+               if (x == 255) {
+                  j->marker = get8u(j->s);
+                  break;
+               } else if (x != 0) {
+                  return 0;
+               }
+            }
+            // if we reach eof without hitting a marker, get_marker() below will fail and we'll eventually return 0
+         }
+      } else {
+         if (!process_marker(j, m)) return 0;
+      }
+      m = get_marker(j);
+   }
+   return 1;
+}
+
+// static jfif-centered resampling (across block boundaries)
+
+typedef stbi__uint8 *(*resample_row_func)(stbi__uint8 *out, stbi__uint8 *in0, stbi__uint8 *in1,
+                                    int w, int hs);
+
+#define div4(x) ((stbi__uint8) ((x) >> 2))
+
+static stbi__uint8 *resample_row_1(stbi__uint8 *out, stbi__uint8 *in_near, stbi__uint8 *in_far, int w, int hs)
+{
+   STBI_NOTUSED(out);
+   STBI_NOTUSED(in_far);
+   STBI_NOTUSED(w);
+   STBI_NOTUSED(hs);
+   return in_near;
+}
+
+static stbi__uint8* resample_row_v_2(stbi__uint8 *out, stbi__uint8 *in_near, stbi__uint8 *in_far, int w, int hs)
+{
+   // need to generate two samples vertically for every one in input
+   int i;
+   STBI_NOTUSED(hs);
+   for (i=0; i < w; ++i)
+      out[i] = div4(3*in_near[i] + in_far[i] + 2);
+   return out;
+}
+
+static stbi__uint8*  resample_row_h_2(stbi__uint8 *out, stbi__uint8 *in_near, stbi__uint8 *in_far, int w, int hs)
+{
+   // need to generate two samples horizontally for every one in input
+   int i;
+   stbi__uint8 *input = in_near;
+
+   if (w == 1) {
+      // if only one sample, can't do any interpolation
+      out[0] = out[1] = input[0];
+      return out;
+   }
+
+   out[0] = input[0];
+   out[1] = div4(input[0]*3 + input[1] + 2);
+   for (i=1; i < w-1; ++i) {
+      int n = 3*input[i]+2;
+      out[i*2+0] = div4(n+input[i-1]);
+      out[i*2+1] = div4(n+input[i+1]);
+   }
+   out[i*2+0] = div4(input[w-2]*3 + input[w-1] + 2);
+   out[i*2+1] = input[w-1];
+
+   STBI_NOTUSED(in_far);
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+
+#define div16(x) ((stbi__uint8) ((x) >> 4))
+
+static stbi__uint8 *resample_row_hv_2(stbi__uint8 *out, stbi__uint8 *in_near, stbi__uint8 *in_far, int w, int hs)
+{
+   // need to generate 2x2 samples for every one in input
+   int i,t0,t1;
+   if (w == 1) {
+      out[0] = out[1] = div4(3*in_near[0] + in_far[0] + 2);
+      return out;
+   }
+
+   t1 = 3*in_near[0] + in_far[0];
+   out[0] = div4(t1+2);
+   for (i=1; i < w; ++i) {
+      t0 = t1;
+      t1 = 3*in_near[i]+in_far[i];
+      out[i*2-1] = div16(3*t0 + t1 + 8);
+      out[i*2  ] = div16(3*t1 + t0 + 8);
+   }
+   out[w*2-1] = div4(t1+2);
+
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+
+static stbi__uint8 *resample_row_generic(stbi__uint8 *out, stbi__uint8 *in_near, stbi__uint8 *in_far, int w, int hs)
+{
+   // resample with nearest-neighbor
+   int i,j;
+   STBI_NOTUSED(in_far);
+   for (i=0; i < w; ++i)
+      for (j=0; j < hs; ++j)
+         out[i*hs+j] = in_near[i];
+   return out;
+}
+
+#define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
+
+// 0.38 seconds on 3*anemones.jpg   (0.25 with processor = Pro)
+// VC6 without processor=Pro is generating multiple LEAs per multiply!
+static void YCbCr_to_RGB_row(stbi__uint8 *out, const stbi__uint8 *y, const stbi__uint8 *pcb, const stbi__uint8 *pcr, int count, int step)
+{
+   int i;
+   for (i=0; i < count; ++i) {
+      int y_fixed = (y[i] << 16) + 32768; // rounding
+      int r,g,b;
+      int cr = pcr[i] - 128;
+      int cb = pcb[i] - 128;
+      r = y_fixed + cr*float2fixed(1.40200f);
+      g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
+      b = y_fixed                            + cb*float2fixed(1.77200f);
+      r >>= 16;
+      g >>= 16;
+      b >>= 16;
+      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
+      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
+      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
+      out[0] = (stbi__uint8)r;
+      out[1] = (stbi__uint8)g;
+      out[2] = (stbi__uint8)b;
+      out[3] = 255;
+      out += step;
+   }
+}
+
+#ifdef STBI_SIMD
+static stbi_YCbCr_to_RGB_run stbi_YCbCr_installed = YCbCr_to_RGB_row;
+
+void stbi_install_YCbCr_to_RGB(stbi_YCbCr_to_RGB_run func)
+{
+   stbi_YCbCr_installed = func;
+}
+#endif
+
+
+// clean up the temporary component buffers
+static void cleanup_jpeg(jpeg *j)
+{
+   int i;
+   for (i=0; i < j->s->img_n; ++i) {
+      if (j->img_comp[i].data) {
+         free(j->img_comp[i].raw_data);
+         j->img_comp[i].data = NULL;
+      }
+      if (j->img_comp[i].linebuf) {
+         free(j->img_comp[i].linebuf);
+         j->img_comp[i].linebuf = NULL;
+      }
+   }
+}
+
+typedef struct
+{
+   resample_row_func resample;
+   stbi__uint8 *line0,*line1;
+   int hs,vs;   // expansion factor in each axis
+   int w_lores; // horizontal pixels pre-expansion 
+   int ystep;   // how far through vertical expansion we are
+   int ypos;    // which pre-expansion row we're on
+} stbi_resample;
+
+static stbi__uint8 *load_jpeg_image(jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
+{
+   int n, decode_n;
+   // validate req_comp
+   if (req_comp < 0 || req_comp > 4) return epuc("bad req_comp", "Internal error");
+   z->s->img_n = 0;
+
+   // load a jpeg image from whichever source
+   if (!decode_jpeg_image(z)) { cleanup_jpeg(z); return NULL; }
+
+   // determine actual number of components to generate
+   n = req_comp ? req_comp : z->s->img_n;
+
+   if (z->s->img_n == 3 && n < 3)
+      decode_n = 1;
+   else
+      decode_n = z->s->img_n;
+
+   // resample and color-convert
+   {
+      int k;
+      unsigned int i,j;
+      stbi__uint8 *output;
+      stbi__uint8 *coutput[4];
+
+      stbi_resample res_comp[4];
+
+      for (k=0; k < decode_n; ++k) {
+         stbi_resample *r = &res_comp[k];
+
+         // allocate line buffer big enough for upsampling off the edges
+         // with upsample factor of 4
+         z->img_comp[k].linebuf = (stbi__uint8 *) malloc(z->s->img_x + 3);
+         if (!z->img_comp[k].linebuf) { cleanup_jpeg(z); return epuc("outofmem", "Out of memory"); }
+
+         r->hs      = z->img_h_max / z->img_comp[k].h;
+         r->vs      = z->img_v_max / z->img_comp[k].v;
+         r->ystep   = r->vs >> 1;
+         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
+         r->ypos    = 0;
+         r->line0   = r->line1 = z->img_comp[k].data;
+
+         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
+         else if (r->hs == 1 && r->vs == 2) r->resample = resample_row_v_2;
+         else if (r->hs == 2 && r->vs == 1) r->resample = resample_row_h_2;
+         else if (r->hs == 2 && r->vs == 2) r->resample = resample_row_hv_2;
+         else                               r->resample = resample_row_generic;
+      }
+
+      // can't error after this so, this is safe
+      output = (stbi__uint8 *) malloc(n * z->s->img_x * z->s->img_y + 1);
+      if (!output) { cleanup_jpeg(z); return epuc("outofmem", "Out of memory"); }
+
+      // now go ahead and resample
+      for (j=0; j < z->s->img_y; ++j) {
+         stbi__uint8 *out = output + n * z->s->img_x * j;
+         for (k=0; k < decode_n; ++k) {
+            stbi_resample *r = &res_comp[k];
+            int y_bot = r->ystep >= (r->vs >> 1);
+            coutput[k] = r->resample(z->img_comp[k].linebuf,
+                                     y_bot ? r->line1 : r->line0,
+                                     y_bot ? r->line0 : r->line1,
+                                     r->w_lores, r->hs);
+            if (++r->ystep >= r->vs) {
+               r->ystep = 0;
+               r->line0 = r->line1;
+               if (++r->ypos < z->img_comp[k].y)
+                  r->line1 += z->img_comp[k].w2;
+            }
+         }
+         if (n >= 3) {
+            stbi__uint8 *y = coutput[0];
+            if (z->s->img_n == 3) {
+               #ifdef STBI_SIMD
+               stbi_YCbCr_installed(out, y, coutput[1], coutput[2], z->s->img_x, n);
+               #else
+               YCbCr_to_RGB_row(out, y, coutput[1], coutput[2], z->s->img_x, n);
+               #endif
+            } else
+               for (i=0; i < z->s->img_x; ++i) {
+                  out[0] = out[1] = out[2] = y[i];
+                  out[3] = 255; // not used if n==3
+                  out += n;
+               }
+         } else {
+            stbi__uint8 *y = coutput[0];
+            if (n == 1)
+               for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
+            else
+               for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
+         }
+      }
+      cleanup_jpeg(z);
+      *out_x = z->s->img_x;
+      *out_y = z->s->img_y;
+      if (comp) *comp  = z->s->img_n; // report original components, not output
+      return output;
+   }
+}
+
+static unsigned char *stbi_jpeg_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   jpeg j;
+   j.s = s;
+   return load_jpeg_image(&j, x,y,comp,req_comp);
+}
+
+static int stbi_jpeg_test(stbi *s)
+{
+   int r;
+   jpeg j;
+   j.s = s;
+   r = decode_jpeg_header(&j, SCAN_type);
+   stbi_rewind(s);
+   return r;
+}
+
+static int stbi_jpeg_info_raw(jpeg *j, int *x, int *y, int *comp)
+{
+   if (!decode_jpeg_header(j, SCAN_header)) {
+      stbi_rewind( j->s );
+      return 0;
+   }
+   if (x) *x = j->s->img_x;
+   if (y) *y = j->s->img_y;
+   if (comp) *comp = j->s->img_n;
+   return 1;
+}
+
+static int stbi_jpeg_info(stbi *s, int *x, int *y, int *comp)
+{
+   jpeg j;
+   j.s = s;
+   return stbi_jpeg_info_raw(&j, x, y, comp);
+}
+
+// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
+//    simple implementation
+//      - all input must be provided in an upfront buffer
+//      - all output is written to a single output buffer (can malloc/realloc)
+//    performance
+//      - fast huffman
+
+// fast-way is faster to check than jpeg huffman, but slow way is slower
+#define ZFAST_BITS  9 // accelerate all cases in default tables
+#define ZFAST_MASK  ((1 << ZFAST_BITS) - 1)
+
+// zlib-style huffman encoding
+// (jpegs packs from left, zlib from right, so can't share code)
+typedef struct
+{
+   stbi__uint16 fast[1 << ZFAST_BITS];
+   stbi__uint16 firstcode[16];
+   int maxcode[17];
+   stbi__uint16 firstsymbol[16];
+   stbi__uint8  size[288];
+   stbi__uint16 value[288]; 
+} zhuffman;
+
+stbi_inline static int bitreverse16(int n)
+{
+  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
+  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
+  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
+  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
+  return n;
+}
+
+stbi_inline static int bit_reverse(int v, int bits)
+{
+   assert(bits <= 16);
+   // to bit reverse n bits, reverse 16 and shift
+   // e.g. 11 bits, bit reverse and shift away 5
+   return bitreverse16(v) >> (16-bits);
+}
+
+static int zbuild_huffman(zhuffman *z, stbi__uint8 *sizelist, int num)
+{
+   int i,k=0;
+   int code, next_code[16], sizes[17];
+
+   // DEFLATE spec for generating codes
+   memset(sizes, 0, sizeof(sizes));
+   memset(z->fast, 255, sizeof(z->fast));
+   for (i=0; i < num; ++i) 
+      ++sizes[sizelist[i]];
+   sizes[0] = 0;
+   for (i=1; i < 16; ++i)
+      assert(sizes[i] <= (1 << i));
+   code = 0;
+   for (i=1; i < 16; ++i) {
+      next_code[i] = code;
+      z->firstcode[i] = (stbi__uint16) code;
+      z->firstsymbol[i] = (stbi__uint16) k;
+      code = (code + sizes[i]);
+      if (sizes[i])
+         if (code-1 >= (1 << i)) return e("bad codelengths","Corrupt JPEG");
+      z->maxcode[i] = code << (16-i); // preshift for inner loop
+      code <<= 1;
+      k += sizes[i];
+   }
+   z->maxcode[16] = 0x10000; // sentinel
+   for (i=0; i < num; ++i) {
+      int s = sizelist[i];
+      if (s) {
+         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
+         z->size[c] = (stbi__uint8)s;
+         z->value[c] = (stbi__uint16)i;
+         if (s <= ZFAST_BITS) {
+            int k = bit_reverse(next_code[s],s);
+            while (k < (1 << ZFAST_BITS)) {
+               z->fast[k] = (stbi__uint16) c;
+               k += (1 << s);
+            }
+         }
+         ++next_code[s];
+      }
+   }
+   return 1;
+}
+
+// zlib-from-memory implementation for PNG reading
+//    because PNG allows splitting the zlib stream arbitrarily,
+//    and it's annoying structurally to have PNG call ZLIB call PNG,
+//    we require PNG read all the IDATs and combine them into a single
+//    memory buffer
+
+typedef struct
+{
+   stbi__uint8 *zbuffer, *zbuffer_end;
+   int num_bits;
+   stbi__uint32 code_buffer;
+
+   char *zout;
+   char *zout_start;
+   char *zout_end;
+   int   z_expandable;
+
+   zhuffman z_length, z_distance;
+} zbuf;
+
+stbi_inline static int zget8(zbuf *z)
+{
+   if (z->zbuffer >= z->zbuffer_end) return 0;
+   return *z->zbuffer++;
+}
+
+static void fill_bits(zbuf *z)
+{
+   do {
+      assert(z->code_buffer < (1U << z->num_bits));
+      z->code_buffer |= zget8(z) << z->num_bits;
+      z->num_bits += 8;
+   } while (z->num_bits <= 24);
+}
+
+stbi_inline static unsigned int zreceive(zbuf *z, int n)
+{
+   unsigned int k;
+   if (z->num_bits < n) fill_bits(z);
+   k = z->code_buffer & ((1 << n) - 1);
+   z->code_buffer >>= n;
+   z->num_bits -= n;
+   return k;   
+}
+
+stbi_inline static int zhuffman_decode(zbuf *a, zhuffman *z)
+{
+   int b,s,k;
+   if (a->num_bits < 16) fill_bits(a);
+   b = z->fast[a->code_buffer & ZFAST_MASK];
+   if (b < 0xffff) {
+      s = z->size[b];
+      a->code_buffer >>= s;
+      a->num_bits -= s;
+      return z->value[b];
+   }
+
+   // not resolved by fast table, so compute it the slow way
+   // use jpeg approach, which requires MSbits at top
+   k = bit_reverse(a->code_buffer, 16);
+   for (s=ZFAST_BITS+1; ; ++s)
+      if (k < z->maxcode[s])
+         break;
+   if (s == 16) return -1; // invalid code!
+   // code size is s, so:
+   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
+   assert(z->size[b] == s);
+   a->code_buffer >>= s;
+   a->num_bits -= s;
+   return z->value[b];
+}
+
+static int expand(zbuf *z, int n)  // need to make room for n bytes
+{
+   char *q;
+   int cur, limit;
+   if (!z->z_expandable) return e("output buffer limit","Corrupt PNG");
+   cur   = (int) (z->zout     - z->zout_start);
+   limit = (int) (z->zout_end - z->zout_start);
+   while (cur + n > limit)
+      limit *= 2;
+   q = (char *) realloc(z->zout_start, limit);
+   if (q == NULL) return e("outofmem", "Out of memory");
+   z->zout_start = q;
+   z->zout       = q + cur;
+   z->zout_end   = q + limit;
+   return 1;
+}
+
+static int length_base[31] = {
+   3,4,5,6,7,8,9,10,11,13,
+   15,17,19,23,27,31,35,43,51,59,
+   67,83,99,115,131,163,195,227,258,0,0 };
+
+static int length_extra[31]= 
+{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
+
+static int dist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
+257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
+
+static int dist_extra[32] =
+{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
+
+static int parse_huffman_block(zbuf *a)
+{
+   for(;;) {
+      int z = zhuffman_decode(a, &a->z_length);
+      if (z < 256) {
+         if (z < 0) return e("bad huffman code","Corrupt PNG"); // error in huffman codes
+         if (a->zout >= a->zout_end) if (!expand(a, 1)) return 0;
+         *a->zout++ = (char) z;
+      } else {
+         stbi__uint8 *p;
+         int len,dist;
+         if (z == 256) return 1;
+         z -= 257;
+         len = length_base[z];
+         if (length_extra[z]) len += zreceive(a, length_extra[z]);
+         z = zhuffman_decode(a, &a->z_distance);
+         if (z < 0) return e("bad huffman code","Corrupt PNG");
+         dist = dist_base[z];
+         if (dist_extra[z]) dist += zreceive(a, dist_extra[z]);
+         if (a->zout - a->zout_start < dist) return e("bad dist","Corrupt PNG");
+         if (a->zout + len > a->zout_end) if (!expand(a, len)) return 0;
+         p = (stbi__uint8 *) (a->zout - dist);
+         while (len--)
+            *a->zout++ = *p++;
+      }
+   }
+}
+
+static int compute_huffman_codes(zbuf *a)
+{
+   static stbi__uint8 length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
+   zhuffman z_codelength;
+   stbi__uint8 lencodes[286+32+137];//padding for maximum single op
+   stbi__uint8 codelength_sizes[19];
+   int i,n;
+
+   int hlit  = zreceive(a,5) + 257;
+   int hdist = zreceive(a,5) + 1;
+   int hclen = zreceive(a,4) + 4;
+
+   memset(codelength_sizes, 0, sizeof(codelength_sizes));
+   for (i=0; i < hclen; ++i) {
+      int s = zreceive(a,3);
+      codelength_sizes[length_dezigzag[i]] = (stbi__uint8) s;
+   }
+   if (!zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
+
+   n = 0;
+   while (n < hlit + hdist) {
+      int c = zhuffman_decode(a, &z_codelength);
+      assert(c >= 0 && c < 19);
+      if (c < 16)
+         lencodes[n++] = (stbi__uint8) c;
+      else if (c == 16) {
+         c = zreceive(a,2)+3;
+         memset(lencodes+n, lencodes[n-1], c);
+         n += c;
+      } else if (c == 17) {
+         c = zreceive(a,3)+3;
+         memset(lencodes+n, 0, c);
+         n += c;
+      } else {
+         assert(c == 18);
+         c = zreceive(a,7)+11;
+         memset(lencodes+n, 0, c);
+         n += c;
+      }
+   }
+   if (n != hlit+hdist) return e("bad codelengths","Corrupt PNG");
+   if (!zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
+   if (!zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
+   return 1;
+}
+
+static int parse_uncompressed_block(zbuf *a)
+{
+   stbi__uint8 header[4];
+   int len,nlen,k;
+   if (a->num_bits & 7)
+      zreceive(a, a->num_bits & 7); // discard
+   // drain the bit-packed data into header
+   k = 0;
+   while (a->num_bits > 0) {
+      header[k++] = (stbi__uint8) (a->code_buffer & 255); // wtf this warns?
+      a->code_buffer >>= 8;
+      a->num_bits -= 8;
+   }
+   assert(a->num_bits == 0);
+   // now fill header the normal way
+   while (k < 4)
+      header[k++] = (stbi__uint8) zget8(a);
+   len  = header[1] * 256 + header[0];
+   nlen = header[3] * 256 + header[2];
+   if (nlen != (len ^ 0xffff)) return e("zlib corrupt","Corrupt PNG");
+   if (a->zbuffer + len > a->zbuffer_end) return e("read past buffer","Corrupt PNG");
+   if (a->zout + len > a->zout_end)
+      if (!expand(a, len)) return 0;
+   memcpy(a->zout, a->zbuffer, len);
+   a->zbuffer += len;
+   a->zout += len;
+   return 1;
+}
+
+static int parse_zlib_header(zbuf *a)
+{
+   int cmf   = zget8(a);
+   int cm    = cmf & 15;
+   /* int cinfo = cmf >> 4; */
+   int flg   = zget8(a);
+   if ((cmf*256+flg) % 31 != 0) return e("bad zlib header","Corrupt PNG"); // zlib spec
+   if (flg & 32) return e("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
+   if (cm != 8) return e("bad compression","Corrupt PNG"); // DEFLATE required for png
+   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
+   return 1;
+}
+
+// @TODO: should statically initialize these for optimal thread safety
+static stbi__uint8 default_length[288], default_distance[32];
+static void init_defaults(void)
+{
+   int i;   // use <= to match clearly with spec
+   for (i=0; i <= 143; ++i)     default_length[i]   = 8;
+   for (   ; i <= 255; ++i)     default_length[i]   = 9;
+   for (   ; i <= 279; ++i)     default_length[i]   = 7;
+   for (   ; i <= 287; ++i)     default_length[i]   = 8;
+
+   for (i=0; i <=  31; ++i)     default_distance[i] = 5;
+}
+
+int stbi_png_partial; // a quick hack to only allow decoding some of a PNG... I should implement real streaming support instead
+static int parse_zlib(zbuf *a, int parse_header)
+{
+   int final, type;
+   if (parse_header)
+      if (!parse_zlib_header(a)) return 0;
+   a->num_bits = 0;
+   a->code_buffer = 0;
+   do {
+      final = zreceive(a,1);
+      type = zreceive(a,2);
+      if (type == 0) {
+         if (!parse_uncompressed_block(a)) return 0;
+      } else if (type == 3) {
+         return 0;
+      } else {
+         if (type == 1) {
+            // use fixed code lengths
+            if (!default_distance[31]) init_defaults();
+            if (!zbuild_huffman(&a->z_length  , default_length  , 288)) return 0;
+            if (!zbuild_huffman(&a->z_distance, default_distance,  32)) return 0;
+         } else {
+            if (!compute_huffman_codes(a)) return 0;
+         }
+         if (!parse_huffman_block(a)) return 0;
+      }
+      if (stbi_png_partial && a->zout - a->zout_start > 65536)
+         break;
+   } while (!final);
+   return 1;
+}
+
+static int do_zlib(zbuf *a, char *obuf, int olen, int exp, int parse_header)
+{
+   a->zout_start = obuf;
+   a->zout       = obuf;
+   a->zout_end   = obuf + olen;
+   a->z_expandable = exp;
+
+   return parse_zlib(a, parse_header);
+}
+
+char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
+{
+   zbuf a;
+   char *p = (char *) malloc(initial_size);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi__uint8 *) buffer;
+   a.zbuffer_end = (stbi__uint8 *) buffer + len;
+   if (do_zlib(&a, p, initial_size, 1, 1)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      free(a.zout_start);
+      return NULL;
+   }
+}
+
+char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
+{
+   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
+}
+
+char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
+{
+   zbuf a;
+   char *p = (char *) malloc(initial_size);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi__uint8 *) buffer;
+   a.zbuffer_end = (stbi__uint8 *) buffer + len;
+   if (do_zlib(&a, p, initial_size, 1, parse_header)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      free(a.zout_start);
+      return NULL;
+   }
+}
+
+int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
+{
+   zbuf a;
+   a.zbuffer = (stbi__uint8 *) ibuffer;
+   a.zbuffer_end = (stbi__uint8 *) ibuffer + ilen;
+   if (do_zlib(&a, obuffer, olen, 0, 1))
+      return (int) (a.zout - a.zout_start);
+   else
+      return -1;
+}
+
+char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
+{
+   zbuf a;
+   char *p = (char *) malloc(16384);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi__uint8 *) buffer;
+   a.zbuffer_end = (stbi__uint8 *) buffer+len;
+   if (do_zlib(&a, p, 16384, 1, 0)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      free(a.zout_start);
+      return NULL;
+   }
+}
+
+int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
+{
+   zbuf a;
+   a.zbuffer = (stbi__uint8 *) ibuffer;
+   a.zbuffer_end = (stbi__uint8 *) ibuffer + ilen;
+   if (do_zlib(&a, obuffer, olen, 0, 0))
+      return (int) (a.zout - a.zout_start);
+   else
+      return -1;
+}
+
+// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
+//    simple implementation
+//      - only 8-bit samples
+//      - no CRC checking
+//      - allocates lots of intermediate memory
+//        - avoids problem of streaming data between subsystems
+//        - avoids explicit window management
+//    performance
+//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
+
+
+typedef struct
+{
+   stbi__uint32 length;
+   stbi__uint32 type;
+} chunk;
+
+#define PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
+
+static chunk get_chunk_header(stbi *s)
+{
+   chunk c;
+   c.length = get32(s);
+   c.type   = get32(s);
+   return c;
+}
+
+static int check_png_header(stbi *s)
+{
+   static stbi__uint8 png_sig[8] = { 137,80,78,71,13,10,26,10 };
+   int i;
+   for (i=0; i < 8; ++i)
+      if (get8u(s) != png_sig[i]) return e("bad png sig","Not a PNG");
+   return 1;
+}
+
+typedef struct
+{
+   stbi *s;
+   stbi__uint8 *idata, *expanded, *out;
+} png;
+
+
+enum {
+   F_none=0, F_sub=1, F_up=2, F_avg=3, F_paeth=4,
+   F_avg_first, F_paeth_first
+};
+
+static stbi__uint8 first_row_filter[5] =
+{
+   F_none, F_sub, F_none, F_avg_first, F_paeth_first
+};
+
+static int paeth(int a, int b, int c)
+{
+   int p = a + b - c;
+   int pa = abs(p-a);
+   int pb = abs(p-b);
+   int pc = abs(p-c);
+   if (pa <= pb && pa <= pc) return a;
+   if (pb <= pc) return b;
+   return c;
+}
+
+// create the png data from post-deflated data
+static int create_png_image_raw(png *a, stbi__uint8 *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y)
+{
+   stbi *s = a->s;
+   stbi__uint32 i,j,stride = x*out_n;
+   int k;
+   int img_n = s->img_n; // copy it into a local for later
+   assert(out_n == s->img_n || out_n == s->img_n+1);
+   if (stbi_png_partial) y = 1;
+   a->out = (stbi__uint8 *) malloc(x * y * out_n);
+   if (!a->out) return e("outofmem", "Out of memory");
+   if (!stbi_png_partial) {
+      if (s->img_x == x && s->img_y == y) {
+         if (raw_len != (img_n * x + 1) * y) return e("not enough pixels","Corrupt PNG");
+      } else { // interlaced:
+         if (raw_len < (img_n * x + 1) * y) return e("not enough pixels","Corrupt PNG");
+      }
+   }
+   for (j=0; j < y; ++j) {
+      stbi__uint8 *cur = a->out + stride*j;
+      stbi__uint8 *prior = cur - stride;
+      int filter = *raw++;
+      if (filter > 4) return e("invalid filter","Corrupt PNG");
+      // if first row, use special filter that doesn't sample previous row
+      if (j == 0) filter = first_row_filter[filter];
+      // handle first pixel explicitly
+      for (k=0; k < img_n; ++k) {
+         switch (filter) {
+            case F_none       : cur[k] = raw[k]; break;
+            case F_sub        : cur[k] = raw[k]; break;
+            case F_up         : cur[k] = raw[k] + prior[k]; break;
+            case F_avg        : cur[k] = raw[k] + (prior[k]>>1); break;
+            case F_paeth      : cur[k] = (stbi__uint8) (raw[k] + paeth(0,prior[k],0)); break;
+            case F_avg_first  : cur[k] = raw[k]; break;
+            case F_paeth_first: cur[k] = raw[k]; break;
+         }
+      }
+      if (img_n != out_n) cur[img_n] = 255;
+      raw += img_n;
+      cur += out_n;
+      prior += out_n;
+      // this is a little gross, so that we don't switch per-pixel or per-component
+      if (img_n == out_n) {
+         #define CASE(f) \
+             case f:     \
+                for (i=x-1; i >= 1; --i, raw+=img_n,cur+=img_n,prior+=img_n) \
+                   for (k=0; k < img_n; ++k)
+         switch (filter) {
+            CASE(F_none)  cur[k] = raw[k]; break;
+            CASE(F_sub)   cur[k] = raw[k] + cur[k-img_n]; break;
+            CASE(F_up)    cur[k] = raw[k] + prior[k]; break;
+            CASE(F_avg)   cur[k] = raw[k] + ((prior[k] + cur[k-img_n])>>1); break;
+            CASE(F_paeth)  cur[k] = (stbi__uint8) (raw[k] + paeth(cur[k-img_n],prior[k],prior[k-img_n])); break;
+            CASE(F_avg_first)    cur[k] = raw[k] + (cur[k-img_n] >> 1); break;
+            CASE(F_paeth_first)  cur[k] = (stbi__uint8) (raw[k] + paeth(cur[k-img_n],0,0)); break;
+         }
+         #undef CASE
+      } else {
+         assert(img_n+1 == out_n);
+         #define CASE(f) \
+             case f:     \
+                for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
+                   for (k=0; k < img_n; ++k)
+         switch (filter) {
+            CASE(F_none)  cur[k] = raw[k]; break;
+            CASE(F_sub)   cur[k] = raw[k] + cur[k-out_n]; break;
+            CASE(F_up)    cur[k] = raw[k] + prior[k]; break;
+            CASE(F_avg)   cur[k] = raw[k] + ((prior[k] + cur[k-out_n])>>1); break;
+            CASE(F_paeth)  cur[k] = (stbi__uint8) (raw[k] + paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
+            CASE(F_avg_first)    cur[k] = raw[k] + (cur[k-out_n] >> 1); break;
+            CASE(F_paeth_first)  cur[k] = (stbi__uint8) (raw[k] + paeth(cur[k-out_n],0,0)); break;
+         }
+         #undef CASE
+      }
+   }
+   return 1;
+}
+
+static int create_png_image(png *a, stbi__uint8 *raw, stbi__uint32 raw_len, int out_n, int interlaced)
+{
+   stbi__uint8 *final;
+   int p;
+   int save;
+   if (!interlaced)
+      return create_png_image_raw(a, raw, raw_len, out_n, a->s->img_x, a->s->img_y);
+   save = stbi_png_partial;
+   stbi_png_partial = 0;
+
+   // de-interlacing
+   final = (stbi__uint8 *) malloc(a->s->img_x * a->s->img_y * out_n);
+   for (p=0; p < 7; ++p) {
+      int xorig[] = { 0,4,0,2,0,1,0 };
+      int yorig[] = { 0,0,4,0,2,0,1 };
+      int xspc[]  = { 8,8,4,4,2,2,1 };
+      int yspc[]  = { 8,8,8,4,4,2,2 };
+      int i,j,x,y;
+      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
+      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
+      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
+      if (x && y) {
+         if (!create_png_image_raw(a, raw, raw_len, out_n, x, y)) {
+            free(final);
+            return 0;
+         }
+         for (j=0; j < y; ++j)
+            for (i=0; i < x; ++i)
+               memcpy(final + (j*yspc[p]+yorig[p])*a->s->img_x*out_n + (i*xspc[p]+xorig[p])*out_n,
+                      a->out + (j*x+i)*out_n, out_n);
+         free(a->out);
+         raw += (x*out_n+1)*y;
+         raw_len -= (x*out_n+1)*y;
+      }
+   }
+   a->out = final;
+
+   stbi_png_partial = save;
+   return 1;
+}
+
+static int compute_transparency(png *z, stbi__uint8 tc[3], int out_n)
+{
+   stbi *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi__uint8 *p = z->out;
+
+   // compute color-based transparency, assuming we've
+   // already got 255 as the alpha value in the output
+   assert(out_n == 2 || out_n == 4);
+
+   if (out_n == 2) {
+      for (i=0; i < pixel_count; ++i) {
+         p[1] = (p[0] == tc[0] ? 0 : 255);
+         p += 2;
+      }
+   } else {
+      for (i=0; i < pixel_count; ++i) {
+         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
+            p[3] = 0;
+         p += 4;
+      }
+   }
+   return 1;
+}
+
+static int expand_palette(png *a, stbi__uint8 *palette, int len, int pal_img_n)
+{
+   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
+   stbi__uint8 *p, *temp_out, *orig = a->out;
+
+   p = (stbi__uint8 *) malloc(pixel_count * pal_img_n);
+   if (p == NULL) return e("outofmem", "Out of memory");
+
+   // between here and free(out) below, exitting would leak
+   temp_out = p;
+
+   if (pal_img_n == 3) {
+      for (i=0; i < pixel_count; ++i) {
+         int n = orig[i]*4;
+         p[0] = palette[n  ];
+         p[1] = palette[n+1];
+         p[2] = palette[n+2];
+         p += 3;
+      }
+   } else {
+      for (i=0; i < pixel_count; ++i) {
+         int n = orig[i]*4;
+         p[0] = palette[n  ];
+         p[1] = palette[n+1];
+         p[2] = palette[n+2];
+         p[3] = palette[n+3];
+         p += 4;
+      }
+   }
+   free(a->out);
+   a->out = temp_out;
+
+   STBI_NOTUSED(len);
+
+   return 1;
+}
+
+static int stbi_unpremultiply_on_load = 0;
+static int stbi_de_iphone_flag = 0;
+
+void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
+{
+   stbi_unpremultiply_on_load = flag_true_if_should_unpremultiply;
+}
+void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
+{
+   stbi_de_iphone_flag = flag_true_if_should_convert;
+}
+
+static void stbi_de_iphone(png *z)
+{
+   stbi *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi__uint8 *p = z->out;
+
+   if (s->img_out_n == 3) {  // convert bgr to rgb
+      for (i=0; i < pixel_count; ++i) {
+         stbi__uint8 t = p[0];
+         p[0] = p[2];
+         p[2] = t;
+         p += 3;
+      }
+   } else {
+      assert(s->img_out_n == 4);
+      if (stbi_unpremultiply_on_load) {
+         // convert bgr to rgb and unpremultiply
+         for (i=0; i < pixel_count; ++i) {
+            stbi__uint8 a = p[3];
+            stbi__uint8 t = p[0];
+            if (a) {
+               p[0] = p[2] * 255 / a;
+               p[1] = p[1] * 255 / a;
+               p[2] =  t   * 255 / a;
+            } else {
+               p[0] = p[2];
+               p[2] = t;
+            } 
+            p += 4;
+         }
+      } else {
+         // convert bgr to rgb
+         for (i=0; i < pixel_count; ++i) {
+            stbi__uint8 t = p[0];
+            p[0] = p[2];
+            p[2] = t;
+            p += 4;
+         }
+      }
+   }
+}
+
+static int parse_png_file(png *z, int scan, int req_comp)
+{
+   stbi__uint8 palette[1024], pal_img_n=0;
+   stbi__uint8 has_trans=0, tc[3];
+   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
+   int first=1,k,interlace=0, iphone=0;
+   stbi *s = z->s;
+
+   z->expanded = NULL;
+   z->idata = NULL;
+   z->out = NULL;
+
+   if (!check_png_header(s)) return 0;
+
+   if (scan == SCAN_type) return 1;
+
+   for (;;) {
+      chunk c = get_chunk_header(s);
+      switch (c.type) {
+         case PNG_TYPE('C','g','B','I'):
+            iphone = stbi_de_iphone_flag;
+            skip(s, c.length);
+            break;
+         case PNG_TYPE('I','H','D','R'): {
+            int depth,color,comp,filter;
+            if (!first) return e("multiple IHDR","Corrupt PNG");
+            first = 0;
+            if (c.length != 13) return e("bad IHDR len","Corrupt PNG");
+            s->img_x = get32(s); if (s->img_x > (1 << 24)) return e("too large","Very large image (corrupt?)");
+            s->img_y = get32(s); if (s->img_y > (1 << 24)) return e("too large","Very large image (corrupt?)");
+            depth = get8(s);  if (depth != 8)        return e("8bit only","PNG not supported: 8-bit only");
+            color = get8(s);  if (color > 6)         return e("bad ctype","Corrupt PNG");
+            if (color == 3) pal_img_n = 3; else if (color & 1) return e("bad ctype","Corrupt PNG");
+            comp  = get8(s);  if (comp) return e("bad comp method","Corrupt PNG");
+            filter= get8(s);  if (filter) return e("bad filter method","Corrupt PNG");
+            interlace = get8(s); if (interlace>1) return e("bad interlace method","Corrupt PNG");
+            if (!s->img_x || !s->img_y) return e("0-pixel image","Corrupt PNG");
+            if (!pal_img_n) {
+               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
+               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return e("too large", "Image too large to decode");
+               if (scan == SCAN_header) return 1;
+            } else {
+               // if paletted, then pal_n is our final components, and
+               // img_n is # components to decompress/filter.
+               s->img_n = 1;
+               if ((1 << 30) / s->img_x / 4 < s->img_y) return e("too large","Corrupt PNG");
+               // if SCAN_header, have to scan to see if we have a tRNS
+            }
+            break;
+         }
+
+         case PNG_TYPE('P','L','T','E'):  {
+            if (first) return e("first not IHDR", "Corrupt PNG");
+            if (c.length > 256*3) return e("invalid PLTE","Corrupt PNG");
+            pal_len = c.length / 3;
+            if (pal_len * 3 != c.length) return e("invalid PLTE","Corrupt PNG");
+            for (i=0; i < pal_len; ++i) {
+               palette[i*4+0] = get8u(s);
+               palette[i*4+1] = get8u(s);
+               palette[i*4+2] = get8u(s);
+               palette[i*4+3] = 255;
+            }
+            break;
+         }
+
+         case PNG_TYPE('t','R','N','S'): {
+            if (first) return e("first not IHDR", "Corrupt PNG");
+            if (z->idata) return e("tRNS after IDAT","Corrupt PNG");
+            if (pal_img_n) {
+               if (scan == SCAN_header) { s->img_n = 4; return 1; }
+               if (pal_len == 0) return e("tRNS before PLTE","Corrupt PNG");
+               if (c.length > pal_len) return e("bad tRNS len","Corrupt PNG");
+               pal_img_n = 4;
+               for (i=0; i < c.length; ++i)
+                  palette[i*4+3] = get8u(s);
+            } else {
+               if (!(s->img_n & 1)) return e("tRNS with alpha","Corrupt PNG");
+               if (c.length != (stbi__uint32) s->img_n*2) return e("bad tRNS len","Corrupt PNG");
+               has_trans = 1;
+               for (k=0; k < s->img_n; ++k)
+                  tc[k] = (stbi__uint8) get16(s); // non 8-bit images will be larger
+            }
+            break;
+         }
+
+         case PNG_TYPE('I','D','A','T'): {
+            if (first) return e("first not IHDR", "Corrupt PNG");
+            if (pal_img_n && !pal_len) return e("no PLTE","Corrupt PNG");
+            if (scan == SCAN_header) { s->img_n = pal_img_n; return 1; }
+            if (ioff + c.length > idata_limit) {
+               stbi__uint8 *p;
+               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
+               while (ioff + c.length > idata_limit)
+                  idata_limit *= 2;
+               p = (stbi__uint8 *) realloc(z->idata, idata_limit); if (p == NULL) return e("outofmem", "Out of memory");
+               z->idata = p;
+            }
+            if (!getn(s, z->idata+ioff,c.length)) return e("outofdata","Corrupt PNG");
+            ioff += c.length;
+            break;
+         }
+
+         case PNG_TYPE('I','E','N','D'): {
+            stbi__uint32 raw_len;
+            if (first) return e("first not IHDR", "Corrupt PNG");
+            if (scan != SCAN_load) return 1;
+            if (z->idata == NULL) return e("no IDAT","Corrupt PNG");
+            z->expanded = (stbi__uint8 *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, 16384, (int *) &raw_len, !iphone);
+            if (z->expanded == NULL) return 0; // zlib should set error
+            free(z->idata); z->idata = NULL;
+            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
+               s->img_out_n = s->img_n+1;
+            else
+               s->img_out_n = s->img_n;
+            if (!create_png_image(z, z->expanded, raw_len, s->img_out_n, interlace)) return 0;
+            if (has_trans)
+               if (!compute_transparency(z, tc, s->img_out_n)) return 0;
+            if (iphone && s->img_out_n > 2)
+               stbi_de_iphone(z);
+            if (pal_img_n) {
+               // pal_img_n == 3 or 4
+               s->img_n = pal_img_n; // record the actual colors we had
+               s->img_out_n = pal_img_n;
+               if (req_comp >= 3) s->img_out_n = req_comp;
+               if (!expand_palette(z, palette, pal_len, s->img_out_n))
+                  return 0;
+            }
+            free(z->expanded); z->expanded = NULL;
+            return 1;
+         }
+
+         default:
+            // if critical, fail
+            if (first) return e("first not IHDR", "Corrupt PNG");
+            if ((c.type & (1 << 29)) == 0) {
+               #ifndef STBI_NO_FAILURE_STRINGS
+               // not threadsafe
+               static char invalid_chunk[] = "XXXX chunk not known";
+               invalid_chunk[0] = (stbi__uint8) (c.type >> 24);
+               invalid_chunk[1] = (stbi__uint8) (c.type >> 16);
+               invalid_chunk[2] = (stbi__uint8) (c.type >>  8);
+               invalid_chunk[3] = (stbi__uint8) (c.type >>  0);
+               #endif
+               return e(invalid_chunk, "PNG not supported: unknown chunk type");
+            }
+            skip(s, c.length);
+            break;
+      }
+      // end of chunk, read and skip CRC
+      get32(s);
+   }
+}
+
+static unsigned char *do_png(png *p, int *x, int *y, int *n, int req_comp)
+{
+   unsigned char *result=NULL;
+   if (req_comp < 0 || req_comp > 4) return epuc("bad req_comp", "Internal error");
+   if (parse_png_file(p, SCAN_load, req_comp)) {
+      result = p->out;
+      p->out = NULL;
+      if (req_comp && req_comp != p->s->img_out_n) {
+         result = convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
+         p->s->img_out_n = req_comp;
+         if (result == NULL) return result;
+      }
+      *x = p->s->img_x;
+      *y = p->s->img_y;
+      if (n) *n = p->s->img_n;
+   }
+   free(p->out);      p->out      = NULL;
+   free(p->expanded); p->expanded = NULL;
+   free(p->idata);    p->idata    = NULL;
+
+   return result;
+}
+
+static unsigned char *stbi_png_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   png p;
+   p.s = s;
+   return do_png(&p, x,y,comp,req_comp);
+}
+
+static int stbi_png_test(stbi *s)
+{
+   int r;
+   r = check_png_header(s);
+   stbi_rewind(s);
+   return r;
+}
+
+static int stbi_png_info_raw(png *p, int *x, int *y, int *comp)
+{
+   if (!parse_png_file(p, SCAN_header, 0)) {
+      stbi_rewind( p->s );
+      return 0;
+   }
+   if (x) *x = p->s->img_x;
+   if (y) *y = p->s->img_y;
+   if (comp) *comp = p->s->img_n;
+   return 1;
+}
+
+static int      stbi_png_info(stbi *s, int *x, int *y, int *comp)
+{
+   png p;
+   p.s = s;
+   return stbi_png_info_raw(&p, x, y, comp);
+}
+
+// Microsoft/Windows BMP image
+
+static int bmp_test(stbi *s)
+{
+   int sz;
+   if (get8(s) != 'B') return 0;
+   if (get8(s) != 'M') return 0;
+   get32le(s); // discard filesize
+   get16le(s); // discard reserved
+   get16le(s); // discard reserved
+   get32le(s); // discard data offset
+   sz = get32le(s);
+   if (sz == 12 || sz == 40 || sz == 56 || sz == 108) return 1;
+   return 0;
+}
+
+static int stbi_bmp_test(stbi *s)
+{
+   int r = bmp_test(s);
+   stbi_rewind(s);
+   return r;
+}
+
+
+// returns 0..31 for the highest set bit
+static int high_bit(unsigned int z)
+{
+   int n=0;
+   if (z == 0) return -1;
+   if (z >= 0x10000) n += 16, z >>= 16;
+   if (z >= 0x00100) n +=  8, z >>=  8;
+   if (z >= 0x00010) n +=  4, z >>=  4;
+   if (z >= 0x00004) n +=  2, z >>=  2;
+   if (z >= 0x00002) n +=  1, z >>=  1;
+   return n;
+}
+
+static int bitcount(unsigned int a)
+{
+   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
+   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
+   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
+   a = (a + (a >> 8)); // max 16 per 8 bits
+   a = (a + (a >> 16)); // max 32 per 8 bits
+   return a & 0xff;
+}
+
+static int shiftsigned(int v, int shift, int bits)
+{
+   int result;
+   int z=0;
+
+   if (shift < 0) v <<= -shift;
+   else v >>= shift;
+   result = v;
+
+   z = bits;
+   while (z < 8) {
+      result += v >> z;
+      z += bits;
+   }
+   return result;
+}
+
+static stbi_uc *bmp_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__uint8 *out;
+   unsigned int mr=0,mg=0,mb=0,ma=0, fake_a=0;
+   stbi_uc pal[256][4];
+   int psize=0,i,j,compress=0,width;
+   int bpp, flip_vertically, pad, target, offset, hsz;
+   if (get8(s) != 'B' || get8(s) != 'M') return epuc("not BMP", "Corrupt BMP");
+   get32le(s); // discard filesize
+   get16le(s); // discard reserved
+   get16le(s); // discard reserved
+   offset = get32le(s);
+   hsz = get32le(s);
+   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108) return epuc("unknown BMP", "BMP type not supported: unknown");
+   if (hsz == 12) {
+      s->img_x = get16le(s);
+      s->img_y = get16le(s);
+   } else {
+      s->img_x = get32le(s);
+      s->img_y = get32le(s);
+   }
+   if (get16le(s) != 1) return epuc("bad BMP", "bad BMP");
+   bpp = get16le(s);
+   if (bpp == 1) return epuc("monochrome", "BMP type not supported: 1-bit");
+   flip_vertically = ((int) s->img_y) > 0;
+   s->img_y = abs((int) s->img_y);
+   if (hsz == 12) {
+      if (bpp < 24)
+         psize = (offset - 14 - 24) / 3;
+   } else {
+      compress = get32le(s);
+      if (compress == 1 || compress == 2) return epuc("BMP RLE", "BMP type not supported: RLE");
+      get32le(s); // discard sizeof
+      get32le(s); // discard hres
+      get32le(s); // discard vres
+      get32le(s); // discard colorsused
+      get32le(s); // discard max important
+      if (hsz == 40 || hsz == 56) {
+         if (hsz == 56) {
+            get32le(s);
+            get32le(s);
+            get32le(s);
+            get32le(s);
+         }
+         if (bpp == 16 || bpp == 32) {
+            mr = mg = mb = 0;
+            if (compress == 0) {
+               if (bpp == 32) {
+                  mr = 0xffu << 16;
+                  mg = 0xffu <<  8;
+                  mb = 0xffu <<  0;
+                  ma = 0xffu << 24;
+                  fake_a = 1; // @TODO: check for cases like alpha value is all 0 and switch it to 255
+                  STBI_NOTUSED(fake_a);
+               } else {
+                  mr = 31u << 10;
+                  mg = 31u <<  5;
+                  mb = 31u <<  0;
+               }
+            } else if (compress == 3) {
+               mr = get32le(s);
+               mg = get32le(s);
+               mb = get32le(s);
+               // not documented, but generated by photoshop and handled by mspaint
+               if (mr == mg && mg == mb) {
+                  // ?!?!?
+                  return epuc("bad BMP", "bad BMP");
+               }
+            } else
+               return epuc("bad BMP", "bad BMP");
+         }
+      } else {
+         assert(hsz == 108);
+         mr = get32le(s);
+         mg = get32le(s);
+         mb = get32le(s);
+         ma = get32le(s);
+         get32le(s); // discard color space
+         for (i=0; i < 12; ++i)
+            get32le(s); // discard color space parameters
+      }
+      if (bpp < 16)
+         psize = (offset - 14 - hsz) >> 2;
+   }
+   s->img_n = ma ? 4 : 3;
+   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
+      target = req_comp;
+   else
+      target = s->img_n; // if they want monochrome, we'll post-convert
+   out = (stbi_uc *) malloc(target * s->img_x * s->img_y);
+   if (!out) return epuc("outofmem", "Out of memory");
+   if (bpp < 16) {
+      int z=0;
+      if (psize == 0 || psize > 256) { free(out); return epuc("invalid", "Corrupt BMP"); }
+      for (i=0; i < psize; ++i) {
+         pal[i][2] = get8u(s);
+         pal[i][1] = get8u(s);
+         pal[i][0] = get8u(s);
+         if (hsz != 12) get8(s);
+         pal[i][3] = 255;
+      }
+      skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
+      if (bpp == 4) width = (s->img_x + 1) >> 1;
+      else if (bpp == 8) width = s->img_x;
+      else { free(out); return epuc("bad bpp", "Corrupt BMP"); }
+      pad = (-width)&3;
+      for (j=0; j < (int) s->img_y; ++j) {
+         for (i=0; i < (int) s->img_x; i += 2) {
+            int v=get8(s),v2=0;
+            if (bpp == 4) {
+               v2 = v & 15;
+               v >>= 4;
+            }
+            out[z++] = pal[v][0];
+            out[z++] = pal[v][1];
+            out[z++] = pal[v][2];
+            if (target == 4) out[z++] = 255;
+            if (i+1 == (int) s->img_x) break;
+            v = (bpp == 8) ? get8(s) : v2;
+            out[z++] = pal[v][0];
+            out[z++] = pal[v][1];
+            out[z++] = pal[v][2];
+            if (target == 4) out[z++] = 255;
+         }
+         skip(s, pad);
+      }
+   } else {
+      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
+      int z = 0;
+      int easy=0;
+      skip(s, offset - 14 - hsz);
+      if (bpp == 24) width = 3 * s->img_x;
+      else if (bpp == 16) width = 2*s->img_x;
+      else /* bpp = 32 and pad = 0 */ width=0;
+      pad = (-width) & 3;
+      if (bpp == 24) {
+         easy = 1;
+      } else if (bpp == 32) {
+         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
+            easy = 2;
+      }
+      if (!easy) {
+         if (!mr || !mg || !mb) { free(out); return epuc("bad masks", "Corrupt BMP"); }
+         // right shift amt to put high bit in position #7
+         rshift = high_bit(mr)-7; rcount = bitcount(mr);
+         gshift = high_bit(mg)-7; gcount = bitcount(mg);
+         bshift = high_bit(mb)-7; bcount = bitcount(mb);
+         ashift = high_bit(ma)-7; acount = bitcount(ma);
+      }
+      for (j=0; j < (int) s->img_y; ++j) {
+         if (easy) {
+            for (i=0; i < (int) s->img_x; ++i) {
+               int a;
+               out[z+2] = get8u(s);
+               out[z+1] = get8u(s);
+               out[z+0] = get8u(s);
+               z += 3;
+               a = (easy == 2 ? get8(s) : 255);
+               if (target == 4) out[z++] = (stbi__uint8) a;
+            }
+         } else {
+            for (i=0; i < (int) s->img_x; ++i) {
+               stbi__uint32 v = (stbi__uint32) (bpp == 16 ? get16le(s) : get32le(s));
+               int a;
+               out[z++] = (stbi__uint8) shiftsigned(v & mr, rshift, rcount);
+               out[z++] = (stbi__uint8) shiftsigned(v & mg, gshift, gcount);
+               out[z++] = (stbi__uint8) shiftsigned(v & mb, bshift, bcount);
+               a = (ma ? shiftsigned(v & ma, ashift, acount) : 255);
+               if (target == 4) out[z++] = (stbi__uint8) a; 
+            }
+         }
+         skip(s, pad);
+      }
+   }
+   if (flip_vertically) {
+      stbi_uc t;
+      for (j=0; j < (int) s->img_y>>1; ++j) {
+         stbi_uc *p1 = out +      j     *s->img_x*target;
+         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
+         for (i=0; i < (int) s->img_x*target; ++i) {
+            t = p1[i], p1[i] = p2[i], p2[i] = t;
+         }
+      }
+   }
+
+   if (req_comp && req_comp != target) {
+      out = convert_format(out, target, req_comp, s->img_x, s->img_y);
+      if (out == NULL) return out; // convert_format frees input on failure
+   }
+
+   *x = s->img_x;
+   *y = s->img_y;
+   if (comp) *comp = s->img_n;
+   return out;
+}
+
+static stbi_uc *stbi_bmp_load(stbi *s,int *x, int *y, int *comp, int req_comp)
+{
+   return bmp_load(s, x,y,comp,req_comp);
+}
+
+
+// Targa Truevision - TGA
+// by Jonathan Dummer
+
+static int tga_info(stbi *s, int *x, int *y, int *comp)
+{
+    int tga_w, tga_h, tga_comp;
+    int sz;
+    get8u(s);                   // discard Offset
+    sz = get8u(s);              // color type
+    if( sz > 1 ) {
+        stbi_rewind(s);
+        return 0;      // only RGB or indexed allowed
+    }
+    sz = get8u(s);              // image type
+    // only RGB or grey allowed, +/- RLE
+    if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
+    skip(s,9);
+    tga_w = get16le(s);
+    if( tga_w < 1 ) {
+        stbi_rewind(s);
+        return 0;   // test width
+    }
+    tga_h = get16le(s);
+    if( tga_h < 1 ) {
+        stbi_rewind(s);
+        return 0;   // test height
+    }
+    sz = get8(s);               // bits per pixel
+    // only RGB or RGBA or grey allowed
+    if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
+        stbi_rewind(s);
+        return 0;
+    }
+    tga_comp = sz;
+    if (x) *x = tga_w;
+    if (y) *y = tga_h;
+    if (comp) *comp = tga_comp / 8;
+    return 1;                   // seems to have passed everything
+}
+
+int stbi_tga_info(stbi *s, int *x, int *y, int *comp)
+{
+    return tga_info(s, x, y, comp);
+}
+
+static int tga_test(stbi *s)
+{
+   int sz;
+   get8u(s);      //   discard Offset
+   sz = get8u(s);   //   color type
+   if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
+   sz = get8u(s);   //   image type
+   if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
+   get16(s);      //   discard palette start
+   get16(s);      //   discard palette length
+   get8(s);         //   discard bits per palette color entry
+   get16(s);      //   discard x origin
+   get16(s);      //   discard y origin
+   if ( get16(s) < 1 ) return 0;      //   test width
+   if ( get16(s) < 1 ) return 0;      //   test height
+   sz = get8(s);   //   bits per pixel
+   if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) ) return 0;   //   only RGB or RGBA or grey allowed
+   return 1;      //   seems to have passed everything
+}
+
+static int stbi_tga_test(stbi *s)
+{
+   int res = tga_test(s);
+   stbi_rewind(s);
+   return res;
+}
+
+static stbi_uc *tga_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   //   read in the TGA header stuff
+   int tga_offset = get8u(s);
+   int tga_indexed = get8u(s);
+   int tga_image_type = get8u(s);
+   int tga_is_RLE = 0;
+   int tga_palette_start = get16le(s);
+   int tga_palette_len = get16le(s);
+   int tga_palette_bits = get8u(s);
+   int tga_x_origin = get16le(s);
+   int tga_y_origin = get16le(s);
+   int tga_width = get16le(s);
+   int tga_height = get16le(s);
+   int tga_bits_per_pixel = get8u(s);
+   int tga_comp = tga_bits_per_pixel / 8;
+   int tga_inverted = get8u(s);
+   //   image data
+   unsigned char *tga_data;
+   unsigned char *tga_palette = NULL;
+   int i, j;
+   unsigned char raw_data[4];
+   int RLE_count = 0;
+   int RLE_repeating = 0;
+   int read_next_pixel = 1;
+
+   //   do a tiny bit of precessing
+   if ( tga_image_type >= 8 )
+   {
+      tga_image_type -= 8;
+      tga_is_RLE = 1;
+   }
+   /* int tga_alpha_bits = tga_inverted & 15; */
+   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
+
+   //   error check
+   if ( //(tga_indexed) ||
+      (tga_width < 1) || (tga_height < 1) ||
+      (tga_image_type < 1) || (tga_image_type > 3) ||
+      ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
+      (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
+      )
+   {
+      return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
+   }
+
+   //   If I'm paletted, then I'll use the number of bits from the palette
+   if ( tga_indexed )
+   {
+      tga_comp = tga_palette_bits / 8;
+   }
+
+   //   tga info
+   *x = tga_width;
+   *y = tga_height;
+   if (comp) *comp = tga_comp;
+
+   tga_data = (unsigned char*)malloc( tga_width * tga_height * req_comp );
+   if (!tga_data) return epuc("outofmem", "Out of memory");
+
+   //   skip to the data's starting position (offset usually = 0)
+   skip(s, tga_offset );
+
+   if ( !tga_indexed && !tga_is_RLE) {
+      for (i=0; i < tga_height; ++i) {
+         int y = tga_inverted ? tga_height -i - 1 : i;
+         stbi__uint8 *tga_row = tga_data + y*tga_width*tga_comp;
+         getn(s, tga_row, tga_width * tga_comp);
+      }
+   } else  {
+      //   do I need to load a palette?
+      if ( tga_indexed)
+      {
+         //   any data to skip? (offset usually = 0)
+         skip(s, tga_palette_start );
+         //   load the palette
+         tga_palette = (unsigned char*)malloc( tga_palette_len * tga_palette_bits / 8 );
+         if (!tga_palette) {
+            free(tga_data);
+            return epuc("outofmem", "Out of memory");
+         }
+         if (!getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
+            free(tga_data);
+            free(tga_palette);
+            return epuc("bad palette", "Corrupt TGA");
+         }
+      }
+      //   load the data
+      for (i=0; i < tga_width * tga_height; ++i)
+      {
+         //   if I'm in RLE mode, do I need to get a RLE chunk?
+         if ( tga_is_RLE )
+         {
+            if ( RLE_count == 0 )
+            {
+               //   yep, get the next byte as a RLE command
+               int RLE_cmd = get8u(s);
+               RLE_count = 1 + (RLE_cmd & 127);
+               RLE_repeating = RLE_cmd >> 7;
+               read_next_pixel = 1;
+            } else if ( !RLE_repeating )
+            {
+               read_next_pixel = 1;
+            }
+         } else
+         {
+            read_next_pixel = 1;
+         }
+         //   OK, if I need to read a pixel, do it now
+         if ( read_next_pixel )
+         {
+            //   load however much data we did have
+            if ( tga_indexed )
+            {
+               //   read in 1 byte, then perform the lookup
+               int pal_idx = get8u(s);
+               if ( pal_idx >= tga_palette_len )
+               {
+                  //   invalid index
+                  pal_idx = 0;
+               }
+               pal_idx *= tga_bits_per_pixel / 8;
+               for (j = 0; j*8 < tga_bits_per_pixel; ++j)
+               {
+                  raw_data[j] = tga_palette[pal_idx+j];
+               }
+            } else
+            {
+               //   read in the data raw
+               for (j = 0; j*8 < tga_bits_per_pixel; ++j)
+               {
+                  raw_data[j] = get8u(s);
+               }
+            }
+            //   clear the reading flag for the next pixel
+            read_next_pixel = 0;
+         } // end of reading a pixel
+
+         // copy data
+         for (j = 0; j < tga_comp; ++j)
+           tga_data[i*tga_comp+j] = raw_data[j];
+
+         //   in case we're in RLE mode, keep counting down
+         --RLE_count;
+      }
+      //   do I need to invert the image?
+      if ( tga_inverted )
+      {
+         for (j = 0; j*2 < tga_height; ++j)
+         {
+            int index1 = j * tga_width * req_comp;
+            int index2 = (tga_height - 1 - j) * tga_width * req_comp;
+            for (i = tga_width * req_comp; i > 0; --i)
+            {
+               unsigned char temp = tga_data[index1];
+               tga_data[index1] = tga_data[index2];
+               tga_data[index2] = temp;
+               ++index1;
+               ++index2;
+            }
+         }
+      }
+      //   clear my palette, if I had one
+      if ( tga_palette != NULL )
+      {
+         free( tga_palette );
+      }
+   }
+
+   // swap RGB
+   if (tga_comp >= 3)
+   {
+      unsigned char* tga_pixel = tga_data;
+      for (i=0; i < tga_width * tga_height; ++i)
+      {
+         unsigned char temp = tga_pixel[0];
+         tga_pixel[0] = tga_pixel[2];
+         tga_pixel[2] = temp;
+         tga_pixel += tga_comp;
+      }
+   }
+
+   // convert to target component count
+   if (req_comp && req_comp != tga_comp)
+      tga_data = convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
+
+   //   the things I do to get rid of an error message, and yet keep
+   //   Microsoft's C compilers happy... [8^(
+   tga_palette_start = tga_palette_len = tga_palette_bits =
+         tga_x_origin = tga_y_origin = 0;
+   //   OK, done
+   return tga_data;
+}
+
+static stbi_uc *stbi_tga_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   return tga_load(s,x,y,comp,req_comp);
+}
+
+
+// *************************************************************************************************
+// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
+
+static int psd_test(stbi *s)
+{
+   if (get32(s) != 0x38425053) return 0;   // "8BPS"
+   else return 1;
+}
+
+static int stbi_psd_test(stbi *s)
+{
+   int r = psd_test(s);
+   stbi_rewind(s);
+   return r;
+}
+
+static stbi_uc *psd_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   int   pixelCount;
+   int channelCount, compression;
+   int channel, i, count, len;
+   int w,h;
+   stbi__uint8 *out;
+
+   // Check identifier
+   if (get32(s) != 0x38425053)   // "8BPS"
+      return epuc("not PSD", "Corrupt PSD image");
+
+   // Check file type version.
+   if (get16(s) != 1)
+      return epuc("wrong version", "Unsupported version of PSD image");
+
+   // Skip 6 reserved bytes.
+   skip(s, 6 );
+
+   // Read the number of channels (R, G, B, A, etc).
+   channelCount = get16(s);
+   if (channelCount < 0 || channelCount > 16)
+      return epuc("wrong channel count", "Unsupported number of channels in PSD image");
+
+   // Read the rows and columns of the image.
+   h = get32(s);
+   w = get32(s);
+   
+   // Make sure the depth is 8 bits.
+   if (get16(s) != 8)
+      return epuc("unsupported bit depth", "PSD bit depth is not 8 bit");
+
+   // Make sure the color mode is RGB.
+   // Valid options are:
+   //   0: Bitmap
+   //   1: Grayscale
+   //   2: Indexed color
+   //   3: RGB color
+   //   4: CMYK color
+   //   7: Multichannel
+   //   8: Duotone
+   //   9: Lab color
+   if (get16(s) != 3)
+      return epuc("wrong color format", "PSD is not in RGB color format");
+
+   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
+   skip(s,get32(s) );
+
+   // Skip the image resources.  (resolution, pen tool paths, etc)
+   skip(s, get32(s) );
+
+   // Skip the reserved data.
+   skip(s, get32(s) );
+
+   // Find out if the data is compressed.
+   // Known values:
+   //   0: no compression
+   //   1: RLE compressed
+   compression = get16(s);
+   if (compression > 1)
+      return epuc("bad compression", "PSD has an unknown compression format");
+
+   // Create the destination image.
+   out = (stbi_uc *) malloc(4 * w*h);
+   if (!out) return epuc("outofmem", "Out of memory");
+   pixelCount = w*h;
+
+   // Initialize the data to zero.
+   //memset( out, 0, pixelCount * 4 );
+   
+   // Finally, the image data.
+   if (compression) {
+      // RLE as used by .PSD and .TIFF
+      // Loop until you get the number of unpacked bytes you are expecting:
+      //     Read the next source byte into n.
+      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
+      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
+      //     Else if n is 128, noop.
+      // Endloop
+
+      // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
+      // which we're going to just skip.
+      skip(s, h * channelCount * 2 );
+
+      // Read the RLE data by channel.
+      for (channel = 0; channel < 4; channel++) {
+         stbi__uint8 *p;
+         
+         p = out+channel;
+         if (channel >= channelCount) {
+            // Fill this channel with default data.
+            for (i = 0; i < pixelCount; i++) *p = (channel == 3 ? 255 : 0), p += 4;
+         } else {
+            // Read the RLE data.
+            count = 0;
+            while (count < pixelCount) {
+               len = get8(s);
+               if (len == 128) {
+                  // No-op.
+               } else if (len < 128) {
+                  // Copy next len+1 bytes literally.
+                  len++;
+                  count += len;
+                  while (len) {
+                     *p = get8u(s);
+                     p += 4;
+                     len--;
+                  }
+               } else if (len > 128) {
+                  stbi__uint8   val;
+                  // Next -len+1 bytes in the dest are replicated from next source byte.
+                  // (Interpret len as a negative 8-bit int.)
+                  len ^= 0x0FF;
+                  len += 2;
+                  val = get8u(s);
+                  count += len;
+                  while (len) {
+                     *p = val;
+                     p += 4;
+                     len--;
+                  }
+               }
+            }
+         }
+      }
+      
+   } else {
+      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
+      // where each channel consists of an 8-bit value for each pixel in the image.
+      
+      // Read the data by channel.
+      for (channel = 0; channel < 4; channel++) {
+         stbi__uint8 *p;
+         
+         p = out + channel;
+         if (channel > channelCount) {
+            // Fill this channel with default data.
+            for (i = 0; i < pixelCount; i++) *p = channel == 3 ? 255 : 0, p += 4;
+         } else {
+            // Read the data.
+            for (i = 0; i < pixelCount; i++)
+               *p = get8u(s), p += 4;
+         }
+      }
+   }
+
+   if (req_comp && req_comp != 4) {
+      out = convert_format(out, 4, req_comp, w, h);
+      if (out == NULL) return out; // convert_format frees input on failure
+   }
+
+   if (comp) *comp = channelCount;
+   *y = h;
+   *x = w;
+   
+   return out;
+}
+
+static stbi_uc *stbi_psd_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   return psd_load(s,x,y,comp,req_comp);
+}
+
+// *************************************************************************************************
+// Softimage PIC loader
+// by Tom Seddon
+//
+// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
+// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
+
+static int pic_is4(stbi *s,const char *str)
+{
+   int i;
+   for (i=0; i<4; ++i)
+      if (get8(s) != (stbi_uc)str[i])
+         return 0;
+
+   return 1;
+}
+
+static int pic_test(stbi *s)
+{
+   int i;
+
+   if (!pic_is4(s,"\x53\x80\xF6\x34"))
+      return 0;
+
+   for(i=0;i<84;++i)
+      get8(s);
+
+   if (!pic_is4(s,"PICT"))
+      return 0;
+
+   return 1;
+}
+
+typedef struct
+{
+   stbi_uc size,type,channel;
+} pic_packet_t;
+
+static stbi_uc *pic_readval(stbi *s, int channel, stbi_uc *dest)
+{
+   int mask=0x80, i;
+
+   for (i=0; i<4; ++i, mask>>=1) {
+      if (channel & mask) {
+         if (at_eof(s)) return epuc("bad file","PIC file too short");
+         dest[i]=get8u(s);
+      }
+   }
+
+   return dest;
+}
+
+static void pic_copyval(int channel,stbi_uc *dest,const stbi_uc *src)
+{
+   int mask=0x80,i;
+
+   for (i=0;i<4; ++i, mask>>=1)
+      if (channel&mask)
+         dest[i]=src[i];
+}
+
+static stbi_uc *pic_load2(stbi *s,int width,int height,int *comp, stbi_uc *result)
+{
+   int act_comp=0,num_packets=0,y,chained;
+   pic_packet_t packets[10];
+
+   // this will (should...) cater for even some bizarre stuff like having data
+    // for the same channel in multiple packets.
+   do {
+      pic_packet_t *packet;
+
+      if (num_packets==sizeof(packets)/sizeof(packets[0]))
+         return epuc("bad format","too many packets");
+
+      packet = &packets[num_packets++];
+
+      chained = get8(s);
+      packet->size    = get8u(s);
+      packet->type    = get8u(s);
+      packet->channel = get8u(s);
+
+      act_comp |= packet->channel;
+
+      if (at_eof(s))          return epuc("bad file","file too short (reading packets)");
+      if (packet->size != 8)  return epuc("bad format","packet isn't 8bpp");
+   } while (chained);
+
+   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
+
+   for(y=0; y<height; ++y) {
+      int packet_idx;
+
+      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
+         pic_packet_t *packet = &packets[packet_idx];
+         stbi_uc *dest = result+y*width*4;
+
+         switch (packet->type) {
+            default:
+               return epuc("bad format","packet has bad compression type");
+
+            case 0: {//uncompressed
+               int x;
+
+               for(x=0;x<width;++x, dest+=4)
+                  if (!pic_readval(s,packet->channel,dest))
+                     return 0;
+               break;
+            }
+
+            case 1://Pure RLE
+               {
+                  int left=width, i;
+
+                  while (left>0) {
+                     stbi_uc count,value[4];
+
+                     count=get8u(s);
+                     if (at_eof(s))   return epuc("bad file","file too short (pure read count)");
+
+                     if (count > left)
+                        count = (stbi__uint8) left;
+
+                     if (!pic_readval(s,packet->channel,value))  return 0;
+
+                     for(i=0; i<count; ++i,dest+=4)
+                        pic_copyval(packet->channel,dest,value);
+                     left -= count;
+                  }
+               }
+               break;
+
+            case 2: {//Mixed RLE
+               int left=width;
+               while (left>0) {
+                  int count = get8(s), i;
+                  if (at_eof(s))  return epuc("bad file","file too short (mixed read count)");
+
+                  if (count >= 128) { // Repeated
+                     stbi_uc value[4];
+                     int i;
+
+                     if (count==128)
+                        count = get16(s);
+                     else
+                        count -= 127;
+                     if (count > left)
+                        return epuc("bad file","scanline overrun");
+
+                     if (!pic_readval(s,packet->channel,value))
+                        return 0;
+
+                     for(i=0;i<count;++i, dest += 4)
+                        pic_copyval(packet->channel,dest,value);
+                  } else { // Raw
+                     ++count;
+                     if (count>left) return epuc("bad file","scanline overrun");
+
+                     for(i=0;i<count;++i, dest+=4)
+                        if (!pic_readval(s,packet->channel,dest))
+                           return 0;
+                  }
+                  left-=count;
+               }
+               break;
+            }
+         }
+      }
+   }
+
+   return result;
+}
+
+static stbi_uc *pic_load(stbi *s,int *px,int *py,int *comp,int req_comp)
+{
+   stbi_uc *result;
+   int i, x,y;
+
+   for (i=0; i<92; ++i)
+      get8(s);
+
+   x = get16(s);
+   y = get16(s);
+   if (at_eof(s))  return epuc("bad file","file too short (pic header)");
+   if ((1 << 28) / x < y) return epuc("too large", "Image too large to decode");
+
+   get32(s); //skip `ratio'
+   get16(s); //skip `fields'
+   get16(s); //skip `pad'
+
+   // intermediate buffer is RGBA
+   result = (stbi_uc *) malloc(x*y*4);
+   memset(result, 0xff, x*y*4);
+
+   if (!pic_load2(s,x,y,comp, result)) {
+      free(result);
+      result=0;
+   }
+   *px = x;
+   *py = y;
+   if (req_comp == 0) req_comp = *comp;
+   result=convert_format(result,4,req_comp,x,y);
+
+   return result;
+}
+
+static int stbi_pic_test(stbi *s)
+{
+   int r = pic_test(s);
+   stbi_rewind(s);
+   return r;
+}
+
+static stbi_uc *stbi_pic_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   return pic_load(s,x,y,comp,req_comp);
+}
+
+// *************************************************************************************************
+// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
+typedef struct stbi_gif_lzw_struct {
+   stbi__int16 prefix;
+   stbi__uint8 first;
+   stbi__uint8 suffix;
+} stbi_gif_lzw;
+
+typedef struct stbi_gif_struct
+{
+   int w,h;
+   stbi_uc *out;                 // output buffer (always 4 components)
+   int flags, bgindex, ratio, transparent, eflags;
+   stbi__uint8  pal[256][4];
+   stbi__uint8 lpal[256][4];
+   stbi_gif_lzw codes[4096];
+   stbi__uint8 *color_table;
+   int parse, step;
+   int lflags;
+   int start_x, start_y;
+   int max_x, max_y;
+   int cur_x, cur_y;
+   int line_size;
+} stbi_gif;
+
+static int gif_test(stbi *s)
+{
+   int sz;
+   if (get8(s) != 'G' || get8(s) != 'I' || get8(s) != 'F' || get8(s) != '8') return 0;
+   sz = get8(s);
+   if (sz != '9' && sz != '7') return 0;
+   if (get8(s) != 'a') return 0;
+   return 1;
+}
+
+static int stbi_gif_test(stbi *s)
+{
+   int r = gif_test(s);
+   stbi_rewind(s);
+   return r;
+}
+
+static void stbi_gif_parse_colortable(stbi *s, stbi__uint8 pal[256][4], int num_entries, int transp)
+{
+   int i;
+   for (i=0; i < num_entries; ++i) {
+      pal[i][2] = get8u(s);
+      pal[i][1] = get8u(s);
+      pal[i][0] = get8u(s);
+      pal[i][3] = transp ? 0 : 255;
+   }   
+}
+
+static int stbi_gif_header(stbi *s, stbi_gif *g, int *comp, int is_info)
+{
+   stbi__uint8 version;
+   if (get8(s) != 'G' || get8(s) != 'I' || get8(s) != 'F' || get8(s) != '8')
+      return e("not GIF", "Corrupt GIF");
+
+   version = get8u(s);
+   if (version != '7' && version != '9')    return e("not GIF", "Corrupt GIF");
+   if (get8(s) != 'a')                      return e("not GIF", "Corrupt GIF");
+ 
+   failure_reason = "";
+   g->w = get16le(s);
+   g->h = get16le(s);
+   g->flags = get8(s);
+   g->bgindex = get8(s);
+   g->ratio = get8(s);
+   g->transparent = -1;
+
+   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
+
+   if (is_info) return 1;
+
+   if (g->flags & 0x80)
+      stbi_gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
+
+   return 1;
+}
+
+static int stbi_gif_info_raw(stbi *s, int *x, int *y, int *comp)
+{
+   stbi_gif g;   
+   if (!stbi_gif_header(s, &g, comp, 1)) {
+      stbi_rewind( s );
+      return 0;
+   }
+   if (x) *x = g.w;
+   if (y) *y = g.h;
+   return 1;
+}
+
+static void stbi_out_gif_code(stbi_gif *g, stbi__uint16 code)
+{
+   stbi__uint8 *p, *c;
+
+   // recurse to decode the prefixes, since the linked-list is backwards,
+   // and working backwards through an interleaved image would be nasty
+   if (g->codes[code].prefix >= 0)
+      stbi_out_gif_code(g, g->codes[code].prefix);
+
+   if (g->cur_y >= g->max_y) return;
+  
+   p = &g->out[g->cur_x + g->cur_y];
+   c = &g->color_table[g->codes[code].suffix * 4];
+
+   if (c[3] >= 128) {
+      p[0] = c[2];
+      p[1] = c[1];
+      p[2] = c[0];
+      p[3] = c[3];
+   }
+   g->cur_x += 4;
+
+   if (g->cur_x >= g->max_x) {
+      g->cur_x = g->start_x;
+      g->cur_y += g->step;
+
+      while (g->cur_y >= g->max_y && g->parse > 0) {
+         g->step = (1 << g->parse) * g->line_size;
+         g->cur_y = g->start_y + (g->step >> 1);
+         --g->parse;
+      }
+   }
+}
+
+static stbi__uint8 *stbi_process_gif_raster(stbi *s, stbi_gif *g)
+{
+   stbi__uint8 lzw_cs;
+   stbi__int32 len, code;
+   stbi__uint32 first;
+   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
+   stbi_gif_lzw *p;
+
+   lzw_cs = get8u(s);
+   clear = 1 << lzw_cs;
+   first = 1;
+   codesize = lzw_cs + 1;
+   codemask = (1 << codesize) - 1;
+   bits = 0;
+   valid_bits = 0;
+   for (code = 0; code < clear; code++) {
+      g->codes[code].prefix = -1;
+      g->codes[code].first = (stbi__uint8) code;
+      g->codes[code].suffix = (stbi__uint8) code;
+   }
+
+   // support no starting clear code
+   avail = clear+2;
+   oldcode = -1;
+
+   len = 0;
+   for(;;) {
+      if (valid_bits < codesize) {
+         if (len == 0) {
+            len = get8(s); // start new block
+            if (len == 0) 
+               return g->out;
+         }
+         --len;
+         bits |= (stbi__int32) get8(s) << valid_bits;
+         valid_bits += 8;
+      } else {
+         stbi__int32 code = bits & codemask;
+         bits >>= codesize;
+         valid_bits -= codesize;
+         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
+         if (code == clear) {  // clear code
+            codesize = lzw_cs + 1;
+            codemask = (1 << codesize) - 1;
+            avail = clear + 2;
+            oldcode = -1;
+            first = 0;
+         } else if (code == clear + 1) { // end of stream code
+            skip(s, len);
+            while ((len = get8(s)) > 0)
+               skip(s,len);
+            return g->out;
+         } else if (code <= avail) {
+            if (first) return epuc("no clear code", "Corrupt GIF");
+
+            if (oldcode >= 0) {
+               p = &g->codes[avail++];
+               if (avail > 4096)        return epuc("too many codes", "Corrupt GIF");
+               p->prefix = (stbi__int16) oldcode;
+               p->first = g->codes[oldcode].first;
+               p->suffix = (code == avail) ? p->first : g->codes[code].first;
+            } else if (code == avail)
+               return epuc("illegal code in raster", "Corrupt GIF");
+
+            stbi_out_gif_code(g, (stbi__uint16) code);
+
+            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
+               codesize++;
+               codemask = (1 << codesize) - 1;
+            }
+
+            oldcode = code;
+         } else {
+            return epuc("illegal code in raster", "Corrupt GIF");
+         }
+      } 
+   }
+}
+
+static void stbi_fill_gif_background(stbi_gif *g)
+{
+   int i;
+   stbi__uint8 *c = g->pal[g->bgindex];
+   // @OPTIMIZE: write a dword at a time
+   for (i = 0; i < g->w * g->h * 4; i += 4) {
+      stbi__uint8 *p  = &g->out[i];
+      p[0] = c[2];
+      p[1] = c[1];
+      p[2] = c[0];
+      p[3] = c[3];
+   }
+}
+
+// this function is designed to support animated gifs, although stb_image doesn't support it
+static stbi__uint8 *stbi_gif_load_next(stbi *s, stbi_gif *g, int *comp, int req_comp)
+{
+   int i;
+   stbi__uint8 *old_out = 0;
+
+   if (g->out == 0) {
+      if (!stbi_gif_header(s, g, comp,0))     return 0; // failure_reason set by stbi_gif_header
+      g->out = (stbi__uint8 *) malloc(4 * g->w * g->h);
+      if (g->out == 0)                      return epuc("outofmem", "Out of memory");
+      stbi_fill_gif_background(g);
+   } else {
+      // animated-gif-only path
+      if (((g->eflags & 0x1C) >> 2) == 3) {
+         old_out = g->out;
+         g->out = (stbi__uint8 *) malloc(4 * g->w * g->h);
+         if (g->out == 0)                   return epuc("outofmem", "Out of memory");
+         memcpy(g->out, old_out, g->w*g->h*4);
+      }
+   }
+    
+   for (;;) {
+      switch (get8(s)) {
+         case 0x2C: /* Image Descriptor */
+         {
+            stbi__int32 x, y, w, h;
+            stbi__uint8 *o;
+
+            x = get16le(s);
+            y = get16le(s);
+            w = get16le(s);
+            h = get16le(s);
+            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
+               return epuc("bad Image Descriptor", "Corrupt GIF");
+
+            g->line_size = g->w * 4;
+            g->start_x = x * 4;
+            g->start_y = y * g->line_size;
+            g->max_x   = g->start_x + w * 4;
+            g->max_y   = g->start_y + h * g->line_size;
+            g->cur_x   = g->start_x;
+            g->cur_y   = g->start_y;
+
+            g->lflags = get8(s);
+
+            if (g->lflags & 0x40) {
+               g->step = 8 * g->line_size; // first interlaced spacing
+               g->parse = 3;
+            } else {
+               g->step = g->line_size;
+               g->parse = 0;
+            }
+
+            if (g->lflags & 0x80) {
+               stbi_gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
+               g->color_table = (stbi__uint8 *) g->lpal;       
+            } else if (g->flags & 0x80) {
+               for (i=0; i < 256; ++i)  // @OPTIMIZE: reset only the previous transparent
+                  g->pal[i][3] = 255; 
+               if (g->transparent >= 0 && (g->eflags & 0x01))
+                  g->pal[g->transparent][3] = 0;
+               g->color_table = (stbi__uint8 *) g->pal;
+            } else
+               return epuc("missing color table", "Corrupt GIF");
+   
+            o = stbi_process_gif_raster(s, g);
+            if (o == NULL) return NULL;
+
+            if (req_comp && req_comp != 4)
+               o = convert_format(o, 4, req_comp, g->w, g->h);
+            return o;
+         }
+
+         case 0x21: // Comment Extension.
+         {
+            int len;
+            if (get8(s) == 0xF9) { // Graphic Control Extension.
+               len = get8(s);
+               if (len == 4) {
+                  g->eflags = get8(s);
+                  get16le(s); // delay
+                  g->transparent = get8(s);
+               } else {
+                  skip(s, len);
+                  break;
+               }
+            }
+            while ((len = get8(s)) != 0)
+               skip(s, len);
+            break;
+         }
+
+         case 0x3B: // gif stream termination code
+            return (stbi__uint8 *) 1;
+
+         default:
+            return epuc("unknown code", "Corrupt GIF");
+      }
+   }
+}
+
+static stbi_uc *stbi_gif_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__uint8 *u = 0;
+   stbi_gif g={0};
+
+   u = stbi_gif_load_next(s, &g, comp, req_comp);
+   if (u == (void *) 1) u = 0;  // end of animated gif marker
+   if (u) {
+      *x = g.w;
+      *y = g.h;
+   }
+
+   return u;
+}
+
+static int stbi_gif_info(stbi *s, int *x, int *y, int *comp)
+{
+   return stbi_gif_info_raw(s,x,y,comp);
+}
+
+
+// *************************************************************************************************
+// Radiance RGBE HDR loader
+// originally by Nicolas Schulz
+#ifndef STBI_NO_HDR
+static int hdr_test(stbi *s)
+{
+   const char *signature = "#?RADIANCE\n";
+   int i;
+   for (i=0; signature[i]; ++i)
+      if (get8(s) != signature[i])
+         return 0;
+   return 1;
+}
+
+static int stbi_hdr_test(stbi* s)
+{
+   int r = hdr_test(s);
+   stbi_rewind(s);
+   return r;
+}
+
+#define HDR_BUFLEN  1024
+static char *hdr_gettoken(stbi *z, char *buffer)
+{
+   int len=0;
+   char c = '\0';
+
+   c = (char) get8(z);
+
+   while (!at_eof(z) && c != '\n') {
+      buffer[len++] = c;
+      if (len == HDR_BUFLEN-1) {
+         // flush to end of line
+         while (!at_eof(z) && get8(z) != '\n')
+            ;
+         break;
+      }
+      c = (char) get8(z);
+   }
+
+   buffer[len] = 0;
+   return buffer;
+}
+
+static void hdr_convert(float *output, stbi_uc *input, int req_comp)
+{
+   if ( input[3] != 0 ) {
+      float f1;
+      // Exponent
+      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
+      if (req_comp <= 2)
+         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
+      else {
+         output[0] = input[0] * f1;
+         output[1] = input[1] * f1;
+         output[2] = input[2] * f1;
+      }
+      if (req_comp == 2) output[1] = 1;
+      if (req_comp == 4) output[3] = 1;
+   } else {
+      switch (req_comp) {
+         case 4: output[3] = 1; /* fallthrough */
+         case 3: output[0] = output[1] = output[2] = 0;
+                 break;
+         case 2: output[1] = 1; /* fallthrough */
+         case 1: output[0] = 0;
+                 break;
+      }
+   }
+}
+
+static float *hdr_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   char buffer[HDR_BUFLEN];
+   char *token;
+   int valid = 0;
+   int width, height;
+   stbi_uc *scanline;
+   float *hdr_data;
+   int len;
+   unsigned char count, value;
+   int i, j, k, c1,c2, z;
+
+
+   // Check identifier
+   if (strcmp(hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
+      return epf("not HDR", "Corrupt HDR image");
+   
+   // Parse header
+   for(;;) {
+      token = hdr_gettoken(s,buffer);
+      if (token[0] == 0) break;
+      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
+   }
+
+   if (!valid)    return epf("unsupported format", "Unsupported HDR format");
+
+   // Parse width and height
+   // can't use sscanf() if we're not using stdio!
+   token = hdr_gettoken(s,buffer);
+   if (strncmp(token, "-Y ", 3))  return epf("unsupported data layout", "Unsupported HDR format");
+   token += 3;
+   height = (int) strtol(token, &token, 10);
+   while (*token == ' ') ++token;
+   if (strncmp(token, "+X ", 3))  return epf("unsupported data layout", "Unsupported HDR format");
+   token += 3;
+   width = (int) strtol(token, NULL, 10);
+
+   *x = width;
+   *y = height;
+
+   *comp = 3;
+   if (req_comp == 0) req_comp = 3;
+
+   // Read data
+   hdr_data = (float *) malloc(height * width * req_comp * sizeof(float));
+
+   // Load image data
+   // image data is stored as some number of sca
+   if ( width < 8 || width >= 32768) {
+      // Read flat data
+      for (j=0; j < height; ++j) {
+         for (i=0; i < width; ++i) {
+            stbi_uc rgbe[4];
+           main_decode_loop:
+            getn(s, rgbe, 4);
+            hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
+         }
+      }
+   } else {
+      // Read RLE-encoded data
+      scanline = NULL;
+
+      for (j = 0; j < height; ++j) {
+         c1 = get8(s);
+         c2 = get8(s);
+         len = get8(s);
+         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
+            // not run-length encoded, so we have to actually use THIS data as a decoded
+            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
+            stbi__uint8 rgbe[4];
+            rgbe[0] = (stbi__uint8) c1;
+            rgbe[1] = (stbi__uint8) c2;
+            rgbe[2] = (stbi__uint8) len;
+            rgbe[3] = (stbi__uint8) get8u(s);
+            hdr_convert(hdr_data, rgbe, req_comp);
+            i = 1;
+            j = 0;
+            free(scanline);
+            goto main_decode_loop; // yes, this makes no sense
+         }
+         len <<= 8;
+         len |= get8(s);
+         if (len != width) { free(hdr_data); free(scanline); return epf("invalid decoded scanline length", "corrupt HDR"); }
+         if (scanline == NULL) scanline = (stbi_uc *) malloc(width * 4);
+            
+         for (k = 0; k < 4; ++k) {
+            i = 0;
+            while (i < width) {
+               count = get8u(s);
+               if (count > 128) {
+                  // Run
+                  value = get8u(s);
+                  count -= 128;
+                  for (z = 0; z < count; ++z)
+                     scanline[i++ * 4 + k] = value;
+               } else {
+                  // Dump
+                  for (z = 0; z < count; ++z)
+                     scanline[i++ * 4 + k] = get8u(s);
+               }
+            }
+         }
+         for (i=0; i < width; ++i)
+            hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
+      }
+      free(scanline);
+   }
+
+   return hdr_data;
+}
+
+static float *stbi_hdr_load(stbi *s, int *x, int *y, int *comp, int req_comp)
+{
+   return hdr_load(s,x,y,comp,req_comp);
+}
+
+static int stbi_hdr_info(stbi *s, int *x, int *y, int *comp)
+{
+   char buffer[HDR_BUFLEN];
+   char *token;
+   int valid = 0;
+
+   if (strcmp(hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
+       stbi_rewind( s );
+       return 0;
+   }
+
+   for(;;) {
+      token = hdr_gettoken(s,buffer);
+      if (token[0] == 0) break;
+      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
+   }
+
+   if (!valid) {
+       stbi_rewind( s );
+       return 0;
+   }
+   token = hdr_gettoken(s,buffer);
+   if (strncmp(token, "-Y ", 3)) {
+       stbi_rewind( s );
+       return 0;
+   }
+   token += 3;
+   *y = (int) strtol(token, &token, 10);
+   while (*token == ' ') ++token;
+   if (strncmp(token, "+X ", 3)) {
+       stbi_rewind( s );
+       return 0;
+   }
+   token += 3;
+   *x = (int) strtol(token, NULL, 10);
+   *comp = 3;
+   return 1;
+}
+#endif // STBI_NO_HDR
+
+static int stbi_bmp_info(stbi *s, int *x, int *y, int *comp)
+{
+   int hsz;
+   if (get8(s) != 'B' || get8(s) != 'M') {
+       stbi_rewind( s );
+       return 0;
+   }
+   skip(s,12);
+   hsz = get32le(s);
+   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108) {
+       stbi_rewind( s );
+       return 0;
+   }
+   if (hsz == 12) {
+      *x = get16le(s);
+      *y = get16le(s);
+   } else {
+      *x = get32le(s);
+      *y = get32le(s);
+   }
+   if (get16le(s) != 1) {
+       stbi_rewind( s );
+       return 0;
+   }
+   *comp = get16le(s) / 8;
+   return 1;
+}
+
+static int stbi_psd_info(stbi *s, int *x, int *y, int *comp)
+{
+   int channelCount;
+   if (get32(s) != 0x38425053) {
+       stbi_rewind( s );
+       return 0;
+   }
+   if (get16(s) != 1) {
+       stbi_rewind( s );
+       return 0;
+   }
+   skip(s, 6);
+   channelCount = get16(s);
+   if (channelCount < 0 || channelCount > 16) {
+       stbi_rewind( s );
+       return 0;
+   }
+   *y = get32(s);
+   *x = get32(s);
+   if (get16(s) != 8) {
+       stbi_rewind( s );
+       return 0;
+   }
+   if (get16(s) != 3) {
+       stbi_rewind( s );
+       return 0;
+   }
+   *comp = 4;
+   return 1;
+}
+
+static int stbi_pic_info(stbi *s, int *x, int *y, int *comp)
+{
+   int act_comp=0,num_packets=0,chained;
+   pic_packet_t packets[10];
+
+   skip(s, 92);
+
+   *x = get16(s);
+   *y = get16(s);
+   if (at_eof(s))  return 0;
+   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
+       stbi_rewind( s );
+       return 0;
+   }
+
+   skip(s, 8);
+
+   do {
+      pic_packet_t *packet;
+
+      if (num_packets==sizeof(packets)/sizeof(packets[0]))
+         return 0;
+
+      packet = &packets[num_packets++];
+      chained = get8(s);
+      packet->size    = get8u(s);
+      packet->type    = get8u(s);
+      packet->channel = get8u(s);
+      act_comp |= packet->channel;
+
+      if (at_eof(s)) {
+          stbi_rewind( s );
+          return 0;
+      }
+      if (packet->size != 8) {
+          stbi_rewind( s );
+          return 0;
+      }
+   } while (chained);
+
+   *comp = (act_comp & 0x10 ? 4 : 3);
+
+   return 1;
+}
+
+static int stbi_info_main(stbi *s, int *x, int *y, int *comp)
+{
+   if (stbi_jpeg_info(s, x, y, comp))
+       return 1;
+   if (stbi_png_info(s, x, y, comp))
+       return 1;
+   if (stbi_gif_info(s, x, y, comp))
+       return 1;
+   if (stbi_bmp_info(s, x, y, comp))
+       return 1;
+   if (stbi_psd_info(s, x, y, comp))
+       return 1;
+   if (stbi_pic_info(s, x, y, comp))
+       return 1;
+   #ifndef STBI_NO_HDR
+   if (stbi_hdr_info(s, x, y, comp))
+       return 1;
+   #endif
+   // test tga last because it's a crappy test!
+   if (stbi_tga_info(s, x, y, comp))
+       return 1;
+   return e("unknown image type", "Image not of any known type, or corrupt");
+}
+
+#ifndef STBI_NO_STDIO
+int stbi_info(char const *filename, int *x, int *y, int *comp)
+{
+    FILE *f = fopen(filename, "rb");
+    int result;
+    if (!f) return e("can't fopen", "Unable to open file");
+    result = stbi_info_from_file(f, x, y, comp);
+    fclose(f);
+    return result;
+}
+
+int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
+{
+   int r;
+   stbi s;
+   long pos = ftell(f);
+   start_file(&s, f);
+   r = stbi_info_main(&s,x,y,comp);
+   fseek(f,pos,SEEK_SET);
+   return r;
+}
+#endif // !STBI_NO_STDIO
+
+int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
+{
+   stbi s;
+   start_mem(&s,buffer,len);
+   return stbi_info_main(&s,x,y,comp);
+}
+
+int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
+{
+   stbi s;
+   start_callbacks(&s, (stbi_io_callbacks *) c, user);
+   return stbi_info_main(&s,x,y,comp);
+}
+
+#endif // STBI_HEADER_FILE_ONLY
+
+#if !defined(STBI_NO_STDIO) && defined(_MSC_VER) && _MSC_VER >= 1400
+#pragma warning(pop)
+#endif
+
+
+/*
+   revision history:
+      1.35 (2014-05-27)
+             various warnings
+             fix broken STBI_SIMD path
+             fix bug where stbi_load_from_file no longer left file pointer in correct place
+             fix broken non-easy path for 32-bit BMP (possibly never used)
+             TGA optimization by Arseny Kapoulkine
+      1.34 (unknown)
+             use STBI_NOTUSED in resample_row_generic(), fix one more leak in tga failure case
+      1.33 (2011-07-14)
+             make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
+      1.32 (2011-07-13)
+             support for "info" function for all supported filetypes (SpartanJ)
+      1.31 (2011-06-20)
+             a few more leak fixes, bug in PNG handling (SpartanJ)
+      1.30 (2011-06-11)
+             added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
+             removed deprecated format-specific test/load functions
+             removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
+             error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
+             fix inefficiency in decoding 32-bit BMP (David Woo)
+      1.29 (2010-08-16)
+             various warning fixes from Aurelien Pocheville 
+      1.28 (2010-08-01)
+             fix bug in GIF palette transparency (SpartanJ)
+      1.27 (2010-08-01)
+             cast-to-stbi__uint8 to fix warnings
+      1.26 (2010-07-24)
+             fix bug in file buffering for PNG reported by SpartanJ
+      1.25 (2010-07-17)
+             refix trans_data warning (Won Chun)
+      1.24 (2010-07-12)
+             perf improvements reading from files on platforms with lock-heavy fgetc()
+             minor perf improvements for jpeg
+             deprecated type-specific functions so we'll get feedback if they're needed
+             attempt to fix trans_data warning (Won Chun)
+      1.23   fixed bug in iPhone support
+      1.22 (2010-07-10)
+             removed image *writing* support
+             stbi_info support from Jetro Lauha
+             GIF support from Jean-Marc Lienher
+             iPhone PNG-extensions from James Brown
+             warning-fixes from Nicolas Schulz and Janez Zemva (i.e. Janez (U+017D)emva)
+      1.21   fix use of 'stbi__uint8' in header (reported by jon blow)
+      1.20   added support for Softimage PIC, by Tom Seddon
+      1.19   bug in interlaced PNG corruption check (found by ryg)
+      1.18 2008-08-02
+             fix a threading bug (local mutable static)
+      1.17   support interlaced PNG
+      1.16   major bugfix - convert_format converted one too many pixels
+      1.15   initialize some fields for thread safety
+      1.14   fix threadsafe conversion bug
+             header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
+      1.13   threadsafe
+      1.12   const qualifiers in the API
+      1.11   Support installable IDCT, colorspace conversion routines
+      1.10   Fixes for 64-bit (don't use "unsigned long")
+             optimized upsampling by Fabian "ryg" Giesen
+      1.09   Fix format-conversion for PSD code (bad global variables!)
+      1.08   Thatcher Ulrich's PSD code integrated by Nicolas Schulz
+      1.07   attempt to fix C++ warning/errors again
+      1.06   attempt to fix C++ warning/errors again
+      1.05   fix TGA loading to return correct *comp and use good luminance calc
+      1.04   default float alpha is 1, not 255; use 'void *' for stbi_image_free
+      1.03   bugfixes to STBI_NO_STDIO, STBI_NO_HDR
+      1.02   support for (subset of) HDR files, float interface for preferred access to them
+      1.01   fix bug: possible bug in handling right-side up bmps... not sure
+             fix bug: the stbi_bmp_load() and stbi_tga_load() functions didn't work at all
+      1.00   interface to zlib that skips zlib header
+      0.99   correct handling of alpha in palette
+      0.98   TGA loader by lonesock; dynamically add loaders (untested)
+      0.97   jpeg errors on too large a file; also catch another malloc failure
+      0.96   fix detection of invalid v value - particleman@mollyrocket forum
+      0.95   during header scan, seek to markers in case of padding
+      0.94   STBI_NO_STDIO to disable stdio usage; rename all #defines the same
+      0.93   handle jpegtran output; verbose errors
+      0.92   read 4,8,16,24,32-bit BMP files of several formats
+      0.91   output 24-bit Windows 3.0 BMP files
+      0.90   fix a few more warnings; bump version number to approach 1.0
+      0.61   bugfixes due to Marc LeBlanc, Christopher Lloyd
+      0.60   fix compiling as c++
+      0.59   fix warnings: merge Dave Moore's -Wall fixes
+      0.58   fix bug: zlib uncompressed mode len/nlen was wrong endian
+      0.57   fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
+      0.56   fix bug: zlib uncompressed mode len vs. nlen
+      0.55   fix bug: restart_interval not initialized to 0
+      0.54   allow NULL for 'int *comp'
+      0.53   fix bug in png 3->4; speedup png decoding
+      0.52   png handles req_comp=3,4 directly; minor cleanup; jpeg comments
+      0.51   obey req_comp requests, 1-component jpegs return as 1-component,
+             on 'test' only check type, not whether we support this variant
+      0.50   first released version
+*/
diff --git a/vendor/stb/deprecated/stb_image_resize.h b/vendor/stb/deprecated/stb_image_resize.h
new file mode 100644
index 0000000..ef9e6fe
--- /dev/null
+++ b/vendor/stb/deprecated/stb_image_resize.h
@@ -0,0 +1,2634 @@
+/* stb_image_resize - v0.97 - public domain image resizing
+   by Jorge L Rodriguez (@VinoBS) - 2014
+   http://github.com/nothings/stb
+
+   Written with emphasis on usability, portability, and efficiency. (No
+   SIMD or threads, so it be easily outperformed by libs that use those.)
+   Only scaling and translation is supported, no rotations or shears.
+   Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
+
+   COMPILING & LINKING
+      In one C/C++ file that #includes this file, do this:
+         #define STB_IMAGE_RESIZE_IMPLEMENTATION
+      before the #include. That will create the implementation in that file.
+
+   QUICKSTART
+      stbir_resize_uint8(      input_pixels , in_w , in_h , 0,
+                               output_pixels, out_w, out_h, 0, num_channels)
+      stbir_resize_float(...)
+      stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
+                               output_pixels, out_w, out_h, 0,
+                               num_channels , alpha_chan  , 0)
+      stbir_resize_uint8_srgb_edgemode(
+                               input_pixels , in_w , in_h , 0,
+                               output_pixels, out_w, out_h, 0,
+                               num_channels , alpha_chan  , 0, STBIR_EDGE_CLAMP)
+                                                            // WRAP/REFLECT/ZERO
+
+   FULL API
+      See the "header file" section of the source for API documentation.
+
+   ADDITIONAL DOCUMENTATION
+
+      SRGB & FLOATING POINT REPRESENTATION
+         The sRGB functions presume IEEE floating point. If you do not have
+         IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
+         a slower implementation.
+
+      MEMORY ALLOCATION
+         The resize functions here perform a single memory allocation using
+         malloc. To control the memory allocation, before the #include that
+         triggers the implementation, do:
+
+            #define STBIR_MALLOC(size,context) ...
+            #define STBIR_FREE(ptr,context)   ...
+
+         Each resize function makes exactly one call to malloc/free, so to use
+         temp memory, store the temp memory in the context and return that.
+
+      ASSERT
+         Define STBIR_ASSERT(boolval) to override assert() and not use assert.h
+
+      OPTIMIZATION
+         Define STBIR_SATURATE_INT to compute clamp values in-range using
+         integer operations instead of float operations. This may be faster
+         on some platforms.
+
+      DEFAULT FILTERS
+         For functions which don't provide explicit control over what filters
+         to use, you can change the compile-time defaults with
+
+            #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
+            #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
+
+         See stbir_filter in the header-file section for the list of filters.
+
+      NEW FILTERS
+         A number of 1D filter kernels are used. For a list of
+         supported filters see the stbir_filter enum. To add a new filter,
+         write a filter function and add it to stbir__filter_info_table.
+
+      PROGRESS
+         For interactive use with slow resize operations, you can install
+         a progress-report callback:
+
+            #define STBIR_PROGRESS_REPORT(val)   some_func(val)
+
+         The parameter val is a float which goes from 0 to 1 as progress is made.
+
+         For example:
+
+            static void my_progress_report(float progress);
+            #define STBIR_PROGRESS_REPORT(val) my_progress_report(val)
+
+            #define STB_IMAGE_RESIZE_IMPLEMENTATION
+            #include "stb_image_resize.h"
+
+            static void my_progress_report(float progress)
+            {
+               printf("Progress: %f%%\n", progress*100);
+            }
+
+      MAX CHANNELS
+         If your image has more than 64 channels, define STBIR_MAX_CHANNELS
+         to the max you'll have.
+
+      ALPHA CHANNEL
+         Most of the resizing functions provide the ability to control how
+         the alpha channel of an image is processed. The important things
+         to know about this:
+
+         1. The best mathematically-behaved version of alpha to use is
+         called "premultiplied alpha", in which the other color channels
+         have had the alpha value multiplied in. If you use premultiplied
+         alpha, linear filtering (such as image resampling done by this
+         library, or performed in texture units on GPUs) does the "right
+         thing". While premultiplied alpha is standard in the movie CGI
+         industry, it is still uncommon in the videogame/real-time world.
+
+         If you linearly filter non-premultiplied alpha, strange effects
+         occur. (For example, the 50/50 average of 99% transparent bright green
+         and 1% transparent black produces 50% transparent dark green when
+         non-premultiplied, whereas premultiplied it produces 50%
+         transparent near-black. The former introduces green energy
+         that doesn't exist in the source image.)
+
+         2. Artists should not edit premultiplied-alpha images; artists
+         want non-premultiplied alpha images. Thus, art tools generally output
+         non-premultiplied alpha images.
+
+         3. You will get best results in most cases by converting images
+         to premultiplied alpha before processing them mathematically.
+
+         4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
+         resizer does not do anything special for the alpha channel;
+         it is resampled identically to other channels. This produces
+         the correct results for premultiplied-alpha images, but produces
+         less-than-ideal results for non-premultiplied-alpha images.
+
+         5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
+         then the resizer weights the contribution of input pixels
+         based on their alpha values, or, equivalently, it multiplies
+         the alpha value into the color channels, resamples, then divides
+         by the resultant alpha value. Input pixels which have alpha=0 do
+         not contribute at all to output pixels unless _all_ of the input
+         pixels affecting that output pixel have alpha=0, in which case
+         the result for that pixel is the same as it would be without
+         STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
+         input images in integer formats. For input images in float format,
+         input pixels with alpha=0 have no effect, and output pixels
+         which have alpha=0 will be 0 in all channels. (For float images,
+         you can manually achieve the same result by adding a tiny epsilon
+         value to the alpha channel of every image, and then subtracting
+         or clamping it at the end.)
+
+         6. You can suppress the behavior described in #5 and make
+         all-0-alpha pixels have 0 in all channels by #defining
+         STBIR_NO_ALPHA_EPSILON.
+
+         7. You can separately control whether the alpha channel is
+         interpreted as linear or affected by the colorspace. By default
+         it is linear; you almost never want to apply the colorspace.
+         (For example, graphics hardware does not apply sRGB conversion
+         to the alpha channel.)
+
+   CONTRIBUTORS
+      Jorge L Rodriguez: Implementation
+      Sean Barrett: API design, optimizations
+      Aras Pranckevicius: bugfix
+      Nathan Reed: warning fixes
+
+   REVISIONS
+      0.97 (2020-02-02) fixed warning
+      0.96 (2019-03-04) fixed warnings
+      0.95 (2017-07-23) fixed warnings
+      0.94 (2017-03-18) fixed warnings
+      0.93 (2017-03-03) fixed bug with certain combinations of heights
+      0.92 (2017-01-02) fix integer overflow on large (>2GB) images
+      0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
+      0.90 (2014-09-17) first released version
+
+   LICENSE
+     See end of file for license information.
+
+   TODO
+      Don't decode all of the image data when only processing a partial tile
+      Don't use full-width decode buffers when only processing a partial tile
+      When processing wide images, break processing into tiles so data fits in L1 cache
+      Installable filters?
+      Resize that respects alpha test coverage
+         (Reference code: FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage:
+         https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp )
+*/
+
+#ifndef STBIR_INCLUDE_STB_IMAGE_RESIZE_H
+#define STBIR_INCLUDE_STB_IMAGE_RESIZE_H
+
+#ifdef _MSC_VER
+typedef unsigned char  stbir_uint8;
+typedef unsigned short stbir_uint16;
+typedef unsigned int   stbir_uint32;
+#else
+#include <stdint.h>
+typedef uint8_t  stbir_uint8;
+typedef uint16_t stbir_uint16;
+typedef uint32_t stbir_uint32;
+#endif
+
+#ifndef STBIRDEF
+#ifdef STB_IMAGE_RESIZE_STATIC
+#define STBIRDEF static
+#else
+#ifdef __cplusplus
+#define STBIRDEF extern "C"
+#else
+#define STBIRDEF extern
+#endif
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Easy-to-use API:
+//
+//     * "input pixels" points to an array of image data with 'num_channels' channels (e.g. RGB=3, RGBA=4)
+//     * input_w is input image width (x-axis), input_h is input image height (y-axis)
+//     * stride is the offset between successive rows of image data in memory, in bytes. you can
+//       specify 0 to mean packed continuously in memory
+//     * alpha channel is treated identically to other channels.
+//     * colorspace is linear or sRGB as specified by function name
+//     * returned result is 1 for success or 0 in case of an error.
+//       #define STBIR_ASSERT() to trigger an assert on parameter validation errors.
+//     * Memory required grows approximately linearly with input and output size, but with
+//       discontinuities at input_w == output_w and input_h == output_h.
+//     * These functions use a "default" resampling filter defined at compile time. To change the filter,
+//       you can change the compile-time defaults by #defining STBIR_DEFAULT_FILTER_UPSAMPLE
+//       and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or you can use the medium-complexity API.
+
+STBIRDEF int stbir_resize_uint8(     const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels);
+
+STBIRDEF int stbir_resize_float(     const float *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           float *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels);
+
+
+// The following functions interpret image data as gamma-corrected sRGB.
+// Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
+// or otherwise provide the index of the alpha channel. Flags value
+// of 0 will probably do the right thing if you're not sure what
+// the flags mean.
+
+#define STBIR_ALPHA_CHANNEL_NONE       -1
+
+// Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
+// use alpha-weighted resampling (effectively premultiplying, resampling,
+// then unpremultiplying).
+#define STBIR_FLAG_ALPHA_PREMULTIPLIED    (1 << 0)
+// The specified alpha channel should be handled as gamma-corrected value even
+// when doing sRGB operations.
+#define STBIR_FLAG_ALPHA_USES_COLORSPACE  (1 << 1)
+
+STBIRDEF int stbir_resize_uint8_srgb(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels, int alpha_channel, int flags);
+
+
+typedef enum
+{
+    STBIR_EDGE_CLAMP   = 1,
+    STBIR_EDGE_REFLECT = 2,
+    STBIR_EDGE_WRAP    = 3,
+    STBIR_EDGE_ZERO    = 4,
+} stbir_edge;
+
+// This function adds the ability to specify how requests to sample off the edge of the image are handled.
+STBIRDEF int stbir_resize_uint8_srgb_edgemode(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                                    unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                              int num_channels, int alpha_channel, int flags,
+                                              stbir_edge edge_wrap_mode);
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Medium-complexity API
+//
+// This extends the easy-to-use API as follows:
+//
+//     * Alpha-channel can be processed separately
+//       * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
+//         * Alpha channel will not be gamma corrected (unless flags&STBIR_FLAG_GAMMA_CORRECT)
+//         * Filters will be weighted by alpha channel (unless flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
+//     * Filter can be selected explicitly
+//     * uint16 image type
+//     * sRGB colorspace available for all types
+//     * context parameter for passing to STBIR_MALLOC
+
+typedef enum
+{
+    STBIR_FILTER_DEFAULT      = 0,  // use same filter type that easy-to-use API chooses
+    STBIR_FILTER_BOX          = 1,  // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
+    STBIR_FILTER_TRIANGLE     = 2,  // On upsampling, produces same results as bilinear texture filtering
+    STBIR_FILTER_CUBICBSPLINE = 3,  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
+    STBIR_FILTER_CATMULLROM   = 4,  // An interpolating cubic spline
+    STBIR_FILTER_MITCHELL     = 5,  // Mitchell-Netrevalli filter with B=1/3, C=1/3
+} stbir_filter;
+
+typedef enum
+{
+    STBIR_COLORSPACE_LINEAR,
+    STBIR_COLORSPACE_SRGB,
+
+    STBIR_MAX_COLORSPACES,
+} stbir_colorspace;
+
+// The following functions are all identical except for the type of the image data
+
+STBIRDEF int stbir_resize_uint8_generic( const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                               unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
+                                         void *alloc_context);
+
+STBIRDEF int stbir_resize_uint16_generic(const stbir_uint16 *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
+                                               stbir_uint16 *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
+                                         void *alloc_context);
+
+STBIRDEF int stbir_resize_float_generic( const float *input_pixels         , int input_w , int input_h , int input_stride_in_bytes,
+                                               float *output_pixels        , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
+                                         void *alloc_context);
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Full-complexity API
+//
+// This extends the medium API as follows:
+//
+//       * uint32 image type
+//     * not typesafe
+//     * separate filter types for each axis
+//     * separate edge modes for each axis
+//     * can specify scale explicitly for subpixel correctness
+//     * can specify image source tile using texture coordinates
+
+typedef enum
+{
+    STBIR_TYPE_UINT8 ,
+    STBIR_TYPE_UINT16,
+    STBIR_TYPE_UINT32,
+    STBIR_TYPE_FLOAT ,
+
+    STBIR_MAX_TYPES
+} stbir_datatype;
+
+STBIRDEF int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context);
+
+STBIRDEF int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float x_scale, float y_scale,
+                                   float x_offset, float y_offset);
+
+STBIRDEF int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float s0, float t0, float s1, float t1);
+// (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing style: [0, 1]x[0, 1]) of a region of the input image to use.
+
+//
+//
+////   end header file   /////////////////////////////////////////////////////
+#endif // STBIR_INCLUDE_STB_IMAGE_RESIZE_H
+
+
+
+
+
+#ifdef STB_IMAGE_RESIZE_IMPLEMENTATION
+
+#ifndef STBIR_ASSERT
+#include <assert.h>
+#define STBIR_ASSERT(x) assert(x)
+#endif
+
+// For memset
+#include <string.h>
+
+#include <math.h>
+
+#ifndef STBIR_MALLOC
+#include <stdlib.h>
+// use comma operator to evaluate c, to avoid "unused parameter" warnings
+#define STBIR_MALLOC(size,c) ((void)(c), malloc(size))
+#define STBIR_FREE(ptr,c)    ((void)(c), free(ptr))
+#endif
+
+#ifndef _MSC_VER
+#ifdef __cplusplus
+#define stbir__inline inline
+#else
+#define stbir__inline
+#endif
+#else
+#define stbir__inline __forceinline
+#endif
+
+
+// should produce compiler error if size is wrong
+typedef unsigned char stbir__validate_uint32[sizeof(stbir_uint32) == 4 ? 1 : -1];
+
+#ifdef _MSC_VER
+#define STBIR__NOTUSED(v)  (void)(v)
+#else
+#define STBIR__NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#define STBIR__ARRAY_SIZE(a) (sizeof((a))/sizeof((a)[0]))
+
+#ifndef STBIR_DEFAULT_FILTER_UPSAMPLE
+#define STBIR_DEFAULT_FILTER_UPSAMPLE    STBIR_FILTER_CATMULLROM
+#endif
+
+#ifndef STBIR_DEFAULT_FILTER_DOWNSAMPLE
+#define STBIR_DEFAULT_FILTER_DOWNSAMPLE  STBIR_FILTER_MITCHELL
+#endif
+
+#ifndef STBIR_PROGRESS_REPORT
+#define STBIR_PROGRESS_REPORT(float_0_to_1)
+#endif
+
+#ifndef STBIR_MAX_CHANNELS
+#define STBIR_MAX_CHANNELS 64
+#endif
+
+#if STBIR_MAX_CHANNELS > 65536
+#error "Too many channels; STBIR_MAX_CHANNELS must be no more than 65536."
+// because we store the indices in 16-bit variables
+#endif
+
+// This value is added to alpha just before premultiplication to avoid
+// zeroing out color values. It is equivalent to 2^-80. If you don't want
+// that behavior (it may interfere if you have floating point images with
+// very small alpha values) then you can define STBIR_NO_ALPHA_EPSILON to
+// disable it.
+#ifndef STBIR_ALPHA_EPSILON
+#define STBIR_ALPHA_EPSILON ((float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20))
+#endif
+
+
+
+#ifdef _MSC_VER
+#define STBIR__UNUSED_PARAM(v)  (void)(v)
+#else
+#define STBIR__UNUSED_PARAM(v)  (void)sizeof(v)
+#endif
+
+// must match stbir_datatype
+static unsigned char stbir__type_size[] = {
+    1, // STBIR_TYPE_UINT8
+    2, // STBIR_TYPE_UINT16
+    4, // STBIR_TYPE_UINT32
+    4, // STBIR_TYPE_FLOAT
+};
+
+// Kernel function centered at 0
+typedef float (stbir__kernel_fn)(float x, float scale);
+typedef float (stbir__support_fn)(float scale);
+
+typedef struct
+{
+    stbir__kernel_fn* kernel;
+    stbir__support_fn* support;
+} stbir__filter_info;
+
+// When upsampling, the contributors are which source pixels contribute.
+// When downsampling, the contributors are which destination pixels are contributed to.
+typedef struct
+{
+    int n0; // First contributing pixel
+    int n1; // Last contributing pixel
+} stbir__contributors;
+
+typedef struct
+{
+    const void* input_data;
+    int input_w;
+    int input_h;
+    int input_stride_bytes;
+
+    void* output_data;
+    int output_w;
+    int output_h;
+    int output_stride_bytes;
+
+    float s0, t0, s1, t1;
+
+    float horizontal_shift; // Units: output pixels
+    float vertical_shift;   // Units: output pixels
+    float horizontal_scale;
+    float vertical_scale;
+
+    int channels;
+    int alpha_channel;
+    stbir_uint32 flags;
+    stbir_datatype type;
+    stbir_filter horizontal_filter;
+    stbir_filter vertical_filter;
+    stbir_edge edge_horizontal;
+    stbir_edge edge_vertical;
+    stbir_colorspace colorspace;
+
+    stbir__contributors* horizontal_contributors;
+    float* horizontal_coefficients;
+
+    stbir__contributors* vertical_contributors;
+    float* vertical_coefficients;
+
+    int decode_buffer_pixels;
+    float* decode_buffer;
+
+    float* horizontal_buffer;
+
+    // cache these because ceil/floor are inexplicably showing up in profile
+    int horizontal_coefficient_width;
+    int vertical_coefficient_width;
+    int horizontal_filter_pixel_width;
+    int vertical_filter_pixel_width;
+    int horizontal_filter_pixel_margin;
+    int vertical_filter_pixel_margin;
+    int horizontal_num_contributors;
+    int vertical_num_contributors;
+
+    int ring_buffer_length_bytes;   // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
+    int ring_buffer_num_entries;    // Total number of entries in the ring buffer.
+    int ring_buffer_first_scanline;
+    int ring_buffer_last_scanline;
+    int ring_buffer_begin_index;    // first_scanline is at this index in the ring buffer
+    float* ring_buffer;
+
+    float* encode_buffer; // A temporary buffer to store floats so we don't lose precision while we do multiply-adds.
+
+    int horizontal_contributors_size;
+    int horizontal_coefficients_size;
+    int vertical_contributors_size;
+    int vertical_coefficients_size;
+    int decode_buffer_size;
+    int horizontal_buffer_size;
+    int ring_buffer_size;
+    int encode_buffer_size;
+} stbir__info;
+
+
+static const float stbir__max_uint8_as_float  = 255.0f;
+static const float stbir__max_uint16_as_float = 65535.0f;
+static const double stbir__max_uint32_as_float = 4294967295.0;
+
+
+static stbir__inline int stbir__min(int a, int b)
+{
+    return a < b ? a : b;
+}
+
+static stbir__inline float stbir__saturate(float x)
+{
+    if (x < 0)
+        return 0;
+
+    if (x > 1)
+        return 1;
+
+    return x;
+}
+
+#ifdef STBIR_SATURATE_INT
+static stbir__inline stbir_uint8 stbir__saturate8(int x)
+{
+    if ((unsigned int) x <= 255)
+        return x;
+
+    if (x < 0)
+        return 0;
+
+    return 255;
+}
+
+static stbir__inline stbir_uint16 stbir__saturate16(int x)
+{
+    if ((unsigned int) x <= 65535)
+        return x;
+
+    if (x < 0)
+        return 0;
+
+    return 65535;
+}
+#endif
+
+static float stbir__srgb_uchar_to_linear_float[256] = {
+    0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
+    0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
+    0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
+    0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
+    0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
+    0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
+    0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
+    0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
+    0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
+    0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
+    0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
+    0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
+    0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
+    0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
+    0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
+    0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
+    0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
+    0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
+    0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
+    0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
+    0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
+    0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
+    0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
+    0.982251f, 0.991102f, 1.0f
+};
+
+static float stbir__srgb_to_linear(float f)
+{
+    if (f <= 0.04045f)
+        return f / 12.92f;
+    else
+        return (float)pow((f + 0.055f) / 1.055f, 2.4f);
+}
+
+static float stbir__linear_to_srgb(float f)
+{
+    if (f <= 0.0031308f)
+        return f * 12.92f;
+    else
+        return 1.055f * (float)pow(f, 1 / 2.4f) - 0.055f;
+}
+
+#ifndef STBIR_NON_IEEE_FLOAT
+// From https://gist.github.com/rygorous/2203834
+
+typedef union
+{
+    stbir_uint32 u;
+    float f;
+} stbir__FP32;
+
+static const stbir_uint32 fp32_to_srgb8_tab4[104] = {
+    0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
+    0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
+    0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
+    0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
+    0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
+    0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
+    0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
+    0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
+    0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
+    0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
+    0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
+    0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
+    0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
+};
+
+static stbir_uint8 stbir__linear_to_srgb_uchar(float in)
+{
+    static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
+    static const stbir__FP32 minval = { (127-13) << 23 };
+    stbir_uint32 tab,bias,scale,t;
+    stbir__FP32 f;
+
+    // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
+    // The tests are carefully written so that NaNs map to 0, same as in the reference
+    // implementation.
+    if (!(in > minval.f)) // written this way to catch NaNs
+        in = minval.f;
+    if (in > almostone.f)
+        in = almostone.f;
+
+    // Do the table lookup and unpack bias, scale
+    f.f = in;
+    tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
+    bias = (tab >> 16) << 9;
+    scale = tab & 0xffff;
+
+    // Grab next-highest mantissa bits and perform linear interpolation
+    t = (f.u >> 12) & 0xff;
+    return (unsigned char) ((bias + scale*t) >> 16);
+}
+
+#else
+// sRGB transition values, scaled by 1<<28
+static int stbir__srgb_offset_to_linear_scaled[256] =
+{
+            0,     40738,    122216,    203693,    285170,    366648,    448125,    529603,
+       611080,    692557,    774035,    855852,    942009,   1033024,   1128971,   1229926,
+      1335959,   1447142,   1563542,   1685229,   1812268,   1944725,   2082664,   2226148,
+      2375238,   2529996,   2690481,   2856753,   3028870,   3206888,   3390865,   3580856,
+      3776916,   3979100,   4187460,   4402049,   4622919,   4850123,   5083710,   5323731,
+      5570236,   5823273,   6082892,   6349140,   6622065,   6901714,   7188133,   7481369,
+      7781466,   8088471,   8402427,   8723380,   9051372,   9386448,   9728650,  10078021,
+     10434603,  10798439,  11169569,  11548036,  11933879,  12327139,  12727857,  13136073,
+     13551826,  13975156,  14406100,  14844697,  15290987,  15745007,  16206795,  16676389,
+     17153826,  17639142,  18132374,  18633560,  19142734,  19659934,  20185196,  20718552,
+     21260042,  21809696,  22367554,  22933648,  23508010,  24090680,  24681686,  25281066,
+     25888850,  26505076,  27129772,  27762974,  28404716,  29055026,  29713942,  30381490,
+     31057708,  31742624,  32436272,  33138682,  33849884,  34569912,  35298800,  36036568,
+     36783260,  37538896,  38303512,  39077136,  39859796,  40651528,  41452360,  42262316,
+     43081432,  43909732,  44747252,  45594016,  46450052,  47315392,  48190064,  49074096,
+     49967516,  50870356,  51782636,  52704392,  53635648,  54576432,  55526772,  56486700,
+     57456236,  58435408,  59424248,  60422780,  61431036,  62449032,  63476804,  64514376,
+     65561776,  66619028,  67686160,  68763192,  69850160,  70947088,  72053992,  73170912,
+     74297864,  75434880,  76581976,  77739184,  78906536,  80084040,  81271736,  82469648,
+     83677792,  84896192,  86124888,  87363888,  88613232,  89872928,  91143016,  92423512,
+     93714432,  95015816,  96327688,  97650056,  98982952, 100326408, 101680440, 103045072,
+    104420320, 105806224, 107202800, 108610064, 110028048, 111456776, 112896264, 114346544,
+    115807632, 117279552, 118762328, 120255976, 121760536, 123276016, 124802440, 126339832,
+    127888216, 129447616, 131018048, 132599544, 134192112, 135795792, 137410592, 139036528,
+    140673648, 142321952, 143981456, 145652208, 147334208, 149027488, 150732064, 152447968,
+    154175200, 155913792, 157663776, 159425168, 161197984, 162982240, 164777968, 166585184,
+    168403904, 170234160, 172075968, 173929344, 175794320, 177670896, 179559120, 181458992,
+    183370528, 185293776, 187228736, 189175424, 191133888, 193104112, 195086128, 197079968,
+    199085648, 201103184, 203132592, 205173888, 207227120, 209292272, 211369392, 213458480,
+    215559568, 217672656, 219797792, 221934976, 224084240, 226245600, 228419056, 230604656,
+    232802400, 235012320, 237234432, 239468736, 241715280, 243974080, 246245120, 248528464,
+    250824112, 253132064, 255452368, 257785040, 260130080, 262487520, 264857376, 267239664,
+};
+
+static stbir_uint8 stbir__linear_to_srgb_uchar(float f)
+{
+    int x = (int) (f * (1 << 28)); // has headroom so you don't need to clamp
+    int v = 0;
+    int i;
+
+    // Refine the guess with a short binary search.
+    i = v + 128; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +  64; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +  32; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +  16; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   8; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   4; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   2; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   1; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+
+    return (stbir_uint8) v;
+}
+#endif
+
+static float stbir__filter_trapezoid(float x, float scale)
+{
+    float halfscale = scale / 2;
+    float t = 0.5f + halfscale;
+    STBIR_ASSERT(scale <= 1);
+
+    x = (float)fabs(x);
+
+    if (x >= t)
+        return 0;
+    else
+    {
+        float r = 0.5f - halfscale;
+        if (x <= r)
+            return 1;
+        else
+            return (t - x) / scale;
+    }
+}
+
+static float stbir__support_trapezoid(float scale)
+{
+    STBIR_ASSERT(scale <= 1);
+    return 0.5f + scale / 2;
+}
+
+static float stbir__filter_triangle(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x <= 1.0f)
+        return 1 - x;
+    else
+        return 0;
+}
+
+static float stbir__filter_cubic(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x < 1.0f)
+        return (4 + x*x*(3*x - 6))/6;
+    else if (x < 2.0f)
+        return (8 + x*(-12 + x*(6 - x)))/6;
+
+    return (0.0f);
+}
+
+static float stbir__filter_catmullrom(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x < 1.0f)
+        return 1 - x*x*(2.5f - 1.5f*x);
+    else if (x < 2.0f)
+        return 2 - x*(4 + x*(0.5f*x - 2.5f));
+
+    return (0.0f);
+}
+
+static float stbir__filter_mitchell(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x < 1.0f)
+        return (16 + x*x*(21 * x - 36))/18;
+    else if (x < 2.0f)
+        return (32 + x*(-60 + x*(36 - 7*x)))/18;
+
+    return (0.0f);
+}
+
+static float stbir__support_zero(float s)
+{
+    STBIR__UNUSED_PARAM(s);
+    return 0;
+}
+
+static float stbir__support_one(float s)
+{
+    STBIR__UNUSED_PARAM(s);
+    return 1;
+}
+
+static float stbir__support_two(float s)
+{
+    STBIR__UNUSED_PARAM(s);
+    return 2;
+}
+
+static stbir__filter_info stbir__filter_info_table[] = {
+        { NULL,                     stbir__support_zero },
+        { stbir__filter_trapezoid,  stbir__support_trapezoid },
+        { stbir__filter_triangle,   stbir__support_one },
+        { stbir__filter_cubic,      stbir__support_two },
+        { stbir__filter_catmullrom, stbir__support_two },
+        { stbir__filter_mitchell,   stbir__support_two },
+};
+
+stbir__inline static int stbir__use_upsampling(float ratio)
+{
+    return ratio > 1;
+}
+
+stbir__inline static int stbir__use_width_upsampling(stbir__info* stbir_info)
+{
+    return stbir__use_upsampling(stbir_info->horizontal_scale);
+}
+
+stbir__inline static int stbir__use_height_upsampling(stbir__info* stbir_info)
+{
+    return stbir__use_upsampling(stbir_info->vertical_scale);
+}
+
+// This is the maximum number of input samples that can affect an output sample
+// with the given filter
+static int stbir__get_filter_pixel_width(stbir_filter filter, float scale)
+{
+    STBIR_ASSERT(filter != 0);
+    STBIR_ASSERT(filter < STBIR__ARRAY_SIZE(stbir__filter_info_table));
+
+    if (stbir__use_upsampling(scale))
+        return (int)ceil(stbir__filter_info_table[filter].support(1/scale) * 2);
+    else
+        return (int)ceil(stbir__filter_info_table[filter].support(scale) * 2 / scale);
+}
+
+// This is how much to expand buffers to account for filters seeking outside
+// the image boundaries.
+static int stbir__get_filter_pixel_margin(stbir_filter filter, float scale)
+{
+    return stbir__get_filter_pixel_width(filter, scale) / 2;
+}
+
+static int stbir__get_coefficient_width(stbir_filter filter, float scale)
+{
+    if (stbir__use_upsampling(scale))
+        return (int)ceil(stbir__filter_info_table[filter].support(1 / scale) * 2);
+    else
+        return (int)ceil(stbir__filter_info_table[filter].support(scale) * 2);
+}
+
+static int stbir__get_contributors(float scale, stbir_filter filter, int input_size, int output_size)
+{
+    if (stbir__use_upsampling(scale))
+        return output_size;
+    else
+        return (input_size + stbir__get_filter_pixel_margin(filter, scale) * 2);
+}
+
+static int stbir__get_total_horizontal_coefficients(stbir__info* info)
+{
+    return info->horizontal_num_contributors
+         * stbir__get_coefficient_width      (info->horizontal_filter, info->horizontal_scale);
+}
+
+static int stbir__get_total_vertical_coefficients(stbir__info* info)
+{
+    return info->vertical_num_contributors
+         * stbir__get_coefficient_width      (info->vertical_filter, info->vertical_scale);
+}
+
+static stbir__contributors* stbir__get_contributor(stbir__contributors* contributors, int n)
+{
+    return &contributors[n];
+}
+
+// For perf reasons this code is duplicated in stbir__resample_horizontal_upsample/downsample,
+// if you change it here change it there too.
+static float* stbir__get_coefficient(float* coefficients, stbir_filter filter, float scale, int n, int c)
+{
+    int width = stbir__get_coefficient_width(filter, scale);
+    return &coefficients[width*n + c];
+}
+
+static int stbir__edge_wrap_slow(stbir_edge edge, int n, int max)
+{
+    switch (edge)
+    {
+    case STBIR_EDGE_ZERO:
+        return 0; // we'll decode the wrong pixel here, and then overwrite with 0s later
+
+    case STBIR_EDGE_CLAMP:
+        if (n < 0)
+            return 0;
+
+        if (n >= max)
+            return max - 1;
+
+        return n; // NOTREACHED
+
+    case STBIR_EDGE_REFLECT:
+    {
+        if (n < 0)
+        {
+            if (n < max)
+                return -n;
+            else
+                return max - 1;
+        }
+
+        if (n >= max)
+        {
+            int max2 = max * 2;
+            if (n >= max2)
+                return 0;
+            else
+                return max2 - n - 1;
+        }
+
+        return n; // NOTREACHED
+    }
+
+    case STBIR_EDGE_WRAP:
+        if (n >= 0)
+            return (n % max);
+        else
+        {
+            int m = (-n) % max;
+
+            if (m != 0)
+                m = max - m;
+
+            return (m);
+        }
+        // NOTREACHED
+
+    default:
+        STBIR_ASSERT(!"Unimplemented edge type");
+        return 0;
+    }
+}
+
+stbir__inline static int stbir__edge_wrap(stbir_edge edge, int n, int max)
+{
+    // avoid per-pixel switch
+    if (n >= 0 && n < max)
+        return n;
+    return stbir__edge_wrap_slow(edge, n, max);
+}
+
+// What input pixels contribute to this output pixel?
+static void stbir__calculate_sample_range_upsample(int n, float out_filter_radius, float scale_ratio, float out_shift, int* in_first_pixel, int* in_last_pixel, float* in_center_of_out)
+{
+    float out_pixel_center = (float)n + 0.5f;
+    float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
+    float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
+
+    float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) / scale_ratio;
+    float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) / scale_ratio;
+
+    *in_center_of_out = (out_pixel_center + out_shift) / scale_ratio;
+    *in_first_pixel = (int)(floor(in_pixel_influence_lowerbound + 0.5));
+    *in_last_pixel = (int)(floor(in_pixel_influence_upperbound - 0.5));
+}
+
+// What output pixels does this input pixel contribute to?
+static void stbir__calculate_sample_range_downsample(int n, float in_pixels_radius, float scale_ratio, float out_shift, int* out_first_pixel, int* out_last_pixel, float* out_center_of_in)
+{
+    float in_pixel_center = (float)n + 0.5f;
+    float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
+    float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
+
+    float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale_ratio - out_shift;
+    float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale_ratio - out_shift;
+
+    *out_center_of_in = in_pixel_center * scale_ratio - out_shift;
+    *out_first_pixel = (int)(floor(out_pixel_influence_lowerbound + 0.5));
+    *out_last_pixel = (int)(floor(out_pixel_influence_upperbound - 0.5));
+}
+
+static void stbir__calculate_coefficients_upsample(stbir_filter filter, float scale, int in_first_pixel, int in_last_pixel, float in_center_of_out, stbir__contributors* contributor, float* coefficient_group)
+{
+    int i;
+    float total_filter = 0;
+    float filter_scale;
+
+    STBIR_ASSERT(in_last_pixel - in_first_pixel <= (int)ceil(stbir__filter_info_table[filter].support(1/scale) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
+
+    contributor->n0 = in_first_pixel;
+    contributor->n1 = in_last_pixel;
+
+    STBIR_ASSERT(contributor->n1 >= contributor->n0);
+
+    for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
+    {
+        float in_pixel_center = (float)(i + in_first_pixel) + 0.5f;
+        coefficient_group[i] = stbir__filter_info_table[filter].kernel(in_center_of_out - in_pixel_center, 1 / scale);
+
+        // If the coefficient is zero, skip it. (Don't do the <0 check here, we want the influence of those outside pixels.)
+        if (i == 0 && !coefficient_group[i])
+        {
+            contributor->n0 = ++in_first_pixel;
+            i--;
+            continue;
+        }
+
+        total_filter += coefficient_group[i];
+    }
+
+    // NOTE(fg): Not actually true in general, nor is there any reason to expect it should be.
+    // It would be true in exact math but is at best approximately true in floating-point math,
+    // and it would not make sense to try and put actual bounds on this here because it depends
+    // on the image aspect ratio which can get pretty extreme.
+    //STBIR_ASSERT(stbir__filter_info_table[filter].kernel((float)(in_last_pixel + 1) + 0.5f - in_center_of_out, 1/scale) == 0);
+
+    STBIR_ASSERT(total_filter > 0.9);
+    STBIR_ASSERT(total_filter < 1.1f); // Make sure it's not way off.
+
+    // Make sure the sum of all coefficients is 1.
+    filter_scale = 1 / total_filter;
+
+    for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
+        coefficient_group[i] *= filter_scale;
+
+    for (i = in_last_pixel - in_first_pixel; i >= 0; i--)
+    {
+        if (coefficient_group[i])
+            break;
+
+        // This line has no weight. We can skip it.
+        contributor->n1 = contributor->n0 + i - 1;
+    }
+}
+
+static void stbir__calculate_coefficients_downsample(stbir_filter filter, float scale_ratio, int out_first_pixel, int out_last_pixel, float out_center_of_in, stbir__contributors* contributor, float* coefficient_group)
+{
+    int i;
+
+    STBIR_ASSERT(out_last_pixel - out_first_pixel <= (int)ceil(stbir__filter_info_table[filter].support(scale_ratio) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
+
+    contributor->n0 = out_first_pixel;
+    contributor->n1 = out_last_pixel;
+
+    STBIR_ASSERT(contributor->n1 >= contributor->n0);
+
+    for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
+    {
+        float out_pixel_center = (float)(i + out_first_pixel) + 0.5f;
+        float x = out_pixel_center - out_center_of_in;
+        coefficient_group[i] = stbir__filter_info_table[filter].kernel(x, scale_ratio) * scale_ratio;
+    }
+
+    // NOTE(fg): Not actually true in general, nor is there any reason to expect it should be.
+    // It would be true in exact math but is at best approximately true in floating-point math,
+    // and it would not make sense to try and put actual bounds on this here because it depends
+    // on the image aspect ratio which can get pretty extreme.
+    //STBIR_ASSERT(stbir__filter_info_table[filter].kernel((float)(out_last_pixel + 1) + 0.5f - out_center_of_in, scale_ratio) == 0);
+
+    for (i = out_last_pixel - out_first_pixel; i >= 0; i--)
+    {
+        if (coefficient_group[i])
+            break;
+
+        // This line has no weight. We can skip it.
+        contributor->n1 = contributor->n0 + i - 1;
+    }
+}
+
+static void stbir__normalize_downsample_coefficients(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, int input_size, int output_size)
+{
+    int num_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
+    int num_coefficients = stbir__get_coefficient_width(filter, scale_ratio);
+    int i, j;
+    int skip;
+
+    for (i = 0; i < output_size; i++)
+    {
+        float scale;
+        float total = 0;
+
+        for (j = 0; j < num_contributors; j++)
+        {
+            if (i >= contributors[j].n0 && i <= contributors[j].n1)
+            {
+                float coefficient = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0);
+                total += coefficient;
+            }
+            else if (i < contributors[j].n0)
+                break;
+        }
+
+        STBIR_ASSERT(total > 0.9f);
+        STBIR_ASSERT(total < 1.1f);
+
+        scale = 1 / total;
+
+        for (j = 0; j < num_contributors; j++)
+        {
+            if (i >= contributors[j].n0 && i <= contributors[j].n1)
+                *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0) *= scale;
+            else if (i < contributors[j].n0)
+                break;
+        }
+    }
+
+    // Optimize: Skip zero coefficients and contributions outside of image bounds.
+    // Do this after normalizing because normalization depends on the n0/n1 values.
+    for (j = 0; j < num_contributors; j++)
+    {
+        int range, max, width;
+
+        skip = 0;
+        while (*stbir__get_coefficient(coefficients, filter, scale_ratio, j, skip) == 0)
+            skip++;
+
+        contributors[j].n0 += skip;
+
+        while (contributors[j].n0 < 0)
+        {
+            contributors[j].n0++;
+            skip++;
+        }
+
+        range = contributors[j].n1 - contributors[j].n0 + 1;
+        max = stbir__min(num_coefficients, range);
+
+        width = stbir__get_coefficient_width(filter, scale_ratio);
+        for (i = 0; i < max; i++)
+        {
+            if (i + skip >= width)
+                break;
+
+            *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i) = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i + skip);
+        }
+
+        continue;
+    }
+
+    // Using min to avoid writing into invalid pixels.
+    for (i = 0; i < num_contributors; i++)
+        contributors[i].n1 = stbir__min(contributors[i].n1, output_size - 1);
+}
+
+// Each scan line uses the same kernel values so we should calculate the kernel
+// values once and then we can use them for every scan line.
+static void stbir__calculate_filters(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, float shift, int input_size, int output_size)
+{
+    int n;
+    int total_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
+
+    if (stbir__use_upsampling(scale_ratio))
+    {
+        float out_pixels_radius = stbir__filter_info_table[filter].support(1 / scale_ratio) * scale_ratio;
+
+        // Looping through out pixels
+        for (n = 0; n < total_contributors; n++)
+        {
+            float in_center_of_out; // Center of the current out pixel in the in pixel space
+            int in_first_pixel, in_last_pixel;
+
+            stbir__calculate_sample_range_upsample(n, out_pixels_radius, scale_ratio, shift, &in_first_pixel, &in_last_pixel, &in_center_of_out);
+
+            stbir__calculate_coefficients_upsample(filter, scale_ratio, in_first_pixel, in_last_pixel, in_center_of_out, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
+        }
+    }
+    else
+    {
+        float in_pixels_radius = stbir__filter_info_table[filter].support(scale_ratio) / scale_ratio;
+
+        // Looping through in pixels
+        for (n = 0; n < total_contributors; n++)
+        {
+            float out_center_of_in; // Center of the current out pixel in the in pixel space
+            int out_first_pixel, out_last_pixel;
+            int n_adjusted = n - stbir__get_filter_pixel_margin(filter, scale_ratio);
+
+            stbir__calculate_sample_range_downsample(n_adjusted, in_pixels_radius, scale_ratio, shift, &out_first_pixel, &out_last_pixel, &out_center_of_in);
+
+            stbir__calculate_coefficients_downsample(filter, scale_ratio, out_first_pixel, out_last_pixel, out_center_of_in, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
+        }
+
+        stbir__normalize_downsample_coefficients(contributors, coefficients, filter, scale_ratio, input_size, output_size);
+    }
+}
+
+static float* stbir__get_decode_buffer(stbir__info* stbir_info)
+{
+    // The 0 index of the decode buffer starts after the margin. This makes
+    // it okay to use negative indexes on the decode buffer.
+    return &stbir_info->decode_buffer[stbir_info->horizontal_filter_pixel_margin * stbir_info->channels];
+}
+
+#define STBIR__DECODE(type, colorspace) ((int)(type) * (STBIR_MAX_COLORSPACES) + (int)(colorspace))
+
+static void stbir__decode_scanline(stbir__info* stbir_info, int n)
+{
+    int c;
+    int channels = stbir_info->channels;
+    int alpha_channel = stbir_info->alpha_channel;
+    int type = stbir_info->type;
+    int colorspace = stbir_info->colorspace;
+    int input_w = stbir_info->input_w;
+    size_t input_stride_bytes = stbir_info->input_stride_bytes;
+    float* decode_buffer = stbir__get_decode_buffer(stbir_info);
+    stbir_edge edge_horizontal = stbir_info->edge_horizontal;
+    stbir_edge edge_vertical = stbir_info->edge_vertical;
+    size_t in_buffer_row_offset = stbir__edge_wrap(edge_vertical, n, stbir_info->input_h) * input_stride_bytes;
+    const void* input_data = (char *) stbir_info->input_data + in_buffer_row_offset;
+    int max_x = input_w + stbir_info->horizontal_filter_pixel_margin;
+    int decode = STBIR__DECODE(type, colorspace);
+
+    int x = -stbir_info->horizontal_filter_pixel_margin;
+
+    // special handling for STBIR_EDGE_ZERO because it needs to return an item that doesn't appear in the input,
+    // and we want to avoid paying overhead on every pixel if not STBIR_EDGE_ZERO
+    if (edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info->input_h))
+    {
+        for (; x < max_x; x++)
+            for (c = 0; c < channels; c++)
+                decode_buffer[x*channels + c] = 0;
+        return;
+    }
+
+    switch (decode)
+    {
+    case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = ((float)((const unsigned char*)input_data)[input_pixel_index + c]) / stbir__max_uint8_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[((const unsigned char*)input_data)[input_pixel_index + c]];
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = ((float)((const unsigned char*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint8_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = ((float)((const unsigned short*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(((float)((const unsigned short*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float);
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = ((float)((const unsigned short*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint16_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = (float)(((double)((const unsigned int*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float);
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((float)(((double)((const unsigned int*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float));
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = (float)(((double)((const unsigned int*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint32_as_float);
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = ((const float*)input_data)[input_pixel_index + c];
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(((const float*)input_data)[input_pixel_index + c]);
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = ((const float*)input_data)[input_pixel_index + alpha_channel];
+        }
+
+        break;
+
+    default:
+        STBIR_ASSERT(!"Unknown type/colorspace/channels combination.");
+        break;
+    }
+
+    if (!(stbir_info->flags & STBIR_FLAG_ALPHA_PREMULTIPLIED))
+    {
+        for (x = -stbir_info->horizontal_filter_pixel_margin; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+
+            // If the alpha value is 0 it will clobber the color values. Make sure it's not.
+            float alpha = decode_buffer[decode_pixel_index + alpha_channel];
+#ifndef STBIR_NO_ALPHA_EPSILON
+            if (stbir_info->type != STBIR_TYPE_FLOAT) {
+                alpha += STBIR_ALPHA_EPSILON;
+                decode_buffer[decode_pixel_index + alpha_channel] = alpha;
+            }
+#endif
+            for (c = 0; c < channels; c++)
+            {
+                if (c == alpha_channel)
+                    continue;
+
+                decode_buffer[decode_pixel_index + c] *= alpha;
+            }
+        }
+    }
+
+    if (edge_horizontal == STBIR_EDGE_ZERO)
+    {
+        for (x = -stbir_info->horizontal_filter_pixel_margin; x < 0; x++)
+        {
+            for (c = 0; c < channels; c++)
+                decode_buffer[x*channels + c] = 0;
+        }
+        for (x = input_w; x < max_x; x++)
+        {
+            for (c = 0; c < channels; c++)
+                decode_buffer[x*channels + c] = 0;
+        }
+    }
+}
+
+static float* stbir__get_ring_buffer_entry(float* ring_buffer, int index, int ring_buffer_length)
+{
+    return &ring_buffer[index * ring_buffer_length];
+}
+
+static float* stbir__add_empty_ring_buffer_entry(stbir__info* stbir_info, int n)
+{
+    int ring_buffer_index;
+    float* ring_buffer;
+
+    stbir_info->ring_buffer_last_scanline = n;
+
+    if (stbir_info->ring_buffer_begin_index < 0)
+    {
+        ring_buffer_index = stbir_info->ring_buffer_begin_index = 0;
+        stbir_info->ring_buffer_first_scanline = n;
+    }
+    else
+    {
+        ring_buffer_index = (stbir_info->ring_buffer_begin_index + (stbir_info->ring_buffer_last_scanline - stbir_info->ring_buffer_first_scanline)) % stbir_info->ring_buffer_num_entries;
+        STBIR_ASSERT(ring_buffer_index != stbir_info->ring_buffer_begin_index);
+    }
+
+    ring_buffer = stbir__get_ring_buffer_entry(stbir_info->ring_buffer, ring_buffer_index, stbir_info->ring_buffer_length_bytes / sizeof(float));
+    memset(ring_buffer, 0, stbir_info->ring_buffer_length_bytes);
+
+    return ring_buffer;
+}
+
+
+static void stbir__resample_horizontal_upsample(stbir__info* stbir_info, float* output_buffer)
+{
+    int x, k;
+    int output_w = stbir_info->output_w;
+    int channels = stbir_info->channels;
+    float* decode_buffer = stbir__get_decode_buffer(stbir_info);
+    stbir__contributors* horizontal_contributors = stbir_info->horizontal_contributors;
+    float* horizontal_coefficients = stbir_info->horizontal_coefficients;
+    int coefficient_width = stbir_info->horizontal_coefficient_width;
+
+    for (x = 0; x < output_w; x++)
+    {
+        int n0 = horizontal_contributors[x].n0;
+        int n1 = horizontal_contributors[x].n1;
+
+        int out_pixel_index = x * channels;
+        int coefficient_group = coefficient_width * x;
+        int coefficient_counter = 0;
+
+        STBIR_ASSERT(n1 >= n0);
+        STBIR_ASSERT(n0 >= -stbir_info->horizontal_filter_pixel_margin);
+        STBIR_ASSERT(n1 >= -stbir_info->horizontal_filter_pixel_margin);
+        STBIR_ASSERT(n0 < stbir_info->input_w + stbir_info->horizontal_filter_pixel_margin);
+        STBIR_ASSERT(n1 < stbir_info->input_w + stbir_info->horizontal_filter_pixel_margin);
+
+        switch (channels) {
+            case 1:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 1;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                }
+                break;
+            case 2:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 2;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                }
+                break;
+            case 3:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 3;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                }
+                break;
+            case 4:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 4;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                    output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
+                }
+                break;
+            default:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * channels;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    int c;
+                    STBIR_ASSERT(coefficient != 0);
+                    for (c = 0; c < channels; c++)
+                        output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
+                }
+                break;
+        }
+    }
+}
+
+static void stbir__resample_horizontal_downsample(stbir__info* stbir_info, float* output_buffer)
+{
+    int x, k;
+    int input_w = stbir_info->input_w;
+    int channels = stbir_info->channels;
+    float* decode_buffer = stbir__get_decode_buffer(stbir_info);
+    stbir__contributors* horizontal_contributors = stbir_info->horizontal_contributors;
+    float* horizontal_coefficients = stbir_info->horizontal_coefficients;
+    int coefficient_width = stbir_info->horizontal_coefficient_width;
+    int filter_pixel_margin = stbir_info->horizontal_filter_pixel_margin;
+    int max_x = input_w + filter_pixel_margin * 2;
+
+    STBIR_ASSERT(!stbir__use_width_upsampling(stbir_info));
+
+    switch (channels) {
+        case 1:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 1;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 1;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                }
+            }
+            break;
+
+        case 2:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 2;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 2;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                }
+            }
+            break;
+
+        case 3:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 3;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 3;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                }
+            }
+            break;
+
+        case 4:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 4;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 4;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                    output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
+                }
+            }
+            break;
+
+        default:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * channels;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int c;
+                    int out_pixel_index = k * channels;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    for (c = 0; c < channels; c++)
+                        output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
+                }
+            }
+            break;
+    }
+}
+
+static void stbir__decode_and_resample_upsample(stbir__info* stbir_info, int n)
+{
+    // Decode the nth scanline from the source image into the decode buffer.
+    stbir__decode_scanline(stbir_info, n);
+
+    // Now resample it into the ring buffer.
+    if (stbir__use_width_upsampling(stbir_info))
+        stbir__resample_horizontal_upsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
+    else
+        stbir__resample_horizontal_downsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
+
+    // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
+}
+
+static void stbir__decode_and_resample_downsample(stbir__info* stbir_info, int n)
+{
+    // Decode the nth scanline from the source image into the decode buffer.
+    stbir__decode_scanline(stbir_info, n);
+
+    memset(stbir_info->horizontal_buffer, 0, stbir_info->output_w * stbir_info->channels * sizeof(float));
+
+    // Now resample it into the horizontal buffer.
+    if (stbir__use_width_upsampling(stbir_info))
+        stbir__resample_horizontal_upsample(stbir_info, stbir_info->horizontal_buffer);
+    else
+        stbir__resample_horizontal_downsample(stbir_info, stbir_info->horizontal_buffer);
+
+    // Now it's sitting in the horizontal buffer ready to be distributed into the ring buffers.
+}
+
+// Get the specified scan line from the ring buffer.
+static float* stbir__get_ring_buffer_scanline(int get_scanline, float* ring_buffer, int begin_index, int first_scanline, int ring_buffer_num_entries, int ring_buffer_length)
+{
+    int ring_buffer_index = (begin_index + (get_scanline - first_scanline)) % ring_buffer_num_entries;
+    return stbir__get_ring_buffer_entry(ring_buffer, ring_buffer_index, ring_buffer_length);
+}
+
+
+static void stbir__encode_scanline(stbir__info* stbir_info, int num_pixels, void *output_buffer, float *encode_buffer, int channels, int alpha_channel, int decode)
+{
+    int x;
+    int n;
+    int num_nonalpha;
+    stbir_uint16 nonalpha[STBIR_MAX_CHANNELS];
+
+    if (!(stbir_info->flags&STBIR_FLAG_ALPHA_PREMULTIPLIED))
+    {
+        for (x=0; x < num_pixels; ++x)
+        {
+            int pixel_index = x*channels;
+
+            float alpha = encode_buffer[pixel_index + alpha_channel];
+            float reciprocal_alpha = alpha ? 1.0f / alpha : 0;
+
+            // unrolling this produced a 1% slowdown upscaling a large RGBA linear-space image on my machine - stb
+            for (n = 0; n < channels; n++)
+                if (n != alpha_channel)
+                    encode_buffer[pixel_index + n] *= reciprocal_alpha;
+
+            // We added in a small epsilon to prevent the color channel from being deleted with zero alpha.
+            // Because we only add it for integer types, it will automatically be discarded on integer
+            // conversion, so we don't need to subtract it back out (which would be problematic for
+            // numeric precision reasons).
+        }
+    }
+
+    // build a table of all channels that need colorspace correction, so
+    // we don't perform colorspace correction on channels that don't need it.
+    for (x = 0, num_nonalpha = 0; x < channels; ++x)
+    {
+        if (x != alpha_channel || (stbir_info->flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
+        {
+            nonalpha[num_nonalpha++] = (stbir_uint16)x;
+        }
+    }
+
+    #define STBIR__ROUND_INT(f)    ((int)          ((f)+0.5))
+    #define STBIR__ROUND_UINT(f)   ((stbir_uint32) ((f)+0.5))
+
+    #ifdef STBIR__SATURATE_INT
+    #define STBIR__ENCODE_LINEAR8(f)   stbir__saturate8 (STBIR__ROUND_INT((f) * stbir__max_uint8_as_float ))
+    #define STBIR__ENCODE_LINEAR16(f)  stbir__saturate16(STBIR__ROUND_INT((f) * stbir__max_uint16_as_float))
+    #else
+    #define STBIR__ENCODE_LINEAR8(f)   (unsigned char ) STBIR__ROUND_INT(stbir__saturate(f) * stbir__max_uint8_as_float )
+    #define STBIR__ENCODE_LINEAR16(f)  (unsigned short) STBIR__ROUND_INT(stbir__saturate(f) * stbir__max_uint16_as_float)
+    #endif
+
+    switch (decode)
+    {
+        case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((unsigned char*)output_buffer)[index] = STBIR__ENCODE_LINEAR8(encode_buffer[index]);
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((unsigned char*)output_buffer)[index] = stbir__linear_to_srgb_uchar(encode_buffer[index]);
+                }
+
+                if (!(stbir_info->flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((unsigned char *)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR8(encode_buffer[pixel_index+alpha_channel]);
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((unsigned short*)output_buffer)[index] = STBIR__ENCODE_LINEAR16(encode_buffer[index]);
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((unsigned short*)output_buffer)[index] = (unsigned short)STBIR__ROUND_INT(stbir__linear_to_srgb(stbir__saturate(encode_buffer[index])) * stbir__max_uint16_as_float);
+                }
+
+                if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((unsigned short*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR16(encode_buffer[pixel_index + alpha_channel]);
+            }
+
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((unsigned int*)output_buffer)[index] = (unsigned int)STBIR__ROUND_UINT(((double)stbir__saturate(encode_buffer[index])) * stbir__max_uint32_as_float);
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((unsigned int*)output_buffer)[index] = (unsigned int)STBIR__ROUND_UINT(((double)stbir__linear_to_srgb(stbir__saturate(encode_buffer[index]))) * stbir__max_uint32_as_float);
+                }
+
+                if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((unsigned int*)output_buffer)[pixel_index + alpha_channel] = (unsigned int)STBIR__ROUND_INT(((double)stbir__saturate(encode_buffer[pixel_index + alpha_channel])) * stbir__max_uint32_as_float);
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((float*)output_buffer)[index] = encode_buffer[index];
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((float*)output_buffer)[index] = stbir__linear_to_srgb(encode_buffer[index]);
+                }
+
+                if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((float*)output_buffer)[pixel_index + alpha_channel] = encode_buffer[pixel_index + alpha_channel];
+            }
+            break;
+
+        default:
+            STBIR_ASSERT(!"Unknown type/colorspace/channels combination.");
+            break;
+    }
+}
+
+static void stbir__resample_vertical_upsample(stbir__info* stbir_info, int n)
+{
+    int x, k;
+    int output_w = stbir_info->output_w;
+    stbir__contributors* vertical_contributors = stbir_info->vertical_contributors;
+    float* vertical_coefficients = stbir_info->vertical_coefficients;
+    int channels = stbir_info->channels;
+    int alpha_channel = stbir_info->alpha_channel;
+    int type = stbir_info->type;
+    int colorspace = stbir_info->colorspace;
+    int ring_buffer_entries = stbir_info->ring_buffer_num_entries;
+    void* output_data = stbir_info->output_data;
+    float* encode_buffer = stbir_info->encode_buffer;
+    int decode = STBIR__DECODE(type, colorspace);
+    int coefficient_width = stbir_info->vertical_coefficient_width;
+    int coefficient_counter;
+    int contributor = n;
+
+    float* ring_buffer = stbir_info->ring_buffer;
+    int ring_buffer_begin_index = stbir_info->ring_buffer_begin_index;
+    int ring_buffer_first_scanline = stbir_info->ring_buffer_first_scanline;
+    int ring_buffer_length = stbir_info->ring_buffer_length_bytes/sizeof(float);
+
+    int n0,n1, output_row_start;
+    int coefficient_group = coefficient_width * contributor;
+
+    n0 = vertical_contributors[contributor].n0;
+    n1 = vertical_contributors[contributor].n1;
+
+    output_row_start = n * stbir_info->output_stride_bytes;
+
+    STBIR_ASSERT(stbir__use_height_upsampling(stbir_info));
+
+    memset(encode_buffer, 0, output_w * sizeof(float) * channels);
+
+    // I tried reblocking this for better cache usage of encode_buffer
+    // (using x_outer, k, x_inner), but it lost speed. -- stb
+
+    coefficient_counter = 0;
+    switch (channels) {
+        case 1:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 1;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                }
+            }
+            break;
+        case 2:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 2;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                    encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
+                }
+            }
+            break;
+        case 3:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 3;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                    encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
+                    encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
+                }
+            }
+            break;
+        case 4:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 4;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                    encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
+                    encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
+                    encode_buffer[in_pixel_index + 3] += ring_buffer_entry[in_pixel_index + 3] * coefficient;
+                }
+            }
+            break;
+        default:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * channels;
+                    int c;
+                    for (c = 0; c < channels; c++)
+                        encode_buffer[in_pixel_index + c] += ring_buffer_entry[in_pixel_index + c] * coefficient;
+                }
+            }
+            break;
+    }
+    stbir__encode_scanline(stbir_info, output_w, (char *) output_data + output_row_start, encode_buffer, channels, alpha_channel, decode);
+}
+
+static void stbir__resample_vertical_downsample(stbir__info* stbir_info, int n)
+{
+    int x, k;
+    int output_w = stbir_info->output_w;
+    stbir__contributors* vertical_contributors = stbir_info->vertical_contributors;
+    float* vertical_coefficients = stbir_info->vertical_coefficients;
+    int channels = stbir_info->channels;
+    int ring_buffer_entries = stbir_info->ring_buffer_num_entries;
+    float* horizontal_buffer = stbir_info->horizontal_buffer;
+    int coefficient_width = stbir_info->vertical_coefficient_width;
+    int contributor = n + stbir_info->vertical_filter_pixel_margin;
+
+    float* ring_buffer = stbir_info->ring_buffer;
+    int ring_buffer_begin_index = stbir_info->ring_buffer_begin_index;
+    int ring_buffer_first_scanline = stbir_info->ring_buffer_first_scanline;
+    int ring_buffer_length = stbir_info->ring_buffer_length_bytes/sizeof(float);
+    int n0,n1;
+
+    n0 = vertical_contributors[contributor].n0;
+    n1 = vertical_contributors[contributor].n1;
+
+    STBIR_ASSERT(!stbir__use_height_upsampling(stbir_info));
+
+    for (k = n0; k <= n1; k++)
+    {
+        int coefficient_index = k - n0;
+        int coefficient_group = coefficient_width * contributor;
+        float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+
+        float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+
+        switch (channels) {
+            case 1:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 1;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                }
+                break;
+            case 2:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 2;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
+                }
+                break;
+            case 3:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 3;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
+                }
+                break;
+            case 4:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 4;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 3] += horizontal_buffer[in_pixel_index + 3] * coefficient;
+                }
+                break;
+            default:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * channels;
+
+                    int c;
+                    for (c = 0; c < channels; c++)
+                        ring_buffer_entry[in_pixel_index + c] += horizontal_buffer[in_pixel_index + c] * coefficient;
+                }
+                break;
+        }
+    }
+}
+
+static void stbir__buffer_loop_upsample(stbir__info* stbir_info)
+{
+    int y;
+    float scale_ratio = stbir_info->vertical_scale;
+    float out_scanlines_radius = stbir__filter_info_table[stbir_info->vertical_filter].support(1/scale_ratio) * scale_ratio;
+
+    STBIR_ASSERT(stbir__use_height_upsampling(stbir_info));
+
+    for (y = 0; y < stbir_info->output_h; y++)
+    {
+        float in_center_of_out = 0; // Center of the current out scanline in the in scanline space
+        int in_first_scanline = 0, in_last_scanline = 0;
+
+        stbir__calculate_sample_range_upsample(y, out_scanlines_radius, scale_ratio, stbir_info->vertical_shift, &in_first_scanline, &in_last_scanline, &in_center_of_out);
+
+        STBIR_ASSERT(in_last_scanline - in_first_scanline + 1 <= stbir_info->ring_buffer_num_entries);
+
+        if (stbir_info->ring_buffer_begin_index >= 0)
+        {
+            // Get rid of whatever we don't need anymore.
+            while (in_first_scanline > stbir_info->ring_buffer_first_scanline)
+            {
+                if (stbir_info->ring_buffer_first_scanline == stbir_info->ring_buffer_last_scanline)
+                {
+                    // We just popped the last scanline off the ring buffer.
+                    // Reset it to the empty state.
+                    stbir_info->ring_buffer_begin_index = -1;
+                    stbir_info->ring_buffer_first_scanline = 0;
+                    stbir_info->ring_buffer_last_scanline = 0;
+                    break;
+                }
+                else
+                {
+                    stbir_info->ring_buffer_first_scanline++;
+                    stbir_info->ring_buffer_begin_index = (stbir_info->ring_buffer_begin_index + 1) % stbir_info->ring_buffer_num_entries;
+                }
+            }
+        }
+
+        // Load in new ones.
+        if (stbir_info->ring_buffer_begin_index < 0)
+            stbir__decode_and_resample_upsample(stbir_info, in_first_scanline);
+
+        while (in_last_scanline > stbir_info->ring_buffer_last_scanline)
+            stbir__decode_and_resample_upsample(stbir_info, stbir_info->ring_buffer_last_scanline + 1);
+
+        // Now all buffers should be ready to write a row of vertical sampling.
+        stbir__resample_vertical_upsample(stbir_info, y);
+
+        STBIR_PROGRESS_REPORT((float)y / stbir_info->output_h);
+    }
+}
+
+static void stbir__empty_ring_buffer(stbir__info* stbir_info, int first_necessary_scanline)
+{
+    int output_stride_bytes = stbir_info->output_stride_bytes;
+    int channels = stbir_info->channels;
+    int alpha_channel = stbir_info->alpha_channel;
+    int type = stbir_info->type;
+    int colorspace = stbir_info->colorspace;
+    int output_w = stbir_info->output_w;
+    void* output_data = stbir_info->output_data;
+    int decode = STBIR__DECODE(type, colorspace);
+
+    float* ring_buffer = stbir_info->ring_buffer;
+    int ring_buffer_length = stbir_info->ring_buffer_length_bytes/sizeof(float);
+
+    if (stbir_info->ring_buffer_begin_index >= 0)
+    {
+        // Get rid of whatever we don't need anymore.
+        while (first_necessary_scanline > stbir_info->ring_buffer_first_scanline)
+        {
+            if (stbir_info->ring_buffer_first_scanline >= 0 && stbir_info->ring_buffer_first_scanline < stbir_info->output_h)
+            {
+                int output_row_start = stbir_info->ring_buffer_first_scanline * output_stride_bytes;
+                float* ring_buffer_entry = stbir__get_ring_buffer_entry(ring_buffer, stbir_info->ring_buffer_begin_index, ring_buffer_length);
+                stbir__encode_scanline(stbir_info, output_w, (char *) output_data + output_row_start, ring_buffer_entry, channels, alpha_channel, decode);
+                STBIR_PROGRESS_REPORT((float)stbir_info->ring_buffer_first_scanline / stbir_info->output_h);
+            }
+
+            if (stbir_info->ring_buffer_first_scanline == stbir_info->ring_buffer_last_scanline)
+            {
+                // We just popped the last scanline off the ring buffer.
+                // Reset it to the empty state.
+                stbir_info->ring_buffer_begin_index = -1;
+                stbir_info->ring_buffer_first_scanline = 0;
+                stbir_info->ring_buffer_last_scanline = 0;
+                break;
+            }
+            else
+            {
+                stbir_info->ring_buffer_first_scanline++;
+                stbir_info->ring_buffer_begin_index = (stbir_info->ring_buffer_begin_index + 1) % stbir_info->ring_buffer_num_entries;
+            }
+        }
+    }
+}
+
+static void stbir__buffer_loop_downsample(stbir__info* stbir_info)
+{
+    int y;
+    float scale_ratio = stbir_info->vertical_scale;
+    int output_h = stbir_info->output_h;
+    float in_pixels_radius = stbir__filter_info_table[stbir_info->vertical_filter].support(scale_ratio) / scale_ratio;
+    int pixel_margin = stbir_info->vertical_filter_pixel_margin;
+    int max_y = stbir_info->input_h + pixel_margin;
+
+    STBIR_ASSERT(!stbir__use_height_upsampling(stbir_info));
+
+    for (y = -pixel_margin; y < max_y; y++)
+    {
+        float out_center_of_in; // Center of the current out scanline in the in scanline space
+        int out_first_scanline, out_last_scanline;
+
+        stbir__calculate_sample_range_downsample(y, in_pixels_radius, scale_ratio, stbir_info->vertical_shift, &out_first_scanline, &out_last_scanline, &out_center_of_in);
+
+        STBIR_ASSERT(out_last_scanline - out_first_scanline + 1 <= stbir_info->ring_buffer_num_entries);
+
+        if (out_last_scanline < 0 || out_first_scanline >= output_h)
+            continue;
+
+        stbir__empty_ring_buffer(stbir_info, out_first_scanline);
+
+        stbir__decode_and_resample_downsample(stbir_info, y);
+
+        // Load in new ones.
+        if (stbir_info->ring_buffer_begin_index < 0)
+            stbir__add_empty_ring_buffer_entry(stbir_info, out_first_scanline);
+
+        while (out_last_scanline > stbir_info->ring_buffer_last_scanline)
+            stbir__add_empty_ring_buffer_entry(stbir_info, stbir_info->ring_buffer_last_scanline + 1);
+
+        // Now the horizontal buffer is ready to write to all ring buffer rows.
+        stbir__resample_vertical_downsample(stbir_info, y);
+    }
+
+    stbir__empty_ring_buffer(stbir_info, stbir_info->output_h);
+}
+
+static void stbir__setup(stbir__info *info, int input_w, int input_h, int output_w, int output_h, int channels)
+{
+    info->input_w = input_w;
+    info->input_h = input_h;
+    info->output_w = output_w;
+    info->output_h = output_h;
+    info->channels = channels;
+}
+
+static void stbir__calculate_transform(stbir__info *info, float s0, float t0, float s1, float t1, float *transform)
+{
+    info->s0 = s0;
+    info->t0 = t0;
+    info->s1 = s1;
+    info->t1 = t1;
+
+    if (transform)
+    {
+        info->horizontal_scale = transform[0];
+        info->vertical_scale   = transform[1];
+        info->horizontal_shift = transform[2];
+        info->vertical_shift   = transform[3];
+    }
+    else
+    {
+        info->horizontal_scale = ((float)info->output_w / info->input_w) / (s1 - s0);
+        info->vertical_scale = ((float)info->output_h / info->input_h) / (t1 - t0);
+
+        info->horizontal_shift = s0 * info->output_w / (s1 - s0);
+        info->vertical_shift = t0 * info->output_h / (t1 - t0);
+    }
+}
+
+static void stbir__choose_filter(stbir__info *info, stbir_filter h_filter, stbir_filter v_filter)
+{
+    if (h_filter == 0)
+        h_filter = stbir__use_upsampling(info->horizontal_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
+    if (v_filter == 0)
+        v_filter = stbir__use_upsampling(info->vertical_scale)   ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
+    info->horizontal_filter = h_filter;
+    info->vertical_filter = v_filter;
+}
+
+static stbir_uint32 stbir__calculate_memory(stbir__info *info)
+{
+    int pixel_margin = stbir__get_filter_pixel_margin(info->horizontal_filter, info->horizontal_scale);
+    int filter_height = stbir__get_filter_pixel_width(info->vertical_filter, info->vertical_scale);
+
+    info->horizontal_num_contributors = stbir__get_contributors(info->horizontal_scale, info->horizontal_filter, info->input_w, info->output_w);
+    info->vertical_num_contributors   = stbir__get_contributors(info->vertical_scale  , info->vertical_filter  , info->input_h, info->output_h);
+
+    // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
+    info->ring_buffer_num_entries = filter_height + 1;
+
+    info->horizontal_contributors_size = info->horizontal_num_contributors * sizeof(stbir__contributors);
+    info->horizontal_coefficients_size = stbir__get_total_horizontal_coefficients(info) * sizeof(float);
+    info->vertical_contributors_size = info->vertical_num_contributors * sizeof(stbir__contributors);
+    info->vertical_coefficients_size = stbir__get_total_vertical_coefficients(info) * sizeof(float);
+    info->decode_buffer_size = (info->input_w + pixel_margin * 2) * info->channels * sizeof(float);
+    info->horizontal_buffer_size = info->output_w * info->channels * sizeof(float);
+    info->ring_buffer_size = info->output_w * info->channels * info->ring_buffer_num_entries * sizeof(float);
+    info->encode_buffer_size = info->output_w * info->channels * sizeof(float);
+
+    STBIR_ASSERT(info->horizontal_filter != 0);
+    STBIR_ASSERT(info->horizontal_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table)); // this now happens too late
+    STBIR_ASSERT(info->vertical_filter != 0);
+    STBIR_ASSERT(info->vertical_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table)); // this now happens too late
+
+    if (stbir__use_height_upsampling(info))
+        // The horizontal buffer is for when we're downsampling the height and we
+        // can't output the result of sampling the decode buffer directly into the
+        // ring buffers.
+        info->horizontal_buffer_size = 0;
+    else
+        // The encode buffer is to retain precision in the height upsampling method
+        // and isn't used when height downsampling.
+        info->encode_buffer_size = 0;
+
+    return info->horizontal_contributors_size + info->horizontal_coefficients_size
+        + info->vertical_contributors_size + info->vertical_coefficients_size
+        + info->decode_buffer_size + info->horizontal_buffer_size
+        + info->ring_buffer_size + info->encode_buffer_size;
+}
+
+static int stbir__resize_allocated(stbir__info *info,
+    const void* input_data, int input_stride_in_bytes,
+    void* output_data, int output_stride_in_bytes,
+    int alpha_channel, stbir_uint32 flags, stbir_datatype type,
+    stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace,
+    void* tempmem, size_t tempmem_size_in_bytes)
+{
+    size_t memory_required = stbir__calculate_memory(info);
+
+    int width_stride_input = input_stride_in_bytes ? input_stride_in_bytes : info->channels * info->input_w * stbir__type_size[type];
+    int width_stride_output = output_stride_in_bytes ? output_stride_in_bytes : info->channels * info->output_w * stbir__type_size[type];
+
+#ifdef STBIR_DEBUG_OVERWRITE_TEST
+#define OVERWRITE_ARRAY_SIZE 8
+    unsigned char overwrite_output_before_pre[OVERWRITE_ARRAY_SIZE];
+    unsigned char overwrite_tempmem_before_pre[OVERWRITE_ARRAY_SIZE];
+    unsigned char overwrite_output_after_pre[OVERWRITE_ARRAY_SIZE];
+    unsigned char overwrite_tempmem_after_pre[OVERWRITE_ARRAY_SIZE];
+
+    size_t begin_forbidden = width_stride_output * (info->output_h - 1) + info->output_w * info->channels * stbir__type_size[type];
+    memcpy(overwrite_output_before_pre, &((unsigned char*)output_data)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE);
+    memcpy(overwrite_output_after_pre, &((unsigned char*)output_data)[begin_forbidden], OVERWRITE_ARRAY_SIZE);
+    memcpy(overwrite_tempmem_before_pre, &((unsigned char*)tempmem)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE);
+    memcpy(overwrite_tempmem_after_pre, &((unsigned char*)tempmem)[tempmem_size_in_bytes], OVERWRITE_ARRAY_SIZE);
+#endif
+
+    STBIR_ASSERT(info->channels >= 0);
+    STBIR_ASSERT(info->channels <= STBIR_MAX_CHANNELS);
+
+    if (info->channels < 0 || info->channels > STBIR_MAX_CHANNELS)
+        return 0;
+
+    STBIR_ASSERT(info->horizontal_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table));
+    STBIR_ASSERT(info->vertical_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table));
+
+    if (info->horizontal_filter >= STBIR__ARRAY_SIZE(stbir__filter_info_table))
+        return 0;
+    if (info->vertical_filter >= STBIR__ARRAY_SIZE(stbir__filter_info_table))
+        return 0;
+
+    if (alpha_channel < 0)
+        flags |= STBIR_FLAG_ALPHA_USES_COLORSPACE | STBIR_FLAG_ALPHA_PREMULTIPLIED;
+
+    if (!(flags&STBIR_FLAG_ALPHA_USES_COLORSPACE) || !(flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)) {
+        STBIR_ASSERT(alpha_channel >= 0 && alpha_channel < info->channels);
+    }
+
+    if (alpha_channel >= info->channels)
+        return 0;
+
+    STBIR_ASSERT(tempmem);
+
+    if (!tempmem)
+        return 0;
+
+    STBIR_ASSERT(tempmem_size_in_bytes >= memory_required);
+
+    if (tempmem_size_in_bytes < memory_required)
+        return 0;
+
+    memset(tempmem, 0, tempmem_size_in_bytes);
+
+    info->input_data = input_data;
+    info->input_stride_bytes = width_stride_input;
+
+    info->output_data = output_data;
+    info->output_stride_bytes = width_stride_output;
+
+    info->alpha_channel = alpha_channel;
+    info->flags = flags;
+    info->type = type;
+    info->edge_horizontal = edge_horizontal;
+    info->edge_vertical = edge_vertical;
+    info->colorspace = colorspace;
+
+    info->horizontal_coefficient_width   = stbir__get_coefficient_width  (info->horizontal_filter, info->horizontal_scale);
+    info->vertical_coefficient_width     = stbir__get_coefficient_width  (info->vertical_filter  , info->vertical_scale  );
+    info->horizontal_filter_pixel_width  = stbir__get_filter_pixel_width (info->horizontal_filter, info->horizontal_scale);
+    info->vertical_filter_pixel_width    = stbir__get_filter_pixel_width (info->vertical_filter  , info->vertical_scale  );
+    info->horizontal_filter_pixel_margin = stbir__get_filter_pixel_margin(info->horizontal_filter, info->horizontal_scale);
+    info->vertical_filter_pixel_margin   = stbir__get_filter_pixel_margin(info->vertical_filter  , info->vertical_scale  );
+
+    info->ring_buffer_length_bytes = info->output_w * info->channels * sizeof(float);
+    info->decode_buffer_pixels = info->input_w + info->horizontal_filter_pixel_margin * 2;
+
+#define STBIR__NEXT_MEMPTR(current, newtype) (newtype*)(((unsigned char*)current) + current##_size)
+
+    info->horizontal_contributors = (stbir__contributors *) tempmem;
+    info->horizontal_coefficients = STBIR__NEXT_MEMPTR(info->horizontal_contributors, float);
+    info->vertical_contributors = STBIR__NEXT_MEMPTR(info->horizontal_coefficients, stbir__contributors);
+    info->vertical_coefficients = STBIR__NEXT_MEMPTR(info->vertical_contributors, float);
+    info->decode_buffer = STBIR__NEXT_MEMPTR(info->vertical_coefficients, float);
+
+    if (stbir__use_height_upsampling(info))
+    {
+        info->horizontal_buffer = NULL;
+        info->ring_buffer = STBIR__NEXT_MEMPTR(info->decode_buffer, float);
+        info->encode_buffer = STBIR__NEXT_MEMPTR(info->ring_buffer, float);
+
+        STBIR_ASSERT((size_t)STBIR__NEXT_MEMPTR(info->encode_buffer, unsigned char) == (size_t)tempmem + tempmem_size_in_bytes);
+    }
+    else
+    {
+        info->horizontal_buffer = STBIR__NEXT_MEMPTR(info->decode_buffer, float);
+        info->ring_buffer = STBIR__NEXT_MEMPTR(info->horizontal_buffer, float);
+        info->encode_buffer = NULL;
+
+        STBIR_ASSERT((size_t)STBIR__NEXT_MEMPTR(info->ring_buffer, unsigned char) == (size_t)tempmem + tempmem_size_in_bytes);
+    }
+
+#undef STBIR__NEXT_MEMPTR
+
+    // This signals that the ring buffer is empty
+    info->ring_buffer_begin_index = -1;
+
+    stbir__calculate_filters(info->horizontal_contributors, info->horizontal_coefficients, info->horizontal_filter, info->horizontal_scale, info->horizontal_shift, info->input_w, info->output_w);
+    stbir__calculate_filters(info->vertical_contributors, info->vertical_coefficients, info->vertical_filter, info->vertical_scale, info->vertical_shift, info->input_h, info->output_h);
+
+    STBIR_PROGRESS_REPORT(0);
+
+    if (stbir__use_height_upsampling(info))
+        stbir__buffer_loop_upsample(info);
+    else
+        stbir__buffer_loop_downsample(info);
+
+    STBIR_PROGRESS_REPORT(1);
+
+#ifdef STBIR_DEBUG_OVERWRITE_TEST
+    STBIR_ASSERT(memcmp(overwrite_output_before_pre, &((unsigned char*)output_data)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE) == 0);
+    STBIR_ASSERT(memcmp(overwrite_output_after_pre, &((unsigned char*)output_data)[begin_forbidden], OVERWRITE_ARRAY_SIZE) == 0);
+    STBIR_ASSERT(memcmp(overwrite_tempmem_before_pre, &((unsigned char*)tempmem)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE) == 0);
+    STBIR_ASSERT(memcmp(overwrite_tempmem_after_pre, &((unsigned char*)tempmem)[tempmem_size_in_bytes], OVERWRITE_ARRAY_SIZE) == 0);
+#endif
+
+    return 1;
+}
+
+
+static int stbir__resize_arbitrary(
+    void *alloc_context,
+    const void* input_data, int input_w, int input_h, int input_stride_in_bytes,
+    void* output_data, int output_w, int output_h, int output_stride_in_bytes,
+    float s0, float t0, float s1, float t1, float *transform,
+    int channels, int alpha_channel, stbir_uint32 flags, stbir_datatype type,
+    stbir_filter h_filter, stbir_filter v_filter,
+    stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace)
+{
+    stbir__info info;
+    int result;
+    size_t memory_required;
+    void* extra_memory;
+
+    stbir__setup(&info, input_w, input_h, output_w, output_h, channels);
+    stbir__calculate_transform(&info, s0,t0,s1,t1,transform);
+    stbir__choose_filter(&info, h_filter, v_filter);
+    memory_required = stbir__calculate_memory(&info);
+    extra_memory = STBIR_MALLOC(memory_required, alloc_context);
+
+    if (!extra_memory)
+        return 0;
+
+    result = stbir__resize_allocated(&info, input_data, input_stride_in_bytes,
+                                            output_data, output_stride_in_bytes,
+                                            alpha_channel, flags, type,
+                                            edge_horizontal, edge_vertical,
+                                            colorspace, extra_memory, memory_required);
+
+    STBIR_FREE(extra_memory, alloc_context);
+
+    return result;
+}
+
+STBIRDEF int stbir_resize_uint8(     const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,-1,0, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
+}
+
+STBIRDEF int stbir_resize_float(     const float *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           float *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,-1,0, STBIR_TYPE_FLOAT, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
+}
+
+STBIRDEF int stbir_resize_uint8_srgb(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels, int alpha_channel, int flags)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB);
+}
+
+STBIRDEF int stbir_resize_uint8_srgb_edgemode(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                                    unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                              int num_channels, int alpha_channel, int flags,
+                                              stbir_edge edge_wrap_mode)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        edge_wrap_mode, edge_wrap_mode, STBIR_COLORSPACE_SRGB);
+}
+
+STBIRDEF int stbir_resize_uint8_generic( const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                               unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
+                                         void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
+        edge_wrap_mode, edge_wrap_mode, space);
+}
+
+STBIRDEF int stbir_resize_uint16_generic(const stbir_uint16 *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
+                                               stbir_uint16 *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
+                                         void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT16, filter, filter,
+        edge_wrap_mode, edge_wrap_mode, space);
+}
+
+
+STBIRDEF int stbir_resize_float_generic( const float *input_pixels         , int input_w , int input_h , int input_stride_in_bytes,
+                                               float *output_pixels        , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
+                                         void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_FLOAT, filter, filter,
+        edge_wrap_mode, edge_wrap_mode, space);
+}
+
+
+STBIRDEF int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
+        edge_mode_horizontal, edge_mode_vertical, space);
+}
+
+
+STBIRDEF int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float x_scale, float y_scale,
+                                   float x_offset, float y_offset)
+{
+    float transform[4];
+    transform[0] = x_scale;
+    transform[1] = y_scale;
+    transform[2] = x_offset;
+    transform[3] = y_offset;
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,transform,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
+        edge_mode_horizontal, edge_mode_vertical, space);
+}
+
+STBIRDEF int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float s0, float t0, float s1, float t1)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        s0,t0,s1,t1,NULL,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
+        edge_mode_horizontal, edge_mode_vertical, space);
+}
+
+#endif // STB_IMAGE_RESIZE_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/deprecated/stretch_test.c b/vendor/stb/deprecated/stretch_test.c
new file mode 100644
index 0000000..772237c
--- /dev/null
+++ b/vendor/stb/deprecated/stretch_test.c
@@ -0,0 +1,29 @@
+// check that stb_truetype compiles with no stb_rect_pack.h
+#define STB_TRUETYPE_IMPLEMENTATION
+#include "stb_truetype.h"
+
+#define STB_DS_IMPLEMENTATION
+#include "stb_ds.h"
+#include <assert.h>
+
+int main(int arg, char **argv)
+{
+   int i;
+   int *arr = NULL;
+
+   for (i=0; i < 1000000; ++i)
+      arrput(arr, i);
+
+   assert(arrlen(arr) == 1000000);
+   for (i=0; i < 1000000; ++i)
+      assert(arr[i] == i);
+
+   arrfree(arr);
+   arr = NULL;
+
+   for (i=0; i < 1000; ++i)
+      arrput(arr, 1000);
+   assert(arrlen(arr) == 1000000);
+
+   return 0;
+}
\ No newline at end of file
diff --git a/vendor/stb/deprecated/stretchy_buffer.h b/vendor/stb/deprecated/stretchy_buffer.h
new file mode 100644
index 0000000..c0880ff
--- /dev/null
+++ b/vendor/stb/deprecated/stretchy_buffer.h
@@ -0,0 +1,263 @@
+// stretchy_buffer.h - v1.04 - public domain - nothings.org/stb
+// a vector<>-like dynamic array for C
+//
+// version history:
+//      1.04 -  fix warning
+//      1.03 -  compile as C++ maybe
+//      1.02 -  tweaks to syntax for no good reason
+//      1.01 -  added a "common uses" documentation section
+//      1.0  -  fixed bug in the version I posted prematurely
+//      0.9  -  rewrite to try to avoid strict-aliasing optimization
+//              issues, but won't compile as C++
+//
+// Will probably not work correctly with strict-aliasing optimizations.
+//
+// The idea:
+//
+//    This implements an approximation to C++ vector<> for C, in that it
+//    provides a generic definition for dynamic arrays which you can
+//    still access in a typesafe way using arr[i] or *(arr+i). However,
+//    it is simply a convenience wrapper around the common idiom of
+//    of keeping a set of variables (in a struct or globals) which store
+//        - pointer to array
+//        - the length of the "in-use" part of the array
+//        - the current size of the allocated array
+//
+//    I find it to be the single most useful non-built-in-structure when
+//    programming in C (hash tables a close second), but to be clear
+//    it lacks many of the capabilities of C++ vector<>: there is no
+//    range checking, the object address isn't stable (see next section
+//    for details), the set of methods available is small (although
+//    the file stb.h has another implementation of stretchy buffers
+//    called 'stb_arr' which provides more methods, e.g. for insertion
+//    and deletion).
+//
+// How to use:
+//
+//    Unlike other stb header file libraries, there is no need to
+//    define an _IMPLEMENTATION symbol. Every #include creates as
+//    much implementation is needed.
+//
+//    stretchy_buffer.h does not define any types, so you do not
+//    need to #include it to before defining data types that are
+//    stretchy buffers, only in files that *manipulate* stretchy
+//    buffers.
+//
+//    If you want a stretchy buffer aka dynamic array containing
+//    objects of TYPE, declare such an array as:
+//
+//       TYPE *myarray = NULL;
+//
+//    (There is no typesafe way to distinguish between stretchy
+//    buffers and regular arrays/pointers; this is necessary to
+//    make ordinary array indexing work on these objects.)
+//
+//    Unlike C++ vector<>, the stretchy_buffer has the same
+//    semantics as an object that you manually malloc and realloc.
+//    The pointer may relocate every time you add a new object
+//    to it, so you:
+//
+//         1. can't take long-term pointers to elements of the array
+//         2. have to return the pointer from functions which might expand it
+//            (either as a return value or by storing it to a ptr-to-ptr)
+//
+//    Now you can do the following things with this array:
+//
+//         sb_free(TYPE *a)           free the array
+//         sb_count(TYPE *a)          the number of elements in the array
+//         sb_push(TYPE *a, TYPE v)   adds v on the end of the array, a la push_back
+//         sb_add(TYPE *a, int n)     adds n uninitialized elements at end of array & returns pointer to first added
+//         sb_last(TYPE *a)           returns an lvalue of the last item in the array
+//         a[n]                       access the nth (counting from 0) element of the array
+//
+//     #define STRETCHY_BUFFER_NO_SHORT_NAMES to only export
+//     names of the form 'stb_sb_' if you have a name that would
+//     otherwise collide.
+//
+//     Note that these are all macros and many of them evaluate
+//     their arguments more than once, so the arguments should
+//     be side-effect-free.
+//
+//     Note that 'TYPE *a' in sb_push and sb_add must be lvalues
+//     so that the library can overwrite the existing pointer if
+//     the object has to be reallocated.
+//
+//     In an out-of-memory condition, the code will try to
+//     set up a null-pointer or otherwise-invalid-pointer
+//     exception to happen later. It's possible optimizing
+//     compilers could detect this write-to-null statically
+//     and optimize away some of the code, but it should only
+//     be along the failure path. Nevertheless, for more security
+//     in the face of such compilers, #define STRETCHY_BUFFER_OUT_OF_MEMORY
+//     to a statement such as assert(0) or exit(1) or something
+//     to force a failure when out-of-memory occurs.
+//
+// Common use:
+//
+//    The main application for this is when building a list of
+//    things with an unknown quantity, either due to loading from
+//    a file or through a process which produces an unpredictable
+//    number.
+//
+//    My most common idiom is something like:
+//
+//       SomeStruct *arr = NULL;
+//       while (something)
+//       {
+//          SomeStruct new_one;
+//          new_one.whatever = whatever;
+//          new_one.whatup   = whatup;
+//          new_one.foobar   = barfoo;
+//          sb_push(arr, new_one);
+//       }
+//
+//    and various closely-related factorings of that. For example,
+//    you might have several functions to create/init new SomeStructs,
+//    and if you use the above idiom, you might prefer to make them
+//    return structs rather than take non-const-pointers-to-structs,
+//    so you can do things like:
+//
+//       SomeStruct *arr = NULL;
+//       while (something)
+//       {
+//          if (case_A) {
+//             sb_push(arr, some_func1());
+//          } else if (case_B) {
+//             sb_push(arr, some_func2());
+//          } else {
+//             sb_push(arr, some_func3());
+//          }
+//       }
+//
+//    Note that the above relies on the fact that sb_push doesn't
+//    evaluate its second argument more than once. The macros do
+//    evaluate the *array* argument multiple times, and numeric
+//    arguments may be evaluated multiple times, but you can rely
+//    on the second argument of sb_push being evaluated only once.
+//
+//    Of course, you don't have to store bare objects in the array;
+//    if you need the objects to have stable pointers, store an array
+//    of pointers instead:
+//
+//       SomeStruct **arr = NULL;
+//       while (something)
+//       {
+//          SomeStruct *new_one = malloc(sizeof(*new_one));
+//          new_one->whatever = whatever;
+//          new_one->whatup   = whatup;
+//          new_one->foobar   = barfoo;
+//          sb_push(arr, new_one);
+//       }
+//
+// How it works:
+//
+//    A long-standing tradition in things like malloc implementations
+//    is to store extra data before the beginning of the block returned
+//    to the user. The stretchy buffer implementation here uses the
+//    same trick; the current-count and current-allocation-size are
+//    stored before the beginning of the array returned to the user.
+//    (This means you can't directly free() the pointer, because the
+//    allocated pointer is different from the type-safe pointer provided
+//    to the user.)
+//
+//    The details are trivial and implementation is straightforward;
+//    the main trick is in realizing in the first place that it's
+//    possible to do this in a generic, type-safe way in C.
+//
+// Contributors:
+//
+// Timothy Wright (github:ZenToad)
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+#ifndef STB_STRETCHY_BUFFER_H_INCLUDED
+#define STB_STRETCHY_BUFFER_H_INCLUDED
+
+#ifndef NO_STRETCHY_BUFFER_SHORT_NAMES
+#define sb_free   stb_sb_free
+#define sb_push   stb_sb_push
+#define sb_count  stb_sb_count
+#define sb_add    stb_sb_add
+#define sb_last   stb_sb_last
+#endif
+
+#define stb_sb_free(a)         ((a) ? free(stb__sbraw(a)),0 : 0)
+#define stb_sb_push(a,v)       (stb__sbmaybegrow(a,1), (a)[stb__sbn(a)++] = (v))
+#define stb_sb_count(a)        ((a) ? stb__sbn(a) : 0)
+#define stb_sb_add(a,n)        (stb__sbmaybegrow(a,n), stb__sbn(a)+=(n), &(a)[stb__sbn(a)-(n)])
+#define stb_sb_last(a)         ((a)[stb__sbn(a)-1])
+
+#define stb__sbraw(a) ((int *) (void *) (a) - 2)
+#define stb__sbm(a)   stb__sbraw(a)[0]
+#define stb__sbn(a)   stb__sbraw(a)[1]
+
+#define stb__sbneedgrow(a,n)  ((a)==0 || stb__sbn(a)+(n) >= stb__sbm(a))
+#define stb__sbmaybegrow(a,n) (stb__sbneedgrow(a,(n)) ? stb__sbgrow(a,n) : 0)
+#define stb__sbgrow(a,n)      (*((void **)&(a)) = stb__sbgrowf((a), (n), sizeof(*(a))))
+
+#include <stdlib.h>
+
+static void * stb__sbgrowf(void *arr, int increment, int itemsize)
+{
+   int dbl_cur = arr ? 2*stb__sbm(arr) : 0;
+   int min_needed = stb_sb_count(arr) + increment;
+   int m = dbl_cur > min_needed ? dbl_cur : min_needed;
+   int *p = (int *) realloc(arr ? stb__sbraw(arr) : 0, itemsize * m + sizeof(int)*2);
+   if (p) {
+      if (!arr)
+         p[1] = 0;
+      p[0] = m;
+      return p+2;
+   } else {
+      #ifdef STRETCHY_BUFFER_OUT_OF_MEMORY
+      STRETCHY_BUFFER_OUT_OF_MEMORY ;
+      #endif
+      return (void *) (2*sizeof(int)); // try to force a NULL pointer exception later
+   }
+}
+#endif // STB_STRETCHY_BUFFER_H_INCLUDED
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/deprecated/stretchy_buffer.txt b/vendor/stb/deprecated/stretchy_buffer.txt
new file mode 100644
index 0000000..dcd747e
--- /dev/null
+++ b/vendor/stb/deprecated/stretchy_buffer.txt
@@ -0,0 +1,28 @@
+// stretchy buffer // init: NULL // free: sbfree() // push_back: sbpush() // size: sbcount() //
+#define sbfree(a)         ((a) ? free(stb__sbraw(a)),0 : 0)
+#define sbpush(a,v)       (stb__sbmaybegrow(a,1), (a)[stb__sbn(a)++] = (v))
+#define sbcount(a)        ((a) ? stb__sbn(a) : 0)
+#define sbadd(a,n)        (stb__sbmaybegrow(a,n), stb__sbn(a)+=(n), &(a)[stb__sbn(a)-(n)])
+#define sblast(a)         ((a)[stb__sbn(a)-1])
+
+#include <stdlib.h>
+#define stb__sbraw(a) ((int *) (a) - 2)
+#define stb__sbm(a)   stb__sbraw(a)[0]
+#define stb__sbn(a)   stb__sbraw(a)[1]
+
+#define stb__sbneedgrow(a,n)  ((a)==0 || stb__sbn(a)+n >= stb__sbm(a))
+#define stb__sbmaybegrow(a,n) (stb__sbneedgrow(a,(n)) ? stb__sbgrow(a,n) : 0)
+#define stb__sbgrow(a,n)  stb__sbgrowf((void **) &(a), (n), sizeof(*(a)))
+
+static void stb__sbgrowf(void **arr, int increment, int itemsize)
+{
+   int m = *arr ? 2*stb__sbm(*arr)+increment : increment+1;
+   void *p = realloc(*arr ? stb__sbraw(*arr) : 0, itemsize * m + sizeof(int)*2);
+   assert(p);
+   if (p) {
+      if (!*arr) ((int *) p)[1] = 0;
+      *arr = (void *) ((int *) p + 2);
+      stb__sbm(*arr) = m;
+   }
+}
+
diff --git a/vendor/stb/docs/other_libs.md b/vendor/stb/docs/other_libs.md
new file mode 100644
index 0000000..62f379c
--- /dev/null
+++ b/vendor/stb/docs/other_libs.md
@@ -0,0 +1 @@
+Moved to https://github.com/nothings/single_file_libs
\ No newline at end of file
diff --git a/vendor/stb/docs/stb_howto.txt b/vendor/stb/docs/stb_howto.txt
new file mode 100644
index 0000000..a969b54
--- /dev/null
+++ b/vendor/stb/docs/stb_howto.txt
@@ -0,0 +1,185 @@
+Lessons learned about how to make a header-file library
+V1.0
+September 2013 Sean Barrett
+
+Things to do in an stb-style header-file library,
+and rationales:
+
+
+1. #define LIBRARYNAME_IMPLEMENTATION
+
+Use a symbol like the above to control creating
+the implementation. (I used a far-less-clear name
+in my first header-file library; it became
+clear that was a mistake once I had multiple
+libraries.)
+
+Include a "header-file" section with header-file
+guards and declarations for all the functions,
+but only guard the implementation with LIBRARYNAME_IMPLEMENTATION,
+not the header-file guard. That way, if client's
+header file X includes your header file for
+declarations, they can still include header file X
+in the source file that creates the implementation;
+if you guard the implementation too, then the first
+include (before the #define) creates the declarations,
+and the second one (after the #define) does nothing.
+
+
+2. AVOID DEPENDENCIES
+
+Don't rely on anything other than the C standard libraries.
+
+(If you're creating a library specifically to leverage/wrap
+some other library, then obviously you can rely on that
+library. But if that library is public domain, you might
+be better off directly embedding the source, to reduce
+dependencies for your clients. But of course now you have
+to update whenever that library updates.)
+
+If you use stdlib, consider wrapping all stdlib calls in
+macros, and then conditionally define those macros to the
+stdlib function, allowing the user to replace them.
+
+For functions with side effects, like memory allocations,
+consider letting the user pass in a context and pass
+that in to the macros. (The stdlib versions will ignore
+the parameter.) Otherwise, users may have to use global
+or thread-local variables to achieve the same effect.
+
+
+3. AVOID MALLOC
+
+You can't always do this, but when you can, embedded developers
+will appreciate it. I almost never bother avoiding, as it's
+too much work (and in some cases is pretty infeasible;
+see http://nothings.org/gamedev/font_rendering_malloc.txt ).
+But it's definitely something one of the things I've gotten
+the most pushback on from potential users.
+
+
+4. ALLOW STATIC IMPLEMENTATION
+
+Have a #define which makes function declarations and
+function definitions static. This makes the implementation
+private to the source file that creates it. This allows
+people to use your library multiple times in their project
+without collision. (This is only necessary if your library
+has configuration macros or global state, or if your
+library has multiple versions that are not backwards
+compatible. I've run into both of those cases.)
+
+
+5. MAKE ACCESSIBLE FROM C
+
+Making your code accessible from C instead of C++ (i.e.
+either coding in C, or using extern "C") makes it more
+straightforward to be used in C and in other languages,
+which often only have support for C bindings, not C++.
+(One of the earliest results I found in googling for
+stb_image was a Haskell wrapper.) Otherwise, people
+have to wrap it in another set of function calls, and
+the whole point here is to make it convenient for people
+to use, isn't it? (See below.)
+
+I prefer to code entirely in C, so the source file that
+instantiates the implementation can be C itself, for
+those crazy people out there who are programming in C.
+But it's probably not a big hardship for a C programmer
+to create a single C++ source file to instantiate your
+library.
+
+
+6. NAMESPACE PRIVATE FUNCTIONS
+
+Try to avoid having names in your source code that
+will cause conflicts with identical names in client
+code. You can do this either by namespacing in C++,
+or prefixing with your library name in C.
+
+In C, generally, I use the same prefix for API
+functions and private symbols, such as "stbtt_"
+for stb_truetype; but private functions (and
+static globals) use a second underscore as
+in "stbtt__" to further minimize the chance of
+additional collisions in the unlikely but not
+impossible event that users write wrapper
+functions that have names of the form "stbtt_".
+(Consider the user that has used "stbtt_foo"
+*successfully*, and then upgrades to a new
+version of your library which has a new private
+function named either "stbtt_foo" or "stbtt__foo".)
+
+Note that the double-underscore is reserved for
+use by the compiler, but (1) there is nothing
+reserved for "middleware", i.e. libraries
+desiring to avoid conflicts with user symbols
+have no other good options, and (2) in practice
+no compilers use double-underscore in the middle
+rather than the beginning/end. (Unfortunately,
+there is at least one videogame-console compiler that
+will warn about double-underscores by default.)
+
+
+7. EASY-TO-COMPLY LICENSE
+
+I make my libraries public domain. You don't have to.
+But my goal in releasing stb-style libraries is to
+reduce friction for potential users as much as
+possible. That means:
+
+    a. easy to build (what this file is mostly about)
+    b. easy to invoke (which requires good API design)
+    c. easy to deploy (which is about licensing)
+
+I choose to place all my libraries in the public
+domain, abjuring copyright, rather than license
+the libraries. This has some benefits and some
+drawbacks.
+
+Any license which is "viral" to modifications
+causes worries for lawyers, even if their programmers
+aren't modifying it.
+
+Any license which requires crediting in documentation
+adds friction which can add up. Valve used to have
+a page with a list of all of these on their web site,
+and it was insane, and obviously nobody ever looked
+at it so why would you care whether your credit appeared
+there?
+
+Permissive licenses like zlib and BSD license are
+perfectly reasonable, but they are very wordy and
+have only two benefits over public domain: legally-mandated
+attribution and liability-control. I do not believe these
+are worth the excessive verbosity and user-unfriendliness
+these licenses induce, especially in the single-file
+case where those licenses tend to be at the top of
+the file, the first thing you see. (To the specific
+points, I have had no trouble receiving attribution
+for my libraries; liability in the face of no explicit
+disclaimer of liability is an open question.)
+
+However, public domain has frictions of its own, because
+public domain declarations aren't necessary recognized
+in the USA and some other locations. For that reason,
+I recommend a declaration along these lines:
+
+// This software is dual-licensed to the public domain and under the following
+// license: you are granted a perpetual, irrevocable license to copy, modify,
+// publish, and distribute this file as you see fit.
+
+I typically place this declaration at the end of the initial
+comment block of the file and just say 'public domain'
+at the top.
+
+I have had people say they couldn't use one of my
+libraries because it was only "public domain" and didn't
+have the additional fallback clause, who asked if
+I could dual-license it under a traditional license.
+
+My answer: they can create a derivative work by
+modifying one character, and then license that however
+they like. (Indeed, *adding* the zlib or BSD license
+would be such a modification!) Unfortunately, their
+lawyers reportedly didn't like that answer. :(
diff --git a/vendor/stb/docs/stb_voxel_render_interview.md b/vendor/stb/docs/stb_voxel_render_interview.md
new file mode 100644
index 0000000..7071466
--- /dev/null
+++ b/vendor/stb/docs/stb_voxel_render_interview.md
@@ -0,0 +1,173 @@
+# An interview with STB about stb_voxel_render.h
+
+**Q:**
+I suppose you really like Minecraft?
+
+**A:**
+Not really. I mean, I do own it and play it some, and
+I do watch YouTube videos of other people playing it
+once in a while, but I'm not saying it's that great.
+
+But I do love voxels. I've been playing with voxel rendering
+since the mid-late 90's when we were still doing software
+rendering and thinking maybe polygons weren't the answer.
+Once GPUs came along that kind of died off, at least until
+Minecraft brought it back to attention.
+
+**Q:**
+Do you expect people will make a lot of Minecraft clones
+with this?
+
+**A:**
+I hope not!
+
+For one thing, it's a terrible idea for the
+developer. Remember before Minecraft was on the Xbox 360,
+there were a ton of "indie" clones (some maybe making
+decent money even), but then the real Minecraft came out
+and just crushed them (as far as I know). It's just not
+something you really want to compete with.
+
+The reason I made this library is because I'd like
+to see more games with Minecraft's *art style*, not
+necessary its *gameplay*.
+
+I can understand the urge to clone the gameplay. When
+you have a world made of voxels/blocks, there are a
+few things that become incredibly easy to do that would
+otherwise be very hard (at least for an indie) to do in 3D.
+One thing is that procedural generation becomes much easier.
+Another is that destructible environments are easy. Another
+is that you have a world where your average user can build
+stuff that they find satisfactory.
+
+Minecraft is at a sort of local maximum, a sweet spot, where
+it leverages all of those easy-to-dos. And so I'm sure it's
+hard to look at the space of 'games using voxels' and move
+away from that local maximum, to give up some of that.
+But I think that's what people should do.
+
+**Q:**
+So what else can people do with stb_voxel_render?
+
+**A:**
+All of those benefits I mentioned above are still valid even
+if you stay away from the sweet spot. You can make a 3D roguelike
+without player-creation/destruction that uses procedural generation.
+You could make a shooter with pre-designed maps but destructible
+environments.
+
+And I'm sure there are other possible benefits to using voxels/blocks.
+Hopefully this will make it easier for people to explore the space.
+
+The library has a pretty wide range of features to allow
+people to come up with some distinctive looks. For example,
+the art style of Continue?9876543210 was one of the inspirations
+for trying to make the multitexturing capabilities flexible.
+I'm terrible at art, so this isn't really something I can
+come up with myself, but I tried to put in flexible
+technology that could be used multiple ways.
+
+One thing I did intentionally was try to make it possible to
+make nicer looking ground terrain, using the half-height
+slopes and "weird slopes". There are Minecraft mods with
+drivable cars and they just go up these blocky slopes and,
+like, what? So I wanted you to be able to make smoother
+terrain, either just for the look, or for vehicles etc.
+Also, you can spatially cross-fade between two ground textures for
+that classic bad dirt/grass transition that has shipped
+in plenty of professional games. Of course, you could
+just use a separate non-voxel ground renderer for all of
+this. But this way, you can seamlessly integrate everything
+else with it. E.g. in your authoring tool (or procedural
+generation) you can make smooth ground and then cut a
+sharp-edged hole in it for a building's basement or whatever.
+
+Another thing you can do is work at a very different scale.
+In Minecraft, a person is just under 2 blocks tall. In
+Ace of Spades, a person is just under 3 blocks tall. Why
+not 4 or 6? Well, partly because you just need a lot more
+voxels; if a meter is 2 voxels in Mineraft and 4 voxels in
+your game, and you draw the same number of voxels due to
+hardware limits, then your game has half the view distance
+of Minecraft. Since stb_voxel_render is designed to keep
+the meshes small and render efficiently, you can push the
+view distance out further than Minecraft--or use a similar
+view distance and a higher voxel resolution. You could also
+stop making infinite worlds and work at entirely different
+scales; where Minecraft is 1 voxel per meter, you could
+have 20 voxels per meter and make a small arena that's
+50 meters wide and 5 meters tall.
+
+Back when the voxel game Voxatron was announced, the weekend
+after the trailer came out I wrote my own little GPU-accelerated
+version of the engine and thought that was pretty cool. I've
+been tempted many times to extract that and release it
+as a library, but
+I don't want to steal Voxatron's thunder so I've avoided
+it. You could use this engine to do the same kind of thing,
+although it won't be as efficient as an engine dedicated to
+that style of thing would be.
+
+**Q:**
+What one thing would you really like to see somebody do?
+
+**A:**
+Before Unity, 3D has seemed deeply problematic in the indie
+space. Software like GameMaker has tried to support 3D but
+it seems like little of note has been done with it.
+
+Minecraft has shown that people can build worlds with the
+Minecraft toolset far more easily than we've ever seen from those
+other tools. Obviously people have done great things with
+Unity, but those people are much closer to professional
+developers; typically they still need real 3D modelling
+and all of that stuff.
+
+So what I'd really like to see is someone build some kind
+of voxel-game-construction-set. Start with stb_voxel_render,
+maybe expose all the flexibility of stb_voxel_render (so
+people can do different things). Thrown in lua or something
+else for scripting, make some kind of editor that feels
+at least as good as Minecraft and Infinifactory, and see
+where that gets you.
+
+**Q:**
+Why'd you make this library?
+
+**A:**
+Mainly as a way of releasing this technology I've been working
+on since 2011 and seemed unlikely to ever ship myself. In 2011
+I was playing the voxel shooter Ace of Spades. One of the maps
+that we played on was a partial port of Broville (which is the
+first Minecraft map in stb_voxel_render release trailer). I'd
+made a bunch of procedural level generators for the game, and
+I started trying to make a city generator inspired by Broville.
+
+But I realized it would be a lot of work, and of very little
+value (most of my maps didn't get much play because people
+preferred to play on maps where they could charge straight
+at the enemies and shoot them as fast as possible). So I
+wrote my own voxel engine and started working on a procedural
+city game. But I got bogged down after I finally got the road
+generator working and never got anywhere with building
+generation or gameplay.
+
+stb_voxel_render is actually a complete rewrite from scratch,
+but it's based a lot on what I learned from that previous work.
+
+**Q:**
+About the release video... how long did that take to edit?
+
+**A:**
+About seven or eight hours. I had the first version done in
+maybe six or seven hours, but then I realized I'd left out
+one clip, and when I went back to add it I also gussied up
+a couple other moments in the video. But there was something
+basically identical to it that was done in around six.
+
+**Q:** 
+Ok, that's it. Thanks, me.
+
+**A:**
+Thanks *me!*
diff --git a/vendor/stb/docs/why_public_domain.md b/vendor/stb/docs/why_public_domain.md
new file mode 100644
index 0000000..fd3f887
--- /dev/null
+++ b/vendor/stb/docs/why_public_domain.md
@@ -0,0 +1,117 @@
+My collected rationales for placing these libraries
+in the public domain:
+
+1. Public domain vs. viral licenses
+
+  Why is this library public domain?
+  Because more people will use it. Because it's not viral, people are
+  not obligated to give back, so you could argue that it hurts the
+  development of it, and then because it doesn't develop as well it's
+  not as good, and then because it's not as good, in the long run
+  maybe fewer people will use it. I have total respect for that
+  opinion, but I just don't believe it myself for most software.
+
+2. Public domain vs. attribution-required licenses
+
+  The primary difference between public domain and, say, a Creative Commons
+  commercial / non-share-alike / attribution license is solely the
+  requirement for attribution. (Similarly the BSD license and such.)
+  While I would *appreciate* acknowledgement and attribution, I believe
+  that it is foolish to place a legal encumberment (i.e. a license) on
+  the software *solely* to get attribution.
+
+  In other words, I'm arguing that PD is superior to the BSD license and
+  the Creative Commons 'Attribution' license. If the license offers
+  anything besides attribution -- as does, e.g., CC NonCommercial-ShareAlike,
+  or the GPL -- that's a separate discussion.
+
+3. Other aspects of BSD-style licenses besides attribution
+
+  Permissive licenses like zlib and BSD license are perfectly reasonable
+  in their requirements, but they are very wordy and
+  have only two benefits over public domain: legally-mandated
+  attribution and liability-control. I do not believe these
+  are worth the excessive verbosity and user-unfriendliness
+  these licenses induce, especially in the single-file
+  case where those licenses tend to be at the top of
+  the file, the first thing you see.
+
+  To the specific points, I have had no trouble receiving
+  attribution for my libraries; liability in the face of
+  no explicit disclaimer of liability is an open question,
+  but one I have a lot of difficulty imagining there being
+  any actual doubt about in court. Sometimes I explicitly
+  note in my libraries that I make no guarantees about them
+  being fit for purpose, but it's pretty absurd to do this;
+  as a whole, it comes across as "here is a library to decode
+  vorbis audio files, but it may not actually work and if
+  you have problems it's not my fault, but also please
+  report bugs so I can fix them"--so dumb!
+
+4. full discussion from stb_howto.txt on what YOU should do for YOUR libs
+
+```
+EASY-TO-COMPLY LICENSE
+
+I make my libraries public domain. You don't have to.
+But my goal in releasing stb-style libraries is to
+reduce friction for potential users as much as
+possible. That means:
+
+  a. easy to build (what this file is mostly about)
+  b. easy to invoke (which requires good API design)
+  c. easy to deploy (which is about licensing)
+
+I choose to place all my libraries in the public
+domain, abjuring copyright, rather than license
+the libraries. This has some benefits and some
+drawbacks.
+
+Any license which is "viral" to modifications
+causes worries for lawyers, even if their programmers
+aren't modifying it.
+
+Any license which requires crediting in documentation
+adds friction which can add up. Valve has a huge list
+(http://nothings.org/remote/ThirdPartyLegalNotices_steam_2019.html)
+of all of these included in each game they ship,
+and it's insane, and obviously nobody ever looks
+at it so why would you care whether your credit
+appeared there?
+
+Permissive licenses like zlib and BSD license are
+perfectly reasonable, but they are very wordy and
+have only two benefits over public domain: legally-mandated
+attribution and liability-control. I do not believe these
+are worth the excessive verbosity and user-unfriendliness
+these licenses induce, especially in the single-file
+case where those licenses tend to be at the top of
+the file, the first thing you see. (To the specific
+points, I have had no trouble receiving attribution
+for my libraries; liability in the face of no explicit
+disclaimer of liability is an open question.)
+
+However, public domain has frictions of its own, because
+public domain declarations aren't necessary recognized
+in the USA and some other locations. For that reason,
+I recommend a declaration along these lines:
+
+// This software is dual-licensed to the public domain and under the following
+// license: you are granted a perpetual, irrevocable license to copy, modify,
+// publish, and distribute this file as you see fit.
+
+I typically place this declaration at the end of the initial
+comment block of the file and just say 'public domain'
+at the top.
+
+I have had people say they couldn't use one of my
+libraries because it was only "public domain" and didn't
+have the additional fallback clause, who asked if
+I could dual-license it under a traditional license.
+
+My answer: they can create a derivative work by
+modifying one character, and then license that however
+they like. (Indeed, *adding* the zlib or BSD license
+would be such a modification!) Unfortunately, their
+lawyers reportedly didn't like that answer. :(
+```
diff --git a/vendor/stb/stb_c_lexer.h b/vendor/stb/stb_c_lexer.h
new file mode 100644
index 0000000..fd42f1c
--- /dev/null
+++ b/vendor/stb/stb_c_lexer.h
@@ -0,0 +1,941 @@
+// stb_c_lexer.h - v0.12 - public domain Sean Barrett 2013
+// lexer for making little C-like languages with recursive-descent parsers
+//
+// This file provides both the interface and the implementation.
+// To instantiate the implementation,
+//      #define STB_C_LEXER_IMPLEMENTATION
+// in *ONE* source file, before #including this file.
+//
+// The default configuration is fairly close to a C lexer, although
+// suffixes on integer constants are not handled (you can override this).
+//
+// History:
+//     0.12 fix compilation bug for NUL support; better support separate inclusion
+//     0.11 fix clang static analysis warning
+//     0.10 fix warnings
+//     0.09 hex floats, no-stdlib fixes
+//     0.08 fix bad pointer comparison
+//     0.07 fix mishandling of hexadecimal constants parsed by strtol
+//     0.06 fix missing next character after ending quote mark (Andreas Fredriksson)
+//     0.05 refixed get_location because github version had lost the fix
+//     0.04 fix octal parsing bug
+//     0.03 added STB_C_LEX_DISCARD_PREPROCESSOR option
+//          refactor API to simplify (only one struct instead of two)
+//          change literal enum names to have 'lit' at the end
+//     0.02 first public release
+//
+// Status:
+//     - haven't tested compiling as C++
+//     - haven't tested the float parsing path
+//     - haven't tested the non-default-config paths (e.g. non-stdlib)
+//     - only tested default-config paths by eyeballing output of self-parse
+//
+//     - haven't implemented multiline strings
+//     - haven't implemented octal/hex character constants
+//     - haven't implemented support for unicode CLEX_char
+//     - need to expand error reporting so you don't just get "CLEX_parse_error"
+//
+// Contributors:
+//   Arpad Goretity (bugfix)
+//   Alan Hickman (hex floats)
+//   github:mundusnine (bugfix)
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+#ifdef STB_C_LEXER_IMPLEMENTATION
+#ifndef STB_C_LEXER_DEFINITIONS
+// to change the default parsing rules, copy the following lines
+// into your C/C++ file *before* including this, and then replace
+// the Y's with N's for the ones you don't want. This needs to be
+// set to the same values for every place in your program where
+// stb_c_lexer.h is included.
+// --BEGIN--
+
+#if defined(Y) || defined(N)
+#error "Can only use stb_c_lexer in contexts where the preprocessor symbols 'Y' and 'N' are not defined"
+#endif
+
+#define STB_C_LEX_C_DECIMAL_INTS    Y   //  "0|[1-9][0-9]*"                        CLEX_intlit
+#define STB_C_LEX_C_HEX_INTS        Y   //  "0x[0-9a-fA-F]+"                       CLEX_intlit
+#define STB_C_LEX_C_OCTAL_INTS      Y   //  "[0-7]+"                               CLEX_intlit
+#define STB_C_LEX_C_DECIMAL_FLOATS  Y   //  "[0-9]*(.[0-9]*([eE][-+]?[0-9]+)?)     CLEX_floatlit
+#define STB_C_LEX_C99_HEX_FLOATS    N   //  "0x{hex}+(.{hex}*)?[pP][-+]?{hex}+     CLEX_floatlit
+#define STB_C_LEX_C_IDENTIFIERS     Y   //  "[_a-zA-Z][_a-zA-Z0-9]*"               CLEX_id
+#define STB_C_LEX_C_DQ_STRINGS      Y   //  double-quote-delimited strings with escapes  CLEX_dqstring
+#define STB_C_LEX_C_SQ_STRINGS      N   //  single-quote-delimited strings with escapes  CLEX_ssstring
+#define STB_C_LEX_C_CHARS           Y   //  single-quote-delimited character with escape CLEX_charlits
+#define STB_C_LEX_C_COMMENTS        Y   //  "/* comment */"
+#define STB_C_LEX_CPP_COMMENTS      Y   //  "// comment to end of line\n"
+#define STB_C_LEX_C_COMPARISONS     Y   //  "==" CLEX_eq  "!=" CLEX_noteq   "<=" CLEX_lesseq  ">=" CLEX_greatereq
+#define STB_C_LEX_C_LOGICAL         Y   //  "&&"  CLEX_andand   "||"  CLEX_oror
+#define STB_C_LEX_C_SHIFTS          Y   //  "<<"  CLEX_shl      ">>"  CLEX_shr
+#define STB_C_LEX_C_INCREMENTS      Y   //  "++"  CLEX_plusplus "--"  CLEX_minusminus
+#define STB_C_LEX_C_ARROW           Y   //  "->"  CLEX_arrow
+#define STB_C_LEX_EQUAL_ARROW       N   //  "=>"  CLEX_eqarrow
+#define STB_C_LEX_C_BITWISEEQ       Y   //  "&="  CLEX_andeq    "|="  CLEX_oreq     "^="  CLEX_xoreq
+#define STB_C_LEX_C_ARITHEQ         Y   //  "+="  CLEX_pluseq   "-="  CLEX_minuseq
+                                        //  "*="  CLEX_muleq    "/="  CLEX_diveq    "%=" CLEX_modeq
+                                        //  if both STB_C_LEX_SHIFTS & STB_C_LEX_ARITHEQ:
+                                        //                      "<<=" CLEX_shleq    ">>=" CLEX_shreq
+
+#define STB_C_LEX_PARSE_SUFFIXES    N   // letters after numbers are parsed as part of those numbers, and must be in suffix list below
+#define STB_C_LEX_DECIMAL_SUFFIXES  ""  // decimal integer suffixes e.g. "uUlL" -- these are returned as-is in string storage
+#define STB_C_LEX_HEX_SUFFIXES      ""  // e.g. "uUlL"
+#define STB_C_LEX_OCTAL_SUFFIXES    ""  // e.g. "uUlL"
+#define STB_C_LEX_FLOAT_SUFFIXES    ""  //
+
+#define STB_C_LEX_0_IS_EOF             N  // if Y, ends parsing at '\0'; if N, returns '\0' as token
+#define STB_C_LEX_INTEGERS_AS_DOUBLES  N  // parses integers as doubles so they can be larger than 'int', but only if STB_C_LEX_STDLIB==N
+#define STB_C_LEX_MULTILINE_DSTRINGS   N  // allow newlines in double-quoted strings
+#define STB_C_LEX_MULTILINE_SSTRINGS   N  // allow newlines in single-quoted strings
+#define STB_C_LEX_USE_STDLIB           Y  // use strtod,strtol for parsing #s; otherwise inaccurate hack
+#define STB_C_LEX_DOLLAR_IDENTIFIER    Y  // allow $ as an identifier character
+#define STB_C_LEX_FLOAT_NO_DECIMAL     Y  // allow floats that have no decimal point if they have an exponent
+
+#define STB_C_LEX_DEFINE_ALL_TOKEN_NAMES  N   // if Y, all CLEX_ token names are defined, even if never returned
+                                              // leaving it as N should help you catch config bugs
+
+#define STB_C_LEX_DISCARD_PREPROCESSOR    Y   // discard C-preprocessor directives (e.g. after prepocess
+                                              // still have #line, #pragma, etc)
+
+//#define STB_C_LEX_ISWHITE(str)    ... // return length in bytes of whitespace characters if first char is whitespace
+
+#define STB_C_LEXER_DEFINITIONS         // This line prevents the header file from replacing your definitions
+// --END--
+#endif
+#endif
+
+#ifndef INCLUDE_STB_C_LEXER_H
+#define INCLUDE_STB_C_LEXER_H
+
+typedef struct
+{
+   // lexer variables
+   char *input_stream;
+   char *eof;
+   char *parse_point;
+   char *string_storage;
+   int   string_storage_len;
+
+   // lexer parse location for error messages
+   char *where_firstchar;
+   char *where_lastchar;
+
+   // lexer token variables
+   long token;
+   double real_number;
+   long   int_number;
+   char *string;
+   int string_len;
+} stb_lexer;
+
+typedef struct
+{
+   int line_number;
+   int line_offset;
+} stb_lex_location;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern void stb_c_lexer_init(stb_lexer *lexer, const char *input_stream, const char *input_stream_end, char *string_store, int store_length);
+// this function initialize the 'lexer' structure
+//   Input:
+//   - input_stream points to the file to parse, loaded into memory
+//   - input_stream_end points to the end of the file, or NULL if you use 0-for-EOF
+//   - string_store is storage the lexer can use for storing parsed strings and identifiers
+//   - store_length is the length of that storage
+
+extern int stb_c_lexer_get_token(stb_lexer *lexer);
+// this function returns non-zero if a token is parsed, or 0 if at EOF
+//   Output:
+//   - lexer->token is the token ID, which is unicode code point for a single-char token, < 0 for a multichar or eof or error
+//   - lexer->real_number is a double constant value for CLEX_floatlit, or CLEX_intlit if STB_C_LEX_INTEGERS_AS_DOUBLES
+//   - lexer->int_number is an integer constant for CLEX_intlit if !STB_C_LEX_INTEGERS_AS_DOUBLES, or character for CLEX_charlit
+//   - lexer->string is a 0-terminated string for CLEX_dqstring or CLEX_sqstring or CLEX_identifier
+//   - lexer->string_len is the byte length of lexer->string
+
+extern void stb_c_lexer_get_location(const stb_lexer *lexer, const char *where, stb_lex_location *loc);
+// this inefficient function returns the line number and character offset of a
+// given location in the file as returned by stb_lex_token. Because it's inefficient,
+// you should only call it for errors, not for every token.
+// For error messages of invalid tokens, you typically want the location of the start
+// of the token (which caused the token to be invalid). For bugs involving legit
+// tokens, you can report the first or the range.
+//    Output:
+//    - loc->line_number is the line number in the file, counting from 1, of the location
+//    - loc->line_offset is the char-offset in the line, counting from 0, of the location
+
+
+#ifdef __cplusplus
+}
+#endif
+
+enum
+{
+   CLEX_eof = 256,
+   CLEX_parse_error,
+   CLEX_intlit        ,
+   CLEX_floatlit      ,
+   CLEX_id            ,
+   CLEX_dqstring      ,
+   CLEX_sqstring      ,
+   CLEX_charlit       ,
+   CLEX_eq            ,
+   CLEX_noteq         ,
+   CLEX_lesseq        ,
+   CLEX_greatereq     ,
+   CLEX_andand        ,
+   CLEX_oror          ,
+   CLEX_shl           ,
+   CLEX_shr           ,
+   CLEX_plusplus      ,
+   CLEX_minusminus    ,
+   CLEX_pluseq        ,
+   CLEX_minuseq       ,
+   CLEX_muleq         ,
+   CLEX_diveq         ,
+   CLEX_modeq         ,
+   CLEX_andeq         ,
+   CLEX_oreq          ,
+   CLEX_xoreq         ,
+   CLEX_arrow         ,
+   CLEX_eqarrow       ,
+   CLEX_shleq, CLEX_shreq,
+
+   CLEX_first_unused_token
+
+};
+#endif // INCLUDE_STB_C_LEXER_H
+
+#ifdef STB_C_LEXER_IMPLEMENTATION
+
+// Hacky definitions so we can easily #if on them
+#define Y(x) 1
+#define N(x) 0
+
+#if STB_C_LEX_INTEGERS_AS_DOUBLES(x)
+typedef double     stb__clex_int;
+#define intfield   real_number
+#define STB__clex_int_as_double
+#else
+typedef long       stb__clex_int;
+#define intfield   int_number
+#endif
+
+// Convert these config options to simple conditional #defines so we can more
+// easily test them once we've change the meaning of Y/N
+
+#if STB_C_LEX_PARSE_SUFFIXES(x)
+#define STB__clex_parse_suffixes
+#endif
+
+#if STB_C_LEX_C99_HEX_FLOATS(x)
+#define STB__clex_hex_floats
+#endif
+
+#if STB_C_LEX_C_HEX_INTS(x)
+#define STB__clex_hex_ints
+#endif
+
+#if STB_C_LEX_C_DECIMAL_INTS(x)
+#define STB__clex_decimal_ints
+#endif
+
+#if STB_C_LEX_C_OCTAL_INTS(x)
+#define STB__clex_octal_ints
+#endif
+
+#if STB_C_LEX_C_DECIMAL_FLOATS(x)
+#define STB__clex_decimal_floats
+#endif
+
+#if STB_C_LEX_DISCARD_PREPROCESSOR(x)
+#define STB__clex_discard_preprocessor
+#endif
+
+#if STB_C_LEX_USE_STDLIB(x) && (!defined(STB__clex_hex_floats) || __STDC_VERSION__ >= 199901L)
+#define STB__CLEX_use_stdlib
+#include <stdlib.h>
+#endif
+
+// Now for the rest of the file we'll use the basic definition where
+// where Y expands to its contents and N expands to nothing
+#undef  Y
+#define Y(a) a
+#undef N
+#define N(a)
+
+// API function
+void stb_c_lexer_init(stb_lexer *lexer, const char *input_stream, const char *input_stream_end, char *string_store, int store_length)
+{
+   lexer->input_stream = (char *) input_stream;
+   lexer->eof = (char *) input_stream_end;
+   lexer->parse_point = (char *) input_stream;
+   lexer->string_storage = string_store;
+   lexer->string_storage_len = store_length;
+}
+
+// API function
+void stb_c_lexer_get_location(const stb_lexer *lexer, const char *where, stb_lex_location *loc)
+{
+   char *p = lexer->input_stream;
+   int line_number = 1;
+   int char_offset = 0;
+   while (*p && p < where) {
+      if (*p == '\n' || *p == '\r') {
+         p += (p[0]+p[1] == '\r'+'\n' ? 2 : 1); // skip newline
+         line_number += 1;
+         char_offset = 0;
+      } else {
+         ++p;
+         ++char_offset;
+      }
+   }
+   loc->line_number = line_number;
+   loc->line_offset = char_offset;
+}
+
+// main helper function for returning a parsed token
+static int stb__clex_token(stb_lexer *lexer, int token, char *start, char *end)
+{
+   lexer->token = token;
+   lexer->where_firstchar = start;
+   lexer->where_lastchar = end;
+   lexer->parse_point = end+1;
+   return 1;
+}
+
+// helper function for returning eof
+static int stb__clex_eof(stb_lexer *lexer)
+{
+   lexer->token = CLEX_eof;
+   return 0;
+}
+
+static int stb__clex_iswhite(int x)
+{
+   return x == ' ' || x == '\t' || x == '\r' || x == '\n' || x == '\f';
+}
+
+static const char *stb__strchr(const char *str, int ch)
+{
+   for (; *str; ++str)
+      if (*str == ch)
+         return str;
+   return 0;
+}
+
+// parse suffixes at the end of a number
+static int stb__clex_parse_suffixes(stb_lexer *lexer, long tokenid, char *start, char *cur, const char *suffixes)
+{
+   #ifdef STB__clex_parse_suffixes
+   lexer->string = lexer->string_storage;
+   lexer->string_len = 0;
+
+   while ((*cur >= 'a' && *cur <= 'z') || (*cur >= 'A' && *cur <= 'Z')) {
+      if (stb__strchr(suffixes, *cur) == 0)
+         return stb__clex_token(lexer, CLEX_parse_error, start, cur);
+      if (lexer->string_len+1 >= lexer->string_storage_len)
+         return stb__clex_token(lexer, CLEX_parse_error, start, cur);
+      lexer->string[lexer->string_len++] = *cur++;
+   }
+   #else
+   suffixes = suffixes; // attempt to suppress warnings
+   #endif
+   return stb__clex_token(lexer, tokenid, start, cur-1);
+}
+
+#ifndef STB__CLEX_use_stdlib
+static double stb__clex_pow(double base, unsigned int exponent)
+{
+   double value=1;
+   for ( ; exponent; exponent >>= 1) {
+      if (exponent & 1)
+         value *= base;
+      base *= base;
+   }
+   return value;
+}
+
+static double stb__clex_parse_float(char *p, char **q)
+{
+   char *s = p;
+   double value=0;
+   int base=10;
+   int exponent=0;
+
+#ifdef STB__clex_hex_floats
+   if (*p == '0') {
+      if (p[1] == 'x' || p[1] == 'X') {
+         base=16;
+         p += 2;
+      }
+   }
+#endif
+
+   for (;;) {
+      if (*p >= '0' && *p <= '9')
+         value = value*base + (*p++ - '0');
+#ifdef STB__clex_hex_floats
+      else if (base == 16 && *p >= 'a' && *p <= 'f')
+         value = value*base + 10 + (*p++ - 'a');
+      else if (base == 16 && *p >= 'A' && *p <= 'F')
+         value = value*base + 10 + (*p++ - 'A');
+#endif
+      else
+         break;
+   }
+
+   if (*p == '.') {
+      double pow, addend = 0;
+      ++p;
+      for (pow=1; ; pow*=base) {
+         if (*p >= '0' && *p <= '9')
+            addend = addend*base + (*p++ - '0');
+#ifdef STB__clex_hex_floats
+         else if (base == 16 && *p >= 'a' && *p <= 'f')
+            addend = addend*base + 10 + (*p++ - 'a');
+         else if (base == 16 && *p >= 'A' && *p <= 'F')
+            addend = addend*base + 10 + (*p++ - 'A');
+#endif
+         else
+            break;
+      }
+      value += addend / pow;
+   }
+#ifdef STB__clex_hex_floats
+   if (base == 16) {
+      // exponent required for hex float literal
+      if (*p != 'p' && *p != 'P') {
+         *q = s;
+         return 0;
+      }
+      exponent = 1;
+   } else
+#endif
+      exponent = (*p == 'e' || *p == 'E');
+
+   if (exponent) {
+      int sign = p[1] == '-';
+      unsigned int exponent=0;
+      double power=1;
+      ++p;
+      if (*p == '-' || *p == '+')
+         ++p;
+      while (*p >= '0' && *p <= '9')
+         exponent = exponent*10 + (*p++ - '0');
+
+#ifdef STB__clex_hex_floats
+      if (base == 16)
+         power = stb__clex_pow(2, exponent);
+      else
+#endif
+         power = stb__clex_pow(10, exponent);
+      if (sign)
+         value /= power;
+      else
+         value *= power;
+   }
+   *q = p;
+   return value;
+}
+#endif
+
+static int stb__clex_parse_char(char *p, char **q)
+{
+   if (*p == '\\') {
+      *q = p+2; // tentatively guess we'll parse two characters
+      switch(p[1]) {
+         case '\\': return '\\';
+         case '\'': return '\'';
+         case '"': return '"';
+         case 't': return '\t';
+         case 'f': return '\f';
+         case 'n': return '\n';
+         case 'r': return '\r';
+         case '0': return '\0'; // @TODO ocatal constants
+         case 'x': case 'X': return -1; // @TODO hex constants
+         case 'u': return -1; // @TODO unicode constants
+      }
+   }
+   *q = p+1;
+   return (unsigned char) *p;
+}
+
+static int stb__clex_parse_string(stb_lexer *lexer, char *p, int type)
+{
+   char *start = p;
+   char delim = *p++; // grab the " or ' for later matching
+   char *out = lexer->string_storage;
+   char *outend = lexer->string_storage + lexer->string_storage_len;
+   while (*p != delim) {
+      int n;
+      if (*p == '\\') {
+         char *q;
+         n = stb__clex_parse_char(p, &q);
+         if (n < 0)
+            return stb__clex_token(lexer, CLEX_parse_error, start, q);
+         p = q;
+      } else {
+         // @OPTIMIZE: could speed this up by looping-while-not-backslash
+         n = (unsigned char) *p++;
+      }
+      if (out+1 > outend)
+         return stb__clex_token(lexer, CLEX_parse_error, start, p);
+      // @TODO expand unicode escapes to UTF8
+      *out++ = (char) n;
+   }
+   *out = 0;
+   lexer->string = lexer->string_storage;
+   lexer->string_len = (int) (out - lexer->string_storage);
+   return stb__clex_token(lexer, type, start, p);
+}
+
+int stb_c_lexer_get_token(stb_lexer *lexer)
+{
+   char *p = lexer->parse_point;
+
+   // skip whitespace and comments
+   for (;;) {
+      #ifdef STB_C_LEX_ISWHITE
+      while (p != lexer->stream_end) {
+         int n;
+         n = STB_C_LEX_ISWHITE(p);
+         if (n == 0) break;
+         if (lexer->eof && lexer->eof - lexer->parse_point < n)
+            return stb__clex_token(tok, CLEX_parse_error, p,lexer->eof-1);
+         p += n;
+      }
+      #else
+      while (p != lexer->eof && stb__clex_iswhite(*p))
+         ++p;
+      #endif
+
+      STB_C_LEX_CPP_COMMENTS(
+         if (p != lexer->eof && p[0] == '/' && p[1] == '/') {
+            while (p != lexer->eof && *p != '\r' && *p != '\n')
+               ++p;
+            continue;
+         }
+      )
+
+      STB_C_LEX_C_COMMENTS(
+         if (p != lexer->eof && p[0] == '/' && p[1] == '*') {
+            char *start = p;
+            p += 2;
+            while (p != lexer->eof && (p[0] != '*' || p[1] != '/'))
+               ++p;
+            if (p == lexer->eof)
+               return stb__clex_token(lexer, CLEX_parse_error, start, p-1);
+            p += 2;
+            continue;
+         }
+      )
+
+      #ifdef STB__clex_discard_preprocessor
+         // @TODO this discards everything after a '#', regardless
+         // of where in the line the # is, rather than requiring it
+         // be at the start. (because this parser doesn't otherwise
+         // check for line breaks!)
+         if (p != lexer->eof && p[0] == '#') {
+            while (p != lexer->eof && *p != '\r' && *p != '\n')
+               ++p;
+            continue;
+         }
+      #endif
+
+      break;
+   }
+
+   if (p == lexer->eof)
+      return stb__clex_eof(lexer);
+
+   switch (*p) {
+      default:
+         if (   (*p >= 'a' && *p <= 'z')
+             || (*p >= 'A' && *p <= 'Z')
+             || *p == '_' || (unsigned char) *p >= 128    // >= 128 is UTF8 char
+             STB_C_LEX_DOLLAR_IDENTIFIER( || *p == '$' ) )
+         {
+            int n = 0;
+            lexer->string = lexer->string_storage;
+            do {
+               if (n+1 >= lexer->string_storage_len)
+                  return stb__clex_token(lexer, CLEX_parse_error, p, p+n);
+               lexer->string[n] = p[n];
+               ++n;
+            } while (
+                  (p[n] >= 'a' && p[n] <= 'z')
+               || (p[n] >= 'A' && p[n] <= 'Z')
+               || (p[n] >= '0' && p[n] <= '9') // allow digits in middle of identifier
+               || p[n] == '_' || (unsigned char) p[n] >= 128
+                STB_C_LEX_DOLLAR_IDENTIFIER( || p[n] == '$' )
+            );
+            lexer->string[n] = 0;
+            lexer->string_len = n;
+            return stb__clex_token(lexer, CLEX_id, p, p+n-1);
+         }
+
+         // check for EOF
+         STB_C_LEX_0_IS_EOF(
+            if (*p == 0)
+               return stb__clex_eof(lexer);
+         )
+
+      single_char:
+         // not an identifier, return the character as itself
+         return stb__clex_token(lexer, *p, p, p);
+
+      case '+':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_INCREMENTS(if (p[1] == '+') return stb__clex_token(lexer, CLEX_plusplus, p,p+1);)
+            STB_C_LEX_C_ARITHEQ(   if (p[1] == '=') return stb__clex_token(lexer, CLEX_pluseq  , p,p+1);)
+         }
+         goto single_char;
+      case '-':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_INCREMENTS(if (p[1] == '-') return stb__clex_token(lexer, CLEX_minusminus, p,p+1);)
+            STB_C_LEX_C_ARITHEQ(   if (p[1] == '=') return stb__clex_token(lexer, CLEX_minuseq   , p,p+1);)
+            STB_C_LEX_C_ARROW(     if (p[1] == '>') return stb__clex_token(lexer, CLEX_arrow     , p,p+1);)
+         }
+         goto single_char;
+      case '&':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_LOGICAL(  if (p[1] == '&') return stb__clex_token(lexer, CLEX_andand, p,p+1);)
+            STB_C_LEX_C_BITWISEEQ(if (p[1] == '=') return stb__clex_token(lexer, CLEX_andeq , p,p+1);)
+         }
+         goto single_char;
+      case '|':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_LOGICAL(  if (p[1] == '|') return stb__clex_token(lexer, CLEX_oror, p,p+1);)
+            STB_C_LEX_C_BITWISEEQ(if (p[1] == '=') return stb__clex_token(lexer, CLEX_oreq, p,p+1);)
+         }
+         goto single_char;
+      case '=':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_COMPARISONS(if (p[1] == '=') return stb__clex_token(lexer, CLEX_eq, p,p+1);)
+            STB_C_LEX_EQUAL_ARROW(  if (p[1] == '>') return stb__clex_token(lexer, CLEX_eqarrow, p,p+1);)
+         }
+         goto single_char;
+      case '!':
+         STB_C_LEX_C_COMPARISONS(if (p+1 != lexer->eof && p[1] == '=') return stb__clex_token(lexer, CLEX_noteq, p,p+1);)
+         goto single_char;
+      case '^':
+         STB_C_LEX_C_BITWISEEQ(if (p+1 != lexer->eof && p[1] == '=') return stb__clex_token(lexer, CLEX_xoreq, p,p+1));
+         goto single_char;
+      case '%':
+         STB_C_LEX_C_ARITHEQ(if (p+1 != lexer->eof && p[1] == '=') return stb__clex_token(lexer, CLEX_modeq, p,p+1));
+         goto single_char;
+      case '*':
+         STB_C_LEX_C_ARITHEQ(if (p+1 != lexer->eof && p[1] == '=') return stb__clex_token(lexer, CLEX_muleq, p,p+1));
+         goto single_char;
+      case '/':
+         STB_C_LEX_C_ARITHEQ(if (p+1 != lexer->eof && p[1] == '=') return stb__clex_token(lexer, CLEX_diveq, p,p+1));
+         goto single_char;
+      case '<':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_COMPARISONS(if (p[1] == '=') return stb__clex_token(lexer, CLEX_lesseq, p,p+1);)
+            STB_C_LEX_C_SHIFTS(     if (p[1] == '<') {
+                                       STB_C_LEX_C_ARITHEQ(if (p+2 != lexer->eof && p[2] == '=')
+                                                              return stb__clex_token(lexer, CLEX_shleq, p,p+2);)
+                                       return stb__clex_token(lexer, CLEX_shl, p,p+1);
+                                    }
+                              )
+         }
+         goto single_char;
+      case '>':
+         if (p+1 != lexer->eof) {
+            STB_C_LEX_C_COMPARISONS(if (p[1] == '=') return stb__clex_token(lexer, CLEX_greatereq, p,p+1);)
+            STB_C_LEX_C_SHIFTS(     if (p[1] == '>') {
+                                       STB_C_LEX_C_ARITHEQ(if (p+2 != lexer->eof && p[2] == '=')
+                                                              return stb__clex_token(lexer, CLEX_shreq, p,p+2);)
+                                       return stb__clex_token(lexer, CLEX_shr, p,p+1);
+                                    }
+                              )
+         }
+         goto single_char;
+
+      case '"':
+         STB_C_LEX_C_DQ_STRINGS(return stb__clex_parse_string(lexer, p, CLEX_dqstring);)
+         goto single_char;
+      case '\'':
+         STB_C_LEX_C_SQ_STRINGS(return stb__clex_parse_string(lexer, p, CLEX_sqstring);)
+         STB_C_LEX_C_CHARS(
+         {
+            char *start = p;
+            lexer->int_number = stb__clex_parse_char(p+1, &p);
+            if (lexer->int_number < 0)
+               return stb__clex_token(lexer, CLEX_parse_error, start,start);
+            if (p == lexer->eof || *p != '\'')
+               return stb__clex_token(lexer, CLEX_parse_error, start,p);
+            return stb__clex_token(lexer, CLEX_charlit, start, p+1);
+         })
+         goto single_char;
+
+      case '0':
+         #if defined(STB__clex_hex_ints) || defined(STB__clex_hex_floats)
+            if (p+1 != lexer->eof) {
+               if (p[1] == 'x' || p[1] == 'X') {
+                  char *q;
+
+                  #ifdef STB__clex_hex_floats
+                  for (q=p+2;
+                       q != lexer->eof && ((*q >= '0' && *q <= '9') || (*q >= 'a' && *q <= 'f') || (*q >= 'A' && *q <= 'F'));
+                       ++q);
+                  if (q != lexer->eof) {
+                     if (*q == '.' STB_C_LEX_FLOAT_NO_DECIMAL(|| *q == 'p' || *q == 'P')) {
+                        #ifdef STB__CLEX_use_stdlib
+                        lexer->real_number = strtod((char *) p, (char**) &q);
+                        #else
+                        lexer->real_number = stb__clex_parse_float(p, &q);
+                        #endif
+
+                        if (p == q)
+                           return stb__clex_token(lexer, CLEX_parse_error, p,q);
+                        return stb__clex_parse_suffixes(lexer, CLEX_floatlit, p,q, STB_C_LEX_FLOAT_SUFFIXES);
+
+                     }
+                  }
+                  #endif   // STB__CLEX_hex_floats
+
+                  #ifdef STB__clex_hex_ints
+                  #ifdef STB__CLEX_use_stdlib
+                  lexer->int_number = strtol((char *) p, (char **) &q, 16);
+                  #else
+                  {
+                     stb__clex_int n=0;
+                     for (q=p+2; q != lexer->eof; ++q) {
+                        if (*q >= '0' && *q <= '9')
+                           n = n*16 + (*q - '0');
+                        else if (*q >= 'a' && *q <= 'f')
+                           n = n*16 + (*q - 'a') + 10;
+                        else if (*q >= 'A' && *q <= 'F')
+                           n = n*16 + (*q - 'A') + 10;
+                        else
+                           break;
+                     }
+                     lexer->int_number = n;
+                  }
+                  #endif
+                  if (q == p+2)
+                     return stb__clex_token(lexer, CLEX_parse_error, p-2,p-1);
+                  return stb__clex_parse_suffixes(lexer, CLEX_intlit, p,q, STB_C_LEX_HEX_SUFFIXES);
+                  #endif
+               }
+            }
+         #endif // defined(STB__clex_hex_ints) || defined(STB__clex_hex_floats)
+         // can't test for octal because we might parse '0.0' as float or as '0' '.' '0',
+         // so have to do float first
+
+         /* FALL THROUGH */
+      case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9':
+
+         #ifdef STB__clex_decimal_floats
+         {
+            char *q = p;
+            while (q != lexer->eof && (*q >= '0' && *q <= '9'))
+               ++q;
+            if (q != lexer->eof) {
+               if (*q == '.' STB_C_LEX_FLOAT_NO_DECIMAL(|| *q == 'e' || *q == 'E')) {
+                  #ifdef STB__CLEX_use_stdlib
+                  lexer->real_number = strtod((char *) p, (char**) &q);
+                  #else
+                  lexer->real_number = stb__clex_parse_float(p, &q);
+                  #endif
+
+                  return stb__clex_parse_suffixes(lexer, CLEX_floatlit, p,q, STB_C_LEX_FLOAT_SUFFIXES);
+
+               }
+            }
+         }
+         #endif // STB__clex_decimal_floats
+
+         #ifdef STB__clex_octal_ints
+         if (p[0] == '0') {
+            char *q = p;
+            #ifdef STB__CLEX_use_stdlib
+            lexer->int_number = strtol((char *) p, (char **) &q, 8);
+            #else
+            stb__clex_int n=0;
+            while (q != lexer->eof) {
+               if (*q >= '0' && *q <= '7')
+                  n = n*8 + (*q - '0');
+               else
+                  break;
+               ++q;
+            }
+            if (q != lexer->eof && (*q == '8' || *q=='9'))
+               return stb__clex_token(lexer, CLEX_parse_error, p, q);
+            lexer->int_number = n;
+            #endif
+            return stb__clex_parse_suffixes(lexer, CLEX_intlit, p,q, STB_C_LEX_OCTAL_SUFFIXES);
+         }
+         #endif // STB__clex_octal_ints
+
+         #ifdef STB__clex_decimal_ints
+         {
+            char *q = p;
+            #ifdef STB__CLEX_use_stdlib
+            lexer->int_number = strtol((char *) p, (char **) &q, 10);
+            #else
+            stb__clex_int n=0;
+            while (q != lexer->eof) {
+               if (*q >= '0' && *q <= '9')
+                  n = n*10 + (*q - '0');
+               else
+                  break;
+               ++q;
+            }
+            lexer->int_number = n;
+            #endif
+            return stb__clex_parse_suffixes(lexer, CLEX_intlit, p,q, STB_C_LEX_OCTAL_SUFFIXES);
+         }
+         #endif // STB__clex_decimal_ints
+         goto single_char;
+   }
+}
+#endif // STB_C_LEXER_IMPLEMENTATION
+
+#ifdef STB_C_LEXER_SELF_TEST
+#define _CRT_SECURE_NO_WARNINGS
+#include <stdio.h>
+#include <stdlib.h>
+
+static void print_token(stb_lexer *lexer)
+{
+   switch (lexer->token) {
+      case CLEX_id        : printf("_%s", lexer->string); break;
+      case CLEX_eq        : printf("=="); break;
+      case CLEX_noteq     : printf("!="); break;
+      case CLEX_lesseq    : printf("<="); break;
+      case CLEX_greatereq : printf(">="); break;
+      case CLEX_andand    : printf("&&"); break;
+      case CLEX_oror      : printf("||"); break;
+      case CLEX_shl       : printf("<<"); break;
+      case CLEX_shr       : printf(">>"); break;
+      case CLEX_plusplus  : printf("++"); break;
+      case CLEX_minusminus: printf("--"); break;
+      case CLEX_arrow     : printf("->"); break;
+      case CLEX_andeq     : printf("&="); break;
+      case CLEX_oreq      : printf("|="); break;
+      case CLEX_xoreq     : printf("^="); break;
+      case CLEX_pluseq    : printf("+="); break;
+      case CLEX_minuseq   : printf("-="); break;
+      case CLEX_muleq     : printf("*="); break;
+      case CLEX_diveq     : printf("/="); break;
+      case CLEX_modeq     : printf("%%="); break;
+      case CLEX_shleq     : printf("<<="); break;
+      case CLEX_shreq     : printf(">>="); break;
+      case CLEX_eqarrow   : printf("=>"); break;
+      case CLEX_dqstring  : printf("\"%s\"", lexer->string); break;
+      case CLEX_sqstring  : printf("'\"%s\"'", lexer->string); break;
+      case CLEX_charlit   : printf("'%s'", lexer->string); break;
+      #if defined(STB__clex_int_as_double) && !defined(STB__CLEX_use_stdlib)
+      case CLEX_intlit    : printf("#%g", lexer->real_number); break;
+      #else
+      case CLEX_intlit    : printf("#%ld", lexer->int_number); break;
+      #endif
+      case CLEX_floatlit  : printf("%g", lexer->real_number); break;
+      default:
+         if (lexer->token >= 0 && lexer->token < 256)
+            printf("%c", (int) lexer->token);
+         else {
+            printf("<<<UNKNOWN TOKEN %ld >>>\n", lexer->token);
+         }
+         break;
+   }
+}
+
+/* Force a test
+of parsing
+multiline comments */
+
+/*/ comment /*/
+/**/ extern /**/
+
+void dummy(void)
+{
+   double some_floats[] = {
+      1.0501, -10.4e12, 5E+10,
+#if 0   // not supported in C++ or C-pre-99, so don't try to compile it, but let our parser test it
+      0x1.0p+24, 0xff.FP-8, 0x1p-23,
+#endif
+      4.
+   };
+   (void) sizeof(some_floats);
+   (void) some_floats[1];
+
+   printf("test %d",1); // https://github.com/nothings/stb/issues/13
+}
+
+int main(int argc, char **argv)
+{
+   FILE *f = fopen("stb_c_lexer.h","rb");
+   char *text = (char *) malloc(1 << 20);
+   int len = f ? (int) fread(text, 1, 1<<20, f) : -1;
+   stb_lexer lex;
+   if (len < 0) {
+      fprintf(stderr, "Error opening file\n");
+      free(text);
+      fclose(f);
+      return 1;
+   }
+   fclose(f);
+
+   stb_c_lexer_init(&lex, text, text+len, (char *) malloc(0x10000), 0x10000);
+   while (stb_c_lexer_get_token(&lex)) {
+      if (lex.token == CLEX_parse_error) {
+         printf("\n<<<PARSE ERROR>>>\n");
+         break;
+      }
+      print_token(&lex);
+      printf("  ");
+   }
+   return 0;
+}
+#endif
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_connected_components.h b/vendor/stb/stb_connected_components.h
new file mode 100644
index 0000000..f762f65
--- /dev/null
+++ b/vendor/stb/stb_connected_components.h
@@ -0,0 +1,1049 @@
+// stb_connected_components - v0.96 - public domain connected components on grids
+//                                                 http://github.com/nothings/stb
+//
+// Finds connected components on 2D grids for testing reachability between
+// two points, with fast updates when changing reachability (e.g. on one machine
+// it was typically 0.2ms w/ 1024x1024 grid). Each grid square must be "open" or
+// "closed" (traversable or untraversable), and grid squares are only connected
+// to their orthogonal neighbors, not diagonally.
+//
+// In one source file, create the implementation by doing something like this:
+//
+//   #define STBCC_GRID_COUNT_X_LOG2    10
+//   #define STBCC_GRID_COUNT_Y_LOG2    10
+//   #define STB_CONNECTED_COMPONENTS_IMPLEMENTATION
+//   #include "stb_connected_components.h"
+//
+// The above creates an implementation that can run on maps up to 1024x1024.
+// Map sizes must be a multiple of (1<<(LOG2/2)) on each axis (e.g. 32 if LOG2=10,
+// 16 if LOG2=8, etc.) (You can just pad your map with untraversable space.)
+//
+// MEMORY USAGE
+//
+//   Uses about 6-7 bytes per grid square (e.g. 7MB for a 1024x1024 grid).
+//   Uses a single worst-case allocation which you pass in.
+//
+// PERFORMANCE
+//
+//   On a core i7-2700K at 3.5 Ghz, for a particular 1024x1024 map (map_03.png):
+//
+//       Creating map                   : 44.85 ms
+//       Making one square   traversable:  0.27 ms    (average over 29,448 calls)
+//       Making one square untraversable:  0.23 ms    (average over 30,123 calls)
+//       Reachability query:               0.00001 ms (average over 4,000,000 calls)
+//
+//   On non-degenerate maps update time is O(N^0.5), but on degenerate maps like
+//   checkerboards or 50% random, update time is O(N^0.75) (~2ms on above machine).
+//
+// CHANGELOG
+//
+//    0.96  (2019-03-04)  Fix warnings
+//    0.95  (2016-10-16)  Bugfix if multiple clumps in one cluster connect to same clump in another
+//    0.94  (2016-04-17)  Bugfix & optimize worst case (checkerboard & random)
+//    0.93  (2016-04-16)  Reduce memory by 10x for 1Kx1K map; small speedup
+//    0.92  (2016-04-16)  Compute sqrt(N) cluster size by default
+//    0.91  (2016-04-15)  Initial release
+//
+// TODO:
+//    - better API documentation
+//    - more comments
+//    - try re-integrating naive algorithm & compare performance
+//    - more optimized batching (current approach still recomputes local clumps many times)
+//    - function for setting a grid of squares at once (just use batching)
+//
+// LICENSE
+//
+//   See end of file for license information.
+//
+// ALGORITHM
+//
+//   The NxN grid map is split into sqrt(N) x sqrt(N) blocks called
+//  "clusters". Each cluster independently computes a set of connected
+//   components within that cluster (ignoring all connectivity out of
+//   that cluster) using a union-find disjoint set forest. This produces a bunch
+//   of locally connected components called "clumps". Each clump is (a) connected
+//   within its cluster, (b) does not directly connect to any other clumps in the
+//   cluster (though it may connect to them by paths that lead outside the cluster,
+//   but those are ignored at this step), and (c) maintains an adjacency list of
+//   all clumps in adjacent clusters that it _is_ connected to. Then a second
+//   union-find disjoint set forest is used to compute connected clumps
+//   globally, across the whole map. Reachability is then computed by
+//   finding which clump each input point belongs to, and checking whether
+//   those clumps are in the same "global" connected component.
+//
+//   The above data structure can be updated efficiently; on a change
+//   of a single grid square on the map, only one cluster changes its
+//   purely-local state, so only one cluster needs its clumps fully
+//   recomputed. Clumps in adjacent clusters need their adjacency lists
+//   updated: first to remove all references to the old clumps in the
+//   rebuilt cluster, then to add new references to the new clumps. Both
+//   of these operations can use the existing "find which clump each input
+//   point belongs to" query to compute that adjacency information rapidly.
+
+#ifndef INCLUDE_STB_CONNECTED_COMPONENTS_H
+#define INCLUDE_STB_CONNECTED_COMPONENTS_H
+
+#include <stdlib.h>
+
+typedef struct st_stbcc_grid stbcc_grid;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+//////////////////////////////////////////////////////////////////////////////////////////
+//
+//  initialization
+//
+
+// you allocate the grid data structure to this size (note that it will be very big!!!)
+extern size_t stbcc_grid_sizeof(void);
+
+// initialize the grid, value of map[] is 0 = traversable, non-0 is solid
+extern void stbcc_init_grid(stbcc_grid *g, unsigned char *map, int w, int h);
+
+
+//////////////////////////////////////////////////////////////////////////////////////////
+//
+//  main functionality
+//
+
+// update a grid square state, 0 = traversable, non-0 is solid
+// i can add a batch-update if it's needed
+extern void stbcc_update_grid(stbcc_grid *g, int x, int y, int solid);
+
+// query if two grid squares are reachable from each other
+extern int stbcc_query_grid_node_connection(stbcc_grid *g, int x1, int y1, int x2, int y2);
+
+
+//////////////////////////////////////////////////////////////////////////////////////////
+//
+//  bonus functions
+//
+
+// wrap multiple stbcc_update_grid calls in these function to compute
+// multiple updates more efficiently; cannot make queries inside batch
+extern void stbcc_update_batch_begin(stbcc_grid *g);
+extern void stbcc_update_batch_end(stbcc_grid *g);
+
+// query the grid data structure for whether a given square is open or not
+extern int stbcc_query_grid_open(stbcc_grid *g, int x, int y);
+
+// get a unique id for the connected component this is in; it's not necessarily
+// small, you'll need a hash table or something to remap it (or just use
+extern unsigned int stbcc_get_unique_id(stbcc_grid *g, int x, int y);
+#define STBCC_NULL_UNIQUE_ID 0xffffffff // returned for closed map squares
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // INCLUDE_STB_CONNECTED_COMPONENTS_H
+
+#ifdef STB_CONNECTED_COMPONENTS_IMPLEMENTATION
+
+#include <assert.h>
+#include <string.h> // memset
+
+#if !defined(STBCC_GRID_COUNT_X_LOG2) || !defined(STBCC_GRID_COUNT_Y_LOG2)
+   #error "You must define STBCC_GRID_COUNT_X_LOG2 and STBCC_GRID_COUNT_Y_LOG2 to define the max grid supported."
+#endif
+
+#define STBCC__GRID_COUNT_X (1 << STBCC_GRID_COUNT_X_LOG2)
+#define STBCC__GRID_COUNT_Y (1 << STBCC_GRID_COUNT_Y_LOG2)
+
+#define STBCC__MAP_STRIDE   (1 << (STBCC_GRID_COUNT_X_LOG2-3))
+
+#ifndef STBCC_CLUSTER_SIZE_X_LOG2
+   #define STBCC_CLUSTER_SIZE_X_LOG2   (STBCC_GRID_COUNT_X_LOG2/2) // log2(sqrt(2^N)) = 1/2 * log2(2^N)) = 1/2 * N
+   #if STBCC_CLUSTER_SIZE_X_LOG2 > 6
+   #undef STBCC_CLUSTER_SIZE_X_LOG2
+   #define STBCC_CLUSTER_SIZE_X_LOG2 6
+   #endif
+#endif
+
+#ifndef STBCC_CLUSTER_SIZE_Y_LOG2
+   #define STBCC_CLUSTER_SIZE_Y_LOG2   (STBCC_GRID_COUNT_Y_LOG2/2)
+   #if STBCC_CLUSTER_SIZE_Y_LOG2 > 6
+   #undef STBCC_CLUSTER_SIZE_Y_LOG2
+   #define STBCC_CLUSTER_SIZE_Y_LOG2 6
+   #endif
+#endif
+
+#define STBCC__CLUSTER_SIZE_X   (1 << STBCC_CLUSTER_SIZE_X_LOG2)
+#define STBCC__CLUSTER_SIZE_Y   (1 << STBCC_CLUSTER_SIZE_Y_LOG2)
+
+#define STBCC__CLUSTER_COUNT_X_LOG2   (STBCC_GRID_COUNT_X_LOG2 - STBCC_CLUSTER_SIZE_X_LOG2)
+#define STBCC__CLUSTER_COUNT_Y_LOG2   (STBCC_GRID_COUNT_Y_LOG2 - STBCC_CLUSTER_SIZE_Y_LOG2)
+
+#define STBCC__CLUSTER_COUNT_X  (1 << STBCC__CLUSTER_COUNT_X_LOG2)
+#define STBCC__CLUSTER_COUNT_Y  (1 << STBCC__CLUSTER_COUNT_Y_LOG2)
+
+#if STBCC__CLUSTER_SIZE_X >= STBCC__GRID_COUNT_X || STBCC__CLUSTER_SIZE_Y >= STBCC__GRID_COUNT_Y
+   #error "STBCC_CLUSTER_SIZE_X/Y_LOG2 must be smaller than STBCC_GRID_COUNT_X/Y_LOG2"
+#endif
+
+// worst case # of clumps per cluster
+#define STBCC__MAX_CLUMPS_PER_CLUSTER_LOG2   (STBCC_CLUSTER_SIZE_X_LOG2 + STBCC_CLUSTER_SIZE_Y_LOG2-1)
+#define STBCC__MAX_CLUMPS_PER_CLUSTER        (1 << STBCC__MAX_CLUMPS_PER_CLUSTER_LOG2)
+#define STBCC__MAX_CLUMPS                    (STBCC__MAX_CLUMPS_PER_CLUSTER * STBCC__CLUSTER_COUNT_X * STBCC__CLUSTER_COUNT_Y)
+#define STBCC__NULL_CLUMPID                  STBCC__MAX_CLUMPS_PER_CLUSTER
+
+#define STBCC__CLUSTER_X_FOR_COORD_X(x)  ((x) >> STBCC_CLUSTER_SIZE_X_LOG2)
+#define STBCC__CLUSTER_Y_FOR_COORD_Y(y)  ((y) >> STBCC_CLUSTER_SIZE_Y_LOG2)
+
+#define STBCC__MAP_BYTE_MASK(x,y)       (1 << ((x) & 7))
+#define STBCC__MAP_BYTE(g,x,y)          ((g)->map[y][(x) >> 3])
+#define STBCC__MAP_OPEN(g,x,y)          (STBCC__MAP_BYTE(g,x,y) & STBCC__MAP_BYTE_MASK(x,y))
+
+typedef unsigned short stbcc__clumpid;
+typedef unsigned char stbcc__verify_max_clumps[STBCC__MAX_CLUMPS_PER_CLUSTER < (1 << (8*sizeof(stbcc__clumpid))) ? 1 : -1];
+
+#define STBCC__MAX_EXITS_PER_CLUSTER   (STBCC__CLUSTER_SIZE_X + STBCC__CLUSTER_SIZE_Y)   // 64 for 32x32
+#define STBCC__MAX_EXITS_PER_CLUMP     (STBCC__CLUSTER_SIZE_X + STBCC__CLUSTER_SIZE_Y)   // 64 for 32x32
+#define STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER  (STBCC__MAX_EXITS_PER_CLUMP)
+
+// 2^19 * 2^6 => 2^25 exits => 2^26  => 64MB for 1024x1024
+
+// Logic for above on 4x4 grid:
+//
+// Many clumps:      One clump:
+//   + +               +  +
+//  +X.X.             +XX.X+
+//   .X.X+             .XXX
+//  +X.X.              XXX.
+//   .X.X+            +X.XX+
+//    + +              +  +
+//
+// 8 exits either way
+
+typedef unsigned char stbcc__verify_max_exits[STBCC__MAX_EXITS_PER_CLUMP <= 256];
+
+typedef struct
+{
+   unsigned short clump_index:12;
+     signed short cluster_dx:2;
+     signed short cluster_dy:2;
+} stbcc__relative_clumpid;
+
+typedef union
+{
+   struct {
+      unsigned int clump_index:12;
+      unsigned int cluster_x:10;
+      unsigned int cluster_y:10;
+   } f;
+   unsigned int c;
+} stbcc__global_clumpid;
+
+// rebuilt cluster 3,4
+
+// what changes in cluster 2,4
+
+typedef struct
+{
+   stbcc__global_clumpid global_label;        // 4
+   unsigned char num_adjacent;                // 1
+   unsigned char max_adjacent;                // 1
+   unsigned char adjacent_clump_list_index;   // 1
+   unsigned char reserved;
+} stbcc__clump; // 8
+
+#define STBCC__CLUSTER_ADJACENCY_COUNT   (STBCC__MAX_EXITS_PER_CLUSTER*2)
+typedef struct
+{
+   short num_clumps;
+   unsigned char num_edge_clumps;
+   unsigned char rebuild_adjacency;
+   stbcc__clump clump[STBCC__MAX_CLUMPS_PER_CLUSTER];       // 8 * 2^9 = 4KB
+   stbcc__relative_clumpid adjacency_storage[STBCC__CLUSTER_ADJACENCY_COUNT]; // 256 bytes
+} stbcc__cluster;
+
+struct st_stbcc_grid
+{
+   int w,h,cw,ch;
+   int in_batched_update;
+   //unsigned char cluster_dirty[STBCC__CLUSTER_COUNT_Y][STBCC__CLUSTER_COUNT_X]; // could bitpack, but: 1K x 1K => 1KB
+   unsigned char map[STBCC__GRID_COUNT_Y][STBCC__MAP_STRIDE]; // 1K x 1K => 1K x 128 => 128KB
+   stbcc__clumpid clump_for_node[STBCC__GRID_COUNT_Y][STBCC__GRID_COUNT_X];  // 1K x 1K x 2 = 2MB
+   stbcc__cluster cluster[STBCC__CLUSTER_COUNT_Y][STBCC__CLUSTER_COUNT_X]; //  1K x 4.5KB = 4.5MB
+};
+
+int stbcc_query_grid_node_connection(stbcc_grid *g, int x1, int y1, int x2, int y2)
+{
+   stbcc__global_clumpid label1, label2;
+   stbcc__clumpid c1 = g->clump_for_node[y1][x1];
+   stbcc__clumpid c2 = g->clump_for_node[y2][x2];
+   int cx1 = STBCC__CLUSTER_X_FOR_COORD_X(x1);
+   int cy1 = STBCC__CLUSTER_Y_FOR_COORD_Y(y1);
+   int cx2 = STBCC__CLUSTER_X_FOR_COORD_X(x2);
+   int cy2 = STBCC__CLUSTER_Y_FOR_COORD_Y(y2);
+   assert(!g->in_batched_update);
+   if (c1 == STBCC__NULL_CLUMPID || c2 == STBCC__NULL_CLUMPID)
+      return 0;
+   label1 = g->cluster[cy1][cx1].clump[c1].global_label;
+   label2 = g->cluster[cy2][cx2].clump[c2].global_label;
+   if (label1.c == label2.c)
+      return 1;
+   return 0;
+}
+
+int stbcc_query_grid_open(stbcc_grid *g, int x, int y)
+{
+   return STBCC__MAP_OPEN(g, x, y) != 0;
+}
+
+unsigned int stbcc_get_unique_id(stbcc_grid *g, int x, int y)
+{
+   stbcc__clumpid c = g->clump_for_node[y][x];
+   int cx = STBCC__CLUSTER_X_FOR_COORD_X(x);
+   int cy = STBCC__CLUSTER_Y_FOR_COORD_Y(y);
+   assert(!g->in_batched_update);
+   if (c == STBCC__NULL_CLUMPID) return STBCC_NULL_UNIQUE_ID;
+   return g->cluster[cy][cx].clump[c].global_label.c;
+}
+
+typedef struct
+{
+   unsigned char x,y;
+} stbcc__tinypoint;
+
+typedef struct
+{
+   stbcc__tinypoint parent[STBCC__CLUSTER_SIZE_Y][STBCC__CLUSTER_SIZE_X]; // 32x32 => 2KB
+   stbcc__clumpid   label[STBCC__CLUSTER_SIZE_Y][STBCC__CLUSTER_SIZE_X];
+} stbcc__cluster_build_info;
+
+static void stbcc__build_clumps_for_cluster(stbcc_grid *g, int cx, int cy);
+static void stbcc__remove_connections_to_adjacent_cluster(stbcc_grid *g, int cx, int cy, int dx, int dy);
+static void stbcc__add_connections_to_adjacent_cluster(stbcc_grid *g, int cx, int cy, int dx, int dy);
+
+static stbcc__global_clumpid stbcc__clump_find(stbcc_grid *g, stbcc__global_clumpid n)
+{
+   stbcc__global_clumpid q;
+   stbcc__clump *c = &g->cluster[n.f.cluster_y][n.f.cluster_x].clump[n.f.clump_index];
+
+   if (c->global_label.c == n.c)
+      return n;
+
+   q = stbcc__clump_find(g, c->global_label);
+   c->global_label = q;
+   return q;
+}
+
+typedef struct
+{
+   unsigned int cluster_x;
+   unsigned int cluster_y;
+   unsigned int clump_index;
+} stbcc__unpacked_clumpid;
+
+static void stbcc__clump_union(stbcc_grid *g, stbcc__unpacked_clumpid m, int x, int y, int idx)
+{
+   stbcc__clump *mc = &g->cluster[m.cluster_y][m.cluster_x].clump[m.clump_index];
+   stbcc__clump *nc = &g->cluster[y][x].clump[idx];
+   stbcc__global_clumpid mp = stbcc__clump_find(g, mc->global_label);
+   stbcc__global_clumpid np = stbcc__clump_find(g, nc->global_label);
+
+   if (mp.c == np.c)
+      return;
+
+   g->cluster[mp.f.cluster_y][mp.f.cluster_x].clump[mp.f.clump_index].global_label = np;
+}
+
+static void stbcc__build_connected_components_for_clumps(stbcc_grid *g)
+{
+   int i,j,k,h;
+
+   for (j=0; j < STBCC__CLUSTER_COUNT_Y; ++j) {
+      for (i=0; i < STBCC__CLUSTER_COUNT_X; ++i) {
+         stbcc__cluster *cluster = &g->cluster[j][i];
+         for (k=0; k < (int) cluster->num_edge_clumps; ++k) {
+            stbcc__global_clumpid m;
+            m.f.clump_index = k;
+            m.f.cluster_x = i;
+            m.f.cluster_y = j;
+            assert((int) m.f.clump_index == k && (int) m.f.cluster_x == i && (int) m.f.cluster_y == j);
+            cluster->clump[k].global_label = m;
+         }
+      }
+   }
+
+   for (j=0; j < STBCC__CLUSTER_COUNT_Y; ++j) {
+      for (i=0; i < STBCC__CLUSTER_COUNT_X; ++i) {
+         stbcc__cluster *cluster = &g->cluster[j][i];
+         for (k=0; k < (int) cluster->num_edge_clumps; ++k) {
+            stbcc__clump *clump = &cluster->clump[k];
+            stbcc__unpacked_clumpid m;
+            stbcc__relative_clumpid *adj;
+            m.clump_index = k;
+            m.cluster_x = i;
+            m.cluster_y = j;
+            adj = &cluster->adjacency_storage[clump->adjacent_clump_list_index];
+            for (h=0; h < clump->num_adjacent; ++h) {
+               unsigned int clump_index = adj[h].clump_index;
+               unsigned int x = adj[h].cluster_dx + i;
+               unsigned int y = adj[h].cluster_dy + j;
+               stbcc__clump_union(g, m, x, y, clump_index);
+            }
+         }
+      }
+   }
+
+   for (j=0; j < STBCC__CLUSTER_COUNT_Y; ++j) {
+      for (i=0; i < STBCC__CLUSTER_COUNT_X; ++i) {
+         stbcc__cluster *cluster = &g->cluster[j][i];
+         for (k=0; k < (int) cluster->num_edge_clumps; ++k) {
+            stbcc__global_clumpid m;
+            m.f.clump_index = k;
+            m.f.cluster_x = i;
+            m.f.cluster_y = j;
+            stbcc__clump_find(g, m);
+         }
+      }
+   }
+}
+
+static void stbcc__build_all_connections_for_cluster(stbcc_grid *g, int cx, int cy)
+{
+   // in this particular case, we are fully non-incremental. that means we
+   // can discover the correct sizes for the arrays, but requires we build
+   // the data into temporary data structures, or just count the sizes, so
+   // for simplicity we do the latter
+   stbcc__cluster *cluster = &g->cluster[cy][cx];
+   unsigned char connected[STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER][STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER/8]; // 64 x 8 => 1KB
+   unsigned char num_adj[STBCC__MAX_CLUMPS_PER_CLUSTER] = { 0 };
+   int x = cx * STBCC__CLUSTER_SIZE_X;
+   int y = cy * STBCC__CLUSTER_SIZE_Y;
+   int step_x, step_y=0, i, j, k, n, m, dx, dy, total;
+   int extra;
+
+   g->cluster[cy][cx].rebuild_adjacency = 0;
+
+   total = 0;
+   for (m=0; m < 4; ++m) {
+      switch (m) {
+         case 0:
+            dx = 1, dy = 0;
+            step_x = 0, step_y = 1;
+            i = STBCC__CLUSTER_SIZE_X-1;
+            j = 0;
+            n = STBCC__CLUSTER_SIZE_Y;
+            break;
+         case 1:
+            dx = -1, dy = 0;
+            i = 0;
+            j = 0;
+            step_x = 0;
+            step_y = 1;
+            n = STBCC__CLUSTER_SIZE_Y;
+            break;
+         case 2:
+            dy = -1, dx = 0;
+            i = 0;
+            j = 0;
+            step_x = 1;
+            step_y = 0;
+            n = STBCC__CLUSTER_SIZE_X;
+            break;
+         case 3:
+            dy = 1, dx = 0;
+            i = 0;
+            j = STBCC__CLUSTER_SIZE_Y-1;
+            step_x = 1;
+            step_y = 0;
+            n = STBCC__CLUSTER_SIZE_X;
+            break;
+      }
+
+      if (cx+dx < 0 || cx+dx >= g->cw || cy+dy < 0 || cy+dy >= g->ch)
+         continue;
+
+      memset(connected, 0, sizeof(connected));
+      for (k=0; k < n; ++k) {
+         if (STBCC__MAP_OPEN(g, x+i, y+j) && STBCC__MAP_OPEN(g, x+i+dx, y+j+dy)) {
+            stbcc__clumpid src = g->clump_for_node[y+j][x+i];
+            stbcc__clumpid dest = g->clump_for_node[y+j+dy][x+i+dx];
+            if (0 == (connected[src][dest>>3] & (1 << (dest & 7)))) {
+               connected[src][dest>>3] |= 1 << (dest & 7);
+               ++num_adj[src];
+               ++total;
+            }
+         }
+         i += step_x;
+         j += step_y;
+      }
+   }
+
+   assert(total <= STBCC__CLUSTER_ADJACENCY_COUNT);
+
+   // decide how to apportion unused adjacency slots; only clumps that lie
+   // on the edges of the cluster need adjacency slots, so divide them up
+   // evenly between those clumps
+
+   // we want:
+   //    extra = (STBCC__CLUSTER_ADJACENCY_COUNT - total) / cluster->num_edge_clumps;
+   // but we efficiently approximate this without a divide, because
+   // ignoring edge-vs-non-edge with 'num_adj[i]*2' was faster than
+   // 'num_adj[i]+extra' with the divide
+   if      (total + (cluster->num_edge_clumps<<2) <= STBCC__CLUSTER_ADJACENCY_COUNT)
+      extra = 4;
+   else if (total + (cluster->num_edge_clumps<<1) <= STBCC__CLUSTER_ADJACENCY_COUNT)
+      extra = 2;
+   else if (total + (cluster->num_edge_clumps<<0) <= STBCC__CLUSTER_ADJACENCY_COUNT)
+      extra = 1;
+   else
+      extra = 0;
+
+   total = 0;
+   for (i=0; i < (int) cluster->num_edge_clumps; ++i) {
+      int alloc = num_adj[i]+extra;
+      if (alloc > STBCC__MAX_EXITS_PER_CLUSTER)
+         alloc = STBCC__MAX_EXITS_PER_CLUSTER;
+      assert(total < 256); // must fit in byte
+      cluster->clump[i].adjacent_clump_list_index = (unsigned char) total;
+      cluster->clump[i].max_adjacent = alloc;
+      cluster->clump[i].num_adjacent = 0;
+      total += alloc;
+   }
+   assert(total <= STBCC__CLUSTER_ADJACENCY_COUNT);
+
+   stbcc__add_connections_to_adjacent_cluster(g, cx, cy, -1, 0);
+   stbcc__add_connections_to_adjacent_cluster(g, cx, cy,  1, 0);
+   stbcc__add_connections_to_adjacent_cluster(g, cx, cy,  0,-1);
+   stbcc__add_connections_to_adjacent_cluster(g, cx, cy,  0, 1);
+   // make sure all of the above succeeded.
+   assert(g->cluster[cy][cx].rebuild_adjacency == 0);
+}
+
+static void stbcc__add_connections_to_adjacent_cluster_with_rebuild(stbcc_grid *g, int cx, int cy, int dx, int dy)
+{
+   if (cx >= 0 && cx < g->cw && cy >= 0 && cy < g->ch) {
+      stbcc__add_connections_to_adjacent_cluster(g, cx, cy, dx, dy);
+      if (g->cluster[cy][cx].rebuild_adjacency)
+         stbcc__build_all_connections_for_cluster(g, cx, cy);
+   }
+}
+
+void stbcc_update_grid(stbcc_grid *g, int x, int y, int solid)
+{
+   int cx,cy;
+
+   if (!solid) {
+      if (STBCC__MAP_OPEN(g,x,y))
+         return;
+   } else {
+      if (!STBCC__MAP_OPEN(g,x,y))
+         return;
+   }
+
+   cx = STBCC__CLUSTER_X_FOR_COORD_X(x);
+   cy = STBCC__CLUSTER_Y_FOR_COORD_Y(y);
+
+   stbcc__remove_connections_to_adjacent_cluster(g, cx-1, cy,  1, 0);
+   stbcc__remove_connections_to_adjacent_cluster(g, cx+1, cy, -1, 0);
+   stbcc__remove_connections_to_adjacent_cluster(g, cx, cy-1,  0, 1);
+   stbcc__remove_connections_to_adjacent_cluster(g, cx, cy+1,  0,-1);
+
+   if (!solid)
+      STBCC__MAP_BYTE(g,x,y) |= STBCC__MAP_BYTE_MASK(x,y);
+   else
+      STBCC__MAP_BYTE(g,x,y) &= ~STBCC__MAP_BYTE_MASK(x,y);
+
+   stbcc__build_clumps_for_cluster(g, cx, cy);
+   stbcc__build_all_connections_for_cluster(g, cx, cy);
+
+   stbcc__add_connections_to_adjacent_cluster_with_rebuild(g, cx-1, cy,  1, 0);
+   stbcc__add_connections_to_adjacent_cluster_with_rebuild(g, cx+1, cy, -1, 0);
+   stbcc__add_connections_to_adjacent_cluster_with_rebuild(g, cx, cy-1,  0, 1);
+   stbcc__add_connections_to_adjacent_cluster_with_rebuild(g, cx, cy+1,  0,-1);
+
+   if (!g->in_batched_update)
+      stbcc__build_connected_components_for_clumps(g);
+   #if 0
+   else
+      g->cluster_dirty[cy][cx] = 1;
+   #endif
+}
+
+void stbcc_update_batch_begin(stbcc_grid *g)
+{
+   assert(!g->in_batched_update);
+   g->in_batched_update = 1;
+}
+
+void stbcc_update_batch_end(stbcc_grid *g)
+{
+   assert(g->in_batched_update);
+   g->in_batched_update =  0;
+   stbcc__build_connected_components_for_clumps(g); // @OPTIMIZE: only do this if update was non-empty
+}
+
+size_t stbcc_grid_sizeof(void)
+{
+   return sizeof(stbcc_grid);
+}
+
+void stbcc_init_grid(stbcc_grid *g, unsigned char *map, int w, int h)
+{
+   int i,j,k;
+   assert(w % STBCC__CLUSTER_SIZE_X == 0);
+   assert(h % STBCC__CLUSTER_SIZE_Y == 0);
+   assert(w % 8 == 0);
+
+   g->w = w;
+   g->h = h;
+   g->cw = w >> STBCC_CLUSTER_SIZE_X_LOG2;
+   g->ch = h >> STBCC_CLUSTER_SIZE_Y_LOG2;
+   g->in_batched_update = 0;
+
+   #if 0
+   for (j=0; j < STBCC__CLUSTER_COUNT_Y; ++j)
+      for (i=0; i < STBCC__CLUSTER_COUNT_X; ++i)
+         g->cluster_dirty[j][i] = 0;
+   #endif
+
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; i += 8) {
+         unsigned char c = 0;
+         for (k=0; k < 8; ++k)
+            if (map[j*w + (i+k)] == 0)
+               c |= (1 << k);
+         g->map[j][i>>3] = c;
+      }
+   }
+
+   for (j=0; j < g->ch; ++j)
+      for (i=0; i < g->cw; ++i)
+         stbcc__build_clumps_for_cluster(g, i, j);
+
+   for (j=0; j < g->ch; ++j)
+      for (i=0; i < g->cw; ++i)
+         stbcc__build_all_connections_for_cluster(g, i, j);
+
+   stbcc__build_connected_components_for_clumps(g);
+
+   for (j=0; j < g->h; ++j)
+      for (i=0; i < g->w; ++i)
+         assert(g->clump_for_node[j][i] <= STBCC__NULL_CLUMPID);
+}
+
+
+static void stbcc__add_clump_connection(stbcc_grid *g, int x1, int y1, int x2, int y2)
+{
+   stbcc__cluster *cluster;
+   stbcc__clump *clump;
+
+   int cx1 = STBCC__CLUSTER_X_FOR_COORD_X(x1);
+   int cy1 = STBCC__CLUSTER_Y_FOR_COORD_Y(y1);
+   int cx2 = STBCC__CLUSTER_X_FOR_COORD_X(x2);
+   int cy2 = STBCC__CLUSTER_Y_FOR_COORD_Y(y2);
+
+   stbcc__clumpid c1 = g->clump_for_node[y1][x1];
+   stbcc__clumpid c2 = g->clump_for_node[y2][x2];
+
+   stbcc__relative_clumpid rc;
+
+   assert(cx1 != cx2 || cy1 != cy2);
+   assert(abs(cx1-cx2) + abs(cy1-cy2) == 1);
+
+   // add connection to c2 in c1
+
+   rc.clump_index = c2;
+   rc.cluster_dx = x2-x1;
+   rc.cluster_dy = y2-y1;
+
+   cluster = &g->cluster[cy1][cx1];
+   clump = &cluster->clump[c1];
+   assert(clump->num_adjacent <= clump->max_adjacent);
+   if (clump->num_adjacent == clump->max_adjacent)
+      g->cluster[cy1][cx1].rebuild_adjacency = 1;
+   else {
+      stbcc__relative_clumpid *adj = &cluster->adjacency_storage[clump->adjacent_clump_list_index];
+      assert(clump->num_adjacent < STBCC__MAX_EXITS_PER_CLUMP);
+      assert(clump->adjacent_clump_list_index + clump->num_adjacent <= STBCC__CLUSTER_ADJACENCY_COUNT);
+      adj[clump->num_adjacent++] = rc;
+   }
+}
+
+static void stbcc__remove_clump_connection(stbcc_grid *g, int x1, int y1, int x2, int y2)
+{
+   stbcc__cluster *cluster;
+   stbcc__clump *clump;
+   stbcc__relative_clumpid *adj;
+   int i;
+
+   int cx1 = STBCC__CLUSTER_X_FOR_COORD_X(x1);
+   int cy1 = STBCC__CLUSTER_Y_FOR_COORD_Y(y1);
+   int cx2 = STBCC__CLUSTER_X_FOR_COORD_X(x2);
+   int cy2 = STBCC__CLUSTER_Y_FOR_COORD_Y(y2);
+
+   stbcc__clumpid c1 = g->clump_for_node[y1][x1];
+   stbcc__clumpid c2 = g->clump_for_node[y2][x2];
+
+   stbcc__relative_clumpid rc;
+
+   assert(cx1 != cx2 || cy1 != cy2);
+   assert(abs(cx1-cx2) + abs(cy1-cy2) == 1);
+
+   // add connection to c2 in c1
+
+   rc.clump_index = c2;
+   rc.cluster_dx = x2-x1;
+   rc.cluster_dy = y2-y1;
+
+   cluster = &g->cluster[cy1][cx1];
+   clump = &cluster->clump[c1];
+   adj = &cluster->adjacency_storage[clump->adjacent_clump_list_index];
+
+   for (i=0; i < clump->num_adjacent; ++i)
+      if (rc.clump_index == adj[i].clump_index &&
+          rc.cluster_dx  == adj[i].cluster_dx  &&
+          rc.cluster_dy  == adj[i].cluster_dy)
+         break;
+
+   if (i < clump->num_adjacent)
+      adj[i] = adj[--clump->num_adjacent];
+   else
+      assert(0);
+}
+
+static void stbcc__add_connections_to_adjacent_cluster(stbcc_grid *g, int cx, int cy, int dx, int dy)
+{
+   unsigned char connected[STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER][STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER/8] = { { 0 } };
+   int x = cx * STBCC__CLUSTER_SIZE_X;
+   int y = cy * STBCC__CLUSTER_SIZE_Y;
+   int step_x, step_y=0, i, j, k, n;
+
+   if (cx < 0 || cx >= g->cw || cy < 0 || cy >= g->ch)
+      return;
+
+   if (cx+dx < 0 || cx+dx >= g->cw || cy+dy < 0 || cy+dy >= g->ch)
+      return;
+
+   if (g->cluster[cy][cx].rebuild_adjacency)
+      return;
+
+   assert(abs(dx) + abs(dy) == 1);
+
+   if (dx == 1) {
+      i = STBCC__CLUSTER_SIZE_X-1;
+      j = 0;
+      step_x = 0;
+      step_y = 1;
+      n = STBCC__CLUSTER_SIZE_Y;
+   } else if (dx == -1) {
+      i = 0;
+      j = 0;
+      step_x = 0;
+      step_y = 1;
+      n = STBCC__CLUSTER_SIZE_Y;
+   } else if (dy == -1) {
+      i = 0;
+      j = 0;
+      step_x = 1;
+      step_y = 0;
+      n = STBCC__CLUSTER_SIZE_X;
+   } else if (dy == 1) {
+      i = 0;
+      j = STBCC__CLUSTER_SIZE_Y-1;
+      step_x = 1;
+      step_y = 0;
+      n = STBCC__CLUSTER_SIZE_X;
+   } else {
+      assert(0);
+      return;
+   }
+
+   for (k=0; k < n; ++k) {
+      if (STBCC__MAP_OPEN(g, x+i, y+j) && STBCC__MAP_OPEN(g, x+i+dx, y+j+dy)) {
+         stbcc__clumpid src = g->clump_for_node[y+j][x+i];
+         stbcc__clumpid dest = g->clump_for_node[y+j+dy][x+i+dx];
+         if (0 == (connected[src][dest>>3] & (1 << (dest & 7)))) {
+            assert((dest>>3) < sizeof(connected));
+            connected[src][dest>>3] |= 1 << (dest & 7);
+            stbcc__add_clump_connection(g, x+i, y+j, x+i+dx, y+j+dy);
+            if (g->cluster[cy][cx].rebuild_adjacency)
+               break;
+         }
+      }
+      i += step_x;
+      j += step_y;
+   }
+}
+
+static void stbcc__remove_connections_to_adjacent_cluster(stbcc_grid *g, int cx, int cy, int dx, int dy)
+{
+   unsigned char disconnected[STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER][STBCC__MAX_EDGE_CLUMPS_PER_CLUSTER/8] = { { 0 } };
+   int x = cx * STBCC__CLUSTER_SIZE_X;
+   int y = cy * STBCC__CLUSTER_SIZE_Y;
+   int step_x, step_y=0, i, j, k, n;
+
+   if (cx < 0 || cx >= g->cw || cy < 0 || cy >= g->ch)
+      return;
+
+   if (cx+dx < 0 || cx+dx >= g->cw || cy+dy < 0 || cy+dy >= g->ch)
+      return;
+
+   assert(abs(dx) + abs(dy) == 1);
+
+   if (dx == 1) {
+      i = STBCC__CLUSTER_SIZE_X-1;
+      j = 0;
+      step_x = 0;
+      step_y = 1;
+      n = STBCC__CLUSTER_SIZE_Y;
+   } else if (dx == -1) {
+      i = 0;
+      j = 0;
+      step_x = 0;
+      step_y = 1;
+      n = STBCC__CLUSTER_SIZE_Y;
+   } else if (dy == -1) {
+      i = 0;
+      j = 0;
+      step_x = 1;
+      step_y = 0;
+      n = STBCC__CLUSTER_SIZE_X;
+   } else if (dy == 1) {
+      i = 0;
+      j = STBCC__CLUSTER_SIZE_Y-1;
+      step_x = 1;
+      step_y = 0;
+      n = STBCC__CLUSTER_SIZE_X;
+   } else {
+      assert(0);
+      return;
+   }
+
+   for (k=0; k < n; ++k) {
+      if (STBCC__MAP_OPEN(g, x+i, y+j) && STBCC__MAP_OPEN(g, x+i+dx, y+j+dy)) {
+         stbcc__clumpid src = g->clump_for_node[y+j][x+i];
+         stbcc__clumpid dest = g->clump_for_node[y+j+dy][x+i+dx];
+         if (0 == (disconnected[src][dest>>3] & (1 << (dest & 7)))) {
+            disconnected[src][dest>>3] |= 1 << (dest & 7);
+            stbcc__remove_clump_connection(g, x+i, y+j, x+i+dx, y+j+dy);
+         }
+      }
+      i += step_x;
+      j += step_y;
+   }
+}
+
+static stbcc__tinypoint stbcc__incluster_find(stbcc__cluster_build_info *cbi, int x, int y)
+{
+   stbcc__tinypoint p,q;
+   p = cbi->parent[y][x];
+   if (p.x == x && p.y == y)
+      return p;
+   q = stbcc__incluster_find(cbi, p.x, p.y);
+   cbi->parent[y][x] = q;
+   return q;
+}
+
+static void stbcc__incluster_union(stbcc__cluster_build_info *cbi, int x1, int y1, int x2, int y2)
+{
+   stbcc__tinypoint p = stbcc__incluster_find(cbi, x1,y1);
+   stbcc__tinypoint q = stbcc__incluster_find(cbi, x2,y2);
+
+   if (p.x == q.x && p.y == q.y)
+      return;
+
+   cbi->parent[p.y][p.x] = q;
+}
+
+static void stbcc__switch_root(stbcc__cluster_build_info *cbi, int x, int y, stbcc__tinypoint p)
+{
+   cbi->parent[p.y][p.x].x = x;
+   cbi->parent[p.y][p.x].y = y;
+   cbi->parent[y][x].x = x;
+   cbi->parent[y][x].y = y;
+}
+
+static void stbcc__build_clumps_for_cluster(stbcc_grid *g, int cx, int cy)
+{
+   stbcc__cluster *c;
+   stbcc__cluster_build_info cbi;
+   int label=0;
+   int i,j;
+   int x = cx * STBCC__CLUSTER_SIZE_X;
+   int y = cy * STBCC__CLUSTER_SIZE_Y;
+
+   // set initial disjoint set forest state
+   for (j=0; j < STBCC__CLUSTER_SIZE_Y; ++j) {
+      for (i=0; i < STBCC__CLUSTER_SIZE_X; ++i) {
+         cbi.parent[j][i].x = i;
+         cbi.parent[j][i].y = j;
+      }
+   }
+
+   // join all sets that are connected
+   for (j=0; j < STBCC__CLUSTER_SIZE_Y; ++j) {
+      // check down only if not on bottom row
+      if (j < STBCC__CLUSTER_SIZE_Y-1)
+         for (i=0; i < STBCC__CLUSTER_SIZE_X; ++i)
+            if (STBCC__MAP_OPEN(g,x+i,y+j) && STBCC__MAP_OPEN(g,x+i  ,y+j+1))
+               stbcc__incluster_union(&cbi, i,j, i,j+1);
+      // check right for everything but rightmost column
+      for (i=0; i < STBCC__CLUSTER_SIZE_X-1; ++i)
+         if (STBCC__MAP_OPEN(g,x+i,y+j) && STBCC__MAP_OPEN(g,x+i+1,y+j  ))
+            stbcc__incluster_union(&cbi, i,j, i+1,j);
+   }
+
+   // label all non-empty clumps along edges so that all edge clumps are first
+   // in list; this means in degenerate case we can skip traversing non-edge clumps.
+   // because in the first pass we only label leaders, we swap the leader to the
+   // edge first
+
+   // first put solid labels on all the edges; these will get overwritten if they're open
+   for (j=0; j < STBCC__CLUSTER_SIZE_Y; ++j)
+      cbi.label[j][0] = cbi.label[j][STBCC__CLUSTER_SIZE_X-1] = STBCC__NULL_CLUMPID;
+   for (i=0; i < STBCC__CLUSTER_SIZE_X; ++i)
+      cbi.label[0][i] = cbi.label[STBCC__CLUSTER_SIZE_Y-1][i] = STBCC__NULL_CLUMPID;
+
+   for (j=0; j < STBCC__CLUSTER_SIZE_Y; ++j) {
+      i = 0;
+      if (STBCC__MAP_OPEN(g, x+i, y+j)) {
+         stbcc__tinypoint p = stbcc__incluster_find(&cbi, i,j);
+         if (p.x == i && p.y == j)
+            // if this is the leader, give it a label
+            cbi.label[j][i] = label++;
+         else if (!(p.x == 0 || p.x == STBCC__CLUSTER_SIZE_X-1 || p.y == 0 || p.y == STBCC__CLUSTER_SIZE_Y-1)) {
+            // if leader is in interior, promote this edge node to leader and label
+            stbcc__switch_root(&cbi, i, j, p);
+            cbi.label[j][i] = label++;
+         }
+         // else if leader is on edge, do nothing (it'll get labelled when we reach it)
+      }
+      i = STBCC__CLUSTER_SIZE_X-1;
+      if (STBCC__MAP_OPEN(g, x+i, y+j)) {
+         stbcc__tinypoint p = stbcc__incluster_find(&cbi, i,j);
+         if (p.x == i && p.y == j)
+            cbi.label[j][i] = label++;
+         else if (!(p.x == 0 || p.x == STBCC__CLUSTER_SIZE_X-1 || p.y == 0 || p.y == STBCC__CLUSTER_SIZE_Y-1)) {
+            stbcc__switch_root(&cbi, i, j, p);
+            cbi.label[j][i] = label++;
+         }
+      }
+   }
+
+   for (i=1; i < STBCC__CLUSTER_SIZE_Y-1; ++i) {
+      j = 0;
+      if (STBCC__MAP_OPEN(g, x+i, y+j)) {
+         stbcc__tinypoint p = stbcc__incluster_find(&cbi, i,j);
+         if (p.x == i && p.y == j)
+            cbi.label[j][i] = label++;
+         else if (!(p.x == 0 || p.x == STBCC__CLUSTER_SIZE_X-1 || p.y == 0 || p.y == STBCC__CLUSTER_SIZE_Y-1)) {
+            stbcc__switch_root(&cbi, i, j, p);
+            cbi.label[j][i] = label++;
+         }
+      }
+      j = STBCC__CLUSTER_SIZE_Y-1;
+      if (STBCC__MAP_OPEN(g, x+i, y+j)) {
+         stbcc__tinypoint p = stbcc__incluster_find(&cbi, i,j);
+         if (p.x == i && p.y == j)
+            cbi.label[j][i] = label++;
+         else if (!(p.x == 0 || p.x == STBCC__CLUSTER_SIZE_X-1 || p.y == 0 || p.y == STBCC__CLUSTER_SIZE_Y-1)) {
+            stbcc__switch_root(&cbi, i, j, p);
+            cbi.label[j][i] = label++;
+         }
+      }
+   }
+
+   c = &g->cluster[cy][cx];
+   c->num_edge_clumps = label;
+
+   // label any internal clusters
+   for (j=1; j < STBCC__CLUSTER_SIZE_Y-1; ++j) {
+      for (i=1; i < STBCC__CLUSTER_SIZE_X-1; ++i) {
+         stbcc__tinypoint p = cbi.parent[j][i];
+         if (p.x == i && p.y == j) {
+            if (STBCC__MAP_OPEN(g,x+i,y+j))
+               cbi.label[j][i] = label++;
+            else
+               cbi.label[j][i] = STBCC__NULL_CLUMPID;
+         }
+      }
+   }
+
+   // label all other nodes
+   for (j=0; j < STBCC__CLUSTER_SIZE_Y; ++j) {
+      for (i=0; i < STBCC__CLUSTER_SIZE_X; ++i) {
+         stbcc__tinypoint p = stbcc__incluster_find(&cbi, i,j);
+         if (p.x != i || p.y != j) {
+            if (STBCC__MAP_OPEN(g,x+i,y+j))
+               cbi.label[j][i] = cbi.label[p.y][p.x];
+         }
+         if (STBCC__MAP_OPEN(g,x+i,y+j))
+            assert(cbi.label[j][i] != STBCC__NULL_CLUMPID);
+      }
+   }
+
+   c->num_clumps = label;
+
+   for (i=0; i < label; ++i) {
+      c->clump[i].num_adjacent = 0;
+      c->clump[i].max_adjacent = 0;
+   }
+
+   for (j=0; j < STBCC__CLUSTER_SIZE_Y; ++j)
+      for (i=0; i < STBCC__CLUSTER_SIZE_X; ++i) {
+         g->clump_for_node[y+j][x+i] = cbi.label[j][i]; // @OPTIMIZE: remove cbi.label entirely
+         assert(g->clump_for_node[y+j][x+i] <= STBCC__NULL_CLUMPID);
+      }
+
+   // set the global label for all interior clumps since they can't have connections,
+   // so we don't have to do this on the global pass (brings from O(N) to O(N^0.75))
+   for (i=(int) c->num_edge_clumps; i < (int) c->num_clumps; ++i) {
+      stbcc__global_clumpid gc;
+      gc.f.cluster_x = cx;
+      gc.f.cluster_y = cy;
+      gc.f.clump_index = i;
+      c->clump[i].global_label = gc;
+   }
+
+   c->rebuild_adjacency = 1; // flag that it has no valid adjacency data
+}
+
+#endif // STB_CONNECTED_COMPONENTS_IMPLEMENTATION
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_divide.h b/vendor/stb/stb_divide.h
new file mode 100644
index 0000000..6a51e3f
--- /dev/null
+++ b/vendor/stb/stb_divide.h
@@ -0,0 +1,433 @@
+// stb_divide.h - v0.94 - public domain - Sean Barrett, Feb 2010
+// Three kinds of divide/modulus of signed integers.
+//
+// HISTORY
+//
+//   v0.94              Fix integer overflow issues
+//   v0.93  2020-02-02  Write useful exit() value from main()
+//   v0.92  2019-02-25  Fix warning
+//   v0.91  2010-02-27  Fix euclidean division by INT_MIN for non-truncating C
+//                      Check result with 64-bit math to catch such cases
+//   v0.90  2010-02-24  First public release
+//
+// USAGE
+//
+// In *ONE* source file, put:
+//
+//    #define STB_DIVIDE_IMPLEMENTATION
+//    // #define C_INTEGER_DIVISION_TRUNCATES  // see Note 1
+//    // #define C_INTEGER_DIVISION_FLOORS     // see Note 2
+//    #include "stb_divide.h"
+//
+// Other source files should just include stb_divide.h
+//
+// Note 1: On platforms/compilers that you know signed C division
+// truncates, you can #define C_INTEGER_DIVISION_TRUNCATES.
+//
+// Note 2: On platforms/compilers that you know signed C division
+// floors (rounds to negative infinity), you can #define
+// C_INTEGER_DIVISION_FLOORS.
+//
+// You can #define STB_DIVIDE_TEST in which case the implementation
+// will generate a main() and compiling the result will create a
+// program that tests the implementation. Run it with no arguments
+// and any output indicates an error; run it with any argument and
+// it will also print the test results. Define STB_DIVIDE_TEST_64
+// to a 64-bit integer type to avoid overflows in the result-checking
+// which give false negatives.
+//
+// ABOUT
+//
+// This file provides three different consistent divide/mod pairs
+// implemented on top of arbitrary C/C++ division, including correct
+// handling of overflow of intermediate calculations:
+//
+//     trunc:   a/b truncates to 0,           a%b has same sign as a
+//     floor:   a/b truncates to -inf,        a%b has same sign as b
+//     eucl:    a/b truncates to sign(b)*inf, a%b is non-negative
+//
+// Not necessarily optimal; I tried to keep it generally efficient,
+// but there may be better ways.
+//
+// Briefly, for those who are not familiar with the problem, we note
+// the reason these divides exist and are interesting:
+//
+//     'trunc' is easy to implement in hardware (strip the signs,
+//          compute, reapply the signs), thus is commonly defined
+//          by many languages (including C99)
+//
+//     'floor' is simple to define and better behaved than trunc;
+//          for example it divides integers into fixed-size buckets
+//          without an extra-wide bucket at 0, and for a fixed
+//          divisor N there are only |N| possible moduli.
+//
+//     'eucl' guarantees fixed-sized buckets *and* a non-negative
+//          modulus and defines division to be whatever is needed
+//          to achieve that result.
+//
+// See "The Euclidean definition of the functions div and mod"
+// by Raymond Boute (1992), or "Division and Modulus for Computer
+// Scientists" by Daan Leijen (2001)
+//
+// We assume of the built-in C division:
+//     (a) modulus is the remainder for the corresponding division
+//     (b) a/b truncates if a and b are the same sign
+//
+// Property (a) requires (a/b)*b + (a%b)==a, and is required by C.
+// Property (b) seems to be true of all hardware but is *not* satisfied
+// by the euclidean division operator we define, so it's possibly not
+// always true. If any such platform turns up, we can add more cases.
+// (Possibly only stb_div_trunc currently relies on property (b).)
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+
+#ifndef INCLUDE_STB_DIVIDE_H
+#define INCLUDE_STB_DIVIDE_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern int stb_div_trunc(int value_to_be_divided, int value_to_divide_by);
+extern int stb_div_floor(int value_to_be_divided, int value_to_divide_by);
+extern int stb_div_eucl (int value_to_be_divided, int value_to_divide_by);
+extern int stb_mod_trunc(int value_to_be_divided, int value_to_divide_by);
+extern int stb_mod_floor(int value_to_be_divided, int value_to_divide_by);
+extern int stb_mod_eucl (int value_to_be_divided, int value_to_divide_by);
+
+#ifdef __cplusplus
+}
+#endif
+
+#ifdef STB_DIVIDE_IMPLEMENTATION
+
+#if defined(__STDC_VERSION) && __STDC_VERSION__ >= 19901
+   #ifndef C_INTEGER_DIVISION_TRUNCATES
+      #define C_INTEGER_DIVISION_TRUNCATES
+   #endif
+#endif
+
+#ifndef INT_MIN
+#include <limits.h> // if you have no limits.h, #define INT_MIN yourself
+#endif
+
+// the following macros are designed to allow testing
+// other platforms by simulating them
+#ifndef STB_DIVIDE_TEST_FLOOR
+   #define stb__div(a,b)  ((a)/(b))
+   #define stb__mod(a,b)  ((a)%(b))
+#else
+   // implement floor-style divide on trunc platform
+   #ifndef C_INTEGER_DIVISION_TRUNCATES
+   #error "floor test requires truncating division"
+   #endif
+   #undef C_INTEGER_DIVISION_TRUNCATES
+   int stb__div(int v1, int v2)
+   {
+      int q = v1/v2, r = v1%v2;
+      if ((r > 0 && v2 < 0) || (r < 0 && v2 > 0))
+         return q-1;
+      else
+         return q;
+   }
+
+   int stb__mod(int v1, int v2)
+   {
+      int r = v1%v2;
+      if ((r > 0 && v2 < 0) || (r < 0 && v2 > 0))
+         return r+v2;
+      else
+         return r;
+   }
+#endif
+
+int stb_div_trunc(int v1, int v2)
+{
+   #ifdef C_INTEGER_DIVISION_TRUNCATES
+   return v1/v2;
+   #else
+   if (v1 >= 0 && v2 <= 0)
+      return -stb__div(-v1,v2);  // both negative to avoid overflow
+   if (v1 <= 0 && v2 >= 0)
+      if (v1 != INT_MIN)
+         return -stb__div(v1,-v2);    // both negative to avoid overflow
+      else
+         return -stb__div(v1+v2,-v2)-1; // push v1 away from wrap point
+   else
+      return v1/v2;            // same sign, so expect truncation
+   #endif
+}
+
+int stb_div_floor(int v1, int v2)
+{
+   #ifdef C_INTEGER_DIVISION_FLOORS
+   return v1/v2;
+   #else
+   if (v1 >= 0 && v2 < 0) {
+      if (v2 + 1 >= INT_MIN + v1) // check if increasing v1's magnitude overflows
+         return -stb__div((v2+1)-v1,v2); // nope, so just compute it
+      else
+         return -stb__div(-v1,v2) + ((-v1)%v2 ? -1 : 0);
+   }
+   if (v1 < 0 && v2 >= 0) {
+      if (v1 != INT_MIN) {
+         if (v1 + 1 >= INT_MIN + v2) // check if increasing v1's magnitude overflows
+            return -stb__div((v1+1)-v2,-v2); // nope, so just compute it
+         else
+            return -stb__div(-v1,v2) + (stb__mod(v1,-v2) ? -1 : 0);
+      } else // it must be possible to compute -(v1+v2) without overflowing
+         return -stb__div(-(v1+v2),v2) + (stb__mod(-(v1+v2),v2) ? -2 : -1);
+   } else
+      return v1/v2;           // same sign, so expect truncation
+   #endif
+}
+
+int stb_div_eucl(int v1, int v2)
+{
+   int q,r;
+   #ifdef C_INTEGER_DIVISION_TRUNCATES
+   q = v1/v2;
+   r = v1%v2;
+   #else
+   // handle every quadrant separately, since we can't rely on q and r flor
+   if (v1 >= 0)
+      if (v2 >= 0)
+         return stb__div(v1,v2);
+      else if (v2 != INT_MIN)
+         q = -stb__div(v1,-v2), r = stb__mod(v1,-v2);
+      else
+         q = 0, r = v1;
+   else if (v1 != INT_MIN)
+      if (v2 >= 0)
+         q = -stb__div(-v1,v2), r = -stb__mod(-v1,v2);
+      else if (v2 != INT_MIN)
+         q = stb__div(-v1,-v2), r = -stb__mod(-v1,-v2);
+      else // if v2 is INT_MIN, then we can't use -v2, but we can't divide by v2
+         q = 1, r = v1-q*v2;
+   else // if v1 is INT_MIN, we have to move away from overflow place
+      if (v2 >= 0)
+         q = -stb__div(-(v1+v2),v2)-1, r = -stb__mod(-(v1+v2),v2);
+      else if (v2 != INT_MIN)
+         q = stb__div(-(v1-v2),-v2)+1, r = -stb__mod(-(v1-v2),-v2);
+      else // for INT_MIN / INT_MIN, we need to be extra-careful to avoid overflow
+         q = 1, r = 0;
+   #endif
+   if (r >= 0)
+      return q;
+   else
+      return q + (v2 > 0 ? -1 : 1);
+}
+
+int stb_mod_trunc(int v1, int v2)
+{
+   #ifdef C_INTEGER_DIVISION_TRUNCATES
+   return v1%v2;
+   #else
+   if (v1 >= 0) { // modulus result should always be positive
+      int r = stb__mod(v1,v2);
+      if (r >= 0)
+         return r;
+      else
+         return r - (v2 < 0 ? v2 : -v2);
+   } else {    // modulus result should always be negative
+      int r = stb__mod(v1,v2);
+      if (r <= 0)
+         return r;
+      else
+         return r + (v2 < 0 ? v2 : -v2);
+   }
+   #endif
+}
+
+int stb_mod_floor(int v1, int v2)
+{
+   #ifdef C_INTEGER_DIVISION_FLOORS
+   return v1%v2;
+   #else
+   if (v2 >= 0) { // result should always be positive
+      int r = stb__mod(v1,v2);
+      if (r >= 0)
+         return r;
+      else
+         return r + v2;
+   } else { // result should always be negative
+      int r = stb__mod(v1,v2);
+      if (r <= 0)
+         return r;
+      else
+         return r + v2;
+   }
+   #endif
+}
+
+int stb_mod_eucl(int v1, int v2)
+{
+   int r = stb__mod(v1,v2);
+
+   if (r >= 0)
+      return r;
+   else
+      return r - (v2 < 0 ? v2 : -v2); // negative abs() [to avoid overflow]
+}
+
+#ifdef STB_DIVIDE_TEST
+#include <stdio.h>
+#include <math.h>
+#include <limits.h>
+
+int show=0;
+int err=0;
+
+void stbdiv_check(int q, int r, int a, int b, char *type, int dir)
+{
+   if ((dir > 0 && r < 0) || (dir < 0 && r > 0)) {
+      fprintf(stderr, "FAILED: %s(%d,%d) remainder %d in wrong direction\n", type,a,b,r);
+      err++;
+   } else
+      if (b != INT_MIN) // can't compute abs(), but if b==INT_MIN all remainders are valid
+         if (r <= -abs(b) || r >= abs(b)) {
+            fprintf(stderr, "FAILED: %s(%d,%d) remainder %d out of range\n", type,a,b,r);
+            err++;
+         }
+   #ifdef STB_DIVIDE_TEST_64
+   {
+      STB_DIVIDE_TEST_64 q64 = q, r64=r, a64=a, b64=b;
+      if (q64*b64+r64 != a64) {
+         fprintf(stderr, "FAILED: %s(%d,%d) remainder %d doesn't match quotient %d\n", type,a,b,r,q);
+         err++;
+      }
+   }
+   #else
+   if (q*b+r != a) {
+      fprintf(stderr, "FAILED: %s(%d,%d) remainder %d doesn't match quotient %d\n", type,a,b,r,q);
+      err++;
+   }
+   #endif
+}
+
+void test(int a, int b)
+{
+   int q,r;
+   if (show) printf("(%+11d,%+d) |  ", a,b);
+   q = stb_div_trunc(a,b), r = stb_mod_trunc(a,b);
+   if (show) printf("(%+11d,%+2d)  ", q,r); stbdiv_check(q,r,a,b, "trunc",a);
+   q = stb_div_floor(a,b), r = stb_mod_floor(a,b);
+   if (show) printf("(%+11d,%+2d)  ", q,r); stbdiv_check(q,r,a,b, "floor",b);
+   q = stb_div_eucl (a,b), r = stb_mod_eucl (a,b);
+   if (show) printf("(%+11d,%+2d)\n", q,r); stbdiv_check(q,r,a,b, "euclidean",1);
+}
+
+void testh(int a, int b)
+{
+   int q,r;
+   if (show) printf("(%08x,%08x) |\n", a,b);
+   q = stb_div_trunc(a,b), r = stb_mod_trunc(a,b); stbdiv_check(q,r,a,b, "trunc",a);
+   if (show) printf("             (%08x,%08x)", q,r);
+   q = stb_div_floor(a,b), r = stb_mod_floor(a,b); stbdiv_check(q,r,a,b, "floor",b);
+   if (show) printf("   (%08x,%08x)", q,r);
+   q = stb_div_eucl (a,b), r = stb_mod_eucl (a,b); stbdiv_check(q,r,a,b, "euclidean",1);
+   if (show) printf("   (%08x,%08x)\n ", q,r);
+}
+
+int main(int argc, char **argv)
+{
+   if (argc > 1) show=1;
+
+   test(8,3);
+   test(8,-3);
+   test(-8,3);
+   test(-8,-3);
+   test(1,2);
+   test(1,-2);
+   test(-1,2);
+   test(-1,-2);
+   test(8,4);
+   test(8,-4);
+   test(-8,4);
+   test(-8,-4);
+
+   test(INT_MAX,1);
+   test(INT_MIN,1);
+   test(INT_MIN+1,1);
+   test(INT_MAX,-1);
+   //test(INT_MIN,-1); // this traps in MSVC, so we leave it untested
+   test(INT_MIN+1,-1);
+   test(INT_MIN,-2);
+   test(INT_MIN+1,2);
+   test(INT_MIN+1,-2);
+   test(INT_MAX,2);
+   test(INT_MAX,-2);
+   test(INT_MIN+1,2);
+   test(INT_MIN+1,-2);
+   test(INT_MIN,2);
+   test(INT_MIN,-2);
+   test(INT_MIN,7);
+   test(INT_MIN,-7);
+   test(INT_MIN+1,4);
+   test(INT_MIN+1,-4);
+
+   testh(-7, INT_MIN);
+   testh(-1, INT_MIN);
+   testh(1, INT_MIN);
+   testh(7, INT_MIN);
+
+   testh(INT_MAX-1, INT_MIN);
+   testh(INT_MAX,   INT_MIN);
+   testh(INT_MIN,   INT_MIN);
+   testh(INT_MIN+1, INT_MIN);
+
+   testh(INT_MAX-1, INT_MAX);
+   testh(INT_MAX  , INT_MAX);
+   testh(INT_MIN  , INT_MAX);
+   testh(INT_MIN+1, INT_MAX);
+
+   return err > 0 ? 1 : 0;
+}
+#endif // STB_DIVIDE_TEST
+#endif // STB_DIVIDE_IMPLEMENTATION
+#endif // INCLUDE_STB_DIVIDE_H
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_ds.h b/vendor/stb/stb_ds.h
new file mode 100644
index 0000000..e84c82d
--- /dev/null
+++ b/vendor/stb/stb_ds.h
@@ -0,0 +1,1895 @@
+/* stb_ds.h - v0.67 - public domain data structures - Sean Barrett 2019
+
+   This is a single-header-file library that provides easy-to-use
+   dynamic arrays and hash tables for C (also works in C++).
+
+   For a gentle introduction:
+      http://nothings.org/stb_ds
+
+   To use this library, do this in *one* C or C++ file:
+      #define STB_DS_IMPLEMENTATION
+      #include "stb_ds.h"
+
+TABLE OF CONTENTS
+
+  Table of Contents
+  Compile-time options
+  License
+  Documentation
+  Notes
+  Notes - Dynamic arrays
+  Notes - Hash maps
+  Credits
+
+COMPILE-TIME OPTIONS
+
+  #define STBDS_NO_SHORT_NAMES
+
+     This flag needs to be set globally.
+
+     By default stb_ds exposes shorter function names that are not qualified
+     with the "stbds_" prefix. If these names conflict with the names in your
+     code, define this flag.
+
+  #define STBDS_SIPHASH_2_4
+
+     This flag only needs to be set in the file containing #define STB_DS_IMPLEMENTATION.
+
+     By default stb_ds.h hashes using a weaker variant of SipHash and a custom hash for
+     4- and 8-byte keys. On 64-bit platforms, you can define the above flag to force
+     stb_ds.h to use specification-compliant SipHash-2-4 for all keys. Doing so makes
+     hash table insertion about 20% slower on 4- and 8-byte keys, 5% slower on
+     64-byte keys, and 10% slower on 256-byte keys on my test computer.
+
+  #define STBDS_REALLOC(context,ptr,size) better_realloc
+  #define STBDS_FREE(context,ptr)         better_free
+
+     These defines only need to be set in the file containing #define STB_DS_IMPLEMENTATION.
+
+     By default stb_ds uses stdlib realloc() and free() for memory management. You can
+     substitute your own functions instead by defining these symbols. You must either
+     define both, or neither. Note that at the moment, 'context' will always be NULL.
+     @TODO add an array/hash initialization function that takes a memory context pointer.
+
+  #define STBDS_UNIT_TESTS
+
+     Defines a function stbds_unit_tests() that checks the functioning of the data structures.
+
+  Note that on older versions of gcc (e.g. 5.x.x) you may need to build with '-std=c++0x'
+     (or equivalentally '-std=c++11') when using anonymous structures as seen on the web
+     page or in STBDS_UNIT_TESTS.
+
+LICENSE
+
+  Placed in the public domain and also MIT licensed.
+  See end of file for detailed license information.
+
+DOCUMENTATION
+
+  Dynamic Arrays
+
+    Non-function interface:
+
+      Declare an empty dynamic array of type T
+        T* foo = NULL;
+
+      Access the i'th item of a dynamic array 'foo' of type T, T* foo:
+        foo[i]
+
+    Functions (actually macros)
+
+      arrfree:
+        void arrfree(T*);
+          Frees the array.
+
+      arrlen:
+        ptrdiff_t arrlen(T*);
+          Returns the number of elements in the array.
+
+      arrlenu:
+        size_t arrlenu(T*);
+          Returns the number of elements in the array as an unsigned type.
+
+      arrpop:
+        T arrpop(T* a)
+          Removes the final element of the array and returns it.
+
+      arrput:
+        T arrput(T* a, T b);
+          Appends the item b to the end of array a. Returns b.
+
+      arrins:
+        T arrins(T* a, int p, T b);
+          Inserts the item b into the middle of array a, into a[p],
+          moving the rest of the array over. Returns b.
+
+      arrinsn:
+        void arrinsn(T* a, int p, int n);
+          Inserts n uninitialized items into array a starting at a[p],
+          moving the rest of the array over.
+
+      arraddnptr:
+        T* arraddnptr(T* a, int n)
+          Appends n uninitialized items onto array at the end.
+          Returns a pointer to the first uninitialized item added.
+
+      arraddnindex:
+        size_t arraddnindex(T* a, int n)
+          Appends n uninitialized items onto array at the end.
+          Returns the index of the first uninitialized item added.
+
+      arrdel:
+        void arrdel(T* a, int p);
+          Deletes the element at a[p], moving the rest of the array over.
+
+      arrdeln:
+        void arrdeln(T* a, int p, int n);
+          Deletes n elements starting at a[p], moving the rest of the array over.
+
+      arrdelswap:
+        void arrdelswap(T* a, int p);
+          Deletes the element at a[p], replacing it with the element from
+          the end of the array. O(1) performance.
+
+      arrsetlen:
+        void arrsetlen(T* a, int n);
+          Changes the length of the array to n. Allocates uninitialized
+          slots at the end if necessary.
+
+      arrsetcap:
+        size_t arrsetcap(T* a, int n);
+          Sets the length of allocated storage to at least n. It will not
+          change the length of the array.
+
+      arrcap:
+        size_t arrcap(T* a);
+          Returns the number of total elements the array can contain without
+          needing to be reallocated.
+
+  Hash maps & String hash maps
+
+    Given T is a structure type: struct { TK key; TV value; }. Note that some
+    functions do not require TV value and can have other fields. For string
+    hash maps, TK must be 'char *'.
+
+    Special interface:
+
+      stbds_rand_seed:
+        void stbds_rand_seed(size_t seed);
+          For security against adversarially chosen data, you should seed the
+          library with a strong random number. Or at least seed it with time().
+
+      stbds_hash_string:
+        size_t stbds_hash_string(char *str, size_t seed);
+          Returns a hash value for a string.
+
+      stbds_hash_bytes:
+        size_t stbds_hash_bytes(void *p, size_t len, size_t seed);
+          These functions hash an arbitrary number of bytes. The function
+          uses a custom hash for 4- and 8-byte data, and a weakened version
+          of SipHash for everything else. On 64-bit platforms you can get
+          specification-compliant SipHash-2-4 on all data by defining
+          STBDS_SIPHASH_2_4, at a significant cost in speed.
+
+    Non-function interface:
+
+      Declare an empty hash map of type T
+        T* foo = NULL;
+
+      Access the i'th entry in a hash table T* foo:
+        foo[i]
+
+    Function interface (actually macros):
+
+      hmfree
+      shfree
+        void hmfree(T*);
+        void shfree(T*);
+          Frees the hashmap and sets the pointer to NULL.
+
+      hmlen
+      shlen
+        ptrdiff_t hmlen(T*)
+        ptrdiff_t shlen(T*)
+          Returns the number of elements in the hashmap.
+
+      hmlenu
+      shlenu
+        size_t hmlenu(T*)
+        size_t shlenu(T*)
+          Returns the number of elements in the hashmap.
+
+      hmgeti
+      shgeti
+      hmgeti_ts
+        ptrdiff_t hmgeti(T*, TK key)
+        ptrdiff_t shgeti(T*, char* key)
+        ptrdiff_t hmgeti_ts(T*, TK key, ptrdiff_t tempvar)
+          Returns the index in the hashmap which has the key 'key', or -1
+          if the key is not present.
+
+      hmget
+      hmget_ts
+      shget
+        TV hmget(T*, TK key)
+        TV shget(T*, char* key)
+        TV hmget_ts(T*, TK key, ptrdiff_t tempvar)
+          Returns the value corresponding to 'key' in the hashmap.
+          The structure must have a 'value' field
+
+      hmgets
+      shgets
+        T hmgets(T*, TK key)
+        T shgets(T*, char* key)
+          Returns the structure corresponding to 'key' in the hashmap.
+
+      hmgetp
+      shgetp
+      hmgetp_ts
+      hmgetp_null
+      shgetp_null
+        T* hmgetp(T*, TK key)
+        T* shgetp(T*, char* key)
+        T* hmgetp_ts(T*, TK key, ptrdiff_t tempvar)
+        T* hmgetp_null(T*, TK key)
+        T* shgetp_null(T*, char *key)
+          Returns a pointer to the structure corresponding to 'key' in
+          the hashmap. Functions ending in "_null" return NULL if the key
+          is not present in the hashmap; the others return a pointer to a
+          structure holding the default value (but not the searched-for key).
+
+      hmdefault
+      shdefault
+        TV hmdefault(T*, TV value)
+        TV shdefault(T*, TV value)
+          Sets the default value for the hashmap, the value which will be
+          returned by hmget/shget if the key is not present.
+
+      hmdefaults
+      shdefaults
+        TV hmdefaults(T*, T item)
+        TV shdefaults(T*, T item)
+          Sets the default struct for the hashmap, the contents which will be
+          returned by hmgets/shgets if the key is not present.
+
+      hmput
+      shput
+        TV hmput(T*, TK key, TV value)
+        TV shput(T*, char* key, TV value)
+          Inserts a <key,value> pair into the hashmap. If the key is already
+          present in the hashmap, updates its value.
+
+      hmputs
+      shputs
+        T hmputs(T*, T item)
+        T shputs(T*, T item)
+          Inserts a struct with T.key into the hashmap. If the struct is already
+          present in the hashmap, updates it.
+
+      hmdel
+      shdel
+        int hmdel(T*, TK key)
+        int shdel(T*, char* key)
+          If 'key' is in the hashmap, deletes its entry and returns 1.
+          Otherwise returns 0.
+
+    Function interface (actually macros) for strings only:
+
+      sh_new_strdup
+        void sh_new_strdup(T*);
+          Overwrites the existing pointer with a newly allocated
+          string hashmap which will automatically allocate and free
+          each string key using realloc/free
+
+      sh_new_arena
+        void sh_new_arena(T*);
+          Overwrites the existing pointer with a newly allocated
+          string hashmap which will automatically allocate each string
+          key to a string arena. Every string key ever used by this
+          hash table remains in the arena until the arena is freed.
+          Additionally, any key which is deleted and reinserted will
+          be allocated multiple times in the string arena.
+
+NOTES
+
+  * These data structures are realloc'd when they grow, and the macro
+    "functions" write to the provided pointer. This means: (a) the pointer
+    must be an lvalue, and (b) the pointer to the data structure is not
+    stable, and you must maintain it the same as you would a realloc'd
+    pointer. For example, if you pass a pointer to a dynamic array to a
+    function which updates it, the function must return back the new
+    pointer to the caller. This is the price of trying to do this in C.
+
+  * The following are the only functions that are thread-safe on a single data
+    structure, i.e. can be run in multiple threads simultaneously on the same
+    data structure
+        hmlen        shlen
+        hmlenu       shlenu
+        hmget_ts     shget_ts
+        hmgeti_ts    shgeti_ts
+        hmgets_ts    shgets_ts
+
+  * You iterate over the contents of a dynamic array and a hashmap in exactly
+    the same way, using arrlen/hmlen/shlen:
+
+      for (i=0; i < arrlen(foo); ++i)
+         ... foo[i] ...
+
+  * All operations except arrins/arrdel are O(1) amortized, but individual
+    operations can be slow, so these data structures may not be suitable
+    for real time use. Dynamic arrays double in capacity as needed, so
+    elements are copied an average of once. Hash tables double/halve
+    their size as needed, with appropriate hysteresis to maintain O(1)
+    performance.
+
+NOTES - DYNAMIC ARRAY
+
+  * If you know how long a dynamic array is going to be in advance, you can avoid
+    extra memory allocations by using arrsetlen to allocate it to that length in
+    advance and use foo[n] while filling it out, or arrsetcap to allocate the memory
+    for that length and use arrput/arrpush as normal.
+
+  * Unlike some other versions of the dynamic array, this version should
+    be safe to use with strict-aliasing optimizations.
+
+NOTES - HASH MAP
+
+  * For compilers other than GCC and clang (e.g. Visual Studio), for hmput/hmget/hmdel
+    and variants, the key must be an lvalue (so the macro can take the address of it).
+    Extensions are used that eliminate this requirement if you're using C99 and later
+    in GCC or clang, or if you're using C++ in GCC. But note that this can make your
+    code less portable.
+
+  * To test for presence of a key in a hashmap, just do 'hmgeti(foo,key) >= 0'.
+
+  * The iteration order of your data in the hashmap is determined solely by the
+    order of insertions and deletions. In particular, if you never delete, new
+    keys are always added at the end of the array. This will be consistent
+    across all platforms and versions of the library. However, you should not
+    attempt to serialize the internal hash table, as the hash is not consistent
+    between different platforms, and may change with future versions of the library.
+
+  * Use sh_new_arena() for string hashmaps that you never delete from. Initialize
+    with NULL if you're managing the memory for your strings, or your strings are
+    never freed (at least until the hashmap is freed). Otherwise, use sh_new_strdup().
+    @TODO: make an arena variant that garbage collects the strings with a trivial
+    copy collector into a new arena whenever the table shrinks / rebuilds. Since
+    current arena recommendation is to only use arena if it never deletes, then
+    this can just replace current arena implementation.
+
+  * If adversarial input is a serious concern and you're on a 64-bit platform,
+    enable STBDS_SIPHASH_2_4 (see the 'Compile-time options' section), and pass
+    a strong random number to stbds_rand_seed.
+
+  * The default value for the hash table is stored in foo[-1], so if you
+    use code like 'hmget(T,k)->value = 5' you can accidentally overwrite
+    the value stored by hmdefault if 'k' is not present.
+
+CREDITS
+
+  Sean Barrett -- library, idea for dynamic array API/implementation
+  Per Vognsen  -- idea for hash table API/implementation
+  Rafael Sachetto -- arrpop()
+  github:HeroicKatora -- arraddn() reworking
+
+  Bugfixes:
+    Andy Durdin
+    Shane Liesegang
+    Vinh Truong
+    Andreas Molzer
+    github:hashitaku
+    github:srdjanstipic
+    Macoy Madson
+    Andreas Vennstrom
+    Tobias Mansfield-Williams
+*/
+
+#ifdef STBDS_UNIT_TESTS
+#define _CRT_SECURE_NO_WARNINGS
+#endif
+
+#ifndef INCLUDE_STB_DS_H
+#define INCLUDE_STB_DS_H
+
+#include <stddef.h>
+#include <string.h>
+
+#ifndef STBDS_NO_SHORT_NAMES
+#define arrlen      stbds_arrlen
+#define arrlenu     stbds_arrlenu
+#define arrput      stbds_arrput
+#define arrpush     stbds_arrput
+#define arrpop      stbds_arrpop
+#define arrfree     stbds_arrfree
+#define arraddn     stbds_arraddn // deprecated, use one of the following instead:
+#define arraddnptr  stbds_arraddnptr
+#define arraddnindex stbds_arraddnindex
+#define arrsetlen   stbds_arrsetlen
+#define arrlast     stbds_arrlast
+#define arrins      stbds_arrins
+#define arrinsn     stbds_arrinsn
+#define arrdel      stbds_arrdel
+#define arrdeln     stbds_arrdeln
+#define arrdelswap  stbds_arrdelswap
+#define arrcap      stbds_arrcap
+#define arrsetcap   stbds_arrsetcap
+
+#define hmput       stbds_hmput
+#define hmputs      stbds_hmputs
+#define hmget       stbds_hmget
+#define hmget_ts    stbds_hmget_ts
+#define hmgets      stbds_hmgets
+#define hmgetp      stbds_hmgetp
+#define hmgetp_ts   stbds_hmgetp_ts
+#define hmgetp_null stbds_hmgetp_null
+#define hmgeti      stbds_hmgeti
+#define hmgeti_ts   stbds_hmgeti_ts
+#define hmdel       stbds_hmdel
+#define hmlen       stbds_hmlen
+#define hmlenu      stbds_hmlenu
+#define hmfree      stbds_hmfree
+#define hmdefault   stbds_hmdefault
+#define hmdefaults  stbds_hmdefaults
+
+#define shput       stbds_shput
+#define shputi      stbds_shputi
+#define shputs      stbds_shputs
+#define shget       stbds_shget
+#define shgeti      stbds_shgeti
+#define shgets      stbds_shgets
+#define shgetp      stbds_shgetp
+#define shgetp_null stbds_shgetp_null
+#define shdel       stbds_shdel
+#define shlen       stbds_shlen
+#define shlenu      stbds_shlenu
+#define shfree      stbds_shfree
+#define shdefault   stbds_shdefault
+#define shdefaults  stbds_shdefaults
+#define sh_new_arena  stbds_sh_new_arena
+#define sh_new_strdup stbds_sh_new_strdup
+
+#define stralloc    stbds_stralloc
+#define strreset    stbds_strreset
+#endif
+
+#if defined(STBDS_REALLOC) && !defined(STBDS_FREE) || !defined(STBDS_REALLOC) && defined(STBDS_FREE)
+#error "You must define both STBDS_REALLOC and STBDS_FREE, or neither."
+#endif
+#if !defined(STBDS_REALLOC) && !defined(STBDS_FREE)
+#include <stdlib.h>
+#define STBDS_REALLOC(c,p,s) realloc(p,s)
+#define STBDS_FREE(c,p)      free(p)
+#endif
+
+#ifdef _MSC_VER
+#define STBDS_NOTUSED(v)  (void)(v)
+#else
+#define STBDS_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// for security against attackers, seed the library with a random number, at least time() but stronger is better
+extern void stbds_rand_seed(size_t seed);
+
+// these are the hash functions used internally if you want to test them or use them for other purposes
+extern size_t stbds_hash_bytes(void *p, size_t len, size_t seed);
+extern size_t stbds_hash_string(char *str, size_t seed);
+
+// this is a simple string arena allocator, initialize with e.g. 'stbds_string_arena my_arena={0}'.
+typedef struct stbds_string_arena stbds_string_arena;
+extern char * stbds_stralloc(stbds_string_arena *a, char *str);
+extern void   stbds_strreset(stbds_string_arena *a);
+
+// have to #define STBDS_UNIT_TESTS to call this
+extern void stbds_unit_tests(void);
+
+///////////////
+//
+// Everything below here is implementation details
+//
+
+extern void * stbds_arrgrowf(void *a, size_t elemsize, size_t addlen, size_t min_cap);
+extern void   stbds_arrfreef(void *a);
+extern void   stbds_hmfree_func(void *p, size_t elemsize);
+extern void * stbds_hmget_key(void *a, size_t elemsize, void *key, size_t keysize, int mode);
+extern void * stbds_hmget_key_ts(void *a, size_t elemsize, void *key, size_t keysize, ptrdiff_t *temp, int mode);
+extern void * stbds_hmput_default(void *a, size_t elemsize);
+extern void * stbds_hmput_key(void *a, size_t elemsize, void *key, size_t keysize, int mode);
+extern void * stbds_hmdel_key(void *a, size_t elemsize, void *key, size_t keysize, size_t keyoffset, int mode);
+extern void * stbds_shmode_func(size_t elemsize, int mode);
+
+#ifdef __cplusplus
+}
+#endif
+
+#if defined(__GNUC__) || defined(__clang__)
+#define STBDS_HAS_TYPEOF
+#ifdef __cplusplus
+//#define STBDS_HAS_LITERAL_ARRAY  // this is currently broken for clang
+#endif
+#endif
+
+#if !defined(__cplusplus)
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+#define STBDS_HAS_LITERAL_ARRAY
+#endif
+#endif
+
+// this macro takes the address of the argument, but on gcc/clang can accept rvalues
+#if defined(STBDS_HAS_LITERAL_ARRAY) && defined(STBDS_HAS_TYPEOF)
+  #if __clang__
+  #define STBDS_ADDRESSOF(typevar, value)     ((__typeof__(typevar)[1]){value}) // literal array decays to pointer to value
+  #else
+  #define STBDS_ADDRESSOF(typevar, value)     ((typeof(typevar)[1]){value}) // literal array decays to pointer to value
+  #endif
+#else
+#define STBDS_ADDRESSOF(typevar, value)     &(value)
+#endif
+
+#define STBDS_OFFSETOF(var,field)           ((char *) &(var)->field - (char *) (var))
+
+#define stbds_header(t)  ((stbds_array_header *) (t) - 1)
+#define stbds_temp(t)    stbds_header(t)->temp
+#define stbds_temp_key(t) (*(char **) stbds_header(t)->hash_table)
+
+#define stbds_arrsetcap(a,n)   (stbds_arrgrow(a,0,n))
+#define stbds_arrsetlen(a,n)   ((stbds_arrcap(a) < (size_t) (n) ? stbds_arrsetcap((a),(size_t)(n)),0 : 0), (a) ? stbds_header(a)->length = (size_t) (n) : 0)
+#define stbds_arrcap(a)        ((a) ? stbds_header(a)->capacity : 0)
+#define stbds_arrlen(a)        ((a) ? (ptrdiff_t) stbds_header(a)->length : 0)
+#define stbds_arrlenu(a)       ((a) ?             stbds_header(a)->length : 0)
+#define stbds_arrput(a,v)      (stbds_arrmaybegrow(a,1), (a)[stbds_header(a)->length++] = (v))
+#define stbds_arrpush          stbds_arrput  // synonym
+#define stbds_arrpop(a)        (stbds_header(a)->length--, (a)[stbds_header(a)->length])
+#define stbds_arraddn(a,n)     ((void)(stbds_arraddnindex(a, n)))    // deprecated, use one of the following instead:
+#define stbds_arraddnptr(a,n)  (stbds_arrmaybegrow(a,n), (n) ? (stbds_header(a)->length += (n), &(a)[stbds_header(a)->length-(n)]) : (a))
+#define stbds_arraddnindex(a,n)(stbds_arrmaybegrow(a,n), (n) ? (stbds_header(a)->length += (n), stbds_header(a)->length-(n)) : stbds_arrlen(a))
+#define stbds_arraddnoff       stbds_arraddnindex
+#define stbds_arrlast(a)       ((a)[stbds_header(a)->length-1])
+#define stbds_arrfree(a)       ((void) ((a) ? STBDS_FREE(NULL,stbds_header(a)) : (void)0), (a)=NULL)
+#define stbds_arrdel(a,i)      stbds_arrdeln(a,i,1)
+#define stbds_arrdeln(a,i,n)   (memmove(&(a)[i], &(a)[(i)+(n)], sizeof *(a) * (stbds_header(a)->length-(n)-(i))), stbds_header(a)->length -= (n))
+#define stbds_arrdelswap(a,i)  ((a)[i] = stbds_arrlast(a), stbds_header(a)->length -= 1)
+#define stbds_arrinsn(a,i,n)   (stbds_arraddn((a),(n)), memmove(&(a)[(i)+(n)], &(a)[i], sizeof *(a) * (stbds_header(a)->length-(n)-(i))))
+#define stbds_arrins(a,i,v)    (stbds_arrinsn((a),(i),1), (a)[i]=(v))
+
+#define stbds_arrmaybegrow(a,n)  ((!(a) || stbds_header(a)->length + (n) > stbds_header(a)->capacity) \
+                                  ? (stbds_arrgrow(a,n,0),0) : 0)
+
+#define stbds_arrgrow(a,b,c)   ((a) = stbds_arrgrowf_wrapper((a), sizeof *(a), (b), (c)))
+
+#define stbds_hmput(t, k, v) \
+    ((t) = stbds_hmput_key_wrapper((t), sizeof *(t), (void*) STBDS_ADDRESSOF((t)->key, (k)), sizeof (t)->key, 0),   \
+     (t)[stbds_temp((t)-1)].key = (k),    \
+     (t)[stbds_temp((t)-1)].value = (v))
+
+#define stbds_hmputs(t, s) \
+    ((t) = stbds_hmput_key_wrapper((t), sizeof *(t), &(s).key, sizeof (s).key, STBDS_HM_BINARY), \
+     (t)[stbds_temp((t)-1)] = (s))
+
+#define stbds_hmgeti(t,k) \
+    ((t) = stbds_hmget_key_wrapper((t), sizeof *(t), (void*) STBDS_ADDRESSOF((t)->key, (k)), sizeof (t)->key, STBDS_HM_BINARY), \
+      stbds_temp((t)-1))
+
+#define stbds_hmgeti_ts(t,k,temp) \
+    ((t) = stbds_hmget_key_ts_wrapper((t), sizeof *(t), (void*) STBDS_ADDRESSOF((t)->key, (k)), sizeof (t)->key, &(temp), STBDS_HM_BINARY), \
+      (temp))
+
+#define stbds_hmgetp(t, k) \
+    ((void) stbds_hmgeti(t,k), &(t)[stbds_temp((t)-1)])
+
+#define stbds_hmgetp_ts(t, k, temp) \
+    ((void) stbds_hmgeti_ts(t,k,temp), &(t)[temp])
+
+#define stbds_hmdel(t,k) \
+    (((t) = stbds_hmdel_key_wrapper((t),sizeof *(t), (void*) STBDS_ADDRESSOF((t)->key, (k)), sizeof (t)->key, STBDS_OFFSETOF((t),key), STBDS_HM_BINARY)),(t)?stbds_temp((t)-1):0)
+
+#define stbds_hmdefault(t, v) \
+    ((t) = stbds_hmput_default_wrapper((t), sizeof *(t)), (t)[-1].value = (v))
+
+#define stbds_hmdefaults(t, s) \
+    ((t) = stbds_hmput_default_wrapper((t), sizeof *(t)), (t)[-1] = (s))
+
+#define stbds_hmfree(p)        \
+    ((void) ((p) != NULL ? stbds_hmfree_func((p)-1,sizeof*(p)),0 : 0),(p)=NULL)
+
+#define stbds_hmgets(t, k)    (*stbds_hmgetp(t,k))
+#define stbds_hmget(t, k)     (stbds_hmgetp(t,k)->value)
+#define stbds_hmget_ts(t, k, temp)  (stbds_hmgetp_ts(t,k,temp)->value)
+#define stbds_hmlen(t)        ((t) ? (ptrdiff_t) stbds_header((t)-1)->length-1 : 0)
+#define stbds_hmlenu(t)       ((t) ?             stbds_header((t)-1)->length-1 : 0)
+#define stbds_hmgetp_null(t,k)  (stbds_hmgeti(t,k) == -1 ? NULL : &(t)[stbds_temp((t)-1)])
+
+#define stbds_shput(t, k, v) \
+    ((t) = stbds_hmput_key_wrapper((t), sizeof *(t), (void*) (k), sizeof (t)->key, STBDS_HM_STRING),   \
+     (t)[stbds_temp((t)-1)].value = (v))
+
+#define stbds_shputi(t, k, v) \
+    ((t) = stbds_hmput_key_wrapper((t), sizeof *(t), (void*) (k), sizeof (t)->key, STBDS_HM_STRING),   \
+     (t)[stbds_temp((t)-1)].value = (v), stbds_temp((t)-1))
+
+#define stbds_shputs(t, s) \
+    ((t) = stbds_hmput_key_wrapper((t), sizeof *(t), (void*) (s).key, sizeof (s).key, STBDS_HM_STRING), \
+     (t)[stbds_temp((t)-1)] = (s), \
+     (t)[stbds_temp((t)-1)].key = stbds_temp_key((t)-1)) // above line overwrites whole structure, so must rewrite key here if it was allocated internally
+
+#define stbds_pshput(t, p) \
+    ((t) = stbds_hmput_key_wrapper((t), sizeof *(t), (void*) (p)->key, sizeof (p)->key, STBDS_HM_PTR_TO_STRING), \
+     (t)[stbds_temp((t)-1)] = (p))
+
+#define stbds_shgeti(t,k) \
+     ((t) = stbds_hmget_key_wrapper((t), sizeof *(t), (void*) (k), sizeof (t)->key, STBDS_HM_STRING), \
+      stbds_temp((t)-1))
+
+#define stbds_pshgeti(t,k) \
+     ((t) = stbds_hmget_key_wrapper((t), sizeof *(t), (void*) (k), sizeof (*(t))->key, STBDS_HM_PTR_TO_STRING), \
+      stbds_temp((t)-1))
+
+#define stbds_shgetp(t, k) \
+    ((void) stbds_shgeti(t,k), &(t)[stbds_temp((t)-1)])
+
+#define stbds_pshget(t, k) \
+    ((void) stbds_pshgeti(t,k), (t)[stbds_temp((t)-1)])
+
+#define stbds_shdel(t,k) \
+    (((t) = stbds_hmdel_key_wrapper((t),sizeof *(t), (void*) (k), sizeof (t)->key, STBDS_OFFSETOF((t),key), STBDS_HM_STRING)),(t)?stbds_temp((t)-1):0)
+#define stbds_pshdel(t,k) \
+    (((t) = stbds_hmdel_key_wrapper((t),sizeof *(t), (void*) (k), sizeof (*(t))->key, STBDS_OFFSETOF(*(t),key), STBDS_HM_PTR_TO_STRING)),(t)?stbds_temp((t)-1):0)
+
+#define stbds_sh_new_arena(t)  \
+    ((t) = stbds_shmode_func_wrapper(t, sizeof *(t), STBDS_SH_ARENA))
+#define stbds_sh_new_strdup(t) \
+    ((t) = stbds_shmode_func_wrapper(t, sizeof *(t), STBDS_SH_STRDUP))
+
+#define stbds_shdefault(t, v)  stbds_hmdefault(t,v)
+#define stbds_shdefaults(t, s) stbds_hmdefaults(t,s)
+
+#define stbds_shfree       stbds_hmfree
+#define stbds_shlenu       stbds_hmlenu
+
+#define stbds_shgets(t, k) (*stbds_shgetp(t,k))
+#define stbds_shget(t, k)  (stbds_shgetp(t,k)->value)
+#define stbds_shgetp_null(t,k)  (stbds_shgeti(t,k) == -1 ? NULL : &(t)[stbds_temp((t)-1)])
+#define stbds_shlen        stbds_hmlen
+
+typedef struct
+{
+  size_t      length;
+  size_t      capacity;
+  void      * hash_table;
+  ptrdiff_t   temp;
+} stbds_array_header;
+
+typedef struct stbds_string_block
+{
+  struct stbds_string_block *next;
+  char storage[8];
+} stbds_string_block;
+
+struct stbds_string_arena
+{
+  stbds_string_block *storage;
+  size_t remaining;
+  unsigned char block;
+  unsigned char mode;  // this isn't used by the string arena itself
+};
+
+#define STBDS_HM_BINARY         0
+#define STBDS_HM_STRING         1
+
+enum
+{
+   STBDS_SH_NONE,
+   STBDS_SH_DEFAULT,
+   STBDS_SH_STRDUP,
+   STBDS_SH_ARENA
+};
+
+#ifdef __cplusplus
+// in C we use implicit assignment from these void*-returning functions to T*.
+// in C++ these templates make the same code work
+template<class T> static T * stbds_arrgrowf_wrapper(T *a, size_t elemsize, size_t addlen, size_t min_cap) {
+  return (T*)stbds_arrgrowf((void *)a, elemsize, addlen, min_cap);
+}
+template<class T> static T * stbds_hmget_key_wrapper(T *a, size_t elemsize, void *key, size_t keysize, int mode) {
+  return (T*)stbds_hmget_key((void*)a, elemsize, key, keysize, mode);
+}
+template<class T> static T * stbds_hmget_key_ts_wrapper(T *a, size_t elemsize, void *key, size_t keysize, ptrdiff_t *temp, int mode) {
+  return (T*)stbds_hmget_key_ts((void*)a, elemsize, key, keysize, temp, mode);
+}
+template<class T> static T * stbds_hmput_default_wrapper(T *a, size_t elemsize) {
+  return (T*)stbds_hmput_default((void *)a, elemsize);
+}
+template<class T> static T * stbds_hmput_key_wrapper(T *a, size_t elemsize, void *key, size_t keysize, int mode) {
+  return (T*)stbds_hmput_key((void*)a, elemsize, key, keysize, mode);
+}
+template<class T> static T * stbds_hmdel_key_wrapper(T *a, size_t elemsize, void *key, size_t keysize, size_t keyoffset, int mode){
+  return (T*)stbds_hmdel_key((void*)a, elemsize, key, keysize, keyoffset, mode);
+}
+template<class T> static T * stbds_shmode_func_wrapper(T *, size_t elemsize, int mode) {
+  return (T*)stbds_shmode_func(elemsize, mode);
+}
+#else
+#define stbds_arrgrowf_wrapper            stbds_arrgrowf
+#define stbds_hmget_key_wrapper           stbds_hmget_key
+#define stbds_hmget_key_ts_wrapper        stbds_hmget_key_ts
+#define stbds_hmput_default_wrapper       stbds_hmput_default
+#define stbds_hmput_key_wrapper           stbds_hmput_key
+#define stbds_hmdel_key_wrapper           stbds_hmdel_key
+#define stbds_shmode_func_wrapper(t,e,m)  stbds_shmode_func(e,m)
+#endif
+
+#endif // INCLUDE_STB_DS_H
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//   IMPLEMENTATION
+//
+
+#ifdef STB_DS_IMPLEMENTATION
+#include <assert.h>
+#include <string.h>
+
+#ifndef STBDS_ASSERT
+#define STBDS_ASSERT_WAS_UNDEFINED
+#define STBDS_ASSERT(x)   ((void) 0)
+#endif
+
+#ifdef STBDS_STATISTICS
+#define STBDS_STATS(x)   x
+size_t stbds_array_grow;
+size_t stbds_hash_grow;
+size_t stbds_hash_shrink;
+size_t stbds_hash_rebuild;
+size_t stbds_hash_probes;
+size_t stbds_hash_alloc;
+size_t stbds_rehash_probes;
+size_t stbds_rehash_items;
+#else
+#define STBDS_STATS(x)
+#endif
+
+//
+// stbds_arr implementation
+//
+
+//int *prev_allocs[65536];
+//int num_prev;
+
+void *stbds_arrgrowf(void *a, size_t elemsize, size_t addlen, size_t min_cap)
+{
+  stbds_array_header temp={0}; // force debugging
+  void *b;
+  size_t min_len = stbds_arrlen(a) + addlen;
+  (void) sizeof(temp);
+
+  // compute the minimum capacity needed
+  if (min_len > min_cap)
+    min_cap = min_len;
+
+  if (min_cap <= stbds_arrcap(a))
+    return a;
+
+  // increase needed capacity to guarantee O(1) amortized
+  if (min_cap < 2 * stbds_arrcap(a))
+    min_cap = 2 * stbds_arrcap(a);
+  else if (min_cap < 4)
+    min_cap = 4;
+
+  //if (num_prev < 65536) if (a) prev_allocs[num_prev++] = (int *) ((char *) a+1);
+  //if (num_prev == 2201)
+  //  num_prev = num_prev;
+  b = STBDS_REALLOC(NULL, (a) ? stbds_header(a) : 0, elemsize * min_cap + sizeof(stbds_array_header));
+  //if (num_prev < 65536) prev_allocs[num_prev++] = (int *) (char *) b;
+  b = (char *) b + sizeof(stbds_array_header);
+  if (a == NULL) {
+    stbds_header(b)->length = 0;
+    stbds_header(b)->hash_table = 0;
+    stbds_header(b)->temp = 0;
+  } else {
+    STBDS_STATS(++stbds_array_grow);
+  }
+  stbds_header(b)->capacity = min_cap;
+
+  return b;
+}
+
+void stbds_arrfreef(void *a)
+{
+  STBDS_FREE(NULL, stbds_header(a));
+}
+
+//
+// stbds_hm hash table implementation
+//
+
+#ifdef STBDS_INTERNAL_SMALL_BUCKET
+#define STBDS_BUCKET_LENGTH      4
+#else
+#define STBDS_BUCKET_LENGTH      8
+#endif
+
+#define STBDS_BUCKET_SHIFT      (STBDS_BUCKET_LENGTH == 8 ? 3 : 2)
+#define STBDS_BUCKET_MASK       (STBDS_BUCKET_LENGTH-1)
+#define STBDS_CACHE_LINE_SIZE   64
+
+#define STBDS_ALIGN_FWD(n,a)   (((n) + (a) - 1) & ~((a)-1))
+
+typedef struct
+{
+   size_t    hash [STBDS_BUCKET_LENGTH];
+   ptrdiff_t index[STBDS_BUCKET_LENGTH];
+} stbds_hash_bucket; // in 32-bit, this is one 64-byte cache line; in 64-bit, each array is one 64-byte cache line
+
+typedef struct
+{
+  char * temp_key; // this MUST be the first field of the hash table
+  size_t slot_count;
+  size_t used_count;
+  size_t used_count_threshold;
+  size_t used_count_shrink_threshold;
+  size_t tombstone_count;
+  size_t tombstone_count_threshold;
+  size_t seed;
+  size_t slot_count_log2;
+  stbds_string_arena string;
+  stbds_hash_bucket *storage; // not a separate allocation, just 64-byte aligned storage after this struct
+} stbds_hash_index;
+
+#define STBDS_INDEX_EMPTY    -1
+#define STBDS_INDEX_DELETED  -2
+#define STBDS_INDEX_IN_USE(x)  ((x) >= 0)
+
+#define STBDS_HASH_EMPTY      0
+#define STBDS_HASH_DELETED    1
+
+static size_t stbds_hash_seed=0x31415926;
+
+void stbds_rand_seed(size_t seed)
+{
+  stbds_hash_seed = seed;
+}
+
+#define stbds_load_32_or_64(var, temp, v32, v64_hi, v64_lo)                                          \
+  temp = v64_lo ^ v32, temp <<= 16, temp <<= 16, temp >>= 16, temp >>= 16, /* discard if 32-bit */   \
+  var = v64_hi, var <<= 16, var <<= 16,                                    /* discard if 32-bit */   \
+  var ^= temp ^ v32
+
+#define STBDS_SIZE_T_BITS           ((sizeof (size_t)) * 8)
+
+static size_t stbds_probe_position(size_t hash, size_t slot_count, size_t slot_log2)
+{
+  size_t pos;
+  STBDS_NOTUSED(slot_log2);
+  pos = hash & (slot_count-1);
+  #ifdef STBDS_INTERNAL_BUCKET_START
+  pos &= ~STBDS_BUCKET_MASK;
+  #endif
+  return pos;
+}
+
+static size_t stbds_log2(size_t slot_count)
+{
+  size_t n=0;
+  while (slot_count > 1) {
+    slot_count >>= 1;
+    ++n;
+  }
+  return n;
+}
+
+static stbds_hash_index *stbds_make_hash_index(size_t slot_count, stbds_hash_index *ot)
+{
+  stbds_hash_index *t;
+  t = (stbds_hash_index *) STBDS_REALLOC(NULL,0,(slot_count >> STBDS_BUCKET_SHIFT) * sizeof(stbds_hash_bucket) + sizeof(stbds_hash_index) + STBDS_CACHE_LINE_SIZE-1);
+  t->storage = (stbds_hash_bucket *) STBDS_ALIGN_FWD((size_t) (t+1), STBDS_CACHE_LINE_SIZE);
+  t->slot_count = slot_count;
+  t->slot_count_log2 = stbds_log2(slot_count);
+  t->tombstone_count = 0;
+  t->used_count = 0;
+
+  #if 0 // A1
+  t->used_count_threshold        = slot_count*12/16; // if 12/16th of table is occupied, grow
+  t->tombstone_count_threshold   = slot_count* 2/16; // if tombstones are 2/16th of table, rebuild
+  t->used_count_shrink_threshold = slot_count* 4/16; // if table is only 4/16th full, shrink
+  #elif 1 // A2
+  //t->used_count_threshold        = slot_count*12/16; // if 12/16th of table is occupied, grow
+  //t->tombstone_count_threshold   = slot_count* 3/16; // if tombstones are 3/16th of table, rebuild
+  //t->used_count_shrink_threshold = slot_count* 4/16; // if table is only 4/16th full, shrink
+
+  // compute without overflowing
+  t->used_count_threshold        = slot_count - (slot_count>>2);
+  t->tombstone_count_threshold   = (slot_count>>3) + (slot_count>>4);
+  t->used_count_shrink_threshold = slot_count >> 2;
+
+  #elif 0 // B1
+  t->used_count_threshold        = slot_count*13/16; // if 13/16th of table is occupied, grow
+  t->tombstone_count_threshold   = slot_count* 2/16; // if tombstones are 2/16th of table, rebuild
+  t->used_count_shrink_threshold = slot_count* 5/16; // if table is only 5/16th full, shrink
+  #else // C1
+  t->used_count_threshold        = slot_count*14/16; // if 14/16th of table is occupied, grow
+  t->tombstone_count_threshold   = slot_count* 2/16; // if tombstones are 2/16th of table, rebuild
+  t->used_count_shrink_threshold = slot_count* 6/16; // if table is only 6/16th full, shrink
+  #endif
+  // Following statistics were measured on a Core i7-6700 @ 4.00Ghz, compiled with clang 7.0.1 -O2
+    // Note that the larger tables have high variance as they were run fewer times
+  //     A1            A2          B1           C1
+  //    0.10ms :     0.10ms :     0.10ms :     0.11ms :      2,000 inserts creating 2K table
+  //    0.96ms :     0.95ms :     0.97ms :     1.04ms :     20,000 inserts creating 20K table
+  //   14.48ms :    14.46ms :    10.63ms :    11.00ms :    200,000 inserts creating 200K table
+  //  195.74ms :   196.35ms :   203.69ms :   214.92ms :  2,000,000 inserts creating 2M table
+  // 2193.88ms :  2209.22ms :  2285.54ms :  2437.17ms : 20,000,000 inserts creating 20M table
+  //   65.27ms :    53.77ms :    65.33ms :    65.47ms : 500,000 inserts & deletes in 2K table
+  //   72.78ms :    62.45ms :    71.95ms :    72.85ms : 500,000 inserts & deletes in 20K table
+  //   89.47ms :    77.72ms :    96.49ms :    96.75ms : 500,000 inserts & deletes in 200K table
+  //   97.58ms :    98.14ms :    97.18ms :    97.53ms : 500,000 inserts & deletes in 2M table
+  //  118.61ms :   119.62ms :   120.16ms :   118.86ms : 500,000 inserts & deletes in 20M table
+  //  192.11ms :   194.39ms :   196.38ms :   195.73ms : 500,000 inserts & deletes in 200M table
+
+  if (slot_count <= STBDS_BUCKET_LENGTH)
+    t->used_count_shrink_threshold = 0;
+  // to avoid infinite loop, we need to guarantee that at least one slot is empty and will terminate probes
+  STBDS_ASSERT(t->used_count_threshold + t->tombstone_count_threshold < t->slot_count);
+  STBDS_STATS(++stbds_hash_alloc);
+  if (ot) {
+    t->string = ot->string;
+    // reuse old seed so we can reuse old hashes so below "copy out old data" doesn't do any hashing
+    t->seed = ot->seed;
+  } else {
+    size_t a,b,temp;
+    memset(&t->string, 0, sizeof(t->string));
+    t->seed = stbds_hash_seed;
+    // LCG
+    // in 32-bit, a =          2147001325   b =  715136305
+    // in 64-bit, a = 2862933555777941757   b = 3037000493
+    stbds_load_32_or_64(a,temp, 2147001325, 0x27bb2ee6, 0x87b0b0fd);
+    stbds_load_32_or_64(b,temp,  715136305,          0, 0xb504f32d);
+    stbds_hash_seed = stbds_hash_seed  * a + b;
+  }
+
+  {
+    size_t i,j;
+    for (i=0; i < slot_count >> STBDS_BUCKET_SHIFT; ++i) {
+      stbds_hash_bucket *b = &t->storage[i];
+      for (j=0; j < STBDS_BUCKET_LENGTH; ++j)
+        b->hash[j] = STBDS_HASH_EMPTY;
+      for (j=0; j < STBDS_BUCKET_LENGTH; ++j)
+        b->index[j] = STBDS_INDEX_EMPTY;
+    }
+  }
+
+  // copy out the old data, if any
+  if (ot) {
+    size_t i,j;
+    t->used_count = ot->used_count;
+    for (i=0; i < ot->slot_count >> STBDS_BUCKET_SHIFT; ++i) {
+      stbds_hash_bucket *ob = &ot->storage[i];
+      for (j=0; j < STBDS_BUCKET_LENGTH; ++j) {
+        if (STBDS_INDEX_IN_USE(ob->index[j])) {
+          size_t hash = ob->hash[j];
+          size_t pos = stbds_probe_position(hash, t->slot_count, t->slot_count_log2);
+          size_t step = STBDS_BUCKET_LENGTH;
+          STBDS_STATS(++stbds_rehash_items);
+          for (;;) {
+            size_t limit,z;
+            stbds_hash_bucket *bucket;
+            bucket = &t->storage[pos >> STBDS_BUCKET_SHIFT];
+            STBDS_STATS(++stbds_rehash_probes);
+
+            for (z=pos & STBDS_BUCKET_MASK; z < STBDS_BUCKET_LENGTH; ++z) {
+              if (bucket->hash[z] == 0) {
+                bucket->hash[z] = hash;
+                bucket->index[z] = ob->index[j];
+                goto done;
+              }
+            }
+
+            limit = pos & STBDS_BUCKET_MASK;
+            for (z = 0; z < limit; ++z) {
+              if (bucket->hash[z] == 0) {
+                bucket->hash[z] = hash;
+                bucket->index[z] = ob->index[j];
+                goto done;
+              }
+            }
+
+            pos += step;                  // quadratic probing
+            step += STBDS_BUCKET_LENGTH;
+            pos &= (t->slot_count-1);
+          }
+        }
+       done:
+        ;
+      }
+    }
+  }
+
+  return t;
+}
+
+#define STBDS_ROTATE_LEFT(val, n)   (((val) << (n)) | ((val) >> (STBDS_SIZE_T_BITS - (n))))
+#define STBDS_ROTATE_RIGHT(val, n)  (((val) >> (n)) | ((val) << (STBDS_SIZE_T_BITS - (n))))
+
+size_t stbds_hash_string(char *str, size_t seed)
+{
+  size_t hash = seed;
+  while (*str)
+     hash = STBDS_ROTATE_LEFT(hash, 9) + (unsigned char) *str++;
+
+  // Thomas Wang 64-to-32 bit mix function, hopefully also works in 32 bits
+  hash ^= seed;
+  hash = (~hash) + (hash << 18);
+  hash ^= hash ^ STBDS_ROTATE_RIGHT(hash,31);
+  hash = hash * 21;
+  hash ^= hash ^ STBDS_ROTATE_RIGHT(hash,11);
+  hash += (hash << 6);
+  hash ^= STBDS_ROTATE_RIGHT(hash,22);
+  return hash+seed;
+}
+
+#ifdef STBDS_SIPHASH_2_4
+#define STBDS_SIPHASH_C_ROUNDS 2
+#define STBDS_SIPHASH_D_ROUNDS 4
+typedef int STBDS_SIPHASH_2_4_can_only_be_used_in_64_bit_builds[sizeof(size_t) == 8 ? 1 : -1];
+#endif
+
+#ifndef STBDS_SIPHASH_C_ROUNDS
+#define STBDS_SIPHASH_C_ROUNDS 1
+#endif
+#ifndef STBDS_SIPHASH_D_ROUNDS
+#define STBDS_SIPHASH_D_ROUNDS 1
+#endif
+
+#ifdef _MSC_VER
+#pragma warning(push)
+#pragma warning(disable:4127) // conditional expression is constant, for do..while(0) and sizeof()==
+#endif
+
+static size_t stbds_siphash_bytes(void *p, size_t len, size_t seed)
+{
+  unsigned char *d = (unsigned char *) p;
+  size_t i,j;
+  size_t v0,v1,v2,v3, data;
+
+  // hash that works on 32- or 64-bit registers without knowing which we have
+  // (computes different results on 32-bit and 64-bit platform)
+  // derived from siphash, but on 32-bit platforms very different as it uses 4 32-bit state not 4 64-bit
+  v0 = ((((size_t) 0x736f6d65 << 16) << 16) + 0x70736575) ^  seed;
+  v1 = ((((size_t) 0x646f7261 << 16) << 16) + 0x6e646f6d) ^ ~seed;
+  v2 = ((((size_t) 0x6c796765 << 16) << 16) + 0x6e657261) ^  seed;
+  v3 = ((((size_t) 0x74656462 << 16) << 16) + 0x79746573) ^ ~seed;
+
+  #ifdef STBDS_TEST_SIPHASH_2_4
+  // hardcoded with key material in the siphash test vectors
+  v0 ^= 0x0706050403020100ull ^  seed;
+  v1 ^= 0x0f0e0d0c0b0a0908ull ^ ~seed;
+  v2 ^= 0x0706050403020100ull ^  seed;
+  v3 ^= 0x0f0e0d0c0b0a0908ull ^ ~seed;
+  #endif
+
+  #define STBDS_SIPROUND() \
+    do {                   \
+      v0 += v1; v1 = STBDS_ROTATE_LEFT(v1, 13);  v1 ^= v0; v0 = STBDS_ROTATE_LEFT(v0,STBDS_SIZE_T_BITS/2); \
+      v2 += v3; v3 = STBDS_ROTATE_LEFT(v3, 16);  v3 ^= v2;                                                 \
+      v2 += v1; v1 = STBDS_ROTATE_LEFT(v1, 17);  v1 ^= v2; v2 = STBDS_ROTATE_LEFT(v2,STBDS_SIZE_T_BITS/2); \
+      v0 += v3; v3 = STBDS_ROTATE_LEFT(v3, 21);  v3 ^= v0;                                                 \
+    } while (0)
+
+  for (i=0; i+sizeof(size_t) <= len; i += sizeof(size_t), d += sizeof(size_t)) {
+    data = d[0] | (d[1] << 8) | (d[2] << 16) | (d[3] << 24);
+    data |= (size_t) (d[4] | (d[5] << 8) | (d[6] << 16) | (d[7] << 24)) << 16 << 16; // discarded if size_t == 4
+
+    v3 ^= data;
+    for (j=0; j < STBDS_SIPHASH_C_ROUNDS; ++j)
+      STBDS_SIPROUND();
+    v0 ^= data;
+  }
+  data = len << (STBDS_SIZE_T_BITS-8);
+  switch (len - i) {
+    case 7: data |= ((size_t) d[6] << 24) << 24; // fall through
+    case 6: data |= ((size_t) d[5] << 20) << 20; // fall through
+    case 5: data |= ((size_t) d[4] << 16) << 16; // fall through
+    case 4: data |= (d[3] << 24); // fall through
+    case 3: data |= (d[2] << 16); // fall through
+    case 2: data |= (d[1] << 8); // fall through
+    case 1: data |= d[0]; // fall through
+    case 0: break;
+  }
+  v3 ^= data;
+  for (j=0; j < STBDS_SIPHASH_C_ROUNDS; ++j)
+    STBDS_SIPROUND();
+  v0 ^= data;
+  v2 ^= 0xff;
+  for (j=0; j < STBDS_SIPHASH_D_ROUNDS; ++j)
+    STBDS_SIPROUND();
+
+#ifdef STBDS_SIPHASH_2_4
+  return v0^v1^v2^v3;
+#else
+  return v1^v2^v3; // slightly stronger since v0^v3 in above cancels out final round operation? I tweeted at the authors of SipHash about this but they didn't reply
+#endif
+}
+
+size_t stbds_hash_bytes(void *p, size_t len, size_t seed)
+{
+#ifdef STBDS_SIPHASH_2_4
+  return stbds_siphash_bytes(p,len,seed);
+#else
+  unsigned char *d = (unsigned char *) p;
+
+  if (len == 4) {
+    unsigned int hash = d[0] | (d[1] << 8) | (d[2] << 16) | (d[3] << 24);
+    #if 0
+    // HASH32-A  Bob Jenkin's hash function w/o large constants
+    hash ^= seed;
+    hash -= (hash<<6);
+    hash ^= (hash>>17);
+    hash -= (hash<<9);
+    hash ^= seed;
+    hash ^= (hash<<4);
+    hash -= (hash<<3);
+    hash ^= (hash<<10);
+    hash ^= (hash>>15);
+    #elif 1
+    // HASH32-BB  Bob Jenkin's presumably-accidental version of Thomas Wang hash with rotates turned into shifts.
+    // Note that converting these back to rotates makes it run a lot slower, presumably due to collisions, so I'm
+    // not really sure what's going on.
+    hash ^= seed;
+    hash = (hash ^ 61) ^ (hash >> 16);
+    hash = hash + (hash << 3);
+    hash = hash ^ (hash >> 4);
+    hash = hash * 0x27d4eb2d;
+    hash ^= seed;
+    hash = hash ^ (hash >> 15);
+    #else  // HASH32-C   -  Murmur3
+    hash ^= seed;
+    hash *= 0xcc9e2d51;
+    hash = (hash << 17) | (hash >> 15);
+    hash *= 0x1b873593;
+    hash ^= seed;
+    hash = (hash << 19) | (hash >> 13);
+    hash = hash*5 + 0xe6546b64;
+    hash ^= hash >> 16;
+    hash *= 0x85ebca6b;
+    hash ^= seed;
+    hash ^= hash >> 13;
+    hash *= 0xc2b2ae35;
+    hash ^= hash >> 16;
+    #endif
+    // Following statistics were measured on a Core i7-6700 @ 4.00Ghz, compiled with clang 7.0.1 -O2
+    // Note that the larger tables have high variance as they were run fewer times
+    //  HASH32-A   //  HASH32-BB  //  HASH32-C
+    //    0.10ms   //    0.10ms   //    0.10ms :      2,000 inserts creating 2K table
+    //    0.96ms   //    0.95ms   //    0.99ms :     20,000 inserts creating 20K table
+    //   14.69ms   //   14.43ms   //   14.97ms :    200,000 inserts creating 200K table
+    //  199.99ms   //  195.36ms   //  202.05ms :  2,000,000 inserts creating 2M table
+    // 2234.84ms   // 2187.74ms   // 2240.38ms : 20,000,000 inserts creating 20M table
+    //   55.68ms   //   53.72ms   //   57.31ms : 500,000 inserts & deletes in 2K table
+    //   63.43ms   //   61.99ms   //   65.73ms : 500,000 inserts & deletes in 20K table
+    //   80.04ms   //   77.96ms   //   81.83ms : 500,000 inserts & deletes in 200K table
+    //  100.42ms   //   97.40ms   //  102.39ms : 500,000 inserts & deletes in 2M table
+    //  119.71ms   //  120.59ms   //  121.63ms : 500,000 inserts & deletes in 20M table
+    //  185.28ms   //  195.15ms   //  187.74ms : 500,000 inserts & deletes in 200M table
+    //   15.58ms   //   14.79ms   //   15.52ms : 200,000 inserts creating 200K table with varying key spacing
+
+    return (((size_t) hash << 16 << 16) | hash) ^ seed;
+  } else if (len == 8 && sizeof(size_t) == 8) {
+    size_t hash = d[0] | (d[1] << 8) | (d[2] << 16) | (d[3] << 24);
+    hash |= (size_t) (d[4] | (d[5] << 8) | (d[6] << 16) | (d[7] << 24)) << 16 << 16; // avoid warning if size_t == 4
+    hash ^= seed;
+    hash = (~hash) + (hash << 21);
+    hash ^= STBDS_ROTATE_RIGHT(hash,24);
+    hash *= 265;
+    hash ^= STBDS_ROTATE_RIGHT(hash,14);
+    hash ^= seed;
+    hash *= 21;
+    hash ^= STBDS_ROTATE_RIGHT(hash,28);
+    hash += (hash << 31);
+    hash = (~hash) + (hash << 18);
+    return hash;
+  } else {
+    return stbds_siphash_bytes(p,len,seed);
+  }
+#endif
+}
+#ifdef _MSC_VER
+#pragma warning(pop)
+#endif
+
+
+static int stbds_is_key_equal(void *a, size_t elemsize, void *key, size_t keysize, size_t keyoffset, int mode, size_t i)
+{
+  if (mode >= STBDS_HM_STRING)
+    return 0==strcmp((char *) key, * (char **) ((char *) a + elemsize*i + keyoffset));
+  else
+    return 0==memcmp(key, (char *) a + elemsize*i + keyoffset, keysize);
+}
+
+#define STBDS_HASH_TO_ARR(x,elemsize) ((char*) (x) - (elemsize))
+#define STBDS_ARR_TO_HASH(x,elemsize) ((char*) (x) + (elemsize))
+
+#define stbds_hash_table(a)  ((stbds_hash_index *) stbds_header(a)->hash_table)
+
+void stbds_hmfree_func(void *a, size_t elemsize)
+{
+  if (a == NULL) return;
+  if (stbds_hash_table(a) != NULL) {
+    if (stbds_hash_table(a)->string.mode == STBDS_SH_STRDUP) {
+      size_t i;
+      // skip 0th element, which is default
+      for (i=1; i < stbds_header(a)->length; ++i)
+        STBDS_FREE(NULL, *(char**) ((char *) a + elemsize*i));
+    }
+    stbds_strreset(&stbds_hash_table(a)->string);
+  }
+  STBDS_FREE(NULL, stbds_header(a)->hash_table);
+  STBDS_FREE(NULL, stbds_header(a));
+}
+
+static ptrdiff_t stbds_hm_find_slot(void *a, size_t elemsize, void *key, size_t keysize, size_t keyoffset, int mode)
+{
+  void *raw_a = STBDS_HASH_TO_ARR(a,elemsize);
+  stbds_hash_index *table = stbds_hash_table(raw_a);
+  size_t hash = mode >= STBDS_HM_STRING ? stbds_hash_string((char*)key,table->seed) : stbds_hash_bytes(key, keysize,table->seed);
+  size_t step = STBDS_BUCKET_LENGTH;
+  size_t limit,i;
+  size_t pos;
+  stbds_hash_bucket *bucket;
+
+  if (hash < 2) hash += 2; // stored hash values are forbidden from being 0, so we can detect empty slots
+
+  pos = stbds_probe_position(hash, table->slot_count, table->slot_count_log2);
+
+  for (;;) {
+    STBDS_STATS(++stbds_hash_probes);
+    bucket = &table->storage[pos >> STBDS_BUCKET_SHIFT];
+
+    // start searching from pos to end of bucket, this should help performance on small hash tables that fit in cache
+    for (i=pos & STBDS_BUCKET_MASK; i < STBDS_BUCKET_LENGTH; ++i) {
+      if (bucket->hash[i] == hash) {
+        if (stbds_is_key_equal(a, elemsize, key, keysize, keyoffset, mode, bucket->index[i])) {
+          return (pos & ~STBDS_BUCKET_MASK)+i;
+        }
+      } else if (bucket->hash[i] == STBDS_HASH_EMPTY) {
+        return -1;
+      }
+    }
+
+    // search from beginning of bucket to pos
+    limit = pos & STBDS_BUCKET_MASK;
+    for (i = 0; i < limit; ++i) {
+      if (bucket->hash[i] == hash) {
+        if (stbds_is_key_equal(a, elemsize, key, keysize, keyoffset, mode, bucket->index[i])) {
+          return (pos & ~STBDS_BUCKET_MASK)+i;
+        }
+      } else if (bucket->hash[i] == STBDS_HASH_EMPTY) {
+        return -1;
+      }
+    }
+
+    // quadratic probing
+    pos += step;
+    step += STBDS_BUCKET_LENGTH;
+    pos &= (table->slot_count-1);
+  }
+  /* NOTREACHED */
+}
+
+void * stbds_hmget_key_ts(void *a, size_t elemsize, void *key, size_t keysize, ptrdiff_t *temp, int mode)
+{
+  size_t keyoffset = 0;
+  if (a == NULL) {
+    // make it non-empty so we can return a temp
+    a = stbds_arrgrowf(0, elemsize, 0, 1);
+    stbds_header(a)->length += 1;
+    memset(a, 0, elemsize);
+    *temp = STBDS_INDEX_EMPTY;
+    // adjust a to point after the default element
+    return STBDS_ARR_TO_HASH(a,elemsize);
+  } else {
+    stbds_hash_index *table;
+    void *raw_a = STBDS_HASH_TO_ARR(a,elemsize);
+    // adjust a to point to the default element
+    table = (stbds_hash_index *) stbds_header(raw_a)->hash_table;
+    if (table == 0) {
+      *temp = -1;
+    } else {
+      ptrdiff_t slot = stbds_hm_find_slot(a, elemsize, key, keysize, keyoffset, mode);
+      if (slot < 0) {
+        *temp = STBDS_INDEX_EMPTY;
+      } else {
+        stbds_hash_bucket *b = &table->storage[slot >> STBDS_BUCKET_SHIFT];
+        *temp = b->index[slot & STBDS_BUCKET_MASK];
+      }
+    }
+    return a;
+  }
+}
+
+void * stbds_hmget_key(void *a, size_t elemsize, void *key, size_t keysize, int mode)
+{
+  ptrdiff_t temp;
+  void *p = stbds_hmget_key_ts(a, elemsize, key, keysize, &temp, mode);
+  stbds_temp(STBDS_HASH_TO_ARR(p,elemsize)) = temp;
+  return p;
+}
+
+void * stbds_hmput_default(void *a, size_t elemsize)
+{
+  // three cases:
+  //   a is NULL <- allocate
+  //   a has a hash table but no entries, because of shmode <- grow
+  //   a has entries <- do nothing
+  if (a == NULL || stbds_header(STBDS_HASH_TO_ARR(a,elemsize))->length == 0) {
+    a = stbds_arrgrowf(a ? STBDS_HASH_TO_ARR(a,elemsize) : NULL, elemsize, 0, 1);
+    stbds_header(a)->length += 1;
+    memset(a, 0, elemsize);
+    a=STBDS_ARR_TO_HASH(a,elemsize);
+  }
+  return a;
+}
+
+static char *stbds_strdup(char *str);
+
+void *stbds_hmput_key(void *a, size_t elemsize, void *key, size_t keysize, int mode)
+{
+  size_t keyoffset=0;
+  void *raw_a;
+  stbds_hash_index *table;
+
+  if (a == NULL) {
+    a = stbds_arrgrowf(0, elemsize, 0, 1);
+    memset(a, 0, elemsize);
+    stbds_header(a)->length += 1;
+    // adjust a to point AFTER the default element
+    a = STBDS_ARR_TO_HASH(a,elemsize);
+  }
+
+  // adjust a to point to the default element
+  raw_a = a;
+  a = STBDS_HASH_TO_ARR(a,elemsize);
+
+  table = (stbds_hash_index *) stbds_header(a)->hash_table;
+
+  if (table == NULL || table->used_count >= table->used_count_threshold) {
+    stbds_hash_index *nt;
+    size_t slot_count;
+
+    slot_count = (table == NULL) ? STBDS_BUCKET_LENGTH : table->slot_count*2;
+    nt = stbds_make_hash_index(slot_count, table);
+    if (table)
+      STBDS_FREE(NULL, table);
+    else
+      nt->string.mode = mode >= STBDS_HM_STRING ? STBDS_SH_DEFAULT : 0;
+    stbds_header(a)->hash_table = table = nt;
+    STBDS_STATS(++stbds_hash_grow);
+  }
+
+  // we iterate hash table explicitly because we want to track if we saw a tombstone
+  {
+    size_t hash = mode >= STBDS_HM_STRING ? stbds_hash_string((char*)key,table->seed) : stbds_hash_bytes(key, keysize,table->seed);
+    size_t step = STBDS_BUCKET_LENGTH;
+    size_t pos;
+    ptrdiff_t tombstone = -1;
+    stbds_hash_bucket *bucket;
+
+    // stored hash values are forbidden from being 0, so we can detect empty slots to early out quickly
+    if (hash < 2) hash += 2;
+
+    pos = stbds_probe_position(hash, table->slot_count, table->slot_count_log2);
+
+    for (;;) {
+      size_t limit, i;
+      STBDS_STATS(++stbds_hash_probes);
+      bucket = &table->storage[pos >> STBDS_BUCKET_SHIFT];
+
+      // start searching from pos to end of bucket
+      for (i=pos & STBDS_BUCKET_MASK; i < STBDS_BUCKET_LENGTH; ++i) {
+        if (bucket->hash[i] == hash) {
+          if (stbds_is_key_equal(raw_a, elemsize, key, keysize, keyoffset, mode, bucket->index[i])) {
+            stbds_temp(a) = bucket->index[i];
+            if (mode >= STBDS_HM_STRING)
+              stbds_temp_key(a) = * (char **) ((char *) raw_a + elemsize*bucket->index[i] + keyoffset);
+            return STBDS_ARR_TO_HASH(a,elemsize);
+          }
+        } else if (bucket->hash[i] == 0) {
+          pos = (pos & ~STBDS_BUCKET_MASK) + i;
+          goto found_empty_slot;
+        } else if (tombstone < 0) {
+          if (bucket->index[i] == STBDS_INDEX_DELETED)
+            tombstone = (ptrdiff_t) ((pos & ~STBDS_BUCKET_MASK) + i);
+        }
+      }
+
+      // search from beginning of bucket to pos
+      limit = pos & STBDS_BUCKET_MASK;
+      for (i = 0; i < limit; ++i) {
+        if (bucket->hash[i] == hash) {
+          if (stbds_is_key_equal(raw_a, elemsize, key, keysize, keyoffset, mode, bucket->index[i])) {
+            stbds_temp(a) = bucket->index[i];
+            return STBDS_ARR_TO_HASH(a,elemsize);
+          }
+        } else if (bucket->hash[i] == 0) {
+          pos = (pos & ~STBDS_BUCKET_MASK) + i;
+          goto found_empty_slot;
+        } else if (tombstone < 0) {
+          if (bucket->index[i] == STBDS_INDEX_DELETED)
+            tombstone = (ptrdiff_t) ((pos & ~STBDS_BUCKET_MASK) + i);
+        }
+      }
+
+      // quadratic probing
+      pos += step;
+      step += STBDS_BUCKET_LENGTH;
+      pos &= (table->slot_count-1);
+    }
+   found_empty_slot:
+    if (tombstone >= 0) {
+      pos = tombstone;
+      --table->tombstone_count;
+    }
+    ++table->used_count;
+
+    {
+      ptrdiff_t i = (ptrdiff_t) stbds_arrlen(a);
+      // we want to do stbds_arraddn(1), but we can't use the macros since we don't have something of the right type
+      if ((size_t) i+1 > stbds_arrcap(a))
+        *(void **) &a = stbds_arrgrowf(a, elemsize, 1, 0);
+      raw_a = STBDS_ARR_TO_HASH(a,elemsize);
+
+      STBDS_ASSERT((size_t) i+1 <= stbds_arrcap(a));
+      stbds_header(a)->length = i+1;
+      bucket = &table->storage[pos >> STBDS_BUCKET_SHIFT];
+      bucket->hash[pos & STBDS_BUCKET_MASK] = hash;
+      bucket->index[pos & STBDS_BUCKET_MASK] = i-1;
+      stbds_temp(a) = i-1;
+
+      switch (table->string.mode) {
+         case STBDS_SH_STRDUP:  stbds_temp_key(a) = *(char **) ((char *) a + elemsize*i) = stbds_strdup((char*) key); break;
+         case STBDS_SH_ARENA:   stbds_temp_key(a) = *(char **) ((char *) a + elemsize*i) = stbds_stralloc(&table->string, (char*)key); break;
+         case STBDS_SH_DEFAULT: stbds_temp_key(a) = *(char **) ((char *) a + elemsize*i) = (char *) key; break;
+         default:                memcpy((char *) a + elemsize*i, key, keysize); break;
+      }
+    }
+    return STBDS_ARR_TO_HASH(a,elemsize);
+  }
+}
+
+void * stbds_shmode_func(size_t elemsize, int mode)
+{
+  void *a = stbds_arrgrowf(0, elemsize, 0, 1);
+  stbds_hash_index *h;
+  memset(a, 0, elemsize);
+  stbds_header(a)->length = 1;
+  stbds_header(a)->hash_table = h = (stbds_hash_index *) stbds_make_hash_index(STBDS_BUCKET_LENGTH, NULL);
+  h->string.mode = (unsigned char) mode;
+  return STBDS_ARR_TO_HASH(a,elemsize);
+}
+
+void * stbds_hmdel_key(void *a, size_t elemsize, void *key, size_t keysize, size_t keyoffset, int mode)
+{
+  if (a == NULL) {
+    return 0;
+  } else {
+    stbds_hash_index *table;
+    void *raw_a = STBDS_HASH_TO_ARR(a,elemsize);
+    table = (stbds_hash_index *) stbds_header(raw_a)->hash_table;
+    stbds_temp(raw_a) = 0;
+    if (table == 0) {
+      return a;
+    } else {
+      ptrdiff_t slot;
+      slot = stbds_hm_find_slot(a, elemsize, key, keysize, keyoffset, mode);
+      if (slot < 0)
+        return a;
+      else {
+        stbds_hash_bucket *b = &table->storage[slot >> STBDS_BUCKET_SHIFT];
+        int i = slot & STBDS_BUCKET_MASK;
+        ptrdiff_t old_index = b->index[i];
+        ptrdiff_t final_index = (ptrdiff_t) stbds_arrlen(raw_a)-1-1; // minus one for the raw_a vs a, and minus one for 'last'
+        STBDS_ASSERT(slot < (ptrdiff_t) table->slot_count);
+        --table->used_count;
+        ++table->tombstone_count;
+        stbds_temp(raw_a) = 1;
+        STBDS_ASSERT(table->used_count >= 0);
+        //STBDS_ASSERT(table->tombstone_count < table->slot_count/4);
+        b->hash[i] = STBDS_HASH_DELETED;
+        b->index[i] = STBDS_INDEX_DELETED;
+
+        if (mode == STBDS_HM_STRING && table->string.mode == STBDS_SH_STRDUP)
+          STBDS_FREE(NULL, *(char**) ((char *) a+elemsize*old_index));
+
+        // if indices are the same, memcpy is a no-op, but back-pointer-fixup will fail, so skip
+        if (old_index != final_index) {
+          // swap delete
+          memmove((char*) a + elemsize*old_index, (char*) a + elemsize*final_index, elemsize);
+
+          // now find the slot for the last element
+          if (mode == STBDS_HM_STRING)
+            slot = stbds_hm_find_slot(a, elemsize, *(char**) ((char *) a+elemsize*old_index + keyoffset), keysize, keyoffset, mode);
+          else
+            slot = stbds_hm_find_slot(a, elemsize,  (char* ) a+elemsize*old_index + keyoffset, keysize, keyoffset, mode);
+          STBDS_ASSERT(slot >= 0);
+          b = &table->storage[slot >> STBDS_BUCKET_SHIFT];
+          i = slot & STBDS_BUCKET_MASK;
+          STBDS_ASSERT(b->index[i] == final_index);
+          b->index[i] = old_index;
+        }
+        stbds_header(raw_a)->length -= 1;
+
+        if (table->used_count < table->used_count_shrink_threshold && table->slot_count > STBDS_BUCKET_LENGTH) {
+          stbds_header(raw_a)->hash_table = stbds_make_hash_index(table->slot_count>>1, table);
+          STBDS_FREE(NULL, table);
+          STBDS_STATS(++stbds_hash_shrink);
+        } else if (table->tombstone_count > table->tombstone_count_threshold) {
+          stbds_header(raw_a)->hash_table = stbds_make_hash_index(table->slot_count   , table);
+          STBDS_FREE(NULL, table);
+          STBDS_STATS(++stbds_hash_rebuild);
+        }
+
+        return a;
+      }
+    }
+  }
+  /* NOTREACHED */
+}
+
+static char *stbds_strdup(char *str)
+{
+  // to keep replaceable allocator simple, we don't want to use strdup.
+  // rolling our own also avoids problem of strdup vs _strdup
+  size_t len = strlen(str)+1;
+  char *p = (char*) STBDS_REALLOC(NULL, 0, len);
+  memmove(p, str, len);
+  return p;
+}
+
+#ifndef STBDS_STRING_ARENA_BLOCKSIZE_MIN
+#define STBDS_STRING_ARENA_BLOCKSIZE_MIN  512u
+#endif
+#ifndef STBDS_STRING_ARENA_BLOCKSIZE_MAX
+#define STBDS_STRING_ARENA_BLOCKSIZE_MAX  (1u<<20)
+#endif
+
+char *stbds_stralloc(stbds_string_arena *a, char *str)
+{
+  char *p;
+  size_t len = strlen(str)+1;
+  if (len > a->remaining) {
+    // compute the next blocksize
+    size_t blocksize = a->block;
+
+    // size is 512, 512, 1024, 1024, 2048, 2048, 4096, 4096, etc., so that
+    // there are log(SIZE) allocations to free when we destroy the table
+    blocksize = (size_t) (STBDS_STRING_ARENA_BLOCKSIZE_MIN) << (blocksize>>1);
+
+    // if size is under 1M, advance to next blocktype
+    if (blocksize < (size_t)(STBDS_STRING_ARENA_BLOCKSIZE_MAX))
+      ++a->block;
+
+    if (len > blocksize) {
+      // if string is larger than blocksize, then just allocate the full size.
+      // note that we still advance string_block so block size will continue
+      // increasing, so e.g. if somebody only calls this with 1000-long strings,
+      // eventually the arena will start doubling and handling those as well
+      stbds_string_block *sb = (stbds_string_block *) STBDS_REALLOC(NULL, 0, sizeof(*sb)-8 + len);
+      memmove(sb->storage, str, len);
+      if (a->storage) {
+        // insert it after the first element, so that we don't waste the space there
+        sb->next = a->storage->next;
+        a->storage->next = sb;
+      } else {
+        sb->next = 0;
+        a->storage = sb;
+        a->remaining = 0; // this is redundant, but good for clarity
+      }
+      return sb->storage;
+    } else {
+      stbds_string_block *sb = (stbds_string_block *) STBDS_REALLOC(NULL, 0, sizeof(*sb)-8 + blocksize);
+      sb->next = a->storage;
+      a->storage = sb;
+      a->remaining = blocksize;
+    }
+  }
+
+  STBDS_ASSERT(len <= a->remaining);
+  p = a->storage->storage + a->remaining - len;
+  a->remaining -= len;
+  memmove(p, str, len);
+  return p;
+}
+
+void stbds_strreset(stbds_string_arena *a)
+{
+  stbds_string_block *x,*y;
+  x = a->storage;
+  while (x) {
+    y = x->next;
+    STBDS_FREE(NULL, x);
+    x = y;
+  }
+  memset(a, 0, sizeof(*a));
+}
+
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//   UNIT TESTS
+//
+
+#ifdef STBDS_UNIT_TESTS
+#include <stdio.h>
+#ifdef STBDS_ASSERT_WAS_UNDEFINED
+#undef STBDS_ASSERT
+#endif
+#ifndef STBDS_ASSERT
+#define STBDS_ASSERT assert
+#include <assert.h>
+#endif
+
+typedef struct { int key,b,c,d; } stbds_struct;
+typedef struct { int key[2],b,c,d; } stbds_struct2;
+
+static char buffer[256];
+char *strkey(int n)
+{
+#if defined(_WIN32) && defined(__STDC_WANT_SECURE_LIB__)
+   sprintf_s(buffer, sizeof(buffer), "test_%d", n);
+#else
+   sprintf(buffer, "test_%d", n);
+#endif
+   return buffer;
+}
+
+void stbds_unit_tests(void)
+{
+#if defined(_MSC_VER) && _MSC_VER <= 1200 && defined(__cplusplus)
+  // VC6 C++ doesn't like the template<> trick on unnamed structures, so do nothing!
+  STBDS_ASSERT(0);
+#else
+  const int testsize = 100000;
+  const int testsize2 = testsize/20;
+  int *arr=NULL;
+  struct { int   key;        int value; }  *intmap  = NULL;
+  struct { char *key;        int value; }  *strmap  = NULL, s;
+  struct { stbds_struct key; int value; }  *map     = NULL;
+  stbds_struct                             *map2    = NULL;
+  stbds_struct2                            *map3    = NULL;
+  stbds_string_arena                        sa      = { 0 };
+  int key3[2] = { 1,2 };
+  ptrdiff_t temp;
+
+  int i,j;
+
+  STBDS_ASSERT(arrlen(arr)==0);
+  for (i=0; i < 20000; i += 50) {
+    for (j=0; j < i; ++j)
+      arrpush(arr,j);
+    arrfree(arr);
+  }
+
+  for (i=0; i < 4; ++i) {
+    arrpush(arr,1); arrpush(arr,2); arrpush(arr,3); arrpush(arr,4);
+    arrdel(arr,i);
+    arrfree(arr);
+    arrpush(arr,1); arrpush(arr,2); arrpush(arr,3); arrpush(arr,4);
+    arrdelswap(arr,i);
+    arrfree(arr);
+  }
+
+  for (i=0; i < 5; ++i) {
+    arrpush(arr,1); arrpush(arr,2); arrpush(arr,3); arrpush(arr,4);
+    stbds_arrins(arr,i,5);
+    STBDS_ASSERT(arr[i] == 5);
+    if (i < 4)
+      STBDS_ASSERT(arr[4] == 4);
+    arrfree(arr);
+  }
+
+  i = 1;
+  STBDS_ASSERT(hmgeti(intmap,i) == -1);
+  hmdefault(intmap, -2);
+  STBDS_ASSERT(hmgeti(intmap, i) == -1);
+  STBDS_ASSERT(hmget (intmap, i) == -2);
+  for (i=0; i < testsize; i+=2)
+    hmput(intmap, i, i*5);
+  for (i=0; i < testsize; i+=1) {
+    if (i & 1) STBDS_ASSERT(hmget(intmap, i) == -2 );
+    else       STBDS_ASSERT(hmget(intmap, i) == i*5);
+    if (i & 1) STBDS_ASSERT(hmget_ts(intmap, i, temp) == -2 );
+    else       STBDS_ASSERT(hmget_ts(intmap, i, temp) == i*5);
+  }
+  for (i=0; i < testsize; i+=2)
+    hmput(intmap, i, i*3);
+  for (i=0; i < testsize; i+=1)
+    if (i & 1) STBDS_ASSERT(hmget(intmap, i) == -2 );
+    else       STBDS_ASSERT(hmget(intmap, i) == i*3);
+  for (i=2; i < testsize; i+=4)
+    hmdel(intmap, i); // delete half the entries
+  for (i=0; i < testsize; i+=1)
+    if (i & 3) STBDS_ASSERT(hmget(intmap, i) == -2 );
+    else       STBDS_ASSERT(hmget(intmap, i) == i*3);
+  for (i=0; i < testsize; i+=1)
+    hmdel(intmap, i); // delete the rest of the entries
+  for (i=0; i < testsize; i+=1)
+    STBDS_ASSERT(hmget(intmap, i) == -2 );
+  hmfree(intmap);
+  for (i=0; i < testsize; i+=2)
+    hmput(intmap, i, i*3);
+  hmfree(intmap);
+
+  #if defined(__clang__) || defined(__GNUC__)
+  #ifndef __cplusplus
+  intmap = NULL;
+  hmput(intmap, 15, 7);
+  hmput(intmap, 11, 3);
+  hmput(intmap,  9, 5);
+  STBDS_ASSERT(hmget(intmap, 9) == 5);
+  STBDS_ASSERT(hmget(intmap, 11) == 3);
+  STBDS_ASSERT(hmget(intmap, 15) == 7);
+  #endif
+  #endif
+
+  for (i=0; i < testsize; ++i)
+    stralloc(&sa, strkey(i));
+  strreset(&sa);
+
+  {
+    s.key = "a", s.value = 1;
+    shputs(strmap, s);
+    STBDS_ASSERT(*strmap[0].key == 'a');
+    STBDS_ASSERT(strmap[0].key == s.key);
+    STBDS_ASSERT(strmap[0].value == s.value);
+    shfree(strmap);
+  }
+
+  {
+    s.key = "a", s.value = 1;
+    sh_new_strdup(strmap);
+    shputs(strmap, s);
+    STBDS_ASSERT(*strmap[0].key == 'a');
+    STBDS_ASSERT(strmap[0].key != s.key);
+    STBDS_ASSERT(strmap[0].value == s.value);
+    shfree(strmap);
+  }
+
+  {
+    s.key = "a", s.value = 1;
+    sh_new_arena(strmap);
+    shputs(strmap, s);
+    STBDS_ASSERT(*strmap[0].key == 'a');
+    STBDS_ASSERT(strmap[0].key != s.key);
+    STBDS_ASSERT(strmap[0].value == s.value);
+    shfree(strmap);
+  }
+
+  for (j=0; j < 2; ++j) {
+    STBDS_ASSERT(shgeti(strmap,"foo") == -1);
+    if (j == 0)
+      sh_new_strdup(strmap);
+    else
+      sh_new_arena(strmap);
+    STBDS_ASSERT(shgeti(strmap,"foo") == -1);
+    shdefault(strmap, -2);
+    STBDS_ASSERT(shgeti(strmap,"foo") == -1);
+    for (i=0; i < testsize; i+=2)
+      shput(strmap, strkey(i), i*3);
+    for (i=0; i < testsize; i+=1)
+      if (i & 1) STBDS_ASSERT(shget(strmap, strkey(i)) == -2 );
+      else       STBDS_ASSERT(shget(strmap, strkey(i)) == i*3);
+    for (i=2; i < testsize; i+=4)
+      shdel(strmap, strkey(i)); // delete half the entries
+    for (i=0; i < testsize; i+=1)
+      if (i & 3) STBDS_ASSERT(shget(strmap, strkey(i)) == -2 );
+      else       STBDS_ASSERT(shget(strmap, strkey(i)) == i*3);
+    for (i=0; i < testsize; i+=1)
+      shdel(strmap, strkey(i)); // delete the rest of the entries
+    for (i=0; i < testsize; i+=1)
+      STBDS_ASSERT(shget(strmap, strkey(i)) == -2 );
+    shfree(strmap);
+  }
+
+  {
+    struct { char *key; char value; } *hash = NULL;
+    char name[4] = "jen";
+    shput(hash, "bob"   , 'h');
+    shput(hash, "sally" , 'e');
+    shput(hash, "fred"  , 'l');
+    shput(hash, "jen"   , 'x');
+    shput(hash, "doug"  , 'o');
+
+    shput(hash, name    , 'l');
+    shfree(hash);
+  }
+
+  for (i=0; i < testsize; i += 2) {
+    stbds_struct s = { i,i*2,i*3,i*4 };
+    hmput(map, s, i*5);
+  }
+
+  for (i=0; i < testsize; i += 1) {
+    stbds_struct s = { i,i*2,i*3  ,i*4 };
+    stbds_struct t = { i,i*2,i*3+1,i*4 };
+    if (i & 1) STBDS_ASSERT(hmget(map, s) == 0);
+    else       STBDS_ASSERT(hmget(map, s) == i*5);
+    if (i & 1) STBDS_ASSERT(hmget_ts(map, s, temp) == 0);
+    else       STBDS_ASSERT(hmget_ts(map, s, temp) == i*5);
+    //STBDS_ASSERT(hmget(map, t.key) == 0);
+  }
+
+  for (i=0; i < testsize; i += 2) {
+    stbds_struct s = { i,i*2,i*3,i*4 };
+    hmputs(map2, s);
+  }
+  hmfree(map);
+
+  for (i=0; i < testsize; i += 1) {
+    stbds_struct s = { i,i*2,i*3,i*4 };
+    stbds_struct t = { i,i*2,i*3+1,i*4 };
+    if (i & 1) STBDS_ASSERT(hmgets(map2, s.key).d == 0);
+    else       STBDS_ASSERT(hmgets(map2, s.key).d == i*4);
+    //STBDS_ASSERT(hmgetp(map2, t.key) == 0);
+  }
+  hmfree(map2);
+
+  for (i=0; i < testsize; i += 2) {
+    stbds_struct2 s = { { i,i*2 }, i*3,i*4, i*5 };
+    hmputs(map3, s);
+  }
+  for (i=0; i < testsize; i += 1) {
+    stbds_struct2 s = { { i,i*2}, i*3, i*4, i*5 };
+    stbds_struct2 t = { { i,i*2}, i*3+1, i*4, i*5 };
+    if (i & 1) STBDS_ASSERT(hmgets(map3, s.key).d == 0);
+    else       STBDS_ASSERT(hmgets(map3, s.key).d == i*5);
+    //STBDS_ASSERT(hmgetp(map3, t.key) == 0);
+  }
+#endif
+}
+#endif
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2019 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_dxt.h b/vendor/stb/stb_dxt.h
new file mode 100644
index 0000000..6150a87
--- /dev/null
+++ b/vendor/stb/stb_dxt.h
@@ -0,0 +1,719 @@
+// stb_dxt.h - v1.12 - DXT1/DXT5 compressor - public domain
+// original by fabian "ryg" giesen - ported to C by stb
+// use '#define STB_DXT_IMPLEMENTATION' before including to create the implementation
+//
+// USAGE:
+//   call stb_compress_dxt_block() for every block (you must pad)
+//     source should be a 4x4 block of RGBA data in row-major order;
+//     Alpha channel is not stored if you specify alpha=0 (but you
+//     must supply some constant alpha in the alpha channel).
+//     You can turn on dithering and "high quality" using mode.
+//
+// version history:
+//   v1.12  - (ryg) fix bug in single-color table generator
+//   v1.11  - (ryg) avoid racy global init, better single-color tables, remove dither
+//   v1.10  - (i.c) various small quality improvements
+//   v1.09  - (stb) update documentation re: surprising alpha channel requirement
+//   v1.08  - (stb) fix bug in dxt-with-alpha block
+//   v1.07  - (stb) bc4; allow not using libc; add STB_DXT_STATIC
+//   v1.06  - (stb) fix to known-broken 1.05
+//   v1.05  - (stb) support bc5/3dc (Arvids Kokins), use extern "C" in C++ (Pavel Krajcevski)
+//   v1.04  - (ryg) default to no rounding bias for lerped colors (as per S3TC/DX10 spec);
+//            single color match fix (allow for inexact color interpolation);
+//            optimal DXT5 index finder; "high quality" mode that runs multiple refinement steps.
+//   v1.03  - (stb) endianness support
+//   v1.02  - (stb) fix alpha encoding bug
+//   v1.01  - (stb) fix bug converting to RGB that messed up quality, thanks ryg & cbloom
+//   v1.00  - (stb) first release
+//
+// contributors:
+//   Rich Geldreich (more accurate index selection)
+//   Kevin Schmidt (#defines for "freestanding" compilation)
+//   github:ppiastucki (BC4 support)
+//   Ignacio Castano - improve DXT endpoint quantization
+//   Alan Hickman - static table initialization
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+#ifndef STB_INCLUDE_STB_DXT_H
+#define STB_INCLUDE_STB_DXT_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef STB_DXT_STATIC
+#define STBDDEF static
+#else
+#define STBDDEF extern
+#endif
+
+// compression mode (bitflags)
+#define STB_DXT_NORMAL    0
+#define STB_DXT_DITHER    1   // use dithering. was always dubious, now deprecated. does nothing!
+#define STB_DXT_HIGHQUAL  2   // high quality mode, does two refinement steps instead of 1. ~30-40% slower.
+
+STBDDEF void stb_compress_dxt_block(unsigned char *dest, const unsigned char *src_rgba_four_bytes_per_pixel, int alpha, int mode);
+STBDDEF void stb_compress_bc4_block(unsigned char *dest, const unsigned char *src_r_one_byte_per_pixel);
+STBDDEF void stb_compress_bc5_block(unsigned char *dest, const unsigned char *src_rg_two_byte_per_pixel);
+
+#define STB_COMPRESS_DXT_BLOCK
+
+#ifdef __cplusplus
+}
+#endif
+#endif // STB_INCLUDE_STB_DXT_H
+
+#ifdef STB_DXT_IMPLEMENTATION
+
+// configuration options for DXT encoder. set them in the project/makefile or just define
+// them at the top.
+
+// STB_DXT_USE_ROUNDING_BIAS
+//     use a rounding bias during color interpolation. this is closer to what "ideal"
+//     interpolation would do but doesn't match the S3TC/DX10 spec. old versions (pre-1.03)
+//     implicitly had this turned on.
+//
+//     in case you're targeting a specific type of hardware (e.g. console programmers):
+//     NVidia and Intel GPUs (as of 2010) as well as DX9 ref use DXT decoders that are closer
+//     to STB_DXT_USE_ROUNDING_BIAS. AMD/ATI, S3 and DX10 ref are closer to rounding with no bias.
+//     you also see "(a*5 + b*3) / 8" on some old GPU designs.
+// #define STB_DXT_USE_ROUNDING_BIAS
+
+#include <stdlib.h>
+
+#if !defined(STBD_FABS)
+#include <math.h>
+#endif
+
+#ifndef STBD_FABS
+#define STBD_FABS(x)          fabs(x)
+#endif
+
+static const unsigned char stb__OMatch5[256][2] = {
+   {  0,  0 }, {  0,  0 }, {  0,  1 }, {  0,  1 }, {  1,  0 }, {  1,  0 }, {  1,  0 }, {  1,  1 },
+   {  1,  1 }, {  1,  1 }, {  1,  2 }, {  0,  4 }, {  2,  1 }, {  2,  1 }, {  2,  1 }, {  2,  2 },
+   {  2,  2 }, {  2,  2 }, {  2,  3 }, {  1,  5 }, {  3,  2 }, {  3,  2 }, {  4,  0 }, {  3,  3 },
+   {  3,  3 }, {  3,  3 }, {  3,  4 }, {  3,  4 }, {  3,  4 }, {  3,  5 }, {  4,  3 }, {  4,  3 },
+   {  5,  2 }, {  4,  4 }, {  4,  4 }, {  4,  5 }, {  4,  5 }, {  5,  4 }, {  5,  4 }, {  5,  4 },
+   {  6,  3 }, {  5,  5 }, {  5,  5 }, {  5,  6 }, {  4,  8 }, {  6,  5 }, {  6,  5 }, {  6,  5 },
+   {  6,  6 }, {  6,  6 }, {  6,  6 }, {  6,  7 }, {  5,  9 }, {  7,  6 }, {  7,  6 }, {  8,  4 },
+   {  7,  7 }, {  7,  7 }, {  7,  7 }, {  7,  8 }, {  7,  8 }, {  7,  8 }, {  7,  9 }, {  8,  7 },
+   {  8,  7 }, {  9,  6 }, {  8,  8 }, {  8,  8 }, {  8,  9 }, {  8,  9 }, {  9,  8 }, {  9,  8 },
+   {  9,  8 }, { 10,  7 }, {  9,  9 }, {  9,  9 }, {  9, 10 }, {  8, 12 }, { 10,  9 }, { 10,  9 },
+   { 10,  9 }, { 10, 10 }, { 10, 10 }, { 10, 10 }, { 10, 11 }, {  9, 13 }, { 11, 10 }, { 11, 10 },
+   { 12,  8 }, { 11, 11 }, { 11, 11 }, { 11, 11 }, { 11, 12 }, { 11, 12 }, { 11, 12 }, { 11, 13 },
+   { 12, 11 }, { 12, 11 }, { 13, 10 }, { 12, 12 }, { 12, 12 }, { 12, 13 }, { 12, 13 }, { 13, 12 },
+   { 13, 12 }, { 13, 12 }, { 14, 11 }, { 13, 13 }, { 13, 13 }, { 13, 14 }, { 12, 16 }, { 14, 13 },
+   { 14, 13 }, { 14, 13 }, { 14, 14 }, { 14, 14 }, { 14, 14 }, { 14, 15 }, { 13, 17 }, { 15, 14 },
+   { 15, 14 }, { 16, 12 }, { 15, 15 }, { 15, 15 }, { 15, 15 }, { 15, 16 }, { 15, 16 }, { 15, 16 },
+   { 15, 17 }, { 16, 15 }, { 16, 15 }, { 17, 14 }, { 16, 16 }, { 16, 16 }, { 16, 17 }, { 16, 17 },
+   { 17, 16 }, { 17, 16 }, { 17, 16 }, { 18, 15 }, { 17, 17 }, { 17, 17 }, { 17, 18 }, { 16, 20 },
+   { 18, 17 }, { 18, 17 }, { 18, 17 }, { 18, 18 }, { 18, 18 }, { 18, 18 }, { 18, 19 }, { 17, 21 },
+   { 19, 18 }, { 19, 18 }, { 20, 16 }, { 19, 19 }, { 19, 19 }, { 19, 19 }, { 19, 20 }, { 19, 20 },
+   { 19, 20 }, { 19, 21 }, { 20, 19 }, { 20, 19 }, { 21, 18 }, { 20, 20 }, { 20, 20 }, { 20, 21 },
+   { 20, 21 }, { 21, 20 }, { 21, 20 }, { 21, 20 }, { 22, 19 }, { 21, 21 }, { 21, 21 }, { 21, 22 },
+   { 20, 24 }, { 22, 21 }, { 22, 21 }, { 22, 21 }, { 22, 22 }, { 22, 22 }, { 22, 22 }, { 22, 23 },
+   { 21, 25 }, { 23, 22 }, { 23, 22 }, { 24, 20 }, { 23, 23 }, { 23, 23 }, { 23, 23 }, { 23, 24 },
+   { 23, 24 }, { 23, 24 }, { 23, 25 }, { 24, 23 }, { 24, 23 }, { 25, 22 }, { 24, 24 }, { 24, 24 },
+   { 24, 25 }, { 24, 25 }, { 25, 24 }, { 25, 24 }, { 25, 24 }, { 26, 23 }, { 25, 25 }, { 25, 25 },
+   { 25, 26 }, { 24, 28 }, { 26, 25 }, { 26, 25 }, { 26, 25 }, { 26, 26 }, { 26, 26 }, { 26, 26 },
+   { 26, 27 }, { 25, 29 }, { 27, 26 }, { 27, 26 }, { 28, 24 }, { 27, 27 }, { 27, 27 }, { 27, 27 },
+   { 27, 28 }, { 27, 28 }, { 27, 28 }, { 27, 29 }, { 28, 27 }, { 28, 27 }, { 29, 26 }, { 28, 28 },
+   { 28, 28 }, { 28, 29 }, { 28, 29 }, { 29, 28 }, { 29, 28 }, { 29, 28 }, { 30, 27 }, { 29, 29 },
+   { 29, 29 }, { 29, 30 }, { 29, 30 }, { 30, 29 }, { 30, 29 }, { 30, 29 }, { 30, 30 }, { 30, 30 },
+   { 30, 30 }, { 30, 31 }, { 30, 31 }, { 31, 30 }, { 31, 30 }, { 31, 30 }, { 31, 31 }, { 31, 31 },
+};
+static const unsigned char stb__OMatch6[256][2] = {
+   {  0,  0 }, {  0,  1 }, {  1,  0 }, {  1,  1 }, {  1,  1 }, {  1,  2 }, {  2,  1 }, {  2,  2 },
+   {  2,  2 }, {  2,  3 }, {  3,  2 }, {  3,  3 }, {  3,  3 }, {  3,  4 }, {  4,  3 }, {  4,  4 },
+   {  4,  4 }, {  4,  5 }, {  5,  4 }, {  5,  5 }, {  5,  5 }, {  5,  6 }, {  6,  5 }, {  6,  6 },
+   {  6,  6 }, {  6,  7 }, {  7,  6 }, {  7,  7 }, {  7,  7 }, {  7,  8 }, {  8,  7 }, {  8,  8 },
+   {  8,  8 }, {  8,  9 }, {  9,  8 }, {  9,  9 }, {  9,  9 }, {  9, 10 }, { 10,  9 }, { 10, 10 },
+   { 10, 10 }, { 10, 11 }, { 11, 10 }, {  8, 16 }, { 11, 11 }, { 11, 12 }, { 12, 11 }, {  9, 17 },
+   { 12, 12 }, { 12, 13 }, { 13, 12 }, { 11, 16 }, { 13, 13 }, { 13, 14 }, { 14, 13 }, { 12, 17 },
+   { 14, 14 }, { 14, 15 }, { 15, 14 }, { 14, 16 }, { 15, 15 }, { 15, 16 }, { 16, 14 }, { 16, 15 },
+   { 17, 14 }, { 16, 16 }, { 16, 17 }, { 17, 16 }, { 18, 15 }, { 17, 17 }, { 17, 18 }, { 18, 17 },
+   { 20, 14 }, { 18, 18 }, { 18, 19 }, { 19, 18 }, { 21, 15 }, { 19, 19 }, { 19, 20 }, { 20, 19 },
+   { 20, 20 }, { 20, 20 }, { 20, 21 }, { 21, 20 }, { 21, 21 }, { 21, 21 }, { 21, 22 }, { 22, 21 },
+   { 22, 22 }, { 22, 22 }, { 22, 23 }, { 23, 22 }, { 23, 23 }, { 23, 23 }, { 23, 24 }, { 24, 23 },
+   { 24, 24 }, { 24, 24 }, { 24, 25 }, { 25, 24 }, { 25, 25 }, { 25, 25 }, { 25, 26 }, { 26, 25 },
+   { 26, 26 }, { 26, 26 }, { 26, 27 }, { 27, 26 }, { 24, 32 }, { 27, 27 }, { 27, 28 }, { 28, 27 },
+   { 25, 33 }, { 28, 28 }, { 28, 29 }, { 29, 28 }, { 27, 32 }, { 29, 29 }, { 29, 30 }, { 30, 29 },
+   { 28, 33 }, { 30, 30 }, { 30, 31 }, { 31, 30 }, { 30, 32 }, { 31, 31 }, { 31, 32 }, { 32, 30 },
+   { 32, 31 }, { 33, 30 }, { 32, 32 }, { 32, 33 }, { 33, 32 }, { 34, 31 }, { 33, 33 }, { 33, 34 },
+   { 34, 33 }, { 36, 30 }, { 34, 34 }, { 34, 35 }, { 35, 34 }, { 37, 31 }, { 35, 35 }, { 35, 36 },
+   { 36, 35 }, { 36, 36 }, { 36, 36 }, { 36, 37 }, { 37, 36 }, { 37, 37 }, { 37, 37 }, { 37, 38 },
+   { 38, 37 }, { 38, 38 }, { 38, 38 }, { 38, 39 }, { 39, 38 }, { 39, 39 }, { 39, 39 }, { 39, 40 },
+   { 40, 39 }, { 40, 40 }, { 40, 40 }, { 40, 41 }, { 41, 40 }, { 41, 41 }, { 41, 41 }, { 41, 42 },
+   { 42, 41 }, { 42, 42 }, { 42, 42 }, { 42, 43 }, { 43, 42 }, { 40, 48 }, { 43, 43 }, { 43, 44 },
+   { 44, 43 }, { 41, 49 }, { 44, 44 }, { 44, 45 }, { 45, 44 }, { 43, 48 }, { 45, 45 }, { 45, 46 },
+   { 46, 45 }, { 44, 49 }, { 46, 46 }, { 46, 47 }, { 47, 46 }, { 46, 48 }, { 47, 47 }, { 47, 48 },
+   { 48, 46 }, { 48, 47 }, { 49, 46 }, { 48, 48 }, { 48, 49 }, { 49, 48 }, { 50, 47 }, { 49, 49 },
+   { 49, 50 }, { 50, 49 }, { 52, 46 }, { 50, 50 }, { 50, 51 }, { 51, 50 }, { 53, 47 }, { 51, 51 },
+   { 51, 52 }, { 52, 51 }, { 52, 52 }, { 52, 52 }, { 52, 53 }, { 53, 52 }, { 53, 53 }, { 53, 53 },
+   { 53, 54 }, { 54, 53 }, { 54, 54 }, { 54, 54 }, { 54, 55 }, { 55, 54 }, { 55, 55 }, { 55, 55 },
+   { 55, 56 }, { 56, 55 }, { 56, 56 }, { 56, 56 }, { 56, 57 }, { 57, 56 }, { 57, 57 }, { 57, 57 },
+   { 57, 58 }, { 58, 57 }, { 58, 58 }, { 58, 58 }, { 58, 59 }, { 59, 58 }, { 59, 59 }, { 59, 59 },
+   { 59, 60 }, { 60, 59 }, { 60, 60 }, { 60, 60 }, { 60, 61 }, { 61, 60 }, { 61, 61 }, { 61, 61 },
+   { 61, 62 }, { 62, 61 }, { 62, 62 }, { 62, 62 }, { 62, 63 }, { 63, 62 }, { 63, 63 }, { 63, 63 },
+};
+
+static int stb__Mul8Bit(int a, int b)
+{
+  int t = a*b + 128;
+  return (t + (t >> 8)) >> 8;
+}
+
+static void stb__From16Bit(unsigned char *out, unsigned short v)
+{
+   int rv = (v & 0xf800) >> 11;
+   int gv = (v & 0x07e0) >>  5;
+   int bv = (v & 0x001f) >>  0;
+
+   // expand to 8 bits via bit replication
+   out[0] = (rv * 33) >> 2;
+   out[1] = (gv * 65) >> 4;
+   out[2] = (bv * 33) >> 2;
+   out[3] = 0;
+}
+
+static unsigned short stb__As16Bit(int r, int g, int b)
+{
+   return (unsigned short)((stb__Mul8Bit(r,31) << 11) + (stb__Mul8Bit(g,63) << 5) + stb__Mul8Bit(b,31));
+}
+
+// linear interpolation at 1/3 point between a and b, using desired rounding type
+static int stb__Lerp13(int a, int b)
+{
+#ifdef STB_DXT_USE_ROUNDING_BIAS
+   // with rounding bias
+   return a + stb__Mul8Bit(b-a, 0x55);
+#else
+   // without rounding bias
+   // replace "/ 3" by "* 0xaaab) >> 17" if your compiler sucks or you really need every ounce of speed.
+   return (2*a + b) / 3;
+#endif
+}
+
+// lerp RGB color
+static void stb__Lerp13RGB(unsigned char *out, unsigned char *p1, unsigned char *p2)
+{
+   out[0] = (unsigned char)stb__Lerp13(p1[0], p2[0]);
+   out[1] = (unsigned char)stb__Lerp13(p1[1], p2[1]);
+   out[2] = (unsigned char)stb__Lerp13(p1[2], p2[2]);
+}
+
+/****************************************************************************/
+
+static void stb__EvalColors(unsigned char *color,unsigned short c0,unsigned short c1)
+{
+   stb__From16Bit(color+ 0, c0);
+   stb__From16Bit(color+ 4, c1);
+   stb__Lerp13RGB(color+ 8, color+0, color+4);
+   stb__Lerp13RGB(color+12, color+4, color+0);
+}
+
+// The color matching function
+static unsigned int stb__MatchColorsBlock(unsigned char *block, unsigned char *color)
+{
+   unsigned int mask = 0;
+   int dirr = color[0*4+0] - color[1*4+0];
+   int dirg = color[0*4+1] - color[1*4+1];
+   int dirb = color[0*4+2] - color[1*4+2];
+   int dots[16];
+   int stops[4];
+   int i;
+   int c0Point, halfPoint, c3Point;
+
+   for(i=0;i<16;i++)
+      dots[i] = block[i*4+0]*dirr + block[i*4+1]*dirg + block[i*4+2]*dirb;
+
+   for(i=0;i<4;i++)
+      stops[i] = color[i*4+0]*dirr + color[i*4+1]*dirg + color[i*4+2]*dirb;
+
+   // think of the colors as arranged on a line; project point onto that line, then choose
+   // next color out of available ones. we compute the crossover points for "best color in top
+   // half"/"best in bottom half" and then the same inside that subinterval.
+   //
+   // relying on this 1d approximation isn't always optimal in terms of euclidean distance,
+   // but it's very close and a lot faster.
+   // http://cbloomrants.blogspot.com/2008/12/12-08-08-dxtc-summary.html
+
+   c0Point   = (stops[1] + stops[3]);
+   halfPoint = (stops[3] + stops[2]);
+   c3Point   = (stops[2] + stops[0]);
+
+   for (i=15;i>=0;i--) {
+      int dot = dots[i]*2;
+      mask <<= 2;
+
+      if(dot < halfPoint)
+         mask |= (dot < c0Point) ? 1 : 3;
+      else
+         mask |= (dot < c3Point) ? 2 : 0;
+   }
+
+   return mask;
+}
+
+// The color optimization function. (Clever code, part 1)
+static void stb__OptimizeColorsBlock(unsigned char *block, unsigned short *pmax16, unsigned short *pmin16)
+{
+  int mind,maxd;
+  unsigned char *minp, *maxp;
+  double magn;
+  int v_r,v_g,v_b;
+  static const int nIterPower = 4;
+  float covf[6],vfr,vfg,vfb;
+
+  // determine color distribution
+  int cov[6];
+  int mu[3],min[3],max[3];
+  int ch,i,iter;
+
+  for(ch=0;ch<3;ch++)
+  {
+    const unsigned char *bp = ((const unsigned char *) block) + ch;
+    int muv,minv,maxv;
+
+    muv = minv = maxv = bp[0];
+    for(i=4;i<64;i+=4)
+    {
+      muv += bp[i];
+      if (bp[i] < minv) minv = bp[i];
+      else if (bp[i] > maxv) maxv = bp[i];
+    }
+
+    mu[ch] = (muv + 8) >> 4;
+    min[ch] = minv;
+    max[ch] = maxv;
+  }
+
+  // determine covariance matrix
+  for (i=0;i<6;i++)
+     cov[i] = 0;
+
+  for (i=0;i<16;i++)
+  {
+    int r = block[i*4+0] - mu[0];
+    int g = block[i*4+1] - mu[1];
+    int b = block[i*4+2] - mu[2];
+
+    cov[0] += r*r;
+    cov[1] += r*g;
+    cov[2] += r*b;
+    cov[3] += g*g;
+    cov[4] += g*b;
+    cov[5] += b*b;
+  }
+
+  // convert covariance matrix to float, find principal axis via power iter
+  for(i=0;i<6;i++)
+    covf[i] = cov[i] / 255.0f;
+
+  vfr = (float) (max[0] - min[0]);
+  vfg = (float) (max[1] - min[1]);
+  vfb = (float) (max[2] - min[2]);
+
+  for(iter=0;iter<nIterPower;iter++)
+  {
+    float r = vfr*covf[0] + vfg*covf[1] + vfb*covf[2];
+    float g = vfr*covf[1] + vfg*covf[3] + vfb*covf[4];
+    float b = vfr*covf[2] + vfg*covf[4] + vfb*covf[5];
+
+    vfr = r;
+    vfg = g;
+    vfb = b;
+  }
+
+  magn = STBD_FABS(vfr);
+  if (STBD_FABS(vfg) > magn) magn = STBD_FABS(vfg);
+  if (STBD_FABS(vfb) > magn) magn = STBD_FABS(vfb);
+
+   if(magn < 4.0f) { // too small, default to luminance
+      v_r = 299; // JPEG YCbCr luma coefs, scaled by 1000.
+      v_g = 587;
+      v_b = 114;
+   } else {
+      magn = 512.0 / magn;
+      v_r = (int) (vfr * magn);
+      v_g = (int) (vfg * magn);
+      v_b = (int) (vfb * magn);
+   }
+
+   minp = maxp = block;
+   mind = maxd = block[0]*v_r + block[1]*v_g + block[2]*v_b;
+   // Pick colors at extreme points
+   for(i=1;i<16;i++)
+   {
+      int dot = block[i*4+0]*v_r + block[i*4+1]*v_g + block[i*4+2]*v_b;
+
+      if (dot < mind) {
+         mind = dot;
+         minp = block+i*4;
+      }
+
+      if (dot > maxd) {
+         maxd = dot;
+         maxp = block+i*4;
+      }
+   }
+
+   *pmax16 = stb__As16Bit(maxp[0],maxp[1],maxp[2]);
+   *pmin16 = stb__As16Bit(minp[0],minp[1],minp[2]);
+}
+
+static const float stb__midpoints5[32] = {
+   0.015686f, 0.047059f, 0.078431f, 0.111765f, 0.145098f, 0.176471f, 0.207843f, 0.241176f, 0.274510f, 0.305882f, 0.337255f, 0.370588f, 0.403922f, 0.435294f, 0.466667f, 0.5f,
+   0.533333f, 0.564706f, 0.596078f, 0.629412f, 0.662745f, 0.694118f, 0.725490f, 0.758824f, 0.792157f, 0.823529f, 0.854902f, 0.888235f, 0.921569f, 0.952941f, 0.984314f, 1.0f
+};
+
+static const float stb__midpoints6[64] = {
+   0.007843f, 0.023529f, 0.039216f, 0.054902f, 0.070588f, 0.086275f, 0.101961f, 0.117647f, 0.133333f, 0.149020f, 0.164706f, 0.180392f, 0.196078f, 0.211765f, 0.227451f, 0.245098f,
+   0.262745f, 0.278431f, 0.294118f, 0.309804f, 0.325490f, 0.341176f, 0.356863f, 0.372549f, 0.388235f, 0.403922f, 0.419608f, 0.435294f, 0.450980f, 0.466667f, 0.482353f, 0.500000f,
+   0.517647f, 0.533333f, 0.549020f, 0.564706f, 0.580392f, 0.596078f, 0.611765f, 0.627451f, 0.643137f, 0.658824f, 0.674510f, 0.690196f, 0.705882f, 0.721569f, 0.737255f, 0.754902f,
+   0.772549f, 0.788235f, 0.803922f, 0.819608f, 0.835294f, 0.850980f, 0.866667f, 0.882353f, 0.898039f, 0.913725f, 0.929412f, 0.945098f, 0.960784f, 0.976471f, 0.992157f, 1.0f
+};
+
+static unsigned short stb__Quantize5(float x)
+{
+   unsigned short q;
+   x = x < 0 ? 0 : x > 1 ? 1 : x;  // saturate
+   q = (unsigned short)(x * 31);
+   q += (x > stb__midpoints5[q]);
+   return q;
+}
+
+static unsigned short stb__Quantize6(float x)
+{
+   unsigned short q;
+   x = x < 0 ? 0 : x > 1 ? 1 : x;  // saturate
+   q = (unsigned short)(x * 63);
+   q += (x > stb__midpoints6[q]);
+   return q;
+}
+
+// The refinement function. (Clever code, part 2)
+// Tries to optimize colors to suit block contents better.
+// (By solving a least squares system via normal equations+Cramer's rule)
+static int stb__RefineBlock(unsigned char *block, unsigned short *pmax16, unsigned short *pmin16, unsigned int mask)
+{
+   static const int w1Tab[4] = { 3,0,2,1 };
+   static const int prods[4] = { 0x090000,0x000900,0x040102,0x010402 };
+   // ^some magic to save a lot of multiplies in the accumulating loop...
+   // (precomputed products of weights for least squares system, accumulated inside one 32-bit register)
+
+   float f;
+   unsigned short oldMin, oldMax, min16, max16;
+   int i, akku = 0, xx,xy,yy;
+   int At1_r,At1_g,At1_b;
+   int At2_r,At2_g,At2_b;
+   unsigned int cm = mask;
+
+   oldMin = *pmin16;
+   oldMax = *pmax16;
+
+   if((mask ^ (mask<<2)) < 4) // all pixels have the same index?
+   {
+      // yes, linear system would be singular; solve using optimal
+      // single-color match on average color
+      int r = 8, g = 8, b = 8;
+      for (i=0;i<16;++i) {
+         r += block[i*4+0];
+         g += block[i*4+1];
+         b += block[i*4+2];
+      }
+
+      r >>= 4; g >>= 4; b >>= 4;
+
+      max16 = (stb__OMatch5[r][0]<<11) | (stb__OMatch6[g][0]<<5) | stb__OMatch5[b][0];
+      min16 = (stb__OMatch5[r][1]<<11) | (stb__OMatch6[g][1]<<5) | stb__OMatch5[b][1];
+   } else {
+      At1_r = At1_g = At1_b = 0;
+      At2_r = At2_g = At2_b = 0;
+      for (i=0;i<16;++i,cm>>=2) {
+         int step = cm&3;
+         int w1 = w1Tab[step];
+         int r = block[i*4+0];
+         int g = block[i*4+1];
+         int b = block[i*4+2];
+
+         akku    += prods[step];
+         At1_r   += w1*r;
+         At1_g   += w1*g;
+         At1_b   += w1*b;
+         At2_r   += r;
+         At2_g   += g;
+         At2_b   += b;
+      }
+
+      At2_r = 3*At2_r - At1_r;
+      At2_g = 3*At2_g - At1_g;
+      At2_b = 3*At2_b - At1_b;
+
+      // extract solutions and decide solvability
+      xx = akku >> 16;
+      yy = (akku >> 8) & 0xff;
+      xy = (akku >> 0) & 0xff;
+
+      f = 3.0f / 255.0f / (xx*yy - xy*xy);
+
+      max16 =  stb__Quantize5((At1_r*yy - At2_r * xy) * f) << 11;
+      max16 |= stb__Quantize6((At1_g*yy - At2_g * xy) * f) << 5;
+      max16 |= stb__Quantize5((At1_b*yy - At2_b * xy) * f) << 0;
+
+      min16 =  stb__Quantize5((At2_r*xx - At1_r * xy) * f) << 11;
+      min16 |= stb__Quantize6((At2_g*xx - At1_g * xy) * f) << 5;
+      min16 |= stb__Quantize5((At2_b*xx - At1_b * xy) * f) << 0;
+   }
+
+   *pmin16 = min16;
+   *pmax16 = max16;
+   return oldMin != min16 || oldMax != max16;
+}
+
+// Color block compression
+static void stb__CompressColorBlock(unsigned char *dest, unsigned char *block, int mode)
+{
+   unsigned int mask;
+   int i;
+   int refinecount;
+   unsigned short max16, min16;
+   unsigned char color[4*4];
+
+   refinecount = (mode & STB_DXT_HIGHQUAL) ? 2 : 1;
+
+   // check if block is constant
+   for (i=1;i<16;i++)
+      if (((unsigned int *) block)[i] != ((unsigned int *) block)[0])
+         break;
+
+   if(i == 16) { // constant color
+      int r = block[0], g = block[1], b = block[2];
+      mask  = 0xaaaaaaaa;
+      max16 = (stb__OMatch5[r][0]<<11) | (stb__OMatch6[g][0]<<5) | stb__OMatch5[b][0];
+      min16 = (stb__OMatch5[r][1]<<11) | (stb__OMatch6[g][1]<<5) | stb__OMatch5[b][1];
+   } else {
+      // first step: PCA+map along principal axis
+      stb__OptimizeColorsBlock(block,&max16,&min16);
+      if (max16 != min16) {
+         stb__EvalColors(color,max16,min16);
+         mask = stb__MatchColorsBlock(block,color);
+      } else
+         mask = 0;
+
+      // third step: refine (multiple times if requested)
+      for (i=0;i<refinecount;i++) {
+         unsigned int lastmask = mask;
+
+         if (stb__RefineBlock(block,&max16,&min16,mask)) {
+            if (max16 != min16) {
+               stb__EvalColors(color,max16,min16);
+               mask = stb__MatchColorsBlock(block,color);
+            } else {
+               mask = 0;
+               break;
+            }
+         }
+
+         if(mask == lastmask)
+            break;
+      }
+  }
+
+  // write the color block
+  if(max16 < min16)
+  {
+     unsigned short t = min16;
+     min16 = max16;
+     max16 = t;
+     mask ^= 0x55555555;
+  }
+
+  dest[0] = (unsigned char) (max16);
+  dest[1] = (unsigned char) (max16 >> 8);
+  dest[2] = (unsigned char) (min16);
+  dest[3] = (unsigned char) (min16 >> 8);
+  dest[4] = (unsigned char) (mask);
+  dest[5] = (unsigned char) (mask >> 8);
+  dest[6] = (unsigned char) (mask >> 16);
+  dest[7] = (unsigned char) (mask >> 24);
+}
+
+// Alpha block compression (this is easy for a change)
+static void stb__CompressAlphaBlock(unsigned char *dest,unsigned char *src, int stride)
+{
+   int i,dist,bias,dist4,dist2,bits,mask;
+
+   // find min/max color
+   int mn,mx;
+   mn = mx = src[0];
+
+   for (i=1;i<16;i++)
+   {
+      if (src[i*stride] < mn) mn = src[i*stride];
+      else if (src[i*stride] > mx) mx = src[i*stride];
+   }
+
+   // encode them
+   dest[0] = (unsigned char)mx;
+   dest[1] = (unsigned char)mn;
+   dest += 2;
+
+   // determine bias and emit color indices
+   // given the choice of mx/mn, these indices are optimal:
+   // http://fgiesen.wordpress.com/2009/12/15/dxt5-alpha-block-index-determination/
+   dist = mx-mn;
+   dist4 = dist*4;
+   dist2 = dist*2;
+   bias = (dist < 8) ? (dist - 1) : (dist/2 + 2);
+   bias -= mn * 7;
+   bits = 0,mask=0;
+
+   for (i=0;i<16;i++) {
+      int a = src[i*stride]*7 + bias;
+      int ind,t;
+
+      // select index. this is a "linear scale" lerp factor between 0 (val=min) and 7 (val=max).
+      t = (a >= dist4) ? -1 : 0; ind =  t & 4; a -= dist4 & t;
+      t = (a >= dist2) ? -1 : 0; ind += t & 2; a -= dist2 & t;
+      ind += (a >= dist);
+
+      // turn linear scale into DXT index (0/1 are extremal pts)
+      ind = -ind & 7;
+      ind ^= (2 > ind);
+
+      // write index
+      mask |= ind << bits;
+      if((bits += 3) >= 8) {
+         *dest++ = (unsigned char)mask;
+         mask >>= 8;
+         bits -= 8;
+      }
+   }
+}
+
+void stb_compress_dxt_block(unsigned char *dest, const unsigned char *src, int alpha, int mode)
+{
+   unsigned char data[16][4];
+   if (alpha) {
+      int i;
+      stb__CompressAlphaBlock(dest,(unsigned char*) src+3, 4);
+      dest += 8;
+      // make a new copy of the data in which alpha is opaque,
+      // because code uses a fast test for color constancy
+      memcpy(data, src, 4*16);
+      for (i=0; i < 16; ++i)
+         data[i][3] = 255;
+      src = &data[0][0];
+   }
+
+   stb__CompressColorBlock(dest,(unsigned char*) src,mode);
+}
+
+void stb_compress_bc4_block(unsigned char *dest, const unsigned char *src)
+{
+   stb__CompressAlphaBlock(dest,(unsigned char*) src, 1);
+}
+
+void stb_compress_bc5_block(unsigned char *dest, const unsigned char *src)
+{
+   stb__CompressAlphaBlock(dest,(unsigned char*) src,2);
+   stb__CompressAlphaBlock(dest + 8,(unsigned char*) src+1,2);
+}
+#endif // STB_DXT_IMPLEMENTATION
+
+// Compile with STB_DXT_IMPLEMENTATION and STB_DXT_GENERATE_TABLES
+// defined to generate the tables above.
+#ifdef STB_DXT_GENERATE_TABLES
+#include <stdio.h>
+
+int main()
+{
+   int i, j;
+   const char *omatch_names[] = { "stb__OMatch5", "stb__OMatch6" };
+   int dequant_mults[2] = { 33*4, 65 }; // .4 fixed-point dequant multipliers
+
+   // optimal endpoint tables
+   for (i = 0; i < 2; ++i) {
+      int dequant = dequant_mults[i];
+      int size = i ? 64 : 32;
+      printf("static const unsigned char %s[256][2] = {\n", omatch_names[i]);
+      for (int j = 0; j < 256; ++j) {
+         int mn, mx;
+         int best_mn = 0, best_mx = 0;
+         int best_err = 256 * 100;
+         for (mn=0;mn<size;mn++) {
+            for (mx=0;mx<size;mx++) {
+               int mine = (mn * dequant) >> 4;
+               int maxe = (mx * dequant) >> 4;
+               int err = abs(stb__Lerp13(maxe, mine) - j) * 100;
+
+               // DX10 spec says that interpolation must be within 3% of "correct" result,
+               // add this as error term. Normally we'd expect a random distribution of
+               // +-1.5% error, but nowhere in the spec does it say that the error has to be
+               // unbiased - better safe than sorry.
+               err += abs(maxe - mine) * 3;
+
+               if(err < best_err) {
+                  best_mn = mn;
+                  best_mx = mx;
+                  best_err = err;
+               }
+            }
+         }
+         if ((j % 8) == 0) printf("  "); // 2 spaces, third is done below
+         printf(" { %2d, %2d },", best_mx, best_mn);
+         if ((j % 8) == 7) printf("\n");
+      }
+      printf("};\n");
+   }
+
+   return 0;
+}
+#endif
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_easy_font.h b/vendor/stb/stb_easy_font.h
new file mode 100644
index 0000000..b663258
--- /dev/null
+++ b/vendor/stb/stb_easy_font.h
@@ -0,0 +1,305 @@
+// stb_easy_font.h - v1.1 - bitmap font for 3D rendering - public domain
+// Sean Barrett, Feb 2015
+//
+//    Easy-to-deploy,
+//    reasonably compact,
+//    extremely inefficient performance-wise,
+//    crappy-looking,
+//    ASCII-only,
+//    bitmap font for use in 3D APIs.
+//
+// Intended for when you just want to get some text displaying
+// in a 3D app as quickly as possible.
+//
+// Doesn't use any textures, instead builds characters out of quads.
+//
+// DOCUMENTATION:
+//
+//   int stb_easy_font_width(char *text)
+//   int stb_easy_font_height(char *text)
+//
+//      Takes a string and returns the horizontal size and the
+//      vertical size (which can vary if 'text' has newlines).
+//
+//   int stb_easy_font_print(float x, float y,
+//                           char *text, unsigned char color[4],
+//                           void *vertex_buffer, int vbuf_size)
+//
+//      Takes a string (which can contain '\n') and fills out a
+//      vertex buffer with renderable data to draw the string.
+//      Output data assumes increasing x is rightwards, increasing y
+//      is downwards.
+//
+//      The vertex data is divided into quads, i.e. there are four
+//      vertices in the vertex buffer for each quad.
+//
+//      The vertices are stored in an interleaved format:
+//
+//         x:float
+//         y:float
+//         z:float
+//         color:uint8[4]
+//
+//      You can ignore z and color if you get them from elsewhere
+//      This format was chosen in the hopes it would make it
+//      easier for you to reuse existing vertex-buffer-drawing code.
+//
+//      If you pass in NULL for color, it becomes 255,255,255,255.
+//
+//      Returns the number of quads.
+//
+//      If the buffer isn't large enough, it will truncate.
+//      Expect it to use an average of ~270 bytes per character.
+//
+//      If your API doesn't draw quads, build a reusable index
+//      list that allows you to render quads as indexed triangles.
+//
+//   void stb_easy_font_spacing(float spacing)
+//
+//      Use positive values to expand the space between characters,
+//      and small negative values (no smaller than -1.5) to contract
+//      the space between characters.
+//
+//      E.g. spacing = 1 adds one "pixel" of spacing between the
+//      characters. spacing = -1 is reasonable but feels a bit too
+//      compact to me; -0.5 is a reasonable compromise as long as
+//      you're scaling the font up.
+//
+// LICENSE
+//
+//   See end of file for license information.
+//
+// VERSION HISTORY
+//
+//   (2020-02-02)  1.1   make everything static so can compile it in more than one src file
+//   (2017-01-15)  1.0   space character takes same space as numbers; fix bad spacing of 'f'
+//   (2016-01-22)  0.7   width() supports multiline text; add height()
+//   (2015-09-13)  0.6   #include <math.h>; updated license
+//   (2015-02-01)  0.5   First release
+//
+// CONTRIBUTORS
+//
+//   github:vassvik    --  bug report
+//   github:podsvirov  --  fix multiple definition errors
+
+#if 0
+// SAMPLE CODE:
+//
+//    Here's sample code for old OpenGL; it's a lot more complicated
+//    to make work on modern APIs, and that's your problem.
+//
+void print_string(float x, float y, char *text, float r, float g, float b)
+{
+  static char buffer[99999]; // ~500 chars
+  int num_quads;
+
+  num_quads = stb_easy_font_print(x, y, text, NULL, buffer, sizeof(buffer));
+
+  glColor3f(r,g,b);
+  glEnableClientState(GL_VERTEX_ARRAY);
+  glVertexPointer(2, GL_FLOAT, 16, buffer);
+  glDrawArrays(GL_QUADS, 0, num_quads*4);
+  glDisableClientState(GL_VERTEX_ARRAY);
+}
+#endif
+
+#ifndef INCLUDE_STB_EASY_FONT_H
+#define INCLUDE_STB_EASY_FONT_H
+
+#include <stdlib.h>
+#include <math.h>
+
+static struct stb_easy_font_info_struct {
+    unsigned char advance;
+    unsigned char h_seg;
+    unsigned char v_seg;
+} stb_easy_font_charinfo[96] = {
+    {  6,  0,  0 },  {  3,  0,  0 },  {  5,  1,  1 },  {  7,  1,  4 },
+    {  7,  3,  7 },  {  7,  6, 12 },  {  7,  8, 19 },  {  4, 16, 21 },
+    {  4, 17, 22 },  {  4, 19, 23 },  { 23, 21, 24 },  { 23, 22, 31 },
+    { 20, 23, 34 },  { 22, 23, 36 },  { 19, 24, 36 },  { 21, 25, 36 },
+    {  6, 25, 39 },  {  6, 27, 43 },  {  6, 28, 45 },  {  6, 30, 49 },
+    {  6, 33, 53 },  {  6, 34, 57 },  {  6, 40, 58 },  {  6, 46, 59 },
+    {  6, 47, 62 },  {  6, 55, 64 },  { 19, 57, 68 },  { 20, 59, 68 },
+    { 21, 61, 69 },  { 22, 66, 69 },  { 21, 68, 69 },  {  7, 73, 69 },
+    {  9, 75, 74 },  {  6, 78, 81 },  {  6, 80, 85 },  {  6, 83, 90 },
+    {  6, 85, 91 },  {  6, 87, 95 },  {  6, 90, 96 },  {  7, 92, 97 },
+    {  6, 96,102 },  {  5, 97,106 },  {  6, 99,107 },  {  6,100,110 },
+    {  6,100,115 },  {  7,101,116 },  {  6,101,121 },  {  6,101,125 },
+    {  6,102,129 },  {  7,103,133 },  {  6,104,140 },  {  6,105,145 },
+    {  7,107,149 },  {  6,108,151 },  {  7,109,155 },  {  7,109,160 },
+    {  7,109,165 },  {  7,118,167 },  {  6,118,172 },  {  4,120,176 },
+    {  6,122,177 },  {  4,122,181 },  { 23,124,182 },  { 22,129,182 },
+    {  4,130,182 },  { 22,131,183 },  {  6,133,187 },  { 22,135,191 },
+    {  6,137,192 },  { 22,139,196 },  {  6,144,197 },  { 22,147,198 },
+    {  6,150,202 },  { 19,151,206 },  { 21,152,207 },  {  6,155,209 },
+    {  3,160,210 },  { 23,160,211 },  { 22,164,216 },  { 22,165,220 },
+    { 22,167,224 },  { 22,169,228 },  { 21,171,232 },  { 21,173,233 },
+    {  5,178,233 },  { 22,179,234 },  { 23,180,238 },  { 23,180,243 },
+    { 23,180,248 },  { 22,189,248 },  { 22,191,252 },  {  5,196,252 },
+    {  3,203,252 },  {  5,203,253 },  { 22,210,253 },  {  0,214,253 },
+};
+
+static unsigned char stb_easy_font_hseg[214] = {
+   97,37,69,84,28,51,2,18,10,49,98,41,65,25,81,105,33,9,97,1,97,37,37,36,
+    81,10,98,107,3,100,3,99,58,51,4,99,58,8,73,81,10,50,98,8,73,81,4,10,50,
+    98,8,25,33,65,81,10,50,17,65,97,25,33,25,49,9,65,20,68,1,65,25,49,41,
+    11,105,13,101,76,10,50,10,50,98,11,99,10,98,11,50,99,11,50,11,99,8,57,
+    58,3,99,99,107,10,10,11,10,99,11,5,100,41,65,57,41,65,9,17,81,97,3,107,
+    9,97,1,97,33,25,9,25,41,100,41,26,82,42,98,27,83,42,98,26,51,82,8,41,
+    35,8,10,26,82,114,42,1,114,8,9,73,57,81,41,97,18,8,8,25,26,26,82,26,82,
+    26,82,41,25,33,82,26,49,73,35,90,17,81,41,65,57,41,65,25,81,90,114,20,
+    84,73,57,41,49,25,33,65,81,9,97,1,97,25,33,65,81,57,33,25,41,25,
+};
+
+static unsigned char stb_easy_font_vseg[253] = {
+   4,2,8,10,15,8,15,33,8,15,8,73,82,73,57,41,82,10,82,18,66,10,21,29,1,65,
+    27,8,27,9,65,8,10,50,97,74,66,42,10,21,57,41,29,25,14,81,73,57,26,8,8,
+    26,66,3,8,8,15,19,21,90,58,26,18,66,18,105,89,28,74,17,8,73,57,26,21,
+    8,42,41,42,8,28,22,8,8,30,7,8,8,26,66,21,7,8,8,29,7,7,21,8,8,8,59,7,8,
+    8,15,29,8,8,14,7,57,43,10,82,7,7,25,42,25,15,7,25,41,15,21,105,105,29,
+    7,57,57,26,21,105,73,97,89,28,97,7,57,58,26,82,18,57,57,74,8,30,6,8,8,
+    14,3,58,90,58,11,7,74,43,74,15,2,82,2,42,75,42,10,67,57,41,10,7,2,42,
+    74,106,15,2,35,8,8,29,7,8,8,59,35,51,8,8,15,35,30,35,8,8,30,7,8,8,60,
+    36,8,45,7,7,36,8,43,8,44,21,8,8,44,35,8,8,43,23,8,8,43,35,8,8,31,21,15,
+    20,8,8,28,18,58,89,58,26,21,89,73,89,29,20,8,8,30,7,
+};
+
+typedef struct
+{
+   unsigned char c[4];
+} stb_easy_font_color;
+
+static int stb_easy_font_draw_segs(float x, float y, unsigned char *segs, int num_segs, int vertical, stb_easy_font_color c, char *vbuf, int vbuf_size, int offset)
+{
+    int i,j;
+    for (i=0; i < num_segs; ++i) {
+        int len = segs[i] & 7;
+        x += (float) ((segs[i] >> 3) & 1);
+        if (len && offset+64 <= vbuf_size) {
+            float y0 = y + (float) (segs[i]>>4);
+            for (j=0; j < 4; ++j) {
+                * (float *) (vbuf+offset+0) = x  + (j==1 || j==2 ? (vertical ? 1 : len) : 0);
+                * (float *) (vbuf+offset+4) = y0 + (    j >= 2   ? (vertical ? len : 1) : 0);
+                * (float *) (vbuf+offset+8) = 0.f;
+                * (stb_easy_font_color *) (vbuf+offset+12) = c;
+                offset += 16;
+            }
+        }
+    }
+    return offset;
+}
+
+static float stb_easy_font_spacing_val = 0;
+static void stb_easy_font_spacing(float spacing)
+{
+   stb_easy_font_spacing_val = spacing;
+}
+
+static int stb_easy_font_print(float x, float y, char *text, unsigned char color[4], void *vertex_buffer, int vbuf_size)
+{
+    char *vbuf = (char *) vertex_buffer;
+    float start_x = x;
+    int offset = 0;
+
+    stb_easy_font_color c = { 255,255,255,255 }; // use structure copying to avoid needing depending on memcpy()
+    if (color) { c.c[0] = color[0]; c.c[1] = color[1]; c.c[2] = color[2]; c.c[3] = color[3]; }
+
+    while (*text && offset < vbuf_size) {
+        if (*text == '\n') {
+            y += 12;
+            x = start_x;
+        } else {
+            unsigned char advance = stb_easy_font_charinfo[*text-32].advance;
+            float y_ch = advance & 16 ? y+1 : y;
+            int h_seg, v_seg, num_h, num_v;
+            h_seg = stb_easy_font_charinfo[*text-32  ].h_seg;
+            v_seg = stb_easy_font_charinfo[*text-32  ].v_seg;
+            num_h = stb_easy_font_charinfo[*text-32+1].h_seg - h_seg;
+            num_v = stb_easy_font_charinfo[*text-32+1].v_seg - v_seg;
+            offset = stb_easy_font_draw_segs(x, y_ch, &stb_easy_font_hseg[h_seg], num_h, 0, c, vbuf, vbuf_size, offset);
+            offset = stb_easy_font_draw_segs(x, y_ch, &stb_easy_font_vseg[v_seg], num_v, 1, c, vbuf, vbuf_size, offset);
+            x += advance & 15;
+            x += stb_easy_font_spacing_val;
+        }
+        ++text;
+    }
+    return (unsigned) offset/64;
+}
+
+static int stb_easy_font_width(char *text)
+{
+    float len = 0;
+    float max_len = 0;
+    while (*text) {
+        if (*text == '\n') {
+            if (len > max_len) max_len = len;
+            len = 0;
+        } else {
+            len += stb_easy_font_charinfo[*text-32].advance & 15;
+            len += stb_easy_font_spacing_val;
+        }
+        ++text;
+    }
+    if (len > max_len) max_len = len;
+    return (int) ceil(max_len);
+}
+
+static int stb_easy_font_height(char *text)
+{
+    float y = 0;
+    int nonempty_line=0;
+    while (*text) {
+        if (*text == '\n') {
+            y += 12;
+            nonempty_line = 0;
+        } else {
+            nonempty_line = 1;
+        }
+        ++text;
+    }
+    return (int) ceil(y + (nonempty_line ? 12 : 0));
+}
+#endif
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_herringbone_wang_tile.h b/vendor/stb/stb_herringbone_wang_tile.h
new file mode 100644
index 0000000..5517941
--- /dev/null
+++ b/vendor/stb/stb_herringbone_wang_tile.h
@@ -0,0 +1,1221 @@
+/* stbhw - v0.7 -  http://nothings.org/gamedev/herringbone
+   Herringbone Wang Tile Generator - Sean Barrett 2014 - public domain
+
+== LICENSE ==============================
+
+This software is dual-licensed to the public domain and under the following
+license: you are granted a perpetual, irrevocable license to copy, modify,
+publish, and distribute this file as you see fit.
+
+== WHAT IT IS ===========================
+
+ This library is an SDK for Herringbone Wang Tile generation:
+
+      http://nothings.org/gamedev/herringbone
+
+ The core design is that you use this library offline to generate a
+ "template" of the tiles you'll create. You then edit those tiles, then
+ load the created tile image file back into this library and use it at
+ runtime to generate "maps".
+
+ You cannot load arbitrary tile image files with this library; it is
+ only designed to load image files made from the template it created.
+ It stores a binary description of the tile sizes & constraints in a
+ few pixels, and uses those to recover the rules, rather than trying
+ to parse the tiles themselves.
+
+ You *can* use this library to generate from arbitrary tile sets, but
+ only by loading the tile set and specifying the constraints explicitly
+ yourself.
+
+== COMPILING ============================
+
+ 1. #define STB_HERRINGBONE_WANG_TILE_IMPLEMENTATION before including this
+    header file in *one* source file to create the implementation
+    in that source file.
+
+ 2. optionally #define STB_HBWANG_RAND() to be a random number
+    generator. if you don't define it, it will use rand(),
+    and you need to seed srand() yourself.
+
+ 3. optionally #define STB_HBWANG_ASSERT(x), otherwise
+    it will use assert()
+
+ 4. optionally #define STB_HBWANG_STATIC to force all symbols to be
+    static instead of public, so they are only accesible
+    in the source file that creates the implementation
+
+ 5. optionally #define STB_HBWANG_NO_REPITITION_REDUCTION to disable
+    the code that tries to reduce having the same tile appear
+    adjacent to itself in wang-corner-tile mode (e.g. imagine
+    if you were doing something where 90% of things should be
+    the same grass tile, you need to disable this system)
+
+ 6. optionally define STB_HBWANG_MAX_X and STB_HBWANG_MAX_Y
+    to be the max dimensions of the generated map in multiples
+    of the wang tile's short side's length (e.g. if you
+    have 20x10 wang tiles, so short_side_len=10, and you
+    have MAX_X is 17, then the largest map you can generate
+    is 170 pixels wide). The defaults are 100x100. This
+    is used to define static arrays which affect memory
+    usage.
+
+== USING ================================
+
+  To use the map generator, you need a tileset. You can download
+  some sample tilesets from http://nothings.org/gamedev/herringbone
+
+  Then see the "sample application" below.
+
+  You can also use this file to generate templates for
+  tilesets which you then hand-edit to create the data.
+
+
+== MEMORY MANAGEMENT ====================
+
+  The tileset loader allocates memory with malloc(). The map
+  generator does no memory allocation, so e.g. you can load
+  tilesets at startup and never free them and never do any
+  further allocation.
+
+
+== SAMPLE APPLICATION ===================
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <time.h>
+
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"        // http://nothings.org/stb_image.c
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"  // http://nothings.org/stb/stb_image_write.h
+
+#define STB_HBWANG_IMPLEMENTATION
+#include "stb_hbwang.h"
+
+int main(int argc, char **argv)
+{
+   unsigned char *data;
+   int xs,ys, w,h;
+   stbhw_tileset ts;
+
+   if (argc != 4) {
+      fprintf(stderr, "Usage: mapgen {tile-file} {xsize} {ysize}\n"
+                      "generates file named 'test_map.png'\n");
+      exit(1);
+   }
+   data = stbi_load(argv[1], &w, &h, NULL, 3);
+   xs = atoi(argv[2]);
+   ys = atoi(argv[3]);
+   if (data == NULL) {
+      fprintf(stderr, "Error opening or parsing '%s' as an image file\n", argv[1]);
+      exit(1);
+   }
+   if (xs < 1 || xs > 1000) {
+      fprintf(stderr, "xsize invalid or out of range\n");
+      exit(1);
+   }
+   if (ys < 1 || ys > 1000) {
+      fprintf(stderr, "ysize invalid or out of range\n");
+      exit(1);
+   }
+
+   stbhw_build_tileset_from_image(&ts, data, w*3, w, h);
+   free(data);
+
+   // allocate a buffer to create the final image to
+   data = malloc(3 * xs * ys);
+
+   srand(time(NULL));
+   stbhw_generate_image(&ts, NULL, data, xs*3, xs, ys);
+
+   stbi_write_png("test_map.png", xs, ys, 3, data, xs*3);
+
+   stbhw_free_tileset(&ts);
+   free(data);
+
+   return 0;
+}
+
+== VERSION HISTORY ===================
+
+   0.7   2019-03-04   - fix warnings
+	0.6   2014-08-17   - fix broken map-maker
+	0.5   2014-07-07   - initial release
+
+*/
+
+//////////////////////////////////////////////////////////////////////////////
+//                                                                          //
+//                         HEADER FILE SECTION                              //
+//                                                                          //
+
+#ifndef INCLUDE_STB_HWANG_H
+#define INCLUDE_STB_HWANG_H
+
+#ifdef STB_HBWANG_STATIC
+#define STBHW_EXTERN static
+#else
+#ifdef __cplusplus
+#define STBHW_EXTERN extern "C"
+#else
+#define STBHW_EXTERN extern
+#endif
+#endif
+
+typedef struct stbhw_tileset stbhw_tileset;
+
+// returns description of last error produced by any function (not thread-safe)
+STBHW_EXTERN const char *stbhw_get_last_error(void);
+
+// build a tileset from an image that conforms to a template created by this
+// library. (you allocate storage for stbhw_tileset and function fills it out;
+// memory for individual tiles are malloc()ed).
+// returns non-zero on success, 0 on error
+STBHW_EXTERN int stbhw_build_tileset_from_image(stbhw_tileset *ts,
+                     unsigned char *pixels, int stride_in_bytes, int w, int h);
+
+// free a tileset built by stbhw_build_tileset_from_image
+STBHW_EXTERN void stbhw_free_tileset(stbhw_tileset *ts);
+
+// generate a map that is w * h pixels (3-bytes each)
+// returns non-zero on success, 0 on error
+// not thread-safe (uses a global data structure to avoid memory management)
+// weighting should be NULL, as non-NULL weighting is currently untested
+STBHW_EXTERN int stbhw_generate_image(stbhw_tileset *ts, int **weighting,
+                     unsigned char *pixels, int stride_in_bytes, int w, int h);
+
+//////////////////////////////////////
+//
+// TILESET DATA STRUCTURE
+//
+// if you use the image-to-tileset system from this file, you
+// don't need to worry about these data structures. but if you
+// want to build/load a tileset yourself, you'll need to fill
+// these out.
+
+typedef struct
+{
+   // the edge or vertex constraints, according to diagram below
+   signed char a,b,c,d,e,f;
+
+   // The herringbone wang tile data; it is a bitmap which is either
+   // w=2*short_sidelen,h=short_sidelen, or w=short_sidelen,h=2*short_sidelen.
+   // it is always RGB, stored row-major, with no padding between rows.
+   // (allocate stbhw_tile structure to be large enough for the pixel data)
+   unsigned char pixels[1];
+} stbhw_tile;
+
+struct stbhw_tileset
+{
+   int is_corner;
+   int num_color[6];  // number of colors for each of 6 edge types or 4 corner types
+   int short_side_len;
+   stbhw_tile **h_tiles;
+   stbhw_tile **v_tiles;
+   int num_h_tiles, max_h_tiles;
+   int num_v_tiles, max_v_tiles;
+};
+
+///////////////  TEMPLATE GENERATOR  //////////////////////////
+
+// when requesting a template, you fill out this data
+typedef struct
+{
+   int is_corner;      // using corner colors or edge colors?
+   int short_side_len; // rectangles is 2n x n, n = short_side_len
+   int num_color[6];   // see below diagram for meaning of the index to this;
+                       // 6 values if edge (!is_corner), 4 values if is_corner
+                       // legal numbers: 1..8 if edge, 1..4 if is_corner
+   int num_vary_x;     // additional number of variations along x axis in the template
+   int num_vary_y;     // additional number of variations along y axis in the template
+   int corner_type_color_template[4][4];
+      // if corner_type_color_template[s][t] is non-zero, then any
+      // corner of type s generated as color t will get a little
+      // corner sample markup in the template image data
+
+} stbhw_config;
+
+// computes the size needed for the template image
+STBHW_EXTERN void stbhw_get_template_size(stbhw_config *c, int *w, int *h);
+
+// generates a template image, assuming data is 3*w*h bytes long, RGB format
+STBHW_EXTERN int stbhw_make_template(stbhw_config *c, unsigned char *data, int w, int h, int stride_in_bytes);
+
+#endif//INCLUDE_STB_HWANG_H
+
+
+// TILE CONSTRAINT TYPES
+//
+// there are 4 "types" of corners and 6 types of edges.
+// you can configure the tileset to have different numbers
+// of colors for each type of color or edge.
+//
+// corner types:
+//
+//                     0---*---1---*---2---*---3
+//                     |       |               |
+//                     *       *               *
+//                     |       |               |
+//     1---*---2---*---3       0---*---1---*---2
+//     |               |       |
+//     *               *       *
+//     |               |       |
+//     0---*---1---*---2---*---3
+//
+//
+//  edge types:
+//
+//     *---2---*---3---*      *---0---*
+//     |               |      |       |
+//     1               4      5       1
+//     |               |      |       |
+//     *---0---*---2---*      *       *
+//                            |       |
+//                            4       5
+//                            |       |
+//                            *---3---*
+//
+// TILE CONSTRAINTS
+//
+// each corner/edge has a color; this shows the name
+// of the variable containing the color
+//
+// corner constraints:
+//
+//                        a---*---d
+//                        |       |
+//                        *       *
+//                        |       |
+//     a---*---b---*---c  b       e
+//     |               |  |       |
+//     *               *  *       *
+//     |               |  |       |
+//     d---*---e---*---f  c---*---f
+//
+//
+//  edge constraints:
+//
+//     *---a---*---b---*      *---a---*
+//     |               |      |       |
+//     c               d      b       c
+//     |               |      |       |
+//     *---e---*---f---*      *       *
+//                            |       |
+//                            d       e
+//                            |       |
+//                            *---f---*
+//
+
+
+//////////////////////////////////////////////////////////////////////////////
+//                                                                          //
+//                       IMPLEMENTATION SECTION                             //
+//                                                                          //
+
+#ifdef STB_HERRINGBONE_WANG_TILE_IMPLEMENTATION
+
+
+#include <string.h> // memcpy
+#include <stdlib.h> // malloc
+
+#ifndef STB_HBWANG_RAND
+#include <stdlib.h>
+#define STB_HBWANG_RAND()  (rand() >> 4)
+#endif
+
+#ifndef STB_HBWANG_ASSERT
+#include <assert.h>
+#define STB_HBWANG_ASSERT(x)  assert(x)
+#endif
+
+// map size
+#ifndef STB_HBWANG_MAX_X
+#define STB_HBWANG_MAX_X  100
+#endif
+
+#ifndef STB_HBWANG_MAX_Y
+#define STB_HBWANG_MAX_Y  100
+#endif
+
+// global variables for color assignments
+// @MEMORY change these to just store last two/three rows
+//         and keep them on the stack
+static signed char c_color[STB_HBWANG_MAX_Y+6][STB_HBWANG_MAX_X+6];
+static signed char v_color[STB_HBWANG_MAX_Y+6][STB_HBWANG_MAX_X+5];
+static signed char h_color[STB_HBWANG_MAX_Y+5][STB_HBWANG_MAX_X+6];
+
+static const char *stbhw_error;
+STBHW_EXTERN const char *stbhw_get_last_error(void)
+{
+   const char *temp = stbhw_error;
+   stbhw_error = 0;
+   return temp;
+}
+
+
+
+
+/////////////////////////////////////////////////////////////
+//
+//  SHARED TEMPLATE-DESCRIPTION CODE
+//
+//  Used by both template generator and tileset parser; by
+//  using the same code, they are locked in sync and we don't
+//  need to try to do more sophisticated parsing of edge color
+//  markup or something.
+
+typedef void stbhw__process_rect(struct stbhw__process *p, int xpos, int ypos,
+                                 int a, int b, int c, int d, int e, int f);
+
+typedef struct stbhw__process
+{
+   stbhw_tileset *ts;
+   stbhw_config *c;
+   stbhw__process_rect *process_h_rect;
+   stbhw__process_rect *process_v_rect;
+   unsigned char *data;
+   int stride,w,h;
+} stbhw__process;
+
+static void stbhw__process_h_row(stbhw__process *p,
+                           int xpos, int ypos,
+                           int a0, int a1,
+                           int b0, int b1,
+                           int c0, int c1,
+                           int d0, int d1,
+                           int e0, int e1,
+                           int f0, int f1,
+                           int variants)
+{
+   int a,b,c,d,e,f,v;
+
+   for (v=0; v < variants; ++v)
+      for (f=f0; f <= f1; ++f)
+         for (e=e0; e <= e1; ++e)
+            for (d=d0; d <= d1; ++d)
+               for (c=c0; c <= c1; ++c)
+                  for (b=b0; b <= b1; ++b)
+                     for (a=a0; a <= a1; ++a) {
+                        p->process_h_rect(p, xpos, ypos, a,b,c,d,e,f);
+                        xpos += 2*p->c->short_side_len + 3;
+                     }
+}
+
+static void stbhw__process_v_row(stbhw__process *p,
+                           int xpos, int ypos,
+                           int a0, int a1,
+                           int b0, int b1,
+                           int c0, int c1,
+                           int d0, int d1,
+                           int e0, int e1,
+                           int f0, int f1,
+                           int variants)
+{
+   int a,b,c,d,e,f,v;
+
+   for (v=0; v < variants; ++v)
+      for (f=f0; f <= f1; ++f)
+         for (e=e0; e <= e1; ++e)
+            for (d=d0; d <= d1; ++d)
+               for (c=c0; c <= c1; ++c)
+                  for (b=b0; b <= b1; ++b)
+                     for (a=a0; a <= a1; ++a) {
+                        p->process_v_rect(p, xpos, ypos, a,b,c,d,e,f);
+                        xpos += p->c->short_side_len+3;
+                     }
+}
+
+static void stbhw__get_template_info(stbhw_config *c, int *w, int *h, int *h_count, int *v_count)
+{
+   int size_x,size_y;
+   int horz_count,vert_count;
+
+   if (c->is_corner) {
+      int horz_w = c->num_color[1] * c->num_color[2] * c->num_color[3] * c->num_vary_x;
+      int horz_h = c->num_color[0] * c->num_color[1] * c->num_color[2] * c->num_vary_y;
+
+      int vert_w = c->num_color[0] * c->num_color[3] * c->num_color[2] * c->num_vary_y;
+      int vert_h = c->num_color[1] * c->num_color[0] * c->num_color[3] * c->num_vary_x;
+
+      int horz_x = horz_w * (2*c->short_side_len + 3);
+      int horz_y = horz_h * (  c->short_side_len + 3);
+
+      int vert_x = vert_w * (  c->short_side_len + 3);
+      int vert_y = vert_h * (2*c->short_side_len + 3);
+
+      horz_count = horz_w * horz_h;
+      vert_count = vert_w * vert_h;
+
+      size_x = horz_x > vert_x ? horz_x : vert_x;
+      size_y = 2 + horz_y + 2 + vert_y;
+   } else {
+      int horz_w = c->num_color[0] * c->num_color[1] * c->num_color[2] * c->num_vary_x;
+      int horz_h = c->num_color[3] * c->num_color[4] * c->num_color[2] * c->num_vary_y;
+
+      int vert_w = c->num_color[0] * c->num_color[5] * c->num_color[1] * c->num_vary_y;
+      int vert_h = c->num_color[3] * c->num_color[4] * c->num_color[5] * c->num_vary_x;
+
+      int horz_x = horz_w * (2*c->short_side_len + 3);
+      int horz_y = horz_h * (  c->short_side_len + 3);
+
+      int vert_x = vert_w * (  c->short_side_len + 3);
+      int vert_y = vert_h * (2*c->short_side_len + 3);
+
+      horz_count = horz_w * horz_h;
+      vert_count = vert_w * vert_h;
+
+      size_x = horz_x > vert_x ? horz_x : vert_x;
+      size_y = 2 + horz_y + 2 + vert_y;
+   }
+   if (w) *w = size_x;
+   if (h) *h = size_y;
+   if (h_count) *h_count = horz_count;
+   if (v_count) *v_count = vert_count;
+}
+
+STBHW_EXTERN void stbhw_get_template_size(stbhw_config *c, int *w, int *h)
+{
+   stbhw__get_template_info(c, w, h, NULL, NULL);
+}
+
+static int stbhw__process_template(stbhw__process *p)
+{
+   int i,j,k,q, ypos;
+   int size_x, size_y;
+   stbhw_config *c = p->c;
+
+   stbhw__get_template_info(c, &size_x, &size_y, NULL, NULL);
+
+   if (p->w < size_x || p->h < size_y) {
+      stbhw_error = "image too small for configuration";
+      return 0;
+   }
+
+   if (c->is_corner) {
+      ypos = 2;
+      for (k=0; k < c->num_color[2]; ++k) {
+         for (j=0; j < c->num_color[1]; ++j) {
+            for (i=0; i < c->num_color[0]; ++i) {
+               for (q=0; q < c->num_vary_y; ++q) {
+                  stbhw__process_h_row(p, 0,ypos,
+                     0,c->num_color[1]-1, 0,c->num_color[2]-1, 0,c->num_color[3]-1,
+                     i,i, j,j, k,k,
+                     c->num_vary_x);
+                  ypos += c->short_side_len + 3;
+               }
+            }
+         }
+      }
+      ypos += 2;
+      for (k=0; k < c->num_color[3]; ++k) {
+         for (j=0; j < c->num_color[0]; ++j) {
+            for (i=0; i < c->num_color[1]; ++i) {
+               for (q=0; q < c->num_vary_x; ++q) {
+                  stbhw__process_v_row(p, 0,ypos,
+                     0,c->num_color[0]-1, 0,c->num_color[3]-1, 0,c->num_color[2]-1,
+                     i,i, j,j, k,k,
+                     c->num_vary_y);
+                  ypos += (c->short_side_len*2) + 3;
+               }
+            }
+         }
+      }
+      assert(ypos == size_y);
+   } else {
+      ypos = 2;
+      for (k=0; k < c->num_color[3]; ++k) {
+         for (j=0; j < c->num_color[4]; ++j) {
+            for (i=0; i < c->num_color[2]; ++i) {
+               for (q=0; q < c->num_vary_y; ++q) {
+                  stbhw__process_h_row(p, 0,ypos,
+                     0,c->num_color[2]-1, k,k,
+                     0,c->num_color[1]-1, j,j,
+                     0,c->num_color[0]-1, i,i,
+                     c->num_vary_x);
+                  ypos += c->short_side_len + 3;
+               }
+            }
+         }
+      }
+      ypos += 2;
+      for (k=0; k < c->num_color[3]; ++k) {
+         for (j=0; j < c->num_color[4]; ++j) {
+            for (i=0; i < c->num_color[5]; ++i) {
+               for (q=0; q < c->num_vary_x; ++q) {
+                  stbhw__process_v_row(p, 0,ypos,
+                     0,c->num_color[0]-1, i,i,
+                     0,c->num_color[1]-1, j,j,
+                     0,c->num_color[5]-1, k,k,
+                     c->num_vary_y);
+                  ypos += (c->short_side_len*2) + 3;
+               }
+            }
+         }
+      }
+      assert(ypos == size_y);
+   }
+   return 1;
+}
+
+
+/////////////////////////////////////////////////////////////
+//
+//  MAP GENERATOR
+//
+
+static void stbhw__draw_pixel(unsigned char *output, int stride, int x, int y, unsigned char c[3])
+{
+   memcpy(output + y*stride + x*3, c, 3);
+}
+
+static void stbhw__draw_h_tile(unsigned char *output, int stride, int xmax, int ymax, int x, int y, stbhw_tile *h, int sz)
+{
+   int i,j;
+   for (j=0; j < sz; ++j)
+      if (y+j >= 0 && y+j < ymax)
+         for (i=0; i < sz*2; ++i)
+            if (x+i >= 0 && x+i < xmax)
+               stbhw__draw_pixel(output,stride, x+i,y+j, &h->pixels[(j*sz*2 + i)*3]);
+}
+
+static void stbhw__draw_v_tile(unsigned char *output, int stride, int xmax, int ymax, int x, int y, stbhw_tile *h, int sz)
+{
+   int i,j;
+   for (j=0; j < sz*2; ++j)
+      if (y+j >= 0 && y+j < ymax)
+         for (i=0; i < sz; ++i)
+            if (x+i >= 0 && x+i < xmax)
+               stbhw__draw_pixel(output,stride, x+i,y+j, &h->pixels[(j*sz + i)*3]);
+}
+
+
+// randomly choose a tile that fits constraints for a given spot, and update the constraints
+static stbhw_tile * stbhw__choose_tile(stbhw_tile **list, int numlist,
+                                      signed char *a, signed char *b, signed char *c,
+                                      signed char *d, signed char *e, signed char *f,
+                                      int **weighting)
+{
+   int i,n,m = 1<<30,pass;
+   for (pass=0; pass < 2; ++pass) {
+      n=0;
+      // pass #1:
+      //   count number of variants that match this partial set of constraints
+      // pass #2:
+      //   stop on randomly selected match
+      for (i=0; i < numlist; ++i) {
+         stbhw_tile *h = list[i];
+         if ((*a < 0 || *a == h->a) &&
+             (*b < 0 || *b == h->b) &&
+             (*c < 0 || *c == h->c) &&
+             (*d < 0 || *d == h->d) &&
+             (*e < 0 || *e == h->e) &&
+             (*f < 0 || *f == h->f)) {
+            if (weighting)
+               n += weighting[0][i];
+            else
+               n += 1;
+            if (n > m) {
+               // use list[i]
+               // update constraints to reflect what we placed
+               *a = h->a;
+               *b = h->b;
+               *c = h->c;
+               *d = h->d;
+               *e = h->e;
+               *f = h->f;
+               return h;
+            }
+         }
+      }
+      if (n == 0) {
+         stbhw_error = "couldn't find tile matching constraints";
+         return NULL;
+      }
+      m = STB_HBWANG_RAND() % n;
+   }
+   STB_HBWANG_ASSERT(0);
+   return NULL;
+}
+
+static int stbhw__match(int x, int y)
+{
+   return c_color[y][x] == c_color[y+1][x+1];
+}
+
+static int stbhw__weighted(int num_options, int *weights)
+{
+   int k, total, choice;
+   total = 0;
+   for (k=0; k < num_options; ++k)
+      total += weights[k];
+   choice = STB_HBWANG_RAND() % total;
+   total = 0;
+   for (k=0; k < num_options; ++k) {
+      total += weights[k];
+      if (choice < total)
+         break;
+   }
+   STB_HBWANG_ASSERT(k < num_options);
+   return k;
+}
+
+static int stbhw__change_color(int old_color, int num_options, int *weights)
+{
+   if (weights) {
+      int k, total, choice;
+      total = 0;
+      for (k=0; k < num_options; ++k)
+         if (k != old_color)
+            total += weights[k];
+      choice = STB_HBWANG_RAND() % total;
+      total = 0;
+      for (k=0; k < num_options; ++k) {
+         if (k != old_color) {
+            total += weights[k];
+            if (choice < total)
+               break;
+         }
+      }
+      STB_HBWANG_ASSERT(k < num_options);
+      return k;
+   } else {
+      int offset = 1+STB_HBWANG_RAND() % (num_options-1);
+      return (old_color+offset) % num_options;
+   }
+}
+
+
+
+// generate a map that is w * h pixels (3-bytes each)
+// returns 1 on success, 0 on error
+STBHW_EXTERN int stbhw_generate_image(stbhw_tileset *ts, int **weighting, unsigned char *output, int stride, int w, int h)
+{
+   int sidelen = ts->short_side_len;
+   int xmax = (w / sidelen) + 6;
+   int ymax = (h / sidelen) + 6;
+   if (xmax > STB_HBWANG_MAX_X+6 || ymax > STB_HBWANG_MAX_Y+6) {
+      stbhw_error = "increase STB_HBWANG_MAX_X/Y";
+      return 0;
+   }
+
+   if (ts->is_corner) {
+      int i,j, ypos;
+      int *cc = ts->num_color;
+
+      for (j=0; j < ymax; ++j) {
+         for (i=0; i < xmax; ++i) {
+            int p = (i-j+1)&3; // corner type
+            if (weighting==NULL || weighting[p]==0 || cc[p] == 1)
+               c_color[j][i] = STB_HBWANG_RAND() % cc[p];
+            else
+               c_color[j][i] = stbhw__weighted(cc[p], weighting[p]);
+         }
+      }
+      #ifndef STB_HBWANG_NO_REPITITION_REDUCTION
+      // now go back through and make sure we don't have adjancent 3x2 vertices that are identical,
+      // to avoid really obvious repetition (which happens easily with extreme weights)
+      for (j=0; j < ymax-3; ++j) {
+         for (i=0; i < xmax-3; ++i) {
+            //int p = (i-j+1) & 3; // corner type   // unused, not sure what the intent was so commenting it out
+            STB_HBWANG_ASSERT(i+3 < STB_HBWANG_MAX_X+6);
+            STB_HBWANG_ASSERT(j+3 < STB_HBWANG_MAX_Y+6);
+            if (stbhw__match(i,j) && stbhw__match(i,j+1) && stbhw__match(i,j+2)
+                && stbhw__match(i+1,j) && stbhw__match(i+1,j+1) && stbhw__match(i+1,j+2)) {
+               int p = ((i+1)-(j+1)+1) & 3;
+               if (cc[p] > 1)
+                  c_color[j+1][i+1] = stbhw__change_color(c_color[j+1][i+1], cc[p], weighting ? weighting[p] : NULL);
+            }
+            if (stbhw__match(i,j) && stbhw__match(i+1,j) && stbhw__match(i+2,j)
+                && stbhw__match(i,j+1) && stbhw__match(i+1,j+1) && stbhw__match(i+2,j+1)) {
+               int p = ((i+2)-(j+1)+1) & 3;
+               if (cc[p] > 1)
+                  c_color[j+1][i+2] = stbhw__change_color(c_color[j+1][i+2], cc[p], weighting ? weighting[p] : NULL);
+            }
+         }
+      }
+      #endif
+
+      ypos = -1 * sidelen;
+      for (j = -1; ypos < h; ++j) {
+         // a general herringbone row consists of:
+         //    horizontal left block, the bottom of a previous vertical, the top of a new vertical
+         int phase = (j & 3);
+         // displace horizontally according to pattern
+         if (phase == 0) {
+            i = 0;
+         } else {
+            i = phase-4;
+         }
+         for (;; i += 4) {
+            int xpos = i * sidelen;
+            if (xpos >= w)
+               break;
+            // horizontal left-block
+            if (xpos + sidelen*2 >= 0 && ypos >= 0) {
+               stbhw_tile *t = stbhw__choose_tile(
+                  ts->h_tiles, ts->num_h_tiles,
+                  &c_color[j+2][i+2], &c_color[j+2][i+3], &c_color[j+2][i+4],
+                  &c_color[j+3][i+2], &c_color[j+3][i+3], &c_color[j+3][i+4],
+                  weighting
+               );
+               if (t == NULL)
+                  return 0;
+               stbhw__draw_h_tile(output,stride,w,h, xpos, ypos, t, sidelen);
+            }
+            xpos += sidelen * 2;
+            // now we're at the end of a previous vertical one
+            xpos += sidelen;
+            // now we're at the start of a new vertical one
+            if (xpos < w) {
+               stbhw_tile *t = stbhw__choose_tile(
+                  ts->v_tiles, ts->num_v_tiles,
+                  &c_color[j+2][i+5], &c_color[j+3][i+5], &c_color[j+4][i+5],
+                  &c_color[j+2][i+6], &c_color[j+3][i+6], &c_color[j+4][i+6],
+                  weighting
+               );
+               if (t == NULL)
+                  return 0;
+               stbhw__draw_v_tile(output,stride,w,h, xpos, ypos,  t, sidelen);
+            }
+         }
+         ypos += sidelen;
+      }
+   } else {
+      // @TODO edge-color repetition reduction
+      int i,j, ypos;
+      memset(v_color, -1, sizeof(v_color));
+      memset(h_color, -1, sizeof(h_color));
+
+      ypos = -1 * sidelen;
+      for (j = -1; ypos<h; ++j) {
+         // a general herringbone row consists of:
+         //    horizontal left block, the bottom of a previous vertical, the top of a new vertical
+         int phase = (j & 3);
+         // displace horizontally according to pattern
+         if (phase == 0) {
+            i = 0;
+         } else {
+            i = phase-4;
+         }
+         for (;; i += 4) {
+            int xpos = i * sidelen;
+            if (xpos >= w)
+               break;
+            // horizontal left-block
+            if (xpos + sidelen*2 >= 0 && ypos >= 0) {
+               stbhw_tile *t = stbhw__choose_tile(
+                  ts->h_tiles, ts->num_h_tiles,
+                  &h_color[j+2][i+2], &h_color[j+2][i+3],
+                  &v_color[j+2][i+2], &v_color[j+2][i+4],
+                  &h_color[j+3][i+2], &h_color[j+3][i+3],
+                  weighting
+               );
+               if (t == NULL) return 0;
+               stbhw__draw_h_tile(output,stride,w,h, xpos, ypos, t, sidelen);
+            }
+            xpos += sidelen * 2;
+            // now we're at the end of a previous vertical one
+            xpos += sidelen;
+            // now we're at the start of a new vertical one
+            if (xpos < w) {
+               stbhw_tile *t = stbhw__choose_tile(
+                  ts->v_tiles, ts->num_v_tiles,
+                  &h_color[j+2][i+5],
+                  &v_color[j+2][i+5], &v_color[j+2][i+6],
+                  &v_color[j+3][i+5], &v_color[j+3][i+6],
+                  &h_color[j+4][i+5],
+                  weighting
+               );
+               if (t == NULL) return 0;
+               stbhw__draw_v_tile(output,stride,w,h, xpos, ypos,  t, sidelen);
+            }
+         }
+         ypos += sidelen;
+      }
+   }
+   return 1;
+}
+
+static void stbhw__parse_h_rect(stbhw__process *p, int xpos, int ypos,
+                            int a, int b, int c, int d, int e, int f)
+{
+   int len = p->c->short_side_len;
+   stbhw_tile *h = (stbhw_tile *) malloc(sizeof(*h)-1 + 3 * (len*2) * len);
+   int i,j;
+   ++xpos;
+   ++ypos;
+   h->a = a, h->b = b, h->c = c, h->d = d, h->e = e, h->f = f;
+   for (j=0; j < len; ++j)
+      for (i=0; i < len*2; ++i)
+         memcpy(h->pixels + j*(3*len*2) + i*3, p->data+(ypos+j)*p->stride+(xpos+i)*3, 3);
+   STB_HBWANG_ASSERT(p->ts->num_h_tiles < p->ts->max_h_tiles);
+   p->ts->h_tiles[p->ts->num_h_tiles++] = h;
+}
+
+static void stbhw__parse_v_rect(stbhw__process *p, int xpos, int ypos,
+                            int a, int b, int c, int d, int e, int f)
+{
+   int len = p->c->short_side_len;
+   stbhw_tile *h = (stbhw_tile *) malloc(sizeof(*h)-1 + 3 * (len*2) * len);
+   int i,j;
+   ++xpos;
+   ++ypos;
+   h->a = a, h->b = b, h->c = c, h->d = d, h->e = e, h->f = f;
+   for (j=0; j < len*2; ++j)
+      for (i=0; i < len; ++i)
+         memcpy(h->pixels + j*(3*len) + i*3, p->data+(ypos+j)*p->stride+(xpos+i)*3, 3);
+   STB_HBWANG_ASSERT(p->ts->num_v_tiles < p->ts->max_v_tiles);
+   p->ts->v_tiles[p->ts->num_v_tiles++] = h;
+}
+
+STBHW_EXTERN int stbhw_build_tileset_from_image(stbhw_tileset *ts, unsigned char *data, int stride, int w, int h)
+{
+   int i, h_count, v_count;
+   unsigned char header[9];
+   stbhw_config c = { 0 };
+   stbhw__process p = { 0 };
+
+   // extract binary header
+
+   // remove encoding that makes it more visually obvious it encodes actual data
+   for (i=0; i < 9; ++i)
+      header[i] = data[w*3 - 1 - i] ^ (i*55);
+
+   // extract header info
+   if (header[7] == 0xc0) {
+      // corner-type
+      c.is_corner = 1;
+      for (i=0; i < 4; ++i)
+         c.num_color[i] = header[i];
+      c.num_vary_x = header[4];
+      c.num_vary_y = header[5];
+      c.short_side_len = header[6];
+   } else {
+      c.is_corner = 0;
+      // edge-type
+      for (i=0; i < 6; ++i)
+         c.num_color[i] = header[i];
+      c.num_vary_x = header[6];
+      c.num_vary_y = header[7];
+      c.short_side_len = header[8];
+   }
+
+   if (c.num_vary_x < 0 || c.num_vary_x > 64 || c.num_vary_y < 0 || c.num_vary_y > 64)
+      return 0;
+   if (c.short_side_len == 0)
+      return 0;
+   if (c.num_color[0] > 32 || c.num_color[1] > 32 || c.num_color[2] > 32 || c.num_color[3] > 32)
+      return 0;
+
+   stbhw__get_template_info(&c, NULL, NULL, &h_count, &v_count);
+
+   ts->is_corner = c.is_corner;
+   ts->short_side_len = c.short_side_len;
+   memcpy(ts->num_color, c.num_color, sizeof(ts->num_color));
+
+   ts->max_h_tiles = h_count;
+   ts->max_v_tiles = v_count;
+
+   ts->num_h_tiles = ts->num_v_tiles = 0;
+
+   ts->h_tiles = (stbhw_tile **) malloc(sizeof(*ts->h_tiles) * h_count);
+   ts->v_tiles = (stbhw_tile **) malloc(sizeof(*ts->v_tiles) * v_count);
+
+   p.ts = ts;
+   p.data = data;
+   p.stride = stride;
+   p.process_h_rect = stbhw__parse_h_rect;
+   p.process_v_rect = stbhw__parse_v_rect;
+   p.w = w;
+   p.h = h;
+   p.c = &c;
+
+   // load all the tiles out of the image
+   return stbhw__process_template(&p);
+}
+
+STBHW_EXTERN void stbhw_free_tileset(stbhw_tileset *ts)
+{
+   int i;
+   for (i=0; i < ts->num_h_tiles; ++i)
+      free(ts->h_tiles[i]);
+   for (i=0; i < ts->num_v_tiles; ++i)
+      free(ts->v_tiles[i]);
+   free(ts->h_tiles);
+   free(ts->v_tiles);
+   ts->h_tiles = NULL;
+   ts->v_tiles = NULL;
+   ts->num_h_tiles = ts->max_h_tiles = 0;
+   ts->num_v_tiles = ts->max_v_tiles = 0;
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//               GENERATOR
+//
+//
+
+
+// shared code
+
+static void stbhw__set_pixel(unsigned char *data, int stride, int xpos, int ypos, unsigned char color[3])
+{
+   memcpy(data + ypos*stride + xpos*3, color, 3);
+}
+
+static void stbhw__stbhw__set_pixel_whiten(unsigned char *data, int stride, int xpos, int ypos, unsigned char color[3])
+{
+   unsigned char c2[3];
+   int i;
+   for (i=0; i < 3; ++i)
+      c2[i] = (color[i]*2 + 255)/3;
+   memcpy(data + ypos*stride + xpos*3, c2, 3);
+}
+
+
+static unsigned char stbhw__black[3] = { 0,0,0 };
+
+// each edge set gets its own unique color variants
+// used http://phrogz.net/css/distinct-colors.html to generate this set,
+// but it's not very good and needs to be revised
+
+static unsigned char stbhw__color[7][8][3] =
+{
+   { {255,51,51}  , {143,143,29}, {0,199,199}, {159,119,199},     {0,149,199}  , {143, 0,143}, {255,128,0}, {64,255,0},  },
+   { {235,255,30 }, {255,0,255},  {199,139,119},  {29,143, 57},    {143,0,71}   , { 0,143,143}, {0,99,199}, {143,71,0},  },
+   { {0,149,199}  , {143, 0,143}, {255,128,0}, {64,255,0},        {255,191,0}  , {51,255,153}, {0,0,143}, {199,119,159},},
+   { {143,0,71}   , { 0,143,143}, {0,99,199}, {143,71,0},         {255,190,153}, { 0,255,255}, {128,0,255}, {255,51,102},},
+   { {255,191,0}  , {51,255,153}, {0,0,143}, {199,119,159},       {255,51,51}  , {143,143,29}, {0,199,199}, {159,119,199},},
+   { {255,190,153}, { 0,255,255}, {128,0,255}, {255,51,102},      {235,255,30 }, {255,0,255}, {199,139,119},  {29,143, 57}, },
+
+   { {40,40,40 },  { 90,90,90 }, { 150,150,150 }, { 200,200,200 },
+     { 255,90,90 }, { 160,160,80}, { 50,150,150 }, { 200,50,200 } },
+};
+
+static void stbhw__draw_hline(unsigned char *data, int stride, int xpos, int ypos, int color, int len, int slot)
+{
+   int i;
+   int j = len * 6 / 16;
+   int k = len * 10 / 16;
+   for (i=0; i < len; ++i)
+      stbhw__set_pixel(data, stride, xpos+i, ypos, stbhw__black);
+   if (k-j < 2) {
+      j = len/2 - 1;
+      k = j+2;
+      if (len & 1)
+         ++k;
+   }
+   for (i=j; i < k; ++i)
+      stbhw__stbhw__set_pixel_whiten(data, stride, xpos+i, ypos, stbhw__color[slot][color]);
+}
+
+static void stbhw__draw_vline(unsigned char *data, int stride, int xpos, int ypos, int color, int len, int slot)
+{
+   int i;
+   int j = len * 6 / 16;
+   int k = len * 10 / 16;
+   for (i=0; i < len; ++i)
+      stbhw__set_pixel(data, stride, xpos, ypos+i, stbhw__black);
+   if (k-j < 2) {
+      j = len/2 - 1;
+      k = j+2;
+      if (len & 1)
+         ++k;
+   }
+   for (i=j; i < k; ++i)
+      stbhw__stbhw__set_pixel_whiten(data, stride, xpos, ypos+i, stbhw__color[slot][color]);
+}
+
+//                 0--*--1--*--2--*--3
+//                 |     |           |
+//                 *     *           *
+//                 |     |           |
+//     1--*--2--*--3     0--*--1--*--2
+//     |           |     |
+//     *           *     *
+//     |           |     |
+//     0--*--1--*--2--*--3
+//
+// variables while enumerating (no correspondence between corners
+// of the types is implied by these variables)
+//
+//     a-----b-----c      a-----d
+//     |           |      |     |
+//     |           |      |     |
+//     |           |      |     |
+//     d-----e-----f      b     e
+//                        |     |
+//                        |     |
+//                        |     |
+//                        c-----f
+//
+
+unsigned char stbhw__corner_colors[4][4][3] =
+{
+   { { 255,0,0 }, { 200,200,200 }, { 100,100,200 }, { 255,200,150 }, },
+   { { 0,0,255 }, { 255,255,0 },   { 100,200,100 }, { 150,255,200 }, },
+   { { 255,0,255 }, { 80,80,80 },  { 200,100,100 }, { 200,150,255 }, },
+   { { 0,255,255 }, { 0,255,0 },   { 200,120,200 }, { 255,200,200 }, },
+};
+
+int stbhw__corner_colors_to_edge_color[4][4] =
+{
+   // 0   1   2   3
+   {  0,  1,  4,  9, }, // 0
+   {  2,  3,  5, 10, }, // 1
+   {  6,  7,  8, 11, }, // 2
+   { 12, 13, 14, 15, }, // 3
+};
+
+#define stbhw__c2e stbhw__corner_colors_to_edge_color
+
+static void stbhw__draw_clipped_corner(unsigned char *data, int stride, int xpos, int ypos, int w, int h, int x, int y)
+{
+   static unsigned char template_color[3] = { 167,204,204 };
+   int i,j;
+   for (j = -2; j <= 1; ++j) {
+      for (i = -2; i <= 1; ++i) {
+         if ((i == -2 || i == 1) && (j == -2 || j == 1))
+            continue;
+         else {
+            if (x+i < 1 || x+i > w) continue;
+            if (y+j < 1 || y+j > h) continue;
+            stbhw__set_pixel(data, stride, xpos+x+i, ypos+y+j, template_color);
+
+         }
+      }
+   }
+}
+
+static void stbhw__edge_process_h_rect(stbhw__process *p, int xpos, int ypos,
+                            int a, int b, int c, int d, int e, int f)
+{
+   int len = p->c->short_side_len;
+   stbhw__draw_hline(p->data, p->stride, xpos+1        , ypos        , a, len, 2);
+   stbhw__draw_hline(p->data, p->stride, xpos+  len+1  , ypos        , b, len, 3);
+   stbhw__draw_vline(p->data, p->stride, xpos          , ypos+1      , c, len, 1);
+   stbhw__draw_vline(p->data, p->stride, xpos+2*len+1  , ypos+1      , d, len, 4);
+   stbhw__draw_hline(p->data, p->stride, xpos+1        , ypos + len+1, e, len, 0);
+   stbhw__draw_hline(p->data, p->stride, xpos + len+1  , ypos + len+1, f, len, 2);
+}
+
+static void stbhw__edge_process_v_rect(stbhw__process *p, int xpos, int ypos,
+                            int a, int b, int c, int d, int e, int f)
+{
+   int len = p->c->short_side_len;
+   stbhw__draw_hline(p->data, p->stride, xpos+1      , ypos          , a, len, 0);
+   stbhw__draw_vline(p->data, p->stride, xpos        , ypos+1        , b, len, 5);
+   stbhw__draw_vline(p->data, p->stride, xpos + len+1, ypos+1        , c, len, 1);
+   stbhw__draw_vline(p->data, p->stride, xpos        , ypos +   len+1, d, len, 4);
+   stbhw__draw_vline(p->data, p->stride, xpos + len+1, ypos +   len+1, e, len, 5);
+   stbhw__draw_hline(p->data, p->stride, xpos+1      , ypos + 2*len+1, f, len, 3);
+}
+
+static void stbhw__corner_process_h_rect(stbhw__process *p, int xpos, int ypos,
+                            int a, int b, int c, int d, int e, int f)
+{
+   int len = p->c->short_side_len;
+
+   stbhw__draw_hline(p->data, p->stride, xpos+1        , ypos        , stbhw__c2e[a][b], len, 2);
+   stbhw__draw_hline(p->data, p->stride, xpos+  len+1  , ypos        , stbhw__c2e[b][c], len, 3);
+   stbhw__draw_vline(p->data, p->stride, xpos          , ypos+1      , stbhw__c2e[a][d], len, 1);
+   stbhw__draw_vline(p->data, p->stride, xpos+2*len+1  , ypos+1      , stbhw__c2e[c][f], len, 4);
+   stbhw__draw_hline(p->data, p->stride, xpos+1        , ypos + len+1, stbhw__c2e[d][e], len, 0);
+   stbhw__draw_hline(p->data, p->stride, xpos + len+1  , ypos + len+1, stbhw__c2e[e][f], len, 2);
+
+   if (p->c->corner_type_color_template[1][a]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len*2,len, 1,1);
+   if (p->c->corner_type_color_template[2][b]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len*2,len, len+1,1);
+   if (p->c->corner_type_color_template[3][c]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len*2,len, len*2+1,1);
+
+   if (p->c->corner_type_color_template[0][d]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len*2,len, 1,len+1);
+   if (p->c->corner_type_color_template[1][e]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len*2,len, len+1,len+1);
+   if (p->c->corner_type_color_template[2][f]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len*2,len, len*2+1,len+1);
+
+   stbhw__set_pixel(p->data, p->stride, xpos        , ypos, stbhw__corner_colors[1][a]);
+   stbhw__set_pixel(p->data, p->stride, xpos+len    , ypos, stbhw__corner_colors[2][b]);
+   stbhw__set_pixel(p->data, p->stride, xpos+2*len+1, ypos, stbhw__corner_colors[3][c]);
+   stbhw__set_pixel(p->data, p->stride, xpos        , ypos+len+1, stbhw__corner_colors[0][d]);
+   stbhw__set_pixel(p->data, p->stride, xpos+len    , ypos+len+1, stbhw__corner_colors[1][e]);
+   stbhw__set_pixel(p->data, p->stride, xpos+2*len+1, ypos+len+1, stbhw__corner_colors[2][f]);
+}
+
+static void stbhw__corner_process_v_rect(stbhw__process *p, int xpos, int ypos,
+                            int a, int b, int c, int d, int e, int f)
+{
+   int len = p->c->short_side_len;
+
+   stbhw__draw_hline(p->data, p->stride, xpos+1      , ypos          , stbhw__c2e[a][d], len, 0);
+   stbhw__draw_vline(p->data, p->stride, xpos        , ypos+1        , stbhw__c2e[a][b], len, 5);
+   stbhw__draw_vline(p->data, p->stride, xpos + len+1, ypos+1        , stbhw__c2e[d][e], len, 1);
+   stbhw__draw_vline(p->data, p->stride, xpos        , ypos +   len+1, stbhw__c2e[b][c], len, 4);
+   stbhw__draw_vline(p->data, p->stride, xpos + len+1, ypos +   len+1, stbhw__c2e[e][f], len, 5);
+   stbhw__draw_hline(p->data, p->stride, xpos+1      , ypos + 2*len+1, stbhw__c2e[c][f], len, 3);
+
+   if (p->c->corner_type_color_template[0][a]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len,len*2, 1,1);
+   if (p->c->corner_type_color_template[3][b]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len,len*2, 1,len+1);
+   if (p->c->corner_type_color_template[2][c]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len,len*2, 1,len*2+1);
+
+   if (p->c->corner_type_color_template[1][d]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len,len*2, len+1,1);
+   if (p->c->corner_type_color_template[0][e]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len,len*2, len+1,len+1);
+   if (p->c->corner_type_color_template[3][f]) stbhw__draw_clipped_corner(p->data,p->stride, xpos,ypos, len,len*2, len+1,len*2+1);
+
+   stbhw__set_pixel(p->data, p->stride, xpos      , ypos        , stbhw__corner_colors[0][a]);
+   stbhw__set_pixel(p->data, p->stride, xpos      , ypos+len    , stbhw__corner_colors[3][b]);
+   stbhw__set_pixel(p->data, p->stride, xpos      , ypos+2*len+1, stbhw__corner_colors[2][c]);
+   stbhw__set_pixel(p->data, p->stride, xpos+len+1, ypos        , stbhw__corner_colors[1][d]);
+   stbhw__set_pixel(p->data, p->stride, xpos+len+1, ypos+len    , stbhw__corner_colors[0][e]);
+   stbhw__set_pixel(p->data, p->stride, xpos+len+1, ypos+2*len+1, stbhw__corner_colors[3][f]);
+}
+
+// generates a template image, assuming data is 3*w*h bytes long, RGB format
+STBHW_EXTERN int stbhw_make_template(stbhw_config *c, unsigned char *data, int w, int h, int stride_in_bytes)
+{
+   stbhw__process p;
+   int i;
+
+   p.data = data;
+   p.w = w;
+   p.h = h;
+   p.stride = stride_in_bytes;
+   p.ts = 0;
+   p.c = c;
+
+   if (c->is_corner) {
+      p.process_h_rect = stbhw__corner_process_h_rect;
+      p.process_v_rect = stbhw__corner_process_v_rect;
+   } else {
+      p.process_h_rect = stbhw__edge_process_h_rect;
+      p.process_v_rect = stbhw__edge_process_v_rect;
+   }
+
+   for (i=0; i < p.h; ++i)
+      memset(p.data + i*p.stride, 255, 3*p.w);
+
+   if (!stbhw__process_template(&p))
+      return 0;
+
+   if (c->is_corner) {
+      // write out binary information in first line of image
+      for (i=0; i < 4; ++i)
+         data[w*3-1-i] = c->num_color[i];
+      data[w*3-1-i] = c->num_vary_x;
+      data[w*3-2-i] = c->num_vary_y;
+      data[w*3-3-i] = c->short_side_len;
+      data[w*3-4-i] = 0xc0;
+   } else {
+      for (i=0; i < 6; ++i)
+         data[w*3-1-i] = c->num_color[i];
+      data[w*3-1-i] = c->num_vary_x;
+      data[w*3-2-i] = c->num_vary_y;
+      data[w*3-3-i] = c->short_side_len;
+   }
+
+   // make it more obvious it encodes actual data
+   for (i=0; i < 9; ++i)
+      p.data[p.w*3 - 1 - i] ^= i*55;
+
+   return 1;
+}
+#endif // STB_HBWANG_IMPLEMENTATION
diff --git a/vendor/stb/stb_hexwave.h b/vendor/stb/stb_hexwave.h
new file mode 100644
index 0000000..480ab1b
--- /dev/null
+++ b/vendor/stb/stb_hexwave.h
@@ -0,0 +1,680 @@
+// stb_hexwave - v0.5 - public domain, initial release 2021-04-01
+//
+// A flexible anti-aliased (bandlimited) digital audio oscillator.
+//
+// This library generates waveforms of a variety of shapes made of
+// line segments. It does not do envelopes, LFO effects, etc.; it
+// merely tries to solve the problem of generating an artifact-free
+// morphable digital waveform with a variety of spectra, and leaves
+// it to the user to rescale the waveform and mix multiple voices, etc.
+//
+// Compiling:
+//
+//   In one C/C++ file that #includes this file, do
+//
+//      #define STB_HEXWAVE_IMPLEMENTATION
+//      #include "stb_hexwave.h"
+//
+//   Optionally, #define STB_HEXWAVE_STATIC before including
+//   the header to cause the definitions to be private to the
+//   implementation file (i.e. to be "static" instead of "extern").
+//
+// Notes:
+//
+//   Optionally performs memory allocation during initialization,
+//   never allocates otherwise.
+//
+// License:
+//
+//   See end of file for license information.
+//
+// Usage:
+//
+//   Initialization:
+//
+//     hexwave_init(32,16,NULL); // read "header section" for alternatives
+//
+//   Create oscillator:
+//
+//     HexWave *osc = malloc(sizeof(*osc)); // or "new HexWave", or declare globally or on stack
+//     hexwave_create(osc, reflect_flag, peak_time, half_height, zero_wait);
+//       see "Waveform shapes" below for the meaning of these parameters
+//
+//   Generate audio:
+//
+//     hexwave_generate_samples(output, number_of_samples, osc, oscillator_freq)
+//       where:
+//         output is a buffer where the library will store floating point audio samples
+//         number_of_samples is the number of audio samples to generate
+//         osc is a pointer to a Hexwave
+//         oscillator_freq is the frequency of the oscillator divided by the sample rate
+//
+//       The output samples will continue from where the samples generated by the
+//       previous hexwave_generate_samples() on this oscillator ended.
+//
+//   Change oscillator waveform:
+//
+//     hexwave_change(osc, reflect_flag, peak_time, half_height, zero_wait);
+//       can call in between calls to hexwave_generate_samples
+//
+// Waveform shapes:
+//
+//   All waveforms generated by hexwave are constructed from six line segments
+//   characterized by 3 parameters.
+//
+//   See demonstration: https://www.youtube.com/watch?v=hsUCrAsDN-M
+//
+//                 reflect=0                          reflect=1
+//
+//           0-----P---1                        0-----P---1    peak_time = P
+//                 .     1                            .     1
+//                /\_    :                           /\_    :
+//               /   \_  :                          /   \_  :
+//              /      \.H                         /      \.H  half_height = H
+//             /       | :                        /       | :
+//       _____/        |_:___               _____/        | :       _____
+//           .           :   \        |         .         | :      /
+//           .           :    \       |         .         | :     /
+//           .           :     \     _/         .         \_:    /
+//           .           :      \  _/           .           :_  /
+//           .          -1       \/             .          -1 \/
+//       0 - Z - - - - 1                    0 - Z - - - - 1   zero_wait = Z
+//
+//    Classic waveforms:
+//                               peak    half    zero
+//                     reflect   time   height   wait
+//      Sawtooth          1       0       0       0
+//      Square            1       0       1       0
+//      Triangle          1       0.5     0       0
+//
+//    Some waveforms can be produced in multiple ways, which is useful when morphing
+//    into other waveforms, and there are a few more notable shapes:
+//
+//                               peak    half    zero
+//                     reflect   time   height   wait
+//      Sawtooth          1       1      any      0
+//      Sawtooth (8va)    1       0      -1       0
+//      Triangle          1       0.5     0       0
+//      Square            1       0       1       0
+//      Square            0       0       1       0
+//      Triangle          0       0.5     0       0
+//      Triangle          0       0      -1       0
+//      AlternatingSaw    0       0       0       0
+//      AlternatingSaw    0       1      any      0
+//      Stairs            0       0       1       0.5
+//
+//    The "Sawtooth (8va)" waveform is identical to a sawtooth wave with 2x the
+//    frequency, but when morphed with other values, it becomes an overtone of
+//    the base frequency.
+//
+//  Morphing waveforms:
+//
+//    Sweeping peak_time morphs the waveform while producing various spectra.
+//    Sweeping half_height effectively crossfades between two waveforms; useful, but less exciting.
+//    Sweeping zero_wait produces a similar effect no matter the reset of the waveform,
+//        a sort of high-pass/PWM effect where the wave becomes silent at zero_wait=1.
+//
+//    You can trivially morph between any two waveforms from the above table
+//    which only differ in one column.
+//
+//    Crossfade between classic waveforms:
+//                                            peak     half    zero
+//        Start         End         reflect   time    height   wait
+//        -----         ---         -------   ----    ------   ----
+//        Triangle      Square         0       0      -1..1    0
+//        Saw           Square         1       0       0..1    0
+//        Triangle      Saw            1       0.5     0..2    0
+//
+//    The last morph uses uses half-height values larger than 1, which means it will
+//    be louder and the output should be scaled down by half to compensate, or better
+//    by dynamically tracking the morph: volume_scale = 1 - half_height/4
+//
+//    Non-crossfade morph between classic waveforms, most require changing
+//    two parameters at the same time:
+//                                           peak     half    zero
+//      Start         End         reflect    time    height   wait
+//      -----         ---         -------    ----    ------   ----
+//      Square        Triangle      any      0..0.5   1..0     0
+//      Square        Saw            1       0..1     1..any   0
+//      Triangle      Saw            1     0.5..1     0..-1    0
+//
+//    Other noteworthy morphs between simple shapes:
+//                                                            peak     half    zero
+//      Start           Halfway       End          reflect    time    height   wait
+//      -----           ---------     ---          -------    ----    ------   ----
+//      Saw (8va,neg)                Saw (pos)        1       0..1      -1      0
+//      Saw (neg)                    Saw (pos)        1       0..1       0      0
+//      Triangle                     AlternatingSaw   0       0..1      -1      0
+//      AlternatingSaw  Triangle     AlternatingSaw   0       0..1       0      0
+//      Square                       AlternatingSaw   0       0..1       1      0
+//      Triangle        Triangle     AlternatingSaw   0       0..1    -1..1     0
+//      Square                       AlternatingSaw   0       0..1     1..0     0
+//      Saw (8va)       Triangle     Saw              1       0..1    -1..1     0
+//      Saw (neg)                    Saw (pos)        1       0..1     0..1     0
+//      AlternatingSaw               AlternatingSaw   0       0..1     0..any   0
+//
+//   The last entry is noteworthy because the morph from the halfway point to either
+//   endpoint sounds very different. For example, an LFO sweeping back and forth over
+//   the whole range will morph between the middle timbre and the AlternatingSaw
+//   timbre in two different ways, alternating.
+//
+//   Entries with "any" for half_height are whole families of morphs, as you can pick
+//   any value you want as the endpoint for half_height.
+//
+//   You can always morph between any two waveforms with the same value of 'reflect'
+//   by just sweeping the parameters simultaneously. There will never be artifacts
+//   and the result will always be useful, if not necessarily what you want.
+//
+//   You can vary the sound of two-parameter morphs by ramping them differently,
+//   e.g. if the morph goes from t=0..1, then square-to-triangle looks like:
+//        peak_time   = lerp(t, 0, 0.5)
+//        half_height = lerp(t, 1, 0  )
+//   but you can also do things like:
+//        peak_time   = lerp(smoothstep(t), 0, 0.5)
+//        half_height = cos(PI/2 * t)
+//
+// How it works:
+//
+//   hexwave use BLEP to bandlimit discontinuities and BLAMP
+//   to bandlimit C1 discontinuities. This is not polyBLEP
+//   (polynomial BLEP), it is table-driven BLEP. It is
+//   also not minBLEP (minimum-phase BLEP), as that complicates
+//   things for little benefit once BLAMP is involved.
+//
+//   The previous oscillator frequency is remembered, and when
+//   the frequency changes, a BLAMP is generated to remove the
+//   C1 discontinuity, which reduces artifacts for sweeps/LFO.
+//
+//   Changes to an oscillator timbre using hexwave_change() actually
+//   wait until the oscillator finishes its current cycle. All
+//   waveforms with non-zero "zero_wait" settings pass through 0
+//   and have 0-slope at the start of a cycle, which means changing
+//   the settings is artifact free at that time. (If zero_wait is 0,
+//   the code still treats it as passing through 0 with 0-slope; it'll
+//   apply the necessary fixups to make it artifact free as if it does
+//   transition to 0 with 0-slope vs. the waveform at the end of
+//   the cycle, then adds the fixups for a non-0 and non-0 slope
+//   at the start of the cycle, which cancels out if zero_wait is 0,
+//   and still does the right thing if zero_wait is 0 when the
+//   settings are updated.)
+//
+//   BLEP/BLAMP normally requires overlapping buffers, but this
+//   is hidden from the user by generating the waveform to a
+//   temporary buffer and saving the overlap regions internally
+//   between calls. (It is slightly more complicated; see code.)
+//
+//   By design all shapes have 0 DC offset; this is one reason
+//   hexwave uses zero_wait instead of standard PWM.
+//
+//   The internals of hexwave could support any arbitrary shape
+//   made of line segments, but I chose not to expose this
+//   generality in favor of a simple, easy-to-use API.
+
+#ifndef STB_INCLUDE_STB_HEXWAVE_H
+#define STB_INCLUDE_STB_HEXWAVE_H
+
+#ifndef STB_HEXWAVE_MAX_BLEP_LENGTH
+#define STB_HEXWAVE_MAX_BLEP_LENGTH   64 // good enough for anybody
+#endif
+
+#ifdef STB_HEXWAVE_STATIC
+#define STB_HEXWAVE_DEF static
+#else
+#define STB_HEXWAVE_DEF extern
+#endif
+
+typedef struct HexWave HexWave;
+
+STB_HEXWAVE_DEF void hexwave_init(int width, int oversample, float *user_buffer);
+//         width: size of BLEP, from 4..64, larger is slower & more memory but less aliasing
+//    oversample: 2+, number of subsample positions, larger uses more memory but less noise
+//   user_buffer: optional, if provided the library will perform no allocations.
+//                16*width*(oversample+1) bytes, must stay allocated as long as library is used
+//                technically it only needs:   8*( width * (oversample  + 1))
+//                                           + 8*((width *  oversample) + 1)  bytes
+//
+// width can be larger than 64 if you define STB_HEXWAVE_MAX_BLEP_LENGTH to a larger value
+
+STB_HEXWAVE_DEF void hexwave_shutdown(float *user_buffer);
+//       user_buffer: pass in same parameter as passed to hexwave_init
+
+STB_HEXWAVE_DEF void hexwave_create(HexWave *hex, int reflect, float peak_time, float half_height, float zero_wait);
+// see docs above for description
+//
+//   reflect is tested as 0 or non-zero
+//   peak_time is clamped to 0..1
+//   half_height is not clamped
+//   zero_wait is clamped to 0..1
+
+STB_HEXWAVE_DEF void hexwave_change(HexWave *hex, int reflect, float peak_time, float half_height, float zero_wait);
+// see docs
+
+STB_HEXWAVE_DEF void hexwave_generate_samples(float *output, int num_samples, HexWave *hex, float freq);
+//            output: buffer where the library will store generated floating point audio samples
+// number_of_samples: the number of audio samples to generate
+//               osc: pointer to a Hexwave initialized with 'hexwave_create'
+//   oscillator_freq: frequency of the oscillator divided by the sample rate
+
+// private:
+typedef struct
+{
+   int   reflect;
+   float peak_time;
+   float zero_wait;
+   float half_height;
+} HexWaveParameters;
+
+struct HexWave
+{
+   float t, prev_dt;
+   HexWaveParameters current, pending;
+   int have_pending;
+   float buffer[STB_HEXWAVE_MAX_BLEP_LENGTH];
+}; 
+#endif
+
+#ifdef STB_HEXWAVE_IMPLEMENTATION
+
+#ifndef STB_HEXWAVE_NO_ALLOCATION
+#include <stdlib.h> // malloc,free
+#endif
+
+#include <string.h> // memset,memcpy,memmove
+#include <math.h>   // sin,cos,fabs
+
+#define hexwave_clamp(v,a,b)   ((v) < (a) ? (a) : (v) > (b) ? (b) : (v))
+
+STB_HEXWAVE_DEF void hexwave_change(HexWave *hex, int reflect, float peak_time, float half_height, float zero_wait)
+{
+   hex->pending.reflect     = reflect;
+   hex->pending.peak_time   = hexwave_clamp(peak_time,0,1);
+   hex->pending.half_height = half_height;
+   hex->pending.zero_wait   = hexwave_clamp(zero_wait,0,1);
+   // put a barrier here to allow changing from a different thread than the generator
+   hex->have_pending        = 1;
+}
+
+STB_HEXWAVE_DEF void hexwave_create(HexWave *hex, int reflect, float peak_time, float half_height, float zero_wait)
+{
+   memset(hex, 0, sizeof(*hex));
+   hexwave_change(hex, reflect, peak_time, half_height, zero_wait);
+   hex->current = hex->pending;
+   hex->have_pending = 0;
+   hex->t = 0;
+   hex->prev_dt = 0;
+}
+
+static struct
+{
+   int width;       // width of fixup in samples
+   int oversample;  // number of oversampled versions (there's actually one more to allow lerpign)
+   float *blep;
+   float *blamp;
+} hexblep;
+
+static void hex_add_oversampled_bleplike(float *output, float time_since_transition, float scale, float *data)
+{
+   float *d1,*d2;
+   float lerpweight;
+   int i, bw = hexblep.width;
+
+   int slot = (int) (time_since_transition * hexblep.oversample);
+   if (slot >= hexblep.oversample)
+      slot = hexblep.oversample-1; // clamp in case the floats overshoot
+
+   d1 = &data[ slot   *bw];
+   d2 = &data[(slot+1)*bw];
+
+   lerpweight = time_since_transition * hexblep.oversample - slot;
+   for (i=0; i < bw; ++i)
+      output[i] += scale * (d1[i] + (d2[i]-d1[i])*lerpweight);
+}
+
+static void hex_blep (float *output, float time_since_transition, float scale)
+{
+   hex_add_oversampled_bleplike(output, time_since_transition, scale, hexblep.blep);
+}
+
+static void hex_blamp(float *output, float time_since_transition, float scale)
+{
+   hex_add_oversampled_bleplike(output, time_since_transition, scale, hexblep.blamp);
+}
+
+typedef struct
+{
+   float t,v,s; // time, value, slope
+} hexvert;
+
+// each half of the waveform needs 4 vertices to represent 3 line
+// segments, plus 1 more for wraparound
+static void hexwave_generate_linesegs(hexvert vert[9], HexWave *hex, float dt)
+{
+   int j;
+   float min_len = dt / 256.0f;
+
+   vert[0].t = 0;
+   vert[0].v = 0;
+   vert[1].t = hex->current.zero_wait*0.5f;
+   vert[1].v = 0;
+   vert[2].t = 0.5f*hex->current.peak_time + vert[1].t*(1-hex->current.peak_time);
+   vert[2].v = 1;
+   vert[3].t = 0.5f;
+   vert[3].v = hex->current.half_height;
+
+   if (hex->current.reflect) {
+      for (j=4; j <= 7; ++j) {
+         vert[j].t = 1 -  vert[7-j].t;
+         vert[j].v =    - vert[7-j].v;
+      }
+   } else {
+      for (j=4; j <= 7; ++j) {
+         vert[j].t =  0.5f +  vert[j-4].t;
+         vert[j].v =        - vert[j-4].v;
+      }
+   }
+   vert[8].t = 1;
+   vert[8].v = 0;
+
+   for (j=0; j < 8; ++j) {
+      if (vert[j+1].t <= vert[j].t + min_len) {
+          // if change takes place over less than a fraction of a sample treat as discontinuity
+          //
+          // otherwise the slope computation can blow up to arbitrarily large and we
+          // try to generate a huge BLAMP and the result is wrong.
+          // 
+          // why does this happen if the math is right? i believe if done perfectly,
+          // the two BLAMPs on either side of the slope would cancel out, but our
+          // BLAMPs have only limited sub-sample precision and limited integration
+          // accuracy. or maybe it's just the math blowing up w/ floating point precision
+          // limits as we try to make x * (1/x) cancel out
+          //
+          // min_len verified artifact-free even near nyquist with only oversample=4
+         vert[j+1].t = vert[j].t;
+      }
+   }
+
+   if (vert[8].t != 1.0f) {
+      // if the above fixup moved the endpoint away from 1.0, move it back,
+      // along with any other vertices that got moved to the same time
+      float t = vert[8].t;
+      for (j=5; j <= 8; ++j)
+         if (vert[j].t == t)
+            vert[j].t = 1.0f;
+   }
+
+   // compute the exact slopes from the final fixed-up positions
+   for (j=0; j < 8; ++j)
+      if (vert[j+1].t == vert[j].t)
+         vert[j].s = 0;
+      else
+         vert[j].s = (vert[j+1].v - vert[j].v) / (vert[j+1].t - vert[j].t);
+
+   // wraparound at end
+   vert[8].t = 1;
+   vert[8].v = vert[0].v;
+   vert[8].s = vert[0].s;
+}
+
+STB_HEXWAVE_DEF void hexwave_generate_samples(float *output, int num_samples, HexWave *hex, float freq)
+{
+   hexvert vert[9];
+   int pass,i,j;
+   float t = hex->t;
+   float temp_output[2*STB_HEXWAVE_MAX_BLEP_LENGTH];
+   int buffered_length = sizeof(float)*hexblep.width;
+   float dt = (float) fabs(freq);
+   float recip_dt = (dt == 0.0f) ? 0.0f : 1.0f / dt;
+
+   int halfw = hexblep.width/2;
+   // all sample times are biased by halfw to leave room for BLEP/BLAMP to go back in time
+
+   if (num_samples <= 0)
+      return;
+
+   // convert parameters to times and slopes
+   hexwave_generate_linesegs(vert, hex, dt);
+
+   if (hex->prev_dt != dt) {
+      // if frequency changes, add a fixup at the derivative discontinuity starting at now
+      float slope;
+      for (j=1; j < 6; ++j)
+         if (t < vert[j].t)
+            break;
+      slope = vert[j].s;
+      if (slope != 0)
+         hex_blamp(output, 0, (dt - hex->prev_dt)*slope);
+      hex->prev_dt = dt;
+   }
+
+   // copy the buffered data from last call and clear the rest of the output array
+   memset(output, 0, sizeof(float)*num_samples);
+   memset(temp_output, 0, 2*hexblep.width*sizeof(float));
+
+   if (num_samples >= hexblep.width) {
+      memcpy(output, hex->buffer, buffered_length);
+   } else {
+      // if the output is shorter than hexblep.width, we do all synthesis to temp_output
+      memcpy(temp_output, hex->buffer, buffered_length);
+   }
+
+   for (pass=0; pass < 2; ++pass) {
+      int i0,i1;
+      float *out;
+
+      // we want to simulate having one buffer that is num_output + hexblep.width
+      // samples long, without putting that requirement on the user, and without
+      // allocating a temp buffer that's as long as the whole thing. so we use two
+      // overlapping buffers, one the user's buffer and one a fixed-length temp
+      // buffer.
+
+      if (pass == 0) {
+         if (num_samples < hexblep.width)
+            continue;
+         // run as far as we can without overwriting the end of the user's buffer 
+         out = output;
+         i0 = 0;
+         i1 = num_samples - hexblep.width;
+      } else {
+         // generate the rest into a temp buffer
+         out = temp_output;
+         i0 = 0;
+         if (num_samples >= hexblep.width)
+            i1 = hexblep.width;
+         else
+            i1 = num_samples;
+      }
+
+      // determine current segment
+      for (j=0; j < 8; ++j)
+         if (t < vert[j+1].t)                                  
+            break;
+
+      i = i0;
+      for(;;) {
+         while (t < vert[j+1].t) {
+            if (i == i1)
+               goto done;
+            out[i+halfw] += vert[j].v + vert[j].s*(t - vert[j].t);
+            t += dt;
+            ++i;
+         }
+         // transition from lineseg starting at j to lineseg starting at j+1
+
+         if (vert[j].t == vert[j+1].t)
+            hex_blep(out+i, recip_dt*(t-vert[j+1].t), (vert[j+1].v - vert[j].v));
+         hex_blamp(out+i, recip_dt*(t-vert[j+1].t), dt*(vert[j+1].s - vert[j].s));
+         ++j;
+
+         if (j == 8) {
+            // change to different waveform if there's a change pending
+            j = 0;
+            t -= 1.0; // t was >= 1.f if j==8
+            if (hex->have_pending) {
+               float prev_s0 = vert[j].s;
+               float prev_v0 = vert[j].v;
+               hex->current = hex->pending;
+               hex->have_pending = 0;
+               hexwave_generate_linesegs(vert, hex, dt);
+               // the following never occurs with this oscillator, but it makes
+               // the code work in more general cases
+               if (vert[j].v != prev_v0)
+                  hex_blep (out+i, recip_dt*t,    (vert[j].v - prev_v0));
+               if (vert[j].s != prev_s0)
+                  hex_blamp(out+i, recip_dt*t, dt*(vert[j].s - prev_s0));
+            }
+         }
+      }
+     done:
+      ;
+   }
+
+   // at this point, we've written output[] and temp_output[]
+   if (num_samples >= hexblep.width) {
+      // the first half of temp[] overlaps the end of output, the second half will be the new start overlap
+      for (i=0; i < hexblep.width; ++i)
+         output[num_samples-hexblep.width + i] += temp_output[i];
+      memcpy(hex->buffer, temp_output+hexblep.width, buffered_length);
+   } else {
+      for (i=0; i < num_samples; ++i)
+         output[i] = temp_output[i];
+      memcpy(hex->buffer, temp_output+num_samples, buffered_length);
+   }
+
+   hex->t = t;
+}
+
+STB_HEXWAVE_DEF void hexwave_shutdown(float *user_buffer)
+{
+   #ifndef STB_HEXWAVE_NO_ALLOCATION
+   if (user_buffer != 0) {
+      free(hexblep.blep);
+      free(hexblep.blamp);
+   }
+   #endif
+}
+
+// buffer should be NULL or must be 4*(width*(oversample+1)*2 + 
+STB_HEXWAVE_DEF void hexwave_init(int width, int oversample, float *user_buffer)
+{
+   int halfwidth = width/2;
+   int half = halfwidth*oversample;
+   int blep_buffer_count = width*(oversample+1);
+   int n = 2*half+1;
+#ifdef STB_HEXWAVE_NO_ALLOCATION
+   float *buffers = user_buffer;
+#else
+   float *buffers = user_buffer ? user_buffer : (float *) malloc(sizeof(float) * n * 2);
+#endif
+   float *step    = buffers+0*n;
+   float *ramp    = buffers+1*n;
+   float *blep_buffer, *blamp_buffer;
+   double integrate_impulse=0, integrate_step=0;
+   int i,j;
+
+   if (width > STB_HEXWAVE_MAX_BLEP_LENGTH)
+      width = STB_HEXWAVE_MAX_BLEP_LENGTH;
+
+   if (user_buffer == 0) {
+      #ifndef STB_HEXWAVE_NO_ALLOCATION
+      blep_buffer  = (float *) malloc(sizeof(float)*blep_buffer_count);
+      blamp_buffer = (float *) malloc(sizeof(float)*blep_buffer_count);
+      #endif
+   } else {
+      blep_buffer  = ramp+n;
+      blamp_buffer = blep_buffer + blep_buffer_count;
+   }
+
+   // compute BLEP and BLAMP by integerating windowed sinc
+   for (i=0; i < n; ++i) {
+      for (j=0; j < 16; ++j) {
+         float sinc_t = 3.141592f* (i-half) / oversample;
+         float sinc   = (i==half) ? 1.0f : (float) sin(sinc_t) / (sinc_t);
+         float wt     = 2.0f*3.1415926f * i / (n-1);
+         float window = (float) (0.355768 - 0.487396*cos(wt) + 0.144232*cos(2*wt) - 0.012604*cos(3*wt)); // Nuttall
+         double value       =         window * sinc;
+         integrate_impulse +=         value/16;
+         integrate_step    +=         integrate_impulse/16;
+      }
+      step[i]            = (float) integrate_impulse;
+      ramp[i]            = (float) integrate_step;
+   }
+
+   // renormalize
+   for (i=0; i < n; ++i) {
+      step[i] = step[i] * (float) (1.0       / step[n-1]); // step needs to reach to 1.0
+      ramp[i] = ramp[i] * (float) (halfwidth / ramp[n-1]); // ramp needs to become a slope of 1.0 after oversampling
+   }
+
+   // deinterleave to allow efficient interpolation e.g. w/SIMD
+   for (j=0; j <= oversample; ++j) {
+      for (i=0; i < width; ++i) {
+         blep_buffer [j*width+i] = step[j+i*oversample];
+         blamp_buffer[j*width+i] = ramp[j+i*oversample];
+      }
+   }
+
+   // subtract out the naive waveform; note we can't do this to the raw data
+   // above, because we want the discontinuity to be in a different locations
+   // for j=0 and j=oversample (which exists to provide something to interpolate against)
+   for (j=0; j <= oversample; ++j) {
+      // subtract step
+      for (i=halfwidth; i < width; ++i)
+         blep_buffer [j*width+i] -= 1.0f;
+      // subtract ramp
+      for (i=halfwidth; i < width; ++i)
+         blamp_buffer[j*width+i] -= (j+i*oversample-half)*(1.0f/oversample);
+   }
+
+   hexblep.blep  = blep_buffer;
+   hexblep.blamp = blamp_buffer;
+   hexblep.width = width;
+   hexblep.oversample = oversample;
+
+   #ifndef STB_HEXWAVE_NO_ALLOCATION
+   if (user_buffer == 0)
+      free(buffers);
+   #endif
+}
+#endif // STB_HEXWAVE_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_image.h b/vendor/stb/stb_image.h
new file mode 100644
index 0000000..9eedabe
--- /dev/null
+++ b/vendor/stb/stb_image.h
@@ -0,0 +1,7988 @@
+/* stb_image - v2.30 - public domain image loader - http://nothings.org/stb
+                                  no warranty implied; use at your own risk
+
+   Do this:
+      #define STB_IMAGE_IMPLEMENTATION
+   before you include this file in *one* C or C++ file to create the implementation.
+
+   // i.e. it should look like this:
+   #include ...
+   #include ...
+   #include ...
+   #define STB_IMAGE_IMPLEMENTATION
+   #include "stb_image.h"
+
+   You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
+   And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
+
+
+   QUICK NOTES:
+      Primarily of interest to game developers and other people who can
+          avoid problematic images and only need the trivial interface
+
+      JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
+      PNG 1/2/4/8/16-bit-per-channel
+
+      TGA (not sure what subset, if a subset)
+      BMP non-1bpp, non-RLE
+      PSD (composited view only, no extra channels, 8/16 bit-per-channel)
+
+      GIF (*comp always reports as 4-channel)
+      HDR (radiance rgbE format)
+      PIC (Softimage PIC)
+      PNM (PPM and PGM binary only)
+
+      Animated GIF still needs a proper API, but here's one way to do it:
+          http://gist.github.com/urraka/685d9a6340b26b830d49
+
+      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
+      - decode from arbitrary I/O callbacks
+      - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
+
+   Full documentation under "DOCUMENTATION" below.
+
+
+LICENSE
+
+  See end of file for license information.
+
+RECENT REVISION HISTORY:
+
+      2.30  (2024-05-31) avoid erroneous gcc warning
+      2.29  (2023-05-xx) optimizations
+      2.28  (2023-01-29) many error fixes, security errors, just tons of stuff
+      2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
+      2.26  (2020-07-13) many minor fixes
+      2.25  (2020-02-02) fix warnings
+      2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
+      2.23  (2019-08-11) fix clang static analysis warning
+      2.22  (2019-03-04) gif fixes, fix warnings
+      2.21  (2019-02-25) fix typo in comment
+      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
+      2.19  (2018-02-11) fix warning
+      2.18  (2018-01-30) fix warnings
+      2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
+      2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
+      2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
+      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
+      2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
+      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
+      2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
+                         RGB-format JPEG; remove white matting in PSD;
+                         allocate large structures on the stack;
+                         correct channel count for PNG & BMP
+      2.10  (2016-01-22) avoid warning introduced in 2.09
+      2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
+
+   See end of file for full revision history.
+
+
+ ============================    Contributors    =========================
+
+ Image formats                          Extensions, features
+    Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
+    Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
+    Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
+    Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
+    Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
+    Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
+    Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
+    github:urraka (animated gif)           Junggon Kim (PNM comments)
+    Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
+                                           socks-the-fox (16-bit PNG)
+                                           Jeremy Sawicki (handle all ImageNet JPGs)
+ Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
+    Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
+    Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
+    John-Mark Allen
+    Carmelo J Fdez-Aguera
+
+ Bug & warning fixes
+    Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
+    Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
+    Phil Jordan                                Dave Moore           Roy Eltham
+    Hayaki Saito            Nathan Reed        Won Chun
+    Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
+    Thomas Ruf              Ronny Chevalier                         github:rlyeh
+    Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
+    Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
+    Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
+    Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
+    Cass Everitt            Ryamond Barbiero                        github:grim210
+    Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
+    Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
+    Josh Tobin              Neil Bickford      Matthew Gregan       github:poppolopoppo
+    Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
+    Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
+                            Brad Weinberger    Matvey Cherevko      github:mosra
+    Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
+    Ryan C. Gordon          [reserved]                              [reserved]
+                     DO NOT ADD YOUR NAME HERE
+
+                     Jacko Dirks
+
+  To add your name to the credits, pick a random blank space in the middle and fill it.
+  80% of merge conflicts on stb PRs are due to people adding their name at the end
+  of the credits.
+*/
+
+#ifndef STBI_INCLUDE_STB_IMAGE_H
+#define STBI_INCLUDE_STB_IMAGE_H
+
+// DOCUMENTATION
+//
+// Limitations:
+//    - no 12-bit-per-channel JPEG
+//    - no JPEGs with arithmetic coding
+//    - GIF always returns *comp=4
+//
+// Basic usage (see HDR discussion below for HDR usage):
+//    int x,y,n;
+//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
+//    // ... process data if not NULL ...
+//    // ... x = width, y = height, n = # 8-bit components per pixel ...
+//    // ... replace '0' with '1'..'4' to force that many components per pixel
+//    // ... but 'n' will always be the number that it would have been if you said 0
+//    stbi_image_free(data);
+//
+// Standard parameters:
+//    int *x                 -- outputs image width in pixels
+//    int *y                 -- outputs image height in pixels
+//    int *channels_in_file  -- outputs # of image components in image file
+//    int desired_channels   -- if non-zero, # of image components requested in result
+//
+// The return value from an image loader is an 'unsigned char *' which points
+// to the pixel data, or NULL on an allocation failure or if the image is
+// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
+// with each pixel consisting of N interleaved 8-bit components; the first
+// pixel pointed to is top-left-most in the image. There is no padding between
+// image scanlines or between pixels, regardless of format. The number of
+// components N is 'desired_channels' if desired_channels is non-zero, or
+// *channels_in_file otherwise. If desired_channels is non-zero,
+// *channels_in_file has the number of components that _would_ have been
+// output otherwise. E.g. if you set desired_channels to 4, you will always
+// get RGBA output, but you can check *channels_in_file to see if it's trivially
+// opaque because e.g. there were only 3 channels in the source image.
+//
+// An output image with N components has the following components interleaved
+// in this order in each pixel:
+//
+//     N=#comp     components
+//       1           grey
+//       2           grey, alpha
+//       3           red, green, blue
+//       4           red, green, blue, alpha
+//
+// If image loading fails for any reason, the return value will be NULL,
+// and *x, *y, *channels_in_file will be unchanged. The function
+// stbi_failure_reason() can be queried for an extremely brief, end-user
+// unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
+// to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
+// more user-friendly ones.
+//
+// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
+//
+// To query the width, height and component count of an image without having to
+// decode the full file, you can use the stbi_info family of functions:
+//
+//   int x,y,n,ok;
+//   ok = stbi_info(filename, &x, &y, &n);
+//   // returns ok=1 and sets x, y, n if image is a supported format,
+//   // 0 otherwise.
+//
+// Note that stb_image pervasively uses ints in its public API for sizes,
+// including sizes of memory buffers. This is now part of the API and thus
+// hard to change without causing breakage. As a result, the various image
+// loaders all have certain limits on image size; these differ somewhat
+// by format but generally boil down to either just under 2GB or just under
+// 1GB. When the decoded image would be larger than this, stb_image decoding
+// will fail.
+//
+// Additionally, stb_image will reject image files that have any of their
+// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
+// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
+// the only way to have an image with such dimensions load correctly
+// is for it to have a rather extreme aspect ratio. Either way, the
+// assumption here is that such larger images are likely to be malformed
+// or malicious. If you do need to load an image with individual dimensions
+// larger than that, and it still fits in the overall size limit, you can
+// #define STBI_MAX_DIMENSIONS on your own to be something larger.
+//
+// ===========================================================================
+//
+// UNICODE:
+//
+//   If compiling for Windows and you wish to use Unicode filenames, compile
+//   with
+//       #define STBI_WINDOWS_UTF8
+//   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
+//   Windows wchar_t filenames to utf8.
+//
+// ===========================================================================
+//
+// Philosophy
+//
+// stb libraries are designed with the following priorities:
+//
+//    1. easy to use
+//    2. easy to maintain
+//    3. good performance
+//
+// Sometimes I let "good performance" creep up in priority over "easy to maintain",
+// and for best performance I may provide less-easy-to-use APIs that give higher
+// performance, in addition to the easy-to-use ones. Nevertheless, it's important
+// to keep in mind that from the standpoint of you, a client of this library,
+// all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
+//
+// Some secondary priorities arise directly from the first two, some of which
+// provide more explicit reasons why performance can't be emphasized.
+//
+//    - Portable ("ease of use")
+//    - Small source code footprint ("easy to maintain")
+//    - No dependencies ("ease of use")
+//
+// ===========================================================================
+//
+// I/O callbacks
+//
+// I/O callbacks allow you to read from arbitrary sources, like packaged
+// files or some other source. Data read from callbacks are processed
+// through a small internal buffer (currently 128 bytes) to try to reduce
+// overhead.
+//
+// The three functions you must define are "read" (reads some bytes of data),
+// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
+//
+// ===========================================================================
+//
+// SIMD support
+//
+// The JPEG decoder will try to automatically use SIMD kernels on x86 when
+// supported by the compiler. For ARM Neon support, you must explicitly
+// request it.
+//
+// (The old do-it-yourself SIMD API is no longer supported in the current
+// code.)
+//
+// On x86, SSE2 will automatically be used when available based on a run-time
+// test; if not, the generic C versions are used as a fall-back. On ARM targets,
+// the typical path is to have separate builds for NEON and non-NEON devices
+// (at least this is true for iOS and Android). Therefore, the NEON support is
+// toggled by a build flag: define STBI_NEON to get NEON loops.
+//
+// If for some reason you do not want to use any of SIMD code, or if
+// you have issues compiling it, you can disable it entirely by
+// defining STBI_NO_SIMD.
+//
+// ===========================================================================
+//
+// HDR image support   (disable by defining STBI_NO_HDR)
+//
+// stb_image supports loading HDR images in general, and currently the Radiance
+// .HDR file format specifically. You can still load any file through the existing
+// interface; if you attempt to load an HDR file, it will be automatically remapped
+// to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
+// both of these constants can be reconfigured through this interface:
+//
+//     stbi_hdr_to_ldr_gamma(2.2f);
+//     stbi_hdr_to_ldr_scale(1.0f);
+//
+// (note, do not use _inverse_ constants; stbi_image will invert them
+// appropriately).
+//
+// Additionally, there is a new, parallel interface for loading files as
+// (linear) floats to preserve the full dynamic range:
+//
+//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
+//
+// If you load LDR images through this interface, those images will
+// be promoted to floating point values, run through the inverse of
+// constants corresponding to the above:
+//
+//     stbi_ldr_to_hdr_scale(1.0f);
+//     stbi_ldr_to_hdr_gamma(2.2f);
+//
+// Finally, given a filename (or an open file or memory block--see header
+// file for details) containing image data, you can query for the "most
+// appropriate" interface to use (that is, whether the image is HDR or
+// not), using:
+//
+//     stbi_is_hdr(char *filename);
+//
+// ===========================================================================
+//
+// iPhone PNG support:
+//
+// We optionally support converting iPhone-formatted PNGs (which store
+// premultiplied BGRA) back to RGB, even though they're internally encoded
+// differently. To enable this conversion, call
+// stbi_convert_iphone_png_to_rgb(1).
+//
+// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
+// pixel to remove any premultiplied alpha *only* if the image file explicitly
+// says there's premultiplied data (currently only happens in iPhone images,
+// and only if iPhone convert-to-rgb processing is on).
+//
+// ===========================================================================
+//
+// ADDITIONAL CONFIGURATION
+//
+//  - You can suppress implementation of any of the decoders to reduce
+//    your code footprint by #defining one or more of the following
+//    symbols before creating the implementation.
+//
+//        STBI_NO_JPEG
+//        STBI_NO_PNG
+//        STBI_NO_BMP
+//        STBI_NO_PSD
+//        STBI_NO_TGA
+//        STBI_NO_GIF
+//        STBI_NO_HDR
+//        STBI_NO_PIC
+//        STBI_NO_PNM   (.ppm and .pgm)
+//
+//  - You can request *only* certain decoders and suppress all other ones
+//    (this will be more forward-compatible, as addition of new decoders
+//    doesn't require you to disable them explicitly):
+//
+//        STBI_ONLY_JPEG
+//        STBI_ONLY_PNG
+//        STBI_ONLY_BMP
+//        STBI_ONLY_PSD
+//        STBI_ONLY_TGA
+//        STBI_ONLY_GIF
+//        STBI_ONLY_HDR
+//        STBI_ONLY_PIC
+//        STBI_ONLY_PNM   (.ppm and .pgm)
+//
+//   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
+//     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
+//
+//  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
+//    than that size (in either width or height) without further processing.
+//    This is to let programs in the wild set an upper bound to prevent
+//    denial-of-service attacks on untrusted data, as one could generate a
+//    valid image of gigantic dimensions and force stb_image to allocate a
+//    huge block of memory and spend disproportionate time decoding it. By
+//    default this is set to (1 << 24), which is 16777216, but that's still
+//    very big.
+
+#ifndef STBI_NO_STDIO
+#include <stdio.h>
+#endif // STBI_NO_STDIO
+
+#define STBI_VERSION 1
+
+enum
+{
+   STBI_default = 0, // only used for desired_channels
+
+   STBI_grey       = 1,
+   STBI_grey_alpha = 2,
+   STBI_rgb        = 3,
+   STBI_rgb_alpha  = 4
+};
+
+#include <stdlib.h>
+typedef unsigned char stbi_uc;
+typedef unsigned short stbi_us;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifndef STBIDEF
+#ifdef STB_IMAGE_STATIC
+#define STBIDEF static
+#else
+#define STBIDEF extern
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// PRIMARY API - works on images of any type
+//
+
+//
+// load image by filename, open file, or memory buffer
+//
+
+typedef struct
+{
+   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
+   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
+   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
+} stbi_io_callbacks;
+
+////////////////////////////////////
+//
+// 8-bits-per-channel interface
+//
+
+STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
+
+#ifndef STBI_NO_STDIO
+STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
+// for stbi_load_from_file, file pointer is left pointing immediately after image
+#endif
+
+#ifndef STBI_NO_GIF
+STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
+#endif
+
+#ifdef STBI_WINDOWS_UTF8
+STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
+#endif
+
+////////////////////////////////////
+//
+// 16-bits-per-channel interface
+//
+
+STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
+
+#ifndef STBI_NO_STDIO
+STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
+#endif
+
+////////////////////////////////////
+//
+// float-per-channel interface
+//
+#ifndef STBI_NO_LINEAR
+   STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
+   STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
+
+   #ifndef STBI_NO_STDIO
+   STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
+   STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
+   #endif
+#endif
+
+#ifndef STBI_NO_HDR
+   STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
+   STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
+#endif // STBI_NO_HDR
+
+#ifndef STBI_NO_LINEAR
+   STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
+   STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
+#endif // STBI_NO_LINEAR
+
+// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
+STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
+STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
+#ifndef STBI_NO_STDIO
+STBIDEF int      stbi_is_hdr          (char const *filename);
+STBIDEF int      stbi_is_hdr_from_file(FILE *f);
+#endif // STBI_NO_STDIO
+
+
+// get a VERY brief reason for failure
+// on most compilers (and ALL modern mainstream compilers) this is threadsafe
+STBIDEF const char *stbi_failure_reason  (void);
+
+// free the loaded image -- this is just free()
+STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
+
+// get image dimensions & components without fully decoding
+STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
+STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
+STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
+STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
+
+#ifndef STBI_NO_STDIO
+STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
+STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
+STBIDEF int      stbi_is_16_bit          (char const *filename);
+STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
+#endif
+
+
+
+// for image formats that explicitly notate that they have premultiplied alpha,
+// we just return the colors as stored in the file. set this flag to force
+// unpremultiplication. results are undefined if the unpremultiply overflow.
+STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
+
+// indicate whether we should process iphone images back to canonical format,
+// or just pass them through "as-is"
+STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
+
+// flip the image vertically, so the first pixel in the output array is the bottom left
+STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
+
+// as above, but only applies to images loaded on the thread that calls the function
+// this function is only available if your compiler supports thread-local variables;
+// calling it will fail to link if your compiler doesn't
+STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
+STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
+STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
+
+// ZLIB client - used by PNG, available for other purposes
+
+STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
+STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
+STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
+STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
+
+STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
+STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+//
+//
+////   end header file   /////////////////////////////////////////////////////
+#endif // STBI_INCLUDE_STB_IMAGE_H
+
+#ifdef STB_IMAGE_IMPLEMENTATION
+
+#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
+  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
+  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
+  || defined(STBI_ONLY_ZLIB)
+   #ifndef STBI_ONLY_JPEG
+   #define STBI_NO_JPEG
+   #endif
+   #ifndef STBI_ONLY_PNG
+   #define STBI_NO_PNG
+   #endif
+   #ifndef STBI_ONLY_BMP
+   #define STBI_NO_BMP
+   #endif
+   #ifndef STBI_ONLY_PSD
+   #define STBI_NO_PSD
+   #endif
+   #ifndef STBI_ONLY_TGA
+   #define STBI_NO_TGA
+   #endif
+   #ifndef STBI_ONLY_GIF
+   #define STBI_NO_GIF
+   #endif
+   #ifndef STBI_ONLY_HDR
+   #define STBI_NO_HDR
+   #endif
+   #ifndef STBI_ONLY_PIC
+   #define STBI_NO_PIC
+   #endif
+   #ifndef STBI_ONLY_PNM
+   #define STBI_NO_PNM
+   #endif
+#endif
+
+#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
+#define STBI_NO_ZLIB
+#endif
+
+
+#include <stdarg.h>
+#include <stddef.h> // ptrdiff_t on osx
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+
+#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
+#include <math.h>  // ldexp, pow
+#endif
+
+#ifndef STBI_NO_STDIO
+#include <stdio.h>
+#endif
+
+#ifndef STBI_ASSERT
+#include <assert.h>
+#define STBI_ASSERT(x) assert(x)
+#endif
+
+#ifdef __cplusplus
+#define STBI_EXTERN extern "C"
+#else
+#define STBI_EXTERN extern
+#endif
+
+
+#ifndef _MSC_VER
+   #ifdef __cplusplus
+   #define stbi_inline inline
+   #else
+   #define stbi_inline
+   #endif
+#else
+   #define stbi_inline __forceinline
+#endif
+
+#ifndef STBI_NO_THREAD_LOCALS
+   #if defined(__cplusplus) &&  __cplusplus >= 201103L
+      #define STBI_THREAD_LOCAL       thread_local
+   #elif defined(__GNUC__) && __GNUC__ < 5
+      #define STBI_THREAD_LOCAL       __thread
+   #elif defined(_MSC_VER)
+      #define STBI_THREAD_LOCAL       __declspec(thread)
+   #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
+      #define STBI_THREAD_LOCAL       _Thread_local
+   #endif
+
+   #ifndef STBI_THREAD_LOCAL
+      #if defined(__GNUC__)
+        #define STBI_THREAD_LOCAL       __thread
+      #endif
+   #endif
+#endif
+
+#if defined(_MSC_VER) || defined(__SYMBIAN32__)
+typedef unsigned short stbi__uint16;
+typedef   signed short stbi__int16;
+typedef unsigned int   stbi__uint32;
+typedef   signed int   stbi__int32;
+#else
+#include <stdint.h>
+typedef uint16_t stbi__uint16;
+typedef int16_t  stbi__int16;
+typedef uint32_t stbi__uint32;
+typedef int32_t  stbi__int32;
+#endif
+
+// should produce compiler error if size is wrong
+typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
+
+#ifdef _MSC_VER
+#define STBI_NOTUSED(v)  (void)(v)
+#else
+#define STBI_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#ifdef _MSC_VER
+#define STBI_HAS_LROTL
+#endif
+
+#ifdef STBI_HAS_LROTL
+   #define stbi_lrot(x,y)  _lrotl(x,y)
+#else
+   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
+#endif
+
+#if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
+// ok
+#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
+// ok
+#else
+#error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
+#endif
+
+#ifndef STBI_MALLOC
+#define STBI_MALLOC(sz)           malloc(sz)
+#define STBI_REALLOC(p,newsz)     realloc(p,newsz)
+#define STBI_FREE(p)              free(p)
+#endif
+
+#ifndef STBI_REALLOC_SIZED
+#define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
+#endif
+
+// x86/x64 detection
+#if defined(__x86_64__) || defined(_M_X64)
+#define STBI__X64_TARGET
+#elif defined(__i386) || defined(_M_IX86)
+#define STBI__X86_TARGET
+#endif
+
+#if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
+// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
+// which in turn means it gets to use SSE2 everywhere. This is unfortunate,
+// but previous attempts to provide the SSE2 functions with runtime
+// detection caused numerous issues. The way architecture extensions are
+// exposed in GCC/Clang is, sadly, not really suited for one-file libs.
+// New behavior: if compiled with -msse2, we use SSE2 without any
+// detection; if not, we don't use it at all.
+#define STBI_NO_SIMD
+#endif
+
+#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
+// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
+//
+// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
+// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
+// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
+// simultaneously enabling "-mstackrealign".
+//
+// See https://github.com/nothings/stb/issues/81 for more information.
+//
+// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
+// -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
+#define STBI_NO_SIMD
+#endif
+
+#if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
+#define STBI_SSE2
+#include <emmintrin.h>
+
+#ifdef _MSC_VER
+
+#if _MSC_VER >= 1400  // not VC6
+#include <intrin.h> // __cpuid
+static int stbi__cpuid3(void)
+{
+   int info[4];
+   __cpuid(info,1);
+   return info[3];
+}
+#else
+static int stbi__cpuid3(void)
+{
+   int res;
+   __asm {
+      mov  eax,1
+      cpuid
+      mov  res,edx
+   }
+   return res;
+}
+#endif
+
+#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
+
+#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
+static int stbi__sse2_available(void)
+{
+   int info3 = stbi__cpuid3();
+   return ((info3 >> 26) & 1) != 0;
+}
+#endif
+
+#else // assume GCC-style if not VC++
+#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
+
+#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
+static int stbi__sse2_available(void)
+{
+   // If we're even attempting to compile this on GCC/Clang, that means
+   // -msse2 is on, which means the compiler is allowed to use SSE2
+   // instructions at will, and so are we.
+   return 1;
+}
+#endif
+
+#endif
+#endif
+
+// ARM NEON
+#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
+#undef STBI_NEON
+#endif
+
+#ifdef STBI_NEON
+#include <arm_neon.h>
+#ifdef _MSC_VER
+#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
+#else
+#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
+#endif
+#endif
+
+#ifndef STBI_SIMD_ALIGN
+#define STBI_SIMD_ALIGN(type, name) type name
+#endif
+
+#ifndef STBI_MAX_DIMENSIONS
+#define STBI_MAX_DIMENSIONS (1 << 24)
+#endif
+
+///////////////////////////////////////////////
+//
+//  stbi__context struct and start_xxx functions
+
+// stbi__context structure is our basic context used by all images, so it
+// contains all the IO context, plus some basic image information
+typedef struct
+{
+   stbi__uint32 img_x, img_y;
+   int img_n, img_out_n;
+
+   stbi_io_callbacks io;
+   void *io_user_data;
+
+   int read_from_callbacks;
+   int buflen;
+   stbi_uc buffer_start[128];
+   int callback_already_read;
+
+   stbi_uc *img_buffer, *img_buffer_end;
+   stbi_uc *img_buffer_original, *img_buffer_original_end;
+} stbi__context;
+
+
+static void stbi__refill_buffer(stbi__context *s);
+
+// initialize a memory-decode context
+static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
+{
+   s->io.read = NULL;
+   s->read_from_callbacks = 0;
+   s->callback_already_read = 0;
+   s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
+   s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
+}
+
+// initialize a callback-based context
+static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
+{
+   s->io = *c;
+   s->io_user_data = user;
+   s->buflen = sizeof(s->buffer_start);
+   s->read_from_callbacks = 1;
+   s->callback_already_read = 0;
+   s->img_buffer = s->img_buffer_original = s->buffer_start;
+   stbi__refill_buffer(s);
+   s->img_buffer_original_end = s->img_buffer_end;
+}
+
+#ifndef STBI_NO_STDIO
+
+static int stbi__stdio_read(void *user, char *data, int size)
+{
+   return (int) fread(data,1,size,(FILE*) user);
+}
+
+static void stbi__stdio_skip(void *user, int n)
+{
+   int ch;
+   fseek((FILE*) user, n, SEEK_CUR);
+   ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
+   if (ch != EOF) {
+      ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
+   }
+}
+
+static int stbi__stdio_eof(void *user)
+{
+   return feof((FILE*) user) || ferror((FILE *) user);
+}
+
+static stbi_io_callbacks stbi__stdio_callbacks =
+{
+   stbi__stdio_read,
+   stbi__stdio_skip,
+   stbi__stdio_eof,
+};
+
+static void stbi__start_file(stbi__context *s, FILE *f)
+{
+   stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
+}
+
+//static void stop_file(stbi__context *s) { }
+
+#endif // !STBI_NO_STDIO
+
+static void stbi__rewind(stbi__context *s)
+{
+   // conceptually rewind SHOULD rewind to the beginning of the stream,
+   // but we just rewind to the beginning of the initial buffer, because
+   // we only use it after doing 'test', which only ever looks at at most 92 bytes
+   s->img_buffer = s->img_buffer_original;
+   s->img_buffer_end = s->img_buffer_original_end;
+}
+
+enum
+{
+   STBI_ORDER_RGB,
+   STBI_ORDER_BGR
+};
+
+typedef struct
+{
+   int bits_per_channel;
+   int num_channels;
+   int channel_order;
+} stbi__result_info;
+
+#ifndef STBI_NO_JPEG
+static int      stbi__jpeg_test(stbi__context *s);
+static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PNG
+static int      stbi__png_test(stbi__context *s);
+static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
+static int      stbi__png_is16(stbi__context *s);
+#endif
+
+#ifndef STBI_NO_BMP
+static int      stbi__bmp_test(stbi__context *s);
+static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_TGA
+static int      stbi__tga_test(stbi__context *s);
+static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PSD
+static int      stbi__psd_test(stbi__context *s);
+static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
+static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
+static int      stbi__psd_is16(stbi__context *s);
+#endif
+
+#ifndef STBI_NO_HDR
+static int      stbi__hdr_test(stbi__context *s);
+static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PIC
+static int      stbi__pic_test(stbi__context *s);
+static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_GIF
+static int      stbi__gif_test(stbi__context *s);
+static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
+static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PNM
+static int      stbi__pnm_test(stbi__context *s);
+static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
+static int      stbi__pnm_is16(stbi__context *s);
+#endif
+
+static
+#ifdef STBI_THREAD_LOCAL
+STBI_THREAD_LOCAL
+#endif
+const char *stbi__g_failure_reason;
+
+STBIDEF const char *stbi_failure_reason(void)
+{
+   return stbi__g_failure_reason;
+}
+
+#ifndef STBI_NO_FAILURE_STRINGS
+static int stbi__err(const char *str)
+{
+   stbi__g_failure_reason = str;
+   return 0;
+}
+#endif
+
+static void *stbi__malloc(size_t size)
+{
+    return STBI_MALLOC(size);
+}
+
+// stb_image uses ints pervasively, including for offset calculations.
+// therefore the largest decoded image size we can support with the
+// current code, even on 64-bit targets, is INT_MAX. this is not a
+// significant limitation for the intended use case.
+//
+// we do, however, need to make sure our size calculations don't
+// overflow. hence a few helper functions for size calculations that
+// multiply integers together, making sure that they're non-negative
+// and no overflow occurs.
+
+// return 1 if the sum is valid, 0 on overflow.
+// negative terms are considered invalid.
+static int stbi__addsizes_valid(int a, int b)
+{
+   if (b < 0) return 0;
+   // now 0 <= b <= INT_MAX, hence also
+   // 0 <= INT_MAX - b <= INTMAX.
+   // And "a + b <= INT_MAX" (which might overflow) is the
+   // same as a <= INT_MAX - b (no overflow)
+   return a <= INT_MAX - b;
+}
+
+// returns 1 if the product is valid, 0 on overflow.
+// negative factors are considered invalid.
+static int stbi__mul2sizes_valid(int a, int b)
+{
+   if (a < 0 || b < 0) return 0;
+   if (b == 0) return 1; // mul-by-0 is always safe
+   // portable way to check for no overflows in a*b
+   return a <= INT_MAX/b;
+}
+
+#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
+// returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
+static int stbi__mad2sizes_valid(int a, int b, int add)
+{
+   return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
+}
+#endif
+
+// returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
+static int stbi__mad3sizes_valid(int a, int b, int c, int add)
+{
+   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
+      stbi__addsizes_valid(a*b*c, add);
+}
+
+// returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
+#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
+static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
+{
+   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
+      stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
+}
+#endif
+
+#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
+// mallocs with size overflow checking
+static void *stbi__malloc_mad2(int a, int b, int add)
+{
+   if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
+   return stbi__malloc(a*b + add);
+}
+#endif
+
+static void *stbi__malloc_mad3(int a, int b, int c, int add)
+{
+   if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
+   return stbi__malloc(a*b*c + add);
+}
+
+#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
+static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
+{
+   if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
+   return stbi__malloc(a*b*c*d + add);
+}
+#endif
+
+// returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow.
+static int stbi__addints_valid(int a, int b)
+{
+   if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow
+   if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0.
+   return a <= INT_MAX - b;
+}
+
+// returns 1 if the product of two ints fits in a signed short, 0 on overflow.
+static int stbi__mul2shorts_valid(int a, int b)
+{
+   if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow
+   if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid
+   if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
+   return a >= SHRT_MIN / b;
+}
+
+// stbi__err - error
+// stbi__errpf - error returning pointer to float
+// stbi__errpuc - error returning pointer to unsigned char
+
+#ifdef STBI_NO_FAILURE_STRINGS
+   #define stbi__err(x,y)  0
+#elif defined(STBI_FAILURE_USERMSG)
+   #define stbi__err(x,y)  stbi__err(y)
+#else
+   #define stbi__err(x,y)  stbi__err(x)
+#endif
+
+#define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
+#define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
+
+STBIDEF void stbi_image_free(void *retval_from_stbi_load)
+{
+   STBI_FREE(retval_from_stbi_load);
+}
+
+#ifndef STBI_NO_LINEAR
+static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
+#endif
+
+#ifndef STBI_NO_HDR
+static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
+#endif
+
+static int stbi__vertically_flip_on_load_global = 0;
+
+STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
+{
+   stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
+}
+
+#ifndef STBI_THREAD_LOCAL
+#define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
+#else
+static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
+
+STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
+{
+   stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
+   stbi__vertically_flip_on_load_set = 1;
+}
+
+#define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
+                                         ? stbi__vertically_flip_on_load_local  \
+                                         : stbi__vertically_flip_on_load_global)
+#endif // STBI_THREAD_LOCAL
+
+static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
+{
+   memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
+   ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
+   ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
+   ri->num_channels = 0;
+
+   // test the formats with a very explicit header first (at least a FOURCC
+   // or distinctive magic number first)
+   #ifndef STBI_NO_PNG
+   if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_BMP
+   if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_GIF
+   if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_PSD
+   if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
+   #else
+   STBI_NOTUSED(bpc);
+   #endif
+   #ifndef STBI_NO_PIC
+   if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
+   #endif
+
+   // then the formats that can end up attempting to load with just 1 or 2
+   // bytes matching expectations; these are prone to false positives, so
+   // try them later
+   #ifndef STBI_NO_JPEG
+   if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_PNM
+   if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
+   #endif
+
+   #ifndef STBI_NO_HDR
+   if (stbi__hdr_test(s)) {
+      float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
+      return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
+   }
+   #endif
+
+   #ifndef STBI_NO_TGA
+   // test tga last because it's a crappy test!
+   if (stbi__tga_test(s))
+      return stbi__tga_load(s,x,y,comp,req_comp, ri);
+   #endif
+
+   return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
+}
+
+static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
+{
+   int i;
+   int img_len = w * h * channels;
+   stbi_uc *reduced;
+
+   reduced = (stbi_uc *) stbi__malloc(img_len);
+   if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
+
+   for (i = 0; i < img_len; ++i)
+      reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
+
+   STBI_FREE(orig);
+   return reduced;
+}
+
+static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
+{
+   int i;
+   int img_len = w * h * channels;
+   stbi__uint16 *enlarged;
+
+   enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
+   if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
+
+   for (i = 0; i < img_len; ++i)
+      enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
+
+   STBI_FREE(orig);
+   return enlarged;
+}
+
+static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
+{
+   int row;
+   size_t bytes_per_row = (size_t)w * bytes_per_pixel;
+   stbi_uc temp[2048];
+   stbi_uc *bytes = (stbi_uc *)image;
+
+   for (row = 0; row < (h>>1); row++) {
+      stbi_uc *row0 = bytes + row*bytes_per_row;
+      stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
+      // swap row0 with row1
+      size_t bytes_left = bytes_per_row;
+      while (bytes_left) {
+         size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
+         memcpy(temp, row0, bytes_copy);
+         memcpy(row0, row1, bytes_copy);
+         memcpy(row1, temp, bytes_copy);
+         row0 += bytes_copy;
+         row1 += bytes_copy;
+         bytes_left -= bytes_copy;
+      }
+   }
+}
+
+#ifndef STBI_NO_GIF
+static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
+{
+   int slice;
+   int slice_size = w * h * bytes_per_pixel;
+
+   stbi_uc *bytes = (stbi_uc *)image;
+   for (slice = 0; slice < z; ++slice) {
+      stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
+      bytes += slice_size;
+   }
+}
+#endif
+
+static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__result_info ri;
+   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
+
+   if (result == NULL)
+      return NULL;
+
+   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
+   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
+
+   if (ri.bits_per_channel != 8) {
+      result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
+      ri.bits_per_channel = 8;
+   }
+
+   // @TODO: move stbi__convert_format to here
+
+   if (stbi__vertically_flip_on_load) {
+      int channels = req_comp ? req_comp : *comp;
+      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
+   }
+
+   return (unsigned char *) result;
+}
+
+static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__result_info ri;
+   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
+
+   if (result == NULL)
+      return NULL;
+
+   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
+   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
+
+   if (ri.bits_per_channel != 16) {
+      result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
+      ri.bits_per_channel = 16;
+   }
+
+   // @TODO: move stbi__convert_format16 to here
+   // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
+
+   if (stbi__vertically_flip_on_load) {
+      int channels = req_comp ? req_comp : *comp;
+      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
+   }
+
+   return (stbi__uint16 *) result;
+}
+
+#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
+static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
+{
+   if (stbi__vertically_flip_on_load && result != NULL) {
+      int channels = req_comp ? req_comp : *comp;
+      stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
+   }
+}
+#endif
+
+#ifndef STBI_NO_STDIO
+
+#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
+STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
+STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
+#endif
+
+#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
+STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
+{
+	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
+}
+#endif
+
+static FILE *stbi__fopen(char const *filename, char const *mode)
+{
+   FILE *f;
+#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
+   wchar_t wMode[64];
+   wchar_t wFilename[1024];
+	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
+      return 0;
+
+	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
+      return 0;
+
+#if defined(_MSC_VER) && _MSC_VER >= 1400
+	if (0 != _wfopen_s(&f, wFilename, wMode))
+		f = 0;
+#else
+   f = _wfopen(wFilename, wMode);
+#endif
+
+#elif defined(_MSC_VER) && _MSC_VER >= 1400
+   if (0 != fopen_s(&f, filename, mode))
+      f=0;
+#else
+   f = fopen(filename, mode);
+#endif
+   return f;
+}
+
+
+STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   FILE *f = stbi__fopen(filename, "rb");
+   unsigned char *result;
+   if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
+   result = stbi_load_from_file(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   unsigned char *result;
+   stbi__context s;
+   stbi__start_file(&s,f);
+   result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
+   if (result) {
+      // need to 'unget' all the characters in the IO buffer
+      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
+   }
+   return result;
+}
+
+STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__uint16 *result;
+   stbi__context s;
+   stbi__start_file(&s,f);
+   result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
+   if (result) {
+      // need to 'unget' all the characters in the IO buffer
+      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
+   }
+   return result;
+}
+
+STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   FILE *f = stbi__fopen(filename, "rb");
+   stbi__uint16 *result;
+   if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
+   result = stbi_load_from_file_16(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+
+#endif //!STBI_NO_STDIO
+
+STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
+}
+
+STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
+   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
+}
+
+STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
+}
+
+STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
+}
+
+#ifndef STBI_NO_GIF
+STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
+{
+   unsigned char *result;
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+
+   result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
+   if (stbi__vertically_flip_on_load) {
+      stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
+   }
+
+   return result;
+}
+#endif
+
+#ifndef STBI_NO_LINEAR
+static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
+{
+   unsigned char *data;
+   #ifndef STBI_NO_HDR
+   if (stbi__hdr_test(s)) {
+      stbi__result_info ri;
+      float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
+      if (hdr_data)
+         stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
+      return hdr_data;
+   }
+   #endif
+   data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
+   if (data)
+      return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
+   return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
+}
+
+STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__loadf_main(&s,x,y,comp,req_comp);
+}
+
+STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi__loadf_main(&s,x,y,comp,req_comp);
+}
+
+#ifndef STBI_NO_STDIO
+STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   float *result;
+   FILE *f = stbi__fopen(filename, "rb");
+   if (!f) return stbi__errpf("can't fopen", "Unable to open file");
+   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_file(&s,f);
+   return stbi__loadf_main(&s,x,y,comp,req_comp);
+}
+#endif // !STBI_NO_STDIO
+
+#endif // !STBI_NO_LINEAR
+
+// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
+// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
+// reports false!
+
+STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
+{
+   #ifndef STBI_NO_HDR
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__hdr_test(&s);
+   #else
+   STBI_NOTUSED(buffer);
+   STBI_NOTUSED(len);
+   return 0;
+   #endif
+}
+
+#ifndef STBI_NO_STDIO
+STBIDEF int      stbi_is_hdr          (char const *filename)
+{
+   FILE *f = stbi__fopen(filename, "rb");
+   int result=0;
+   if (f) {
+      result = stbi_is_hdr_from_file(f);
+      fclose(f);
+   }
+   return result;
+}
+
+STBIDEF int stbi_is_hdr_from_file(FILE *f)
+{
+   #ifndef STBI_NO_HDR
+   long pos = ftell(f);
+   int res;
+   stbi__context s;
+   stbi__start_file(&s,f);
+   res = stbi__hdr_test(&s);
+   fseek(f, pos, SEEK_SET);
+   return res;
+   #else
+   STBI_NOTUSED(f);
+   return 0;
+   #endif
+}
+#endif // !STBI_NO_STDIO
+
+STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
+{
+   #ifndef STBI_NO_HDR
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi__hdr_test(&s);
+   #else
+   STBI_NOTUSED(clbk);
+   STBI_NOTUSED(user);
+   return 0;
+   #endif
+}
+
+#ifndef STBI_NO_LINEAR
+static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
+
+STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
+STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
+#endif
+
+static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
+
+STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
+STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Common code used by all image loaders
+//
+
+enum
+{
+   STBI__SCAN_load=0,
+   STBI__SCAN_type,
+   STBI__SCAN_header
+};
+
+static void stbi__refill_buffer(stbi__context *s)
+{
+   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
+   s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
+   if (n == 0) {
+      // at end of file, treat same as if from memory, but need to handle case
+      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
+      s->read_from_callbacks = 0;
+      s->img_buffer = s->buffer_start;
+      s->img_buffer_end = s->buffer_start+1;
+      *s->img_buffer = 0;
+   } else {
+      s->img_buffer = s->buffer_start;
+      s->img_buffer_end = s->buffer_start + n;
+   }
+}
+
+stbi_inline static stbi_uc stbi__get8(stbi__context *s)
+{
+   if (s->img_buffer < s->img_buffer_end)
+      return *s->img_buffer++;
+   if (s->read_from_callbacks) {
+      stbi__refill_buffer(s);
+      return *s->img_buffer++;
+   }
+   return 0;
+}
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
+// nothing
+#else
+stbi_inline static int stbi__at_eof(stbi__context *s)
+{
+   if (s->io.read) {
+      if (!(s->io.eof)(s->io_user_data)) return 0;
+      // if feof() is true, check if buffer = end
+      // special case: we've only got the special 0 character at the end
+      if (s->read_from_callbacks == 0) return 1;
+   }
+
+   return s->img_buffer >= s->img_buffer_end;
+}
+#endif
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
+// nothing
+#else
+static void stbi__skip(stbi__context *s, int n)
+{
+   if (n == 0) return;  // already there!
+   if (n < 0) {
+      s->img_buffer = s->img_buffer_end;
+      return;
+   }
+   if (s->io.read) {
+      int blen = (int) (s->img_buffer_end - s->img_buffer);
+      if (blen < n) {
+         s->img_buffer = s->img_buffer_end;
+         (s->io.skip)(s->io_user_data, n - blen);
+         return;
+      }
+   }
+   s->img_buffer += n;
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
+// nothing
+#else
+static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
+{
+   if (s->io.read) {
+      int blen = (int) (s->img_buffer_end - s->img_buffer);
+      if (blen < n) {
+         int res, count;
+
+         memcpy(buffer, s->img_buffer, blen);
+
+         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
+         res = (count == (n-blen));
+         s->img_buffer = s->img_buffer_end;
+         return res;
+      }
+   }
+
+   if (s->img_buffer+n <= s->img_buffer_end) {
+      memcpy(buffer, s->img_buffer, n);
+      s->img_buffer += n;
+      return 1;
+   } else
+      return 0;
+}
+#endif
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
+// nothing
+#else
+static int stbi__get16be(stbi__context *s)
+{
+   int z = stbi__get8(s);
+   return (z << 8) + stbi__get8(s);
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
+// nothing
+#else
+static stbi__uint32 stbi__get32be(stbi__context *s)
+{
+   stbi__uint32 z = stbi__get16be(s);
+   return (z << 16) + stbi__get16be(s);
+}
+#endif
+
+#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
+// nothing
+#else
+static int stbi__get16le(stbi__context *s)
+{
+   int z = stbi__get8(s);
+   return z + (stbi__get8(s) << 8);
+}
+#endif
+
+#ifndef STBI_NO_BMP
+static stbi__uint32 stbi__get32le(stbi__context *s)
+{
+   stbi__uint32 z = stbi__get16le(s);
+   z += (stbi__uint32)stbi__get16le(s) << 16;
+   return z;
+}
+#endif
+
+#define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
+// nothing
+#else
+//////////////////////////////////////////////////////////////////////////////
+//
+//  generic converter from built-in img_n to req_comp
+//    individual types do this automatically as much as possible (e.g. jpeg
+//    does all cases internally since it needs to colorspace convert anyway,
+//    and it never has alpha, so very few cases ). png can automatically
+//    interleave an alpha=255 channel, but falls back to this for other cases
+//
+//  assume data buffer is malloced, so malloc a new one and free that one
+//  only failure mode is malloc failing
+
+static stbi_uc stbi__compute_y(int r, int g, int b)
+{
+   return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
+// nothing
+#else
+static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
+{
+   int i,j;
+   unsigned char *good;
+
+   if (req_comp == img_n) return data;
+   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
+
+   good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
+   if (good == NULL) {
+      STBI_FREE(data);
+      return stbi__errpuc("outofmem", "Out of memory");
+   }
+
+   for (j=0; j < (int) y; ++j) {
+      unsigned char *src  = data + j * x * img_n   ;
+      unsigned char *dest = good + j * x * req_comp;
+
+      #define STBI__COMBO(a,b)  ((a)*8+(b))
+      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
+      // convert source image with img_n components to one with req_comp components;
+      // avoid switch per pixel, so use switch per scanline and massive macros
+      switch (STBI__COMBO(img_n, req_comp)) {
+         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
+         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
+         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
+         STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
+         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
+         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
+         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
+         STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
+         STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
+         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
+         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
+      }
+      #undef STBI__CASE
+   }
+
+   STBI_FREE(data);
+   return good;
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
+// nothing
+#else
+static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
+{
+   return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
+// nothing
+#else
+static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
+{
+   int i,j;
+   stbi__uint16 *good;
+
+   if (req_comp == img_n) return data;
+   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
+
+   good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
+   if (good == NULL) {
+      STBI_FREE(data);
+      return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
+   }
+
+   for (j=0; j < (int) y; ++j) {
+      stbi__uint16 *src  = data + j * x * img_n   ;
+      stbi__uint16 *dest = good + j * x * req_comp;
+
+      #define STBI__COMBO(a,b)  ((a)*8+(b))
+      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
+      // convert source image with img_n components to one with req_comp components;
+      // avoid switch per pixel, so use switch per scanline and massive macros
+      switch (STBI__COMBO(img_n, req_comp)) {
+         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
+         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
+         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
+         STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
+         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
+         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
+         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
+         STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
+         STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
+         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
+         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
+      }
+      #undef STBI__CASE
+   }
+
+   STBI_FREE(data);
+   return good;
+}
+#endif
+
+#ifndef STBI_NO_LINEAR
+static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
+{
+   int i,k,n;
+   float *output;
+   if (!data) return NULL;
+   output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
+   if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
+   // compute number of non-alpha components
+   if (comp & 1) n = comp; else n = comp-1;
+   for (i=0; i < x*y; ++i) {
+      for (k=0; k < n; ++k) {
+         output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
+      }
+   }
+   if (n < comp) {
+      for (i=0; i < x*y; ++i) {
+         output[i*comp + n] = data[i*comp + n]/255.0f;
+      }
+   }
+   STBI_FREE(data);
+   return output;
+}
+#endif
+
+#ifndef STBI_NO_HDR
+#define stbi__float2int(x)   ((int) (x))
+static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
+{
+   int i,k,n;
+   stbi_uc *output;
+   if (!data) return NULL;
+   output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
+   if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
+   // compute number of non-alpha components
+   if (comp & 1) n = comp; else n = comp-1;
+   for (i=0; i < x*y; ++i) {
+      for (k=0; k < n; ++k) {
+         float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
+         if (z < 0) z = 0;
+         if (z > 255) z = 255;
+         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
+      }
+      if (k < comp) {
+         float z = data[i*comp+k] * 255 + 0.5f;
+         if (z < 0) z = 0;
+         if (z > 255) z = 255;
+         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
+      }
+   }
+   STBI_FREE(data);
+   return output;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  "baseline" JPEG/JFIF decoder
+//
+//    simple implementation
+//      - doesn't support delayed output of y-dimension
+//      - simple interface (only one output format: 8-bit interleaved RGB)
+//      - doesn't try to recover corrupt jpegs
+//      - doesn't allow partial loading, loading multiple at once
+//      - still fast on x86 (copying globals into locals doesn't help x86)
+//      - allocates lots of intermediate memory (full size of all components)
+//        - non-interleaved case requires this anyway
+//        - allows good upsampling (see next)
+//    high-quality
+//      - upsampled channels are bilinearly interpolated, even across blocks
+//      - quality integer IDCT derived from IJG's 'slow'
+//    performance
+//      - fast huffman; reasonable integer IDCT
+//      - some SIMD kernels for common paths on targets with SSE2/NEON
+//      - uses a lot of intermediate memory, could cache poorly
+
+#ifndef STBI_NO_JPEG
+
+// huffman decoding acceleration
+#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
+
+typedef struct
+{
+   stbi_uc  fast[1 << FAST_BITS];
+   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
+   stbi__uint16 code[256];
+   stbi_uc  values[256];
+   stbi_uc  size[257];
+   unsigned int maxcode[18];
+   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
+} stbi__huffman;
+
+typedef struct
+{
+   stbi__context *s;
+   stbi__huffman huff_dc[4];
+   stbi__huffman huff_ac[4];
+   stbi__uint16 dequant[4][64];
+   stbi__int16 fast_ac[4][1 << FAST_BITS];
+
+// sizes for components, interleaved MCUs
+   int img_h_max, img_v_max;
+   int img_mcu_x, img_mcu_y;
+   int img_mcu_w, img_mcu_h;
+
+// definition of jpeg image component
+   struct
+   {
+      int id;
+      int h,v;
+      int tq;
+      int hd,ha;
+      int dc_pred;
+
+      int x,y,w2,h2;
+      stbi_uc *data;
+      void *raw_data, *raw_coeff;
+      stbi_uc *linebuf;
+      short   *coeff;   // progressive only
+      int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
+   } img_comp[4];
+
+   stbi__uint32   code_buffer; // jpeg entropy-coded buffer
+   int            code_bits;   // number of valid bits
+   unsigned char  marker;      // marker seen while filling entropy buffer
+   int            nomore;      // flag if we saw a marker so must stop
+
+   int            progressive;
+   int            spec_start;
+   int            spec_end;
+   int            succ_high;
+   int            succ_low;
+   int            eob_run;
+   int            jfif;
+   int            app14_color_transform; // Adobe APP14 tag
+   int            rgb;
+
+   int scan_n, order[4];
+   int restart_interval, todo;
+
+// kernels
+   void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
+   void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
+   stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
+} stbi__jpeg;
+
+static int stbi__build_huffman(stbi__huffman *h, int *count)
+{
+   int i,j,k=0;
+   unsigned int code;
+   // build size list for each symbol (from JPEG spec)
+   for (i=0; i < 16; ++i) {
+      for (j=0; j < count[i]; ++j) {
+         h->size[k++] = (stbi_uc) (i+1);
+         if(k >= 257) return stbi__err("bad size list","Corrupt JPEG");
+      }
+   }
+   h->size[k] = 0;
+
+   // compute actual symbols (from jpeg spec)
+   code = 0;
+   k = 0;
+   for(j=1; j <= 16; ++j) {
+      // compute delta to add to code to compute symbol id
+      h->delta[j] = k - code;
+      if (h->size[k] == j) {
+         while (h->size[k] == j)
+            h->code[k++] = (stbi__uint16) (code++);
+         if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
+      }
+      // compute largest code + 1 for this size, preshifted as needed later
+      h->maxcode[j] = code << (16-j);
+      code <<= 1;
+   }
+   h->maxcode[j] = 0xffffffff;
+
+   // build non-spec acceleration table; 255 is flag for not-accelerated
+   memset(h->fast, 255, 1 << FAST_BITS);
+   for (i=0; i < k; ++i) {
+      int s = h->size[i];
+      if (s <= FAST_BITS) {
+         int c = h->code[i] << (FAST_BITS-s);
+         int m = 1 << (FAST_BITS-s);
+         for (j=0; j < m; ++j) {
+            h->fast[c+j] = (stbi_uc) i;
+         }
+      }
+   }
+   return 1;
+}
+
+// build a table that decodes both magnitude and value of small ACs in
+// one go.
+static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
+{
+   int i;
+   for (i=0; i < (1 << FAST_BITS); ++i) {
+      stbi_uc fast = h->fast[i];
+      fast_ac[i] = 0;
+      if (fast < 255) {
+         int rs = h->values[fast];
+         int run = (rs >> 4) & 15;
+         int magbits = rs & 15;
+         int len = h->size[fast];
+
+         if (magbits && len + magbits <= FAST_BITS) {
+            // magnitude code followed by receive_extend code
+            int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
+            int m = 1 << (magbits - 1);
+            if (k < m) k += (~0U << magbits) + 1;
+            // if the result is small enough, we can fit it in fast_ac table
+            if (k >= -128 && k <= 127)
+               fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
+         }
+      }
+   }
+}
+
+static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
+{
+   do {
+      unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
+      if (b == 0xff) {
+         int c = stbi__get8(j->s);
+         while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
+         if (c != 0) {
+            j->marker = (unsigned char) c;
+            j->nomore = 1;
+            return;
+         }
+      }
+      j->code_buffer |= b << (24 - j->code_bits);
+      j->code_bits += 8;
+   } while (j->code_bits <= 24);
+}
+
+// (1 << n) - 1
+static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
+
+// decode a jpeg huffman value from the bitstream
+stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
+{
+   unsigned int temp;
+   int c,k;
+
+   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+
+   // look at the top FAST_BITS and determine what symbol ID it is,
+   // if the code is <= FAST_BITS
+   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+   k = h->fast[c];
+   if (k < 255) {
+      int s = h->size[k];
+      if (s > j->code_bits)
+         return -1;
+      j->code_buffer <<= s;
+      j->code_bits -= s;
+      return h->values[k];
+   }
+
+   // naive test is to shift the code_buffer down so k bits are
+   // valid, then test against maxcode. To speed this up, we've
+   // preshifted maxcode left so that it has (16-k) 0s at the
+   // end; in other words, regardless of the number of bits, it
+   // wants to be compared against something shifted to have 16;
+   // that way we don't need to shift inside the loop.
+   temp = j->code_buffer >> 16;
+   for (k=FAST_BITS+1 ; ; ++k)
+      if (temp < h->maxcode[k])
+         break;
+   if (k == 17) {
+      // error! code not found
+      j->code_bits -= 16;
+      return -1;
+   }
+
+   if (k > j->code_bits)
+      return -1;
+
+   // convert the huffman code to the symbol id
+   c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
+   if(c < 0 || c >= 256) // symbol id out of bounds!
+       return -1;
+   STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
+
+   // convert the id to a symbol
+   j->code_bits -= k;
+   j->code_buffer <<= k;
+   return h->values[c];
+}
+
+// bias[n] = (-1<<n) + 1
+static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
+
+// combined JPEG 'receive' and JPEG 'extend', since baseline
+// always extends everything it receives.
+stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
+{
+   unsigned int k;
+   int sgn;
+   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
+   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
+
+   sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
+   k = stbi_lrot(j->code_buffer, n);
+   j->code_buffer = k & ~stbi__bmask[n];
+   k &= stbi__bmask[n];
+   j->code_bits -= n;
+   return k + (stbi__jbias[n] & (sgn - 1));
+}
+
+// get some unsigned bits
+stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
+{
+   unsigned int k;
+   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
+   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
+   k = stbi_lrot(j->code_buffer, n);
+   j->code_buffer = k & ~stbi__bmask[n];
+   k &= stbi__bmask[n];
+   j->code_bits -= n;
+   return k;
+}
+
+stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
+{
+   unsigned int k;
+   if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
+   if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing
+   k = j->code_buffer;
+   j->code_buffer <<= 1;
+   --j->code_bits;
+   return k & 0x80000000;
+}
+
+// given a value that's at position X in the zigzag stream,
+// where does it appear in the 8x8 matrix coded as row-major?
+static const stbi_uc stbi__jpeg_dezigzag[64+15] =
+{
+    0,  1,  8, 16,  9,  2,  3, 10,
+   17, 24, 32, 25, 18, 11,  4,  5,
+   12, 19, 26, 33, 40, 48, 41, 34,
+   27, 20, 13,  6,  7, 14, 21, 28,
+   35, 42, 49, 56, 57, 50, 43, 36,
+   29, 22, 15, 23, 30, 37, 44, 51,
+   58, 59, 52, 45, 38, 31, 39, 46,
+   53, 60, 61, 54, 47, 55, 62, 63,
+   // let corrupt input sample past end
+   63, 63, 63, 63, 63, 63, 63, 63,
+   63, 63, 63, 63, 63, 63, 63
+};
+
+// decode one 64-entry block--
+static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
+{
+   int diff,dc,k;
+   int t;
+
+   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+   t = stbi__jpeg_huff_decode(j, hdc);
+   if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
+
+   // 0 all the ac values now so we can do it 32-bits at a time
+   memset(data,0,64*sizeof(data[0]));
+
+   diff = t ? stbi__extend_receive(j, t) : 0;
+   if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG");
+   dc = j->img_comp[b].dc_pred + diff;
+   j->img_comp[b].dc_pred = dc;
+   if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+   data[0] = (short) (dc * dequant[0]);
+
+   // decode AC components, see JPEG spec
+   k = 1;
+   do {
+      unsigned int zig;
+      int c,r,s;
+      if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+      c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+      r = fac[c];
+      if (r) { // fast-AC path
+         k += (r >> 4) & 15; // run
+         s = r & 15; // combined length
+         if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
+         j->code_buffer <<= s;
+         j->code_bits -= s;
+         // decode into unzigzag'd location
+         zig = stbi__jpeg_dezigzag[k++];
+         data[zig] = (short) ((r >> 8) * dequant[zig]);
+      } else {
+         int rs = stbi__jpeg_huff_decode(j, hac);
+         if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
+         s = rs & 15;
+         r = rs >> 4;
+         if (s == 0) {
+            if (rs != 0xf0) break; // end block
+            k += 16;
+         } else {
+            k += r;
+            // decode into unzigzag'd location
+            zig = stbi__jpeg_dezigzag[k++];
+            data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
+         }
+      }
+   } while (k < 64);
+   return 1;
+}
+
+static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
+{
+   int diff,dc;
+   int t;
+   if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+
+   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+
+   if (j->succ_high == 0) {
+      // first scan for DC coefficient, must be first
+      memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
+      t = stbi__jpeg_huff_decode(j, hdc);
+      if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+      diff = t ? stbi__extend_receive(j, t) : 0;
+
+      if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG");
+      dc = j->img_comp[b].dc_pred + diff;
+      j->img_comp[b].dc_pred = dc;
+      if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+      data[0] = (short) (dc * (1 << j->succ_low));
+   } else {
+      // refinement scan for DC coefficient
+      if (stbi__jpeg_get_bit(j))
+         data[0] += (short) (1 << j->succ_low);
+   }
+   return 1;
+}
+
+// @OPTIMIZE: store non-zigzagged during the decode passes,
+// and only de-zigzag when dequantizing
+static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
+{
+   int k;
+   if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+
+   if (j->succ_high == 0) {
+      int shift = j->succ_low;
+
+      if (j->eob_run) {
+         --j->eob_run;
+         return 1;
+      }
+
+      k = j->spec_start;
+      do {
+         unsigned int zig;
+         int c,r,s;
+         if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+         c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+         r = fac[c];
+         if (r) { // fast-AC path
+            k += (r >> 4) & 15; // run
+            s = r & 15; // combined length
+            if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
+            j->code_buffer <<= s;
+            j->code_bits -= s;
+            zig = stbi__jpeg_dezigzag[k++];
+            data[zig] = (short) ((r >> 8) * (1 << shift));
+         } else {
+            int rs = stbi__jpeg_huff_decode(j, hac);
+            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
+            s = rs & 15;
+            r = rs >> 4;
+            if (s == 0) {
+               if (r < 15) {
+                  j->eob_run = (1 << r);
+                  if (r)
+                     j->eob_run += stbi__jpeg_get_bits(j, r);
+                  --j->eob_run;
+                  break;
+               }
+               k += 16;
+            } else {
+               k += r;
+               zig = stbi__jpeg_dezigzag[k++];
+               data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
+            }
+         }
+      } while (k <= j->spec_end);
+   } else {
+      // refinement scan for these AC coefficients
+
+      short bit = (short) (1 << j->succ_low);
+
+      if (j->eob_run) {
+         --j->eob_run;
+         for (k = j->spec_start; k <= j->spec_end; ++k) {
+            short *p = &data[stbi__jpeg_dezigzag[k]];
+            if (*p != 0)
+               if (stbi__jpeg_get_bit(j))
+                  if ((*p & bit)==0) {
+                     if (*p > 0)
+                        *p += bit;
+                     else
+                        *p -= bit;
+                  }
+         }
+      } else {
+         k = j->spec_start;
+         do {
+            int r,s;
+            int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
+            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
+            s = rs & 15;
+            r = rs >> 4;
+            if (s == 0) {
+               if (r < 15) {
+                  j->eob_run = (1 << r) - 1;
+                  if (r)
+                     j->eob_run += stbi__jpeg_get_bits(j, r);
+                  r = 64; // force end of block
+               } else {
+                  // r=15 s=0 should write 16 0s, so we just do
+                  // a run of 15 0s and then write s (which is 0),
+                  // so we don't have to do anything special here
+               }
+            } else {
+               if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
+               // sign bit
+               if (stbi__jpeg_get_bit(j))
+                  s = bit;
+               else
+                  s = -bit;
+            }
+
+            // advance by r
+            while (k <= j->spec_end) {
+               short *p = &data[stbi__jpeg_dezigzag[k++]];
+               if (*p != 0) {
+                  if (stbi__jpeg_get_bit(j))
+                     if ((*p & bit)==0) {
+                        if (*p > 0)
+                           *p += bit;
+                        else
+                           *p -= bit;
+                     }
+               } else {
+                  if (r == 0) {
+                     *p = (short) s;
+                     break;
+                  }
+                  --r;
+               }
+            }
+         } while (k <= j->spec_end);
+      }
+   }
+   return 1;
+}
+
+// take a -128..127 value and stbi__clamp it and convert to 0..255
+stbi_inline static stbi_uc stbi__clamp(int x)
+{
+   // trick to use a single test to catch both cases
+   if ((unsigned int) x > 255) {
+      if (x < 0) return 0;
+      if (x > 255) return 255;
+   }
+   return (stbi_uc) x;
+}
+
+#define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
+#define stbi__fsh(x)  ((x) * 4096)
+
+// derived from jidctint -- DCT_ISLOW
+#define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
+   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
+   p2 = s2;                                    \
+   p3 = s6;                                    \
+   p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
+   t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
+   t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
+   p2 = s0;                                    \
+   p3 = s4;                                    \
+   t0 = stbi__fsh(p2+p3);                      \
+   t1 = stbi__fsh(p2-p3);                      \
+   x0 = t0+t3;                                 \
+   x3 = t0-t3;                                 \
+   x1 = t1+t2;                                 \
+   x2 = t1-t2;                                 \
+   t0 = s7;                                    \
+   t1 = s5;                                    \
+   t2 = s3;                                    \
+   t3 = s1;                                    \
+   p3 = t0+t2;                                 \
+   p4 = t1+t3;                                 \
+   p1 = t0+t3;                                 \
+   p2 = t1+t2;                                 \
+   p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
+   t0 = t0*stbi__f2f( 0.298631336f);           \
+   t1 = t1*stbi__f2f( 2.053119869f);           \
+   t2 = t2*stbi__f2f( 3.072711026f);           \
+   t3 = t3*stbi__f2f( 1.501321110f);           \
+   p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
+   p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
+   p3 = p3*stbi__f2f(-1.961570560f);           \
+   p4 = p4*stbi__f2f(-0.390180644f);           \
+   t3 += p1+p4;                                \
+   t2 += p2+p3;                                \
+   t1 += p2+p4;                                \
+   t0 += p1+p3;
+
+static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
+{
+   int i,val[64],*v=val;
+   stbi_uc *o;
+   short *d = data;
+
+   // columns
+   for (i=0; i < 8; ++i,++d, ++v) {
+      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
+      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
+           && d[40]==0 && d[48]==0 && d[56]==0) {
+         //    no shortcut                 0     seconds
+         //    (1|2|3|4|5|6|7)==0          0     seconds
+         //    all separate               -0.047 seconds
+         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
+         int dcterm = d[0]*4;
+         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
+      } else {
+         STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
+         // constants scaled things up by 1<<12; let's bring them back
+         // down, but keep 2 extra bits of precision
+         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
+         v[ 0] = (x0+t3) >> 10;
+         v[56] = (x0-t3) >> 10;
+         v[ 8] = (x1+t2) >> 10;
+         v[48] = (x1-t2) >> 10;
+         v[16] = (x2+t1) >> 10;
+         v[40] = (x2-t1) >> 10;
+         v[24] = (x3+t0) >> 10;
+         v[32] = (x3-t0) >> 10;
+      }
+   }
+
+   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
+      // no fast case since the first 1D IDCT spread components out
+      STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
+      // constants scaled things up by 1<<12, plus we had 1<<2 from first
+      // loop, plus horizontal and vertical each scale by sqrt(8) so together
+      // we've got an extra 1<<3, so 1<<17 total we need to remove.
+      // so we want to round that, which means adding 0.5 * 1<<17,
+      // aka 65536. Also, we'll end up with -128 to 127 that we want
+      // to encode as 0..255 by adding 128, so we'll add that before the shift
+      x0 += 65536 + (128<<17);
+      x1 += 65536 + (128<<17);
+      x2 += 65536 + (128<<17);
+      x3 += 65536 + (128<<17);
+      // tried computing the shifts into temps, or'ing the temps to see
+      // if any were out of range, but that was slower
+      o[0] = stbi__clamp((x0+t3) >> 17);
+      o[7] = stbi__clamp((x0-t3) >> 17);
+      o[1] = stbi__clamp((x1+t2) >> 17);
+      o[6] = stbi__clamp((x1-t2) >> 17);
+      o[2] = stbi__clamp((x2+t1) >> 17);
+      o[5] = stbi__clamp((x2-t1) >> 17);
+      o[3] = stbi__clamp((x3+t0) >> 17);
+      o[4] = stbi__clamp((x3-t0) >> 17);
+   }
+}
+
+#ifdef STBI_SSE2
+// sse2 integer IDCT. not the fastest possible implementation but it
+// produces bit-identical results to the generic C version so it's
+// fully "transparent".
+static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
+{
+   // This is constructed to match our regular (generic) integer IDCT exactly.
+   __m128i row0, row1, row2, row3, row4, row5, row6, row7;
+   __m128i tmp;
+
+   // dot product constant: even elems=x, odd elems=y
+   #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
+
+   // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
+   // out(1) = c1[even]*x + c1[odd]*y
+   #define dct_rot(out0,out1, x,y,c0,c1) \
+      __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
+      __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
+      __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
+      __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
+      __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
+      __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
+
+   // out = in << 12  (in 16-bit, out 32-bit)
+   #define dct_widen(out, in) \
+      __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
+      __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
+
+   // wide add
+   #define dct_wadd(out, a, b) \
+      __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
+      __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
+
+   // wide sub
+   #define dct_wsub(out, a, b) \
+      __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
+      __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
+
+   // butterfly a/b, add bias, then shift by "s" and pack
+   #define dct_bfly32o(out0, out1, a,b,bias,s) \
+      { \
+         __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
+         __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
+         dct_wadd(sum, abiased, b); \
+         dct_wsub(dif, abiased, b); \
+         out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
+         out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
+      }
+
+   // 8-bit interleave step (for transposes)
+   #define dct_interleave8(a, b) \
+      tmp = a; \
+      a = _mm_unpacklo_epi8(a, b); \
+      b = _mm_unpackhi_epi8(tmp, b)
+
+   // 16-bit interleave step (for transposes)
+   #define dct_interleave16(a, b) \
+      tmp = a; \
+      a = _mm_unpacklo_epi16(a, b); \
+      b = _mm_unpackhi_epi16(tmp, b)
+
+   #define dct_pass(bias,shift) \
+      { \
+         /* even part */ \
+         dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
+         __m128i sum04 = _mm_add_epi16(row0, row4); \
+         __m128i dif04 = _mm_sub_epi16(row0, row4); \
+         dct_widen(t0e, sum04); \
+         dct_widen(t1e, dif04); \
+         dct_wadd(x0, t0e, t3e); \
+         dct_wsub(x3, t0e, t3e); \
+         dct_wadd(x1, t1e, t2e); \
+         dct_wsub(x2, t1e, t2e); \
+         /* odd part */ \
+         dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
+         dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
+         __m128i sum17 = _mm_add_epi16(row1, row7); \
+         __m128i sum35 = _mm_add_epi16(row3, row5); \
+         dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
+         dct_wadd(x4, y0o, y4o); \
+         dct_wadd(x5, y1o, y5o); \
+         dct_wadd(x6, y2o, y5o); \
+         dct_wadd(x7, y3o, y4o); \
+         dct_bfly32o(row0,row7, x0,x7,bias,shift); \
+         dct_bfly32o(row1,row6, x1,x6,bias,shift); \
+         dct_bfly32o(row2,row5, x2,x5,bias,shift); \
+         dct_bfly32o(row3,row4, x3,x4,bias,shift); \
+      }
+
+   __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
+   __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
+   __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
+   __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
+   __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
+   __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
+   __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
+   __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
+
+   // rounding biases in column/row passes, see stbi__idct_block for explanation.
+   __m128i bias_0 = _mm_set1_epi32(512);
+   __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
+
+   // load
+   row0 = _mm_load_si128((const __m128i *) (data + 0*8));
+   row1 = _mm_load_si128((const __m128i *) (data + 1*8));
+   row2 = _mm_load_si128((const __m128i *) (data + 2*8));
+   row3 = _mm_load_si128((const __m128i *) (data + 3*8));
+   row4 = _mm_load_si128((const __m128i *) (data + 4*8));
+   row5 = _mm_load_si128((const __m128i *) (data + 5*8));
+   row6 = _mm_load_si128((const __m128i *) (data + 6*8));
+   row7 = _mm_load_si128((const __m128i *) (data + 7*8));
+
+   // column pass
+   dct_pass(bias_0, 10);
+
+   {
+      // 16bit 8x8 transpose pass 1
+      dct_interleave16(row0, row4);
+      dct_interleave16(row1, row5);
+      dct_interleave16(row2, row6);
+      dct_interleave16(row3, row7);
+
+      // transpose pass 2
+      dct_interleave16(row0, row2);
+      dct_interleave16(row1, row3);
+      dct_interleave16(row4, row6);
+      dct_interleave16(row5, row7);
+
+      // transpose pass 3
+      dct_interleave16(row0, row1);
+      dct_interleave16(row2, row3);
+      dct_interleave16(row4, row5);
+      dct_interleave16(row6, row7);
+   }
+
+   // row pass
+   dct_pass(bias_1, 17);
+
+   {
+      // pack
+      __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
+      __m128i p1 = _mm_packus_epi16(row2, row3);
+      __m128i p2 = _mm_packus_epi16(row4, row5);
+      __m128i p3 = _mm_packus_epi16(row6, row7);
+
+      // 8bit 8x8 transpose pass 1
+      dct_interleave8(p0, p2); // a0e0a1e1...
+      dct_interleave8(p1, p3); // c0g0c1g1...
+
+      // transpose pass 2
+      dct_interleave8(p0, p1); // a0c0e0g0...
+      dct_interleave8(p2, p3); // b0d0f0h0...
+
+      // transpose pass 3
+      dct_interleave8(p0, p2); // a0b0c0d0...
+      dct_interleave8(p1, p3); // a4b4c4d4...
+
+      // store
+      _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
+   }
+
+#undef dct_const
+#undef dct_rot
+#undef dct_widen
+#undef dct_wadd
+#undef dct_wsub
+#undef dct_bfly32o
+#undef dct_interleave8
+#undef dct_interleave16
+#undef dct_pass
+}
+
+#endif // STBI_SSE2
+
+#ifdef STBI_NEON
+
+// NEON integer IDCT. should produce bit-identical
+// results to the generic C version.
+static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
+{
+   int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
+
+   int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
+   int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
+   int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
+   int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
+   int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
+   int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
+   int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
+   int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
+   int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
+   int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
+   int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
+   int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
+
+#define dct_long_mul(out, inq, coeff) \
+   int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
+   int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
+
+#define dct_long_mac(out, acc, inq, coeff) \
+   int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
+   int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
+
+#define dct_widen(out, inq) \
+   int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
+   int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
+
+// wide add
+#define dct_wadd(out, a, b) \
+   int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
+   int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
+
+// wide sub
+#define dct_wsub(out, a, b) \
+   int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
+   int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
+
+// butterfly a/b, then shift using "shiftop" by "s" and pack
+#define dct_bfly32o(out0,out1, a,b,shiftop,s) \
+   { \
+      dct_wadd(sum, a, b); \
+      dct_wsub(dif, a, b); \
+      out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
+      out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
+   }
+
+#define dct_pass(shiftop, shift) \
+   { \
+      /* even part */ \
+      int16x8_t sum26 = vaddq_s16(row2, row6); \
+      dct_long_mul(p1e, sum26, rot0_0); \
+      dct_long_mac(t2e, p1e, row6, rot0_1); \
+      dct_long_mac(t3e, p1e, row2, rot0_2); \
+      int16x8_t sum04 = vaddq_s16(row0, row4); \
+      int16x8_t dif04 = vsubq_s16(row0, row4); \
+      dct_widen(t0e, sum04); \
+      dct_widen(t1e, dif04); \
+      dct_wadd(x0, t0e, t3e); \
+      dct_wsub(x3, t0e, t3e); \
+      dct_wadd(x1, t1e, t2e); \
+      dct_wsub(x2, t1e, t2e); \
+      /* odd part */ \
+      int16x8_t sum15 = vaddq_s16(row1, row5); \
+      int16x8_t sum17 = vaddq_s16(row1, row7); \
+      int16x8_t sum35 = vaddq_s16(row3, row5); \
+      int16x8_t sum37 = vaddq_s16(row3, row7); \
+      int16x8_t sumodd = vaddq_s16(sum17, sum35); \
+      dct_long_mul(p5o, sumodd, rot1_0); \
+      dct_long_mac(p1o, p5o, sum17, rot1_1); \
+      dct_long_mac(p2o, p5o, sum35, rot1_2); \
+      dct_long_mul(p3o, sum37, rot2_0); \
+      dct_long_mul(p4o, sum15, rot2_1); \
+      dct_wadd(sump13o, p1o, p3o); \
+      dct_wadd(sump24o, p2o, p4o); \
+      dct_wadd(sump23o, p2o, p3o); \
+      dct_wadd(sump14o, p1o, p4o); \
+      dct_long_mac(x4, sump13o, row7, rot3_0); \
+      dct_long_mac(x5, sump24o, row5, rot3_1); \
+      dct_long_mac(x6, sump23o, row3, rot3_2); \
+      dct_long_mac(x7, sump14o, row1, rot3_3); \
+      dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
+      dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
+      dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
+      dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
+   }
+
+   // load
+   row0 = vld1q_s16(data + 0*8);
+   row1 = vld1q_s16(data + 1*8);
+   row2 = vld1q_s16(data + 2*8);
+   row3 = vld1q_s16(data + 3*8);
+   row4 = vld1q_s16(data + 4*8);
+   row5 = vld1q_s16(data + 5*8);
+   row6 = vld1q_s16(data + 6*8);
+   row7 = vld1q_s16(data + 7*8);
+
+   // add DC bias
+   row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
+
+   // column pass
+   dct_pass(vrshrn_n_s32, 10);
+
+   // 16bit 8x8 transpose
+   {
+// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
+// whether compilers actually get this is another story, sadly.
+#define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
+#define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
+#define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
+
+      // pass 1
+      dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
+      dct_trn16(row2, row3);
+      dct_trn16(row4, row5);
+      dct_trn16(row6, row7);
+
+      // pass 2
+      dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
+      dct_trn32(row1, row3);
+      dct_trn32(row4, row6);
+      dct_trn32(row5, row7);
+
+      // pass 3
+      dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
+      dct_trn64(row1, row5);
+      dct_trn64(row2, row6);
+      dct_trn64(row3, row7);
+
+#undef dct_trn16
+#undef dct_trn32
+#undef dct_trn64
+   }
+
+   // row pass
+   // vrshrn_n_s32 only supports shifts up to 16, we need
+   // 17. so do a non-rounding shift of 16 first then follow
+   // up with a rounding shift by 1.
+   dct_pass(vshrn_n_s32, 16);
+
+   {
+      // pack and round
+      uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
+      uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
+      uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
+      uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
+      uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
+      uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
+      uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
+      uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
+
+      // again, these can translate into one instruction, but often don't.
+#define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
+#define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
+#define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
+
+      // sadly can't use interleaved stores here since we only write
+      // 8 bytes to each scan line!
+
+      // 8x8 8-bit transpose pass 1
+      dct_trn8_8(p0, p1);
+      dct_trn8_8(p2, p3);
+      dct_trn8_8(p4, p5);
+      dct_trn8_8(p6, p7);
+
+      // pass 2
+      dct_trn8_16(p0, p2);
+      dct_trn8_16(p1, p3);
+      dct_trn8_16(p4, p6);
+      dct_trn8_16(p5, p7);
+
+      // pass 3
+      dct_trn8_32(p0, p4);
+      dct_trn8_32(p1, p5);
+      dct_trn8_32(p2, p6);
+      dct_trn8_32(p3, p7);
+
+      // store
+      vst1_u8(out, p0); out += out_stride;
+      vst1_u8(out, p1); out += out_stride;
+      vst1_u8(out, p2); out += out_stride;
+      vst1_u8(out, p3); out += out_stride;
+      vst1_u8(out, p4); out += out_stride;
+      vst1_u8(out, p5); out += out_stride;
+      vst1_u8(out, p6); out += out_stride;
+      vst1_u8(out, p7);
+
+#undef dct_trn8_8
+#undef dct_trn8_16
+#undef dct_trn8_32
+   }
+
+#undef dct_long_mul
+#undef dct_long_mac
+#undef dct_widen
+#undef dct_wadd
+#undef dct_wsub
+#undef dct_bfly32o
+#undef dct_pass
+}
+
+#endif // STBI_NEON
+
+#define STBI__MARKER_none  0xff
+// if there's a pending marker from the entropy stream, return that
+// otherwise, fetch from the stream and get a marker. if there's no
+// marker, return 0xff, which is never a valid marker value
+static stbi_uc stbi__get_marker(stbi__jpeg *j)
+{
+   stbi_uc x;
+   if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
+   x = stbi__get8(j->s);
+   if (x != 0xff) return STBI__MARKER_none;
+   while (x == 0xff)
+      x = stbi__get8(j->s); // consume repeated 0xff fill bytes
+   return x;
+}
+
+// in each scan, we'll have scan_n components, and the order
+// of the components is specified by order[]
+#define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
+
+// after a restart interval, stbi__jpeg_reset the entropy decoder and
+// the dc prediction
+static void stbi__jpeg_reset(stbi__jpeg *j)
+{
+   j->code_bits = 0;
+   j->code_buffer = 0;
+   j->nomore = 0;
+   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
+   j->marker = STBI__MARKER_none;
+   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
+   j->eob_run = 0;
+   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
+   // since we don't even allow 1<<30 pixels
+}
+
+static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
+{
+   stbi__jpeg_reset(z);
+   if (!z->progressive) {
+      if (z->scan_n == 1) {
+         int i,j;
+         STBI_SIMD_ALIGN(short, data[64]);
+         int n = z->order[0];
+         // non-interleaved data, we just need to process one block at a time,
+         // in trivial scanline order
+         // number of blocks to do just depends on how many actual "pixels" this
+         // component has, independent of interleaved MCU blocking and such
+         int w = (z->img_comp[n].x+7) >> 3;
+         int h = (z->img_comp[n].y+7) >> 3;
+         for (j=0; j < h; ++j) {
+            for (i=0; i < w; ++i) {
+               int ha = z->img_comp[n].ha;
+               if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
+               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
+               // every data block is an MCU, so countdown the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  // if it's NOT a restart, then just bail, so we get corrupt data
+                  // rather than no data
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      } else { // interleaved
+         int i,j,k,x,y;
+         STBI_SIMD_ALIGN(short, data[64]);
+         for (j=0; j < z->img_mcu_y; ++j) {
+            for (i=0; i < z->img_mcu_x; ++i) {
+               // scan an interleaved mcu... process scan_n components in order
+               for (k=0; k < z->scan_n; ++k) {
+                  int n = z->order[k];
+                  // scan out an mcu's worth of this component; that's just determined
+                  // by the basic H and V specified for the component
+                  for (y=0; y < z->img_comp[n].v; ++y) {
+                     for (x=0; x < z->img_comp[n].h; ++x) {
+                        int x2 = (i*z->img_comp[n].h + x)*8;
+                        int y2 = (j*z->img_comp[n].v + y)*8;
+                        int ha = z->img_comp[n].ha;
+                        if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
+                        z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
+                     }
+                  }
+               }
+               // after all interleaved components, that's an interleaved MCU,
+               // so now count down the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      }
+   } else {
+      if (z->scan_n == 1) {
+         int i,j;
+         int n = z->order[0];
+         // non-interleaved data, we just need to process one block at a time,
+         // in trivial scanline order
+         // number of blocks to do just depends on how many actual "pixels" this
+         // component has, independent of interleaved MCU blocking and such
+         int w = (z->img_comp[n].x+7) >> 3;
+         int h = (z->img_comp[n].y+7) >> 3;
+         for (j=0; j < h; ++j) {
+            for (i=0; i < w; ++i) {
+               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
+               if (z->spec_start == 0) {
+                  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
+                     return 0;
+               } else {
+                  int ha = z->img_comp[n].ha;
+                  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
+                     return 0;
+               }
+               // every data block is an MCU, so countdown the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      } else { // interleaved
+         int i,j,k,x,y;
+         for (j=0; j < z->img_mcu_y; ++j) {
+            for (i=0; i < z->img_mcu_x; ++i) {
+               // scan an interleaved mcu... process scan_n components in order
+               for (k=0; k < z->scan_n; ++k) {
+                  int n = z->order[k];
+                  // scan out an mcu's worth of this component; that's just determined
+                  // by the basic H and V specified for the component
+                  for (y=0; y < z->img_comp[n].v; ++y) {
+                     for (x=0; x < z->img_comp[n].h; ++x) {
+                        int x2 = (i*z->img_comp[n].h + x);
+                        int y2 = (j*z->img_comp[n].v + y);
+                        short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
+                        if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
+                           return 0;
+                     }
+                  }
+               }
+               // after all interleaved components, that's an interleaved MCU,
+               // so now count down the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      }
+   }
+}
+
+static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
+{
+   int i;
+   for (i=0; i < 64; ++i)
+      data[i] *= dequant[i];
+}
+
+static void stbi__jpeg_finish(stbi__jpeg *z)
+{
+   if (z->progressive) {
+      // dequantize and idct the data
+      int i,j,n;
+      for (n=0; n < z->s->img_n; ++n) {
+         int w = (z->img_comp[n].x+7) >> 3;
+         int h = (z->img_comp[n].y+7) >> 3;
+         for (j=0; j < h; ++j) {
+            for (i=0; i < w; ++i) {
+               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
+               stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
+               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
+            }
+         }
+      }
+   }
+}
+
+static int stbi__process_marker(stbi__jpeg *z, int m)
+{
+   int L;
+   switch (m) {
+      case STBI__MARKER_none: // no marker found
+         return stbi__err("expected marker","Corrupt JPEG");
+
+      case 0xDD: // DRI - specify restart interval
+         if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
+         z->restart_interval = stbi__get16be(z->s);
+         return 1;
+
+      case 0xDB: // DQT - define quantization table
+         L = stbi__get16be(z->s)-2;
+         while (L > 0) {
+            int q = stbi__get8(z->s);
+            int p = q >> 4, sixteen = (p != 0);
+            int t = q & 15,i;
+            if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
+            if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
+
+            for (i=0; i < 64; ++i)
+               z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
+            L -= (sixteen ? 129 : 65);
+         }
+         return L==0;
+
+      case 0xC4: // DHT - define huffman table
+         L = stbi__get16be(z->s)-2;
+         while (L > 0) {
+            stbi_uc *v;
+            int sizes[16],i,n=0;
+            int q = stbi__get8(z->s);
+            int tc = q >> 4;
+            int th = q & 15;
+            if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
+            for (i=0; i < 16; ++i) {
+               sizes[i] = stbi__get8(z->s);
+               n += sizes[i];
+            }
+            if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values!
+            L -= 17;
+            if (tc == 0) {
+               if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
+               v = z->huff_dc[th].values;
+            } else {
+               if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
+               v = z->huff_ac[th].values;
+            }
+            for (i=0; i < n; ++i)
+               v[i] = stbi__get8(z->s);
+            if (tc != 0)
+               stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
+            L -= n;
+         }
+         return L==0;
+   }
+
+   // check for comment block or APP blocks
+   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
+      L = stbi__get16be(z->s);
+      if (L < 2) {
+         if (m == 0xFE)
+            return stbi__err("bad COM len","Corrupt JPEG");
+         else
+            return stbi__err("bad APP len","Corrupt JPEG");
+      }
+      L -= 2;
+
+      if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
+         static const unsigned char tag[5] = {'J','F','I','F','\0'};
+         int ok = 1;
+         int i;
+         for (i=0; i < 5; ++i)
+            if (stbi__get8(z->s) != tag[i])
+               ok = 0;
+         L -= 5;
+         if (ok)
+            z->jfif = 1;
+      } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
+         static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
+         int ok = 1;
+         int i;
+         for (i=0; i < 6; ++i)
+            if (stbi__get8(z->s) != tag[i])
+               ok = 0;
+         L -= 6;
+         if (ok) {
+            stbi__get8(z->s); // version
+            stbi__get16be(z->s); // flags0
+            stbi__get16be(z->s); // flags1
+            z->app14_color_transform = stbi__get8(z->s); // color transform
+            L -= 6;
+         }
+      }
+
+      stbi__skip(z->s, L);
+      return 1;
+   }
+
+   return stbi__err("unknown marker","Corrupt JPEG");
+}
+
+// after we see SOS
+static int stbi__process_scan_header(stbi__jpeg *z)
+{
+   int i;
+   int Ls = stbi__get16be(z->s);
+   z->scan_n = stbi__get8(z->s);
+   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
+   if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
+   for (i=0; i < z->scan_n; ++i) {
+      int id = stbi__get8(z->s), which;
+      int q = stbi__get8(z->s);
+      for (which = 0; which < z->s->img_n; ++which)
+         if (z->img_comp[which].id == id)
+            break;
+      if (which == z->s->img_n) return 0; // no match
+      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
+      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
+      z->order[i] = which;
+   }
+
+   {
+      int aa;
+      z->spec_start = stbi__get8(z->s);
+      z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
+      aa = stbi__get8(z->s);
+      z->succ_high = (aa >> 4);
+      z->succ_low  = (aa & 15);
+      if (z->progressive) {
+         if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
+            return stbi__err("bad SOS", "Corrupt JPEG");
+      } else {
+         if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
+         if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
+         z->spec_end = 63;
+      }
+   }
+
+   return 1;
+}
+
+static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
+{
+   int i;
+   for (i=0; i < ncomp; ++i) {
+      if (z->img_comp[i].raw_data) {
+         STBI_FREE(z->img_comp[i].raw_data);
+         z->img_comp[i].raw_data = NULL;
+         z->img_comp[i].data = NULL;
+      }
+      if (z->img_comp[i].raw_coeff) {
+         STBI_FREE(z->img_comp[i].raw_coeff);
+         z->img_comp[i].raw_coeff = 0;
+         z->img_comp[i].coeff = 0;
+      }
+      if (z->img_comp[i].linebuf) {
+         STBI_FREE(z->img_comp[i].linebuf);
+         z->img_comp[i].linebuf = NULL;
+      }
+   }
+   return why;
+}
+
+static int stbi__process_frame_header(stbi__jpeg *z, int scan)
+{
+   stbi__context *s = z->s;
+   int Lf,p,i,q, h_max=1,v_max=1,c;
+   Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
+   p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
+   s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
+   s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
+   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+   c = stbi__get8(s);
+   if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
+   s->img_n = c;
+   for (i=0; i < c; ++i) {
+      z->img_comp[i].data = NULL;
+      z->img_comp[i].linebuf = NULL;
+   }
+
+   if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
+
+   z->rgb = 0;
+   for (i=0; i < s->img_n; ++i) {
+      static const unsigned char rgb[3] = { 'R', 'G', 'B' };
+      z->img_comp[i].id = stbi__get8(s);
+      if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
+         ++z->rgb;
+      q = stbi__get8(s);
+      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
+      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
+      z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
+   }
+
+   if (scan != STBI__SCAN_load) return 1;
+
+   if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
+
+   for (i=0; i < s->img_n; ++i) {
+      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
+      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
+   }
+
+   // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
+   // and I've never seen a non-corrupted JPEG file actually use them
+   for (i=0; i < s->img_n; ++i) {
+      if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
+      if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
+   }
+
+   // compute interleaved mcu info
+   z->img_h_max = h_max;
+   z->img_v_max = v_max;
+   z->img_mcu_w = h_max * 8;
+   z->img_mcu_h = v_max * 8;
+   // these sizes can't be more than 17 bits
+   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
+   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
+
+   for (i=0; i < s->img_n; ++i) {
+      // number of effective pixels (e.g. for non-interleaved MCU)
+      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
+      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
+      // to simplify generation, we'll allocate enough memory to decode
+      // the bogus oversized data from using interleaved MCUs and their
+      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
+      // discard the extra data until colorspace conversion
+      //
+      // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
+      // so these muls can't overflow with 32-bit ints (which we require)
+      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
+      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
+      z->img_comp[i].coeff = 0;
+      z->img_comp[i].raw_coeff = 0;
+      z->img_comp[i].linebuf = NULL;
+      z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
+      if (z->img_comp[i].raw_data == NULL)
+         return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
+      // align blocks for idct using mmx/sse
+      z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
+      if (z->progressive) {
+         // w2, h2 are multiples of 8 (see above)
+         z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
+         z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
+         z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
+         if (z->img_comp[i].raw_coeff == NULL)
+            return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
+         z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
+      }
+   }
+
+   return 1;
+}
+
+// use comparisons since in some cases we handle more than one case (e.g. SOF)
+#define stbi__DNL(x)         ((x) == 0xdc)
+#define stbi__SOI(x)         ((x) == 0xd8)
+#define stbi__EOI(x)         ((x) == 0xd9)
+#define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
+#define stbi__SOS(x)         ((x) == 0xda)
+
+#define stbi__SOF_progressive(x)   ((x) == 0xc2)
+
+static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
+{
+   int m;
+   z->jfif = 0;
+   z->app14_color_transform = -1; // valid values are 0,1,2
+   z->marker = STBI__MARKER_none; // initialize cached marker to empty
+   m = stbi__get_marker(z);
+   if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
+   if (scan == STBI__SCAN_type) return 1;
+   m = stbi__get_marker(z);
+   while (!stbi__SOF(m)) {
+      if (!stbi__process_marker(z,m)) return 0;
+      m = stbi__get_marker(z);
+      while (m == STBI__MARKER_none) {
+         // some files have extra padding after their blocks, so ok, we'll scan
+         if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
+         m = stbi__get_marker(z);
+      }
+   }
+   z->progressive = stbi__SOF_progressive(m);
+   if (!stbi__process_frame_header(z, scan)) return 0;
+   return 1;
+}
+
+static stbi_uc stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
+{
+   // some JPEGs have junk at end, skip over it but if we find what looks
+   // like a valid marker, resume there
+   while (!stbi__at_eof(j->s)) {
+      stbi_uc x = stbi__get8(j->s);
+      while (x == 0xff) { // might be a marker
+         if (stbi__at_eof(j->s)) return STBI__MARKER_none;
+         x = stbi__get8(j->s);
+         if (x != 0x00 && x != 0xff) {
+            // not a stuffed zero or lead-in to another marker, looks
+            // like an actual marker, return it
+            return x;
+         }
+         // stuffed zero has x=0 now which ends the loop, meaning we go
+         // back to regular scan loop.
+         // repeated 0xff keeps trying to read the next byte of the marker.
+      }
+   }
+   return STBI__MARKER_none;
+}
+
+// decode image to YCbCr format
+static int stbi__decode_jpeg_image(stbi__jpeg *j)
+{
+   int m;
+   for (m = 0; m < 4; m++) {
+      j->img_comp[m].raw_data = NULL;
+      j->img_comp[m].raw_coeff = NULL;
+   }
+   j->restart_interval = 0;
+   if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
+   m = stbi__get_marker(j);
+   while (!stbi__EOI(m)) {
+      if (stbi__SOS(m)) {
+         if (!stbi__process_scan_header(j)) return 0;
+         if (!stbi__parse_entropy_coded_data(j)) return 0;
+         if (j->marker == STBI__MARKER_none ) {
+         j->marker = stbi__skip_jpeg_junk_at_end(j);
+            // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
+         }
+         m = stbi__get_marker(j);
+         if (STBI__RESTART(m))
+            m = stbi__get_marker(j);
+      } else if (stbi__DNL(m)) {
+         int Ld = stbi__get16be(j->s);
+         stbi__uint32 NL = stbi__get16be(j->s);
+         if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
+         if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
+         m = stbi__get_marker(j);
+      } else {
+         if (!stbi__process_marker(j, m)) return 1;
+         m = stbi__get_marker(j);
+      }
+   }
+   if (j->progressive)
+      stbi__jpeg_finish(j);
+   return 1;
+}
+
+// static jfif-centered resampling (across block boundaries)
+
+typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
+                                    int w, int hs);
+
+#define stbi__div4(x) ((stbi_uc) ((x) >> 2))
+
+static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   STBI_NOTUSED(out);
+   STBI_NOTUSED(in_far);
+   STBI_NOTUSED(w);
+   STBI_NOTUSED(hs);
+   return in_near;
+}
+
+static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate two samples vertically for every one in input
+   int i;
+   STBI_NOTUSED(hs);
+   for (i=0; i < w; ++i)
+      out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
+   return out;
+}
+
+static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate two samples horizontally for every one in input
+   int i;
+   stbi_uc *input = in_near;
+
+   if (w == 1) {
+      // if only one sample, can't do any interpolation
+      out[0] = out[1] = input[0];
+      return out;
+   }
+
+   out[0] = input[0];
+   out[1] = stbi__div4(input[0]*3 + input[1] + 2);
+   for (i=1; i < w-1; ++i) {
+      int n = 3*input[i]+2;
+      out[i*2+0] = stbi__div4(n+input[i-1]);
+      out[i*2+1] = stbi__div4(n+input[i+1]);
+   }
+   out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
+   out[i*2+1] = input[w-1];
+
+   STBI_NOTUSED(in_far);
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+
+#define stbi__div16(x) ((stbi_uc) ((x) >> 4))
+
+static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate 2x2 samples for every one in input
+   int i,t0,t1;
+   if (w == 1) {
+      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
+      return out;
+   }
+
+   t1 = 3*in_near[0] + in_far[0];
+   out[0] = stbi__div4(t1+2);
+   for (i=1; i < w; ++i) {
+      t0 = t1;
+      t1 = 3*in_near[i]+in_far[i];
+      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
+      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
+   }
+   out[w*2-1] = stbi__div4(t1+2);
+
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+
+#if defined(STBI_SSE2) || defined(STBI_NEON)
+static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate 2x2 samples for every one in input
+   int i=0,t0,t1;
+
+   if (w == 1) {
+      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
+      return out;
+   }
+
+   t1 = 3*in_near[0] + in_far[0];
+   // process groups of 8 pixels for as long as we can.
+   // note we can't handle the last pixel in a row in this loop
+   // because we need to handle the filter boundary conditions.
+   for (; i < ((w-1) & ~7); i += 8) {
+#if defined(STBI_SSE2)
+      // load and perform the vertical filtering pass
+      // this uses 3*x + y = 4*x + (y - x)
+      __m128i zero  = _mm_setzero_si128();
+      __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
+      __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
+      __m128i farw  = _mm_unpacklo_epi8(farb, zero);
+      __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
+      __m128i diff  = _mm_sub_epi16(farw, nearw);
+      __m128i nears = _mm_slli_epi16(nearw, 2);
+      __m128i curr  = _mm_add_epi16(nears, diff); // current row
+
+      // horizontal filter works the same based on shifted vers of current
+      // row. "prev" is current row shifted right by 1 pixel; we need to
+      // insert the previous pixel value (from t1).
+      // "next" is current row shifted left by 1 pixel, with first pixel
+      // of next block of 8 pixels added in.
+      __m128i prv0 = _mm_slli_si128(curr, 2);
+      __m128i nxt0 = _mm_srli_si128(curr, 2);
+      __m128i prev = _mm_insert_epi16(prv0, t1, 0);
+      __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
+
+      // horizontal filter, polyphase implementation since it's convenient:
+      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
+      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
+      // note the shared term.
+      __m128i bias  = _mm_set1_epi16(8);
+      __m128i curs = _mm_slli_epi16(curr, 2);
+      __m128i prvd = _mm_sub_epi16(prev, curr);
+      __m128i nxtd = _mm_sub_epi16(next, curr);
+      __m128i curb = _mm_add_epi16(curs, bias);
+      __m128i even = _mm_add_epi16(prvd, curb);
+      __m128i odd  = _mm_add_epi16(nxtd, curb);
+
+      // interleave even and odd pixels, then undo scaling.
+      __m128i int0 = _mm_unpacklo_epi16(even, odd);
+      __m128i int1 = _mm_unpackhi_epi16(even, odd);
+      __m128i de0  = _mm_srli_epi16(int0, 4);
+      __m128i de1  = _mm_srli_epi16(int1, 4);
+
+      // pack and write output
+      __m128i outv = _mm_packus_epi16(de0, de1);
+      _mm_storeu_si128((__m128i *) (out + i*2), outv);
+#elif defined(STBI_NEON)
+      // load and perform the vertical filtering pass
+      // this uses 3*x + y = 4*x + (y - x)
+      uint8x8_t farb  = vld1_u8(in_far + i);
+      uint8x8_t nearb = vld1_u8(in_near + i);
+      int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
+      int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
+      int16x8_t curr  = vaddq_s16(nears, diff); // current row
+
+      // horizontal filter works the same based on shifted vers of current
+      // row. "prev" is current row shifted right by 1 pixel; we need to
+      // insert the previous pixel value (from t1).
+      // "next" is current row shifted left by 1 pixel, with first pixel
+      // of next block of 8 pixels added in.
+      int16x8_t prv0 = vextq_s16(curr, curr, 7);
+      int16x8_t nxt0 = vextq_s16(curr, curr, 1);
+      int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
+      int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
+
+      // horizontal filter, polyphase implementation since it's convenient:
+      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
+      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
+      // note the shared term.
+      int16x8_t curs = vshlq_n_s16(curr, 2);
+      int16x8_t prvd = vsubq_s16(prev, curr);
+      int16x8_t nxtd = vsubq_s16(next, curr);
+      int16x8_t even = vaddq_s16(curs, prvd);
+      int16x8_t odd  = vaddq_s16(curs, nxtd);
+
+      // undo scaling and round, then store with even/odd phases interleaved
+      uint8x8x2_t o;
+      o.val[0] = vqrshrun_n_s16(even, 4);
+      o.val[1] = vqrshrun_n_s16(odd,  4);
+      vst2_u8(out + i*2, o);
+#endif
+
+      // "previous" value for next iter
+      t1 = 3*in_near[i+7] + in_far[i+7];
+   }
+
+   t0 = t1;
+   t1 = 3*in_near[i] + in_far[i];
+   out[i*2] = stbi__div16(3*t1 + t0 + 8);
+
+   for (++i; i < w; ++i) {
+      t0 = t1;
+      t1 = 3*in_near[i]+in_far[i];
+      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
+      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
+   }
+   out[w*2-1] = stbi__div4(t1+2);
+
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+#endif
+
+static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // resample with nearest-neighbor
+   int i,j;
+   STBI_NOTUSED(in_far);
+   for (i=0; i < w; ++i)
+      for (j=0; j < hs; ++j)
+         out[i*hs+j] = in_near[i];
+   return out;
+}
+
+// this is a reduced-precision calculation of YCbCr-to-RGB introduced
+// to make sure the code produces the same results in both SIMD and scalar
+#define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
+static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
+{
+   int i;
+   for (i=0; i < count; ++i) {
+      int y_fixed = (y[i] << 20) + (1<<19); // rounding
+      int r,g,b;
+      int cr = pcr[i] - 128;
+      int cb = pcb[i] - 128;
+      r = y_fixed +  cr* stbi__float2fixed(1.40200f);
+      g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
+      b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
+      r >>= 20;
+      g >>= 20;
+      b >>= 20;
+      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
+      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
+      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
+      out[0] = (stbi_uc)r;
+      out[1] = (stbi_uc)g;
+      out[2] = (stbi_uc)b;
+      out[3] = 255;
+      out += step;
+   }
+}
+
+#if defined(STBI_SSE2) || defined(STBI_NEON)
+static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
+{
+   int i = 0;
+
+#ifdef STBI_SSE2
+   // step == 3 is pretty ugly on the final interleave, and i'm not convinced
+   // it's useful in practice (you wouldn't use it for textures, for example).
+   // so just accelerate step == 4 case.
+   if (step == 4) {
+      // this is a fairly straightforward implementation and not super-optimized.
+      __m128i signflip  = _mm_set1_epi8(-0x80);
+      __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
+      __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
+      __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
+      __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
+      __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
+      __m128i xw = _mm_set1_epi16(255); // alpha channel
+
+      for (; i+7 < count; i += 8) {
+         // load
+         __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
+         __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
+         __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
+         __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
+         __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
+
+         // unpack to short (and left-shift cr, cb by 8)
+         __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
+         __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
+         __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
+
+         // color transform
+         __m128i yws = _mm_srli_epi16(yw, 4);
+         __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
+         __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
+         __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
+         __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
+         __m128i rws = _mm_add_epi16(cr0, yws);
+         __m128i gwt = _mm_add_epi16(cb0, yws);
+         __m128i bws = _mm_add_epi16(yws, cb1);
+         __m128i gws = _mm_add_epi16(gwt, cr1);
+
+         // descale
+         __m128i rw = _mm_srai_epi16(rws, 4);
+         __m128i bw = _mm_srai_epi16(bws, 4);
+         __m128i gw = _mm_srai_epi16(gws, 4);
+
+         // back to byte, set up for transpose
+         __m128i brb = _mm_packus_epi16(rw, bw);
+         __m128i gxb = _mm_packus_epi16(gw, xw);
+
+         // transpose to interleave channels
+         __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
+         __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
+         __m128i o0 = _mm_unpacklo_epi16(t0, t1);
+         __m128i o1 = _mm_unpackhi_epi16(t0, t1);
+
+         // store
+         _mm_storeu_si128((__m128i *) (out + 0), o0);
+         _mm_storeu_si128((__m128i *) (out + 16), o1);
+         out += 32;
+      }
+   }
+#endif
+
+#ifdef STBI_NEON
+   // in this version, step=3 support would be easy to add. but is there demand?
+   if (step == 4) {
+      // this is a fairly straightforward implementation and not super-optimized.
+      uint8x8_t signflip = vdup_n_u8(0x80);
+      int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
+      int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
+      int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
+      int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
+
+      for (; i+7 < count; i += 8) {
+         // load
+         uint8x8_t y_bytes  = vld1_u8(y + i);
+         uint8x8_t cr_bytes = vld1_u8(pcr + i);
+         uint8x8_t cb_bytes = vld1_u8(pcb + i);
+         int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
+         int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
+
+         // expand to s16
+         int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
+         int16x8_t crw = vshll_n_s8(cr_biased, 7);
+         int16x8_t cbw = vshll_n_s8(cb_biased, 7);
+
+         // color transform
+         int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
+         int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
+         int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
+         int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
+         int16x8_t rws = vaddq_s16(yws, cr0);
+         int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
+         int16x8_t bws = vaddq_s16(yws, cb1);
+
+         // undo scaling, round, convert to byte
+         uint8x8x4_t o;
+         o.val[0] = vqrshrun_n_s16(rws, 4);
+         o.val[1] = vqrshrun_n_s16(gws, 4);
+         o.val[2] = vqrshrun_n_s16(bws, 4);
+         o.val[3] = vdup_n_u8(255);
+
+         // store, interleaving r/g/b/a
+         vst4_u8(out, o);
+         out += 8*4;
+      }
+   }
+#endif
+
+   for (; i < count; ++i) {
+      int y_fixed = (y[i] << 20) + (1<<19); // rounding
+      int r,g,b;
+      int cr = pcr[i] - 128;
+      int cb = pcb[i] - 128;
+      r = y_fixed + cr* stbi__float2fixed(1.40200f);
+      g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
+      b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
+      r >>= 20;
+      g >>= 20;
+      b >>= 20;
+      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
+      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
+      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
+      out[0] = (stbi_uc)r;
+      out[1] = (stbi_uc)g;
+      out[2] = (stbi_uc)b;
+      out[3] = 255;
+      out += step;
+   }
+}
+#endif
+
+// set up the kernels
+static void stbi__setup_jpeg(stbi__jpeg *j)
+{
+   j->idct_block_kernel = stbi__idct_block;
+   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
+   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
+
+#ifdef STBI_SSE2
+   if (stbi__sse2_available()) {
+      j->idct_block_kernel = stbi__idct_simd;
+      j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
+      j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
+   }
+#endif
+
+#ifdef STBI_NEON
+   j->idct_block_kernel = stbi__idct_simd;
+   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
+   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
+#endif
+}
+
+// clean up the temporary component buffers
+static void stbi__cleanup_jpeg(stbi__jpeg *j)
+{
+   stbi__free_jpeg_components(j, j->s->img_n, 0);
+}
+
+typedef struct
+{
+   resample_row_func resample;
+   stbi_uc *line0,*line1;
+   int hs,vs;   // expansion factor in each axis
+   int w_lores; // horizontal pixels pre-expansion
+   int ystep;   // how far through vertical expansion we are
+   int ypos;    // which pre-expansion row we're on
+} stbi__resample;
+
+// fast 0..255 * 0..255 => 0..255 rounded multiplication
+static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
+{
+   unsigned int t = x*y + 128;
+   return (stbi_uc) ((t + (t >>8)) >> 8);
+}
+
+static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
+{
+   int n, decode_n, is_rgb;
+   z->s->img_n = 0; // make stbi__cleanup_jpeg safe
+
+   // validate req_comp
+   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
+
+   // load a jpeg image from whichever source, but leave in YCbCr format
+   if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
+
+   // determine actual number of components to generate
+   n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
+
+   is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
+
+   if (z->s->img_n == 3 && n < 3 && !is_rgb)
+      decode_n = 1;
+   else
+      decode_n = z->s->img_n;
+
+   // nothing to do if no components requested; check this now to avoid
+   // accessing uninitialized coutput[0] later
+   if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
+
+   // resample and color-convert
+   {
+      int k;
+      unsigned int i,j;
+      stbi_uc *output;
+      stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
+
+      stbi__resample res_comp[4];
+
+      for (k=0; k < decode_n; ++k) {
+         stbi__resample *r = &res_comp[k];
+
+         // allocate line buffer big enough for upsampling off the edges
+         // with upsample factor of 4
+         z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
+         if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
+
+         r->hs      = z->img_h_max / z->img_comp[k].h;
+         r->vs      = z->img_v_max / z->img_comp[k].v;
+         r->ystep   = r->vs >> 1;
+         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
+         r->ypos    = 0;
+         r->line0   = r->line1 = z->img_comp[k].data;
+
+         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
+         else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
+         else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
+         else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
+         else                               r->resample = stbi__resample_row_generic;
+      }
+
+      // can't error after this so, this is safe
+      output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
+      if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
+
+      // now go ahead and resample
+      for (j=0; j < z->s->img_y; ++j) {
+         stbi_uc *out = output + n * z->s->img_x * j;
+         for (k=0; k < decode_n; ++k) {
+            stbi__resample *r = &res_comp[k];
+            int y_bot = r->ystep >= (r->vs >> 1);
+            coutput[k] = r->resample(z->img_comp[k].linebuf,
+                                     y_bot ? r->line1 : r->line0,
+                                     y_bot ? r->line0 : r->line1,
+                                     r->w_lores, r->hs);
+            if (++r->ystep >= r->vs) {
+               r->ystep = 0;
+               r->line0 = r->line1;
+               if (++r->ypos < z->img_comp[k].y)
+                  r->line1 += z->img_comp[k].w2;
+            }
+         }
+         if (n >= 3) {
+            stbi_uc *y = coutput[0];
+            if (z->s->img_n == 3) {
+               if (is_rgb) {
+                  for (i=0; i < z->s->img_x; ++i) {
+                     out[0] = y[i];
+                     out[1] = coutput[1][i];
+                     out[2] = coutput[2][i];
+                     out[3] = 255;
+                     out += n;
+                  }
+               } else {
+                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
+               }
+            } else if (z->s->img_n == 4) {
+               if (z->app14_color_transform == 0) { // CMYK
+                  for (i=0; i < z->s->img_x; ++i) {
+                     stbi_uc m = coutput[3][i];
+                     out[0] = stbi__blinn_8x8(coutput[0][i], m);
+                     out[1] = stbi__blinn_8x8(coutput[1][i], m);
+                     out[2] = stbi__blinn_8x8(coutput[2][i], m);
+                     out[3] = 255;
+                     out += n;
+                  }
+               } else if (z->app14_color_transform == 2) { // YCCK
+                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
+                  for (i=0; i < z->s->img_x; ++i) {
+                     stbi_uc m = coutput[3][i];
+                     out[0] = stbi__blinn_8x8(255 - out[0], m);
+                     out[1] = stbi__blinn_8x8(255 - out[1], m);
+                     out[2] = stbi__blinn_8x8(255 - out[2], m);
+                     out += n;
+                  }
+               } else { // YCbCr + alpha?  Ignore the fourth channel for now
+                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
+               }
+            } else
+               for (i=0; i < z->s->img_x; ++i) {
+                  out[0] = out[1] = out[2] = y[i];
+                  out[3] = 255; // not used if n==3
+                  out += n;
+               }
+         } else {
+            if (is_rgb) {
+               if (n == 1)
+                  for (i=0; i < z->s->img_x; ++i)
+                     *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
+               else {
+                  for (i=0; i < z->s->img_x; ++i, out += 2) {
+                     out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
+                     out[1] = 255;
+                  }
+               }
+            } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
+               for (i=0; i < z->s->img_x; ++i) {
+                  stbi_uc m = coutput[3][i];
+                  stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
+                  stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
+                  stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
+                  out[0] = stbi__compute_y(r, g, b);
+                  out[1] = 255;
+                  out += n;
+               }
+            } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
+               for (i=0; i < z->s->img_x; ++i) {
+                  out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
+                  out[1] = 255;
+                  out += n;
+               }
+            } else {
+               stbi_uc *y = coutput[0];
+               if (n == 1)
+                  for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
+               else
+                  for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
+            }
+         }
+      }
+      stbi__cleanup_jpeg(z);
+      *out_x = z->s->img_x;
+      *out_y = z->s->img_y;
+      if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
+      return output;
+   }
+}
+
+static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   unsigned char* result;
+   stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
+   if (!j) return stbi__errpuc("outofmem", "Out of memory");
+   memset(j, 0, sizeof(stbi__jpeg));
+   STBI_NOTUSED(ri);
+   j->s = s;
+   stbi__setup_jpeg(j);
+   result = load_jpeg_image(j, x,y,comp,req_comp);
+   STBI_FREE(j);
+   return result;
+}
+
+static int stbi__jpeg_test(stbi__context *s)
+{
+   int r;
+   stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
+   if (!j) return stbi__err("outofmem", "Out of memory");
+   memset(j, 0, sizeof(stbi__jpeg));
+   j->s = s;
+   stbi__setup_jpeg(j);
+   r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
+   stbi__rewind(s);
+   STBI_FREE(j);
+   return r;
+}
+
+static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
+{
+   if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
+      stbi__rewind( j->s );
+      return 0;
+   }
+   if (x) *x = j->s->img_x;
+   if (y) *y = j->s->img_y;
+   if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
+   return 1;
+}
+
+static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int result;
+   stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
+   if (!j) return stbi__err("outofmem", "Out of memory");
+   memset(j, 0, sizeof(stbi__jpeg));
+   j->s = s;
+   result = stbi__jpeg_info_raw(j, x, y, comp);
+   STBI_FREE(j);
+   return result;
+}
+#endif
+
+// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
+//    simple implementation
+//      - all input must be provided in an upfront buffer
+//      - all output is written to a single output buffer (can malloc/realloc)
+//    performance
+//      - fast huffman
+
+#ifndef STBI_NO_ZLIB
+
+// fast-way is faster to check than jpeg huffman, but slow way is slower
+#define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
+#define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
+#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
+
+// zlib-style huffman encoding
+// (jpegs packs from left, zlib from right, so can't share code)
+typedef struct
+{
+   stbi__uint16 fast[1 << STBI__ZFAST_BITS];
+   stbi__uint16 firstcode[16];
+   int maxcode[17];
+   stbi__uint16 firstsymbol[16];
+   stbi_uc  size[STBI__ZNSYMS];
+   stbi__uint16 value[STBI__ZNSYMS];
+} stbi__zhuffman;
+
+stbi_inline static int stbi__bitreverse16(int n)
+{
+  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
+  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
+  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
+  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
+  return n;
+}
+
+stbi_inline static int stbi__bit_reverse(int v, int bits)
+{
+   STBI_ASSERT(bits <= 16);
+   // to bit reverse n bits, reverse 16 and shift
+   // e.g. 11 bits, bit reverse and shift away 5
+   return stbi__bitreverse16(v) >> (16-bits);
+}
+
+static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
+{
+   int i,k=0;
+   int code, next_code[16], sizes[17];
+
+   // DEFLATE spec for generating codes
+   memset(sizes, 0, sizeof(sizes));
+   memset(z->fast, 0, sizeof(z->fast));
+   for (i=0; i < num; ++i)
+      ++sizes[sizelist[i]];
+   sizes[0] = 0;
+   for (i=1; i < 16; ++i)
+      if (sizes[i] > (1 << i))
+         return stbi__err("bad sizes", "Corrupt PNG");
+   code = 0;
+   for (i=1; i < 16; ++i) {
+      next_code[i] = code;
+      z->firstcode[i] = (stbi__uint16) code;
+      z->firstsymbol[i] = (stbi__uint16) k;
+      code = (code + sizes[i]);
+      if (sizes[i])
+         if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
+      z->maxcode[i] = code << (16-i); // preshift for inner loop
+      code <<= 1;
+      k += sizes[i];
+   }
+   z->maxcode[16] = 0x10000; // sentinel
+   for (i=0; i < num; ++i) {
+      int s = sizelist[i];
+      if (s) {
+         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
+         stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
+         z->size [c] = (stbi_uc     ) s;
+         z->value[c] = (stbi__uint16) i;
+         if (s <= STBI__ZFAST_BITS) {
+            int j = stbi__bit_reverse(next_code[s],s);
+            while (j < (1 << STBI__ZFAST_BITS)) {
+               z->fast[j] = fastv;
+               j += (1 << s);
+            }
+         }
+         ++next_code[s];
+      }
+   }
+   return 1;
+}
+
+// zlib-from-memory implementation for PNG reading
+//    because PNG allows splitting the zlib stream arbitrarily,
+//    and it's annoying structurally to have PNG call ZLIB call PNG,
+//    we require PNG read all the IDATs and combine them into a single
+//    memory buffer
+
+typedef struct
+{
+   stbi_uc *zbuffer, *zbuffer_end;
+   int num_bits;
+   int hit_zeof_once;
+   stbi__uint32 code_buffer;
+
+   char *zout;
+   char *zout_start;
+   char *zout_end;
+   int   z_expandable;
+
+   stbi__zhuffman z_length, z_distance;
+} stbi__zbuf;
+
+stbi_inline static int stbi__zeof(stbi__zbuf *z)
+{
+   return (z->zbuffer >= z->zbuffer_end);
+}
+
+stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
+{
+   return stbi__zeof(z) ? 0 : *z->zbuffer++;
+}
+
+static void stbi__fill_bits(stbi__zbuf *z)
+{
+   do {
+      if (z->code_buffer >= (1U << z->num_bits)) {
+        z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
+        return;
+      }
+      z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
+      z->num_bits += 8;
+   } while (z->num_bits <= 24);
+}
+
+stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
+{
+   unsigned int k;
+   if (z->num_bits < n) stbi__fill_bits(z);
+   k = z->code_buffer & ((1 << n) - 1);
+   z->code_buffer >>= n;
+   z->num_bits -= n;
+   return k;
+}
+
+static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
+{
+   int b,s,k;
+   // not resolved by fast table, so compute it the slow way
+   // use jpeg approach, which requires MSbits at top
+   k = stbi__bit_reverse(a->code_buffer, 16);
+   for (s=STBI__ZFAST_BITS+1; ; ++s)
+      if (k < z->maxcode[s])
+         break;
+   if (s >= 16) return -1; // invalid code!
+   // code size is s, so:
+   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
+   if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
+   if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
+   a->code_buffer >>= s;
+   a->num_bits -= s;
+   return z->value[b];
+}
+
+stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
+{
+   int b,s;
+   if (a->num_bits < 16) {
+      if (stbi__zeof(a)) {
+         if (!a->hit_zeof_once) {
+            // This is the first time we hit eof, insert 16 extra padding btis
+            // to allow us to keep going; if we actually consume any of them
+            // though, that is invalid data. This is caught later.
+            a->hit_zeof_once = 1;
+            a->num_bits += 16; // add 16 implicit zero bits
+         } else {
+            // We already inserted our extra 16 padding bits and are again
+            // out, this stream is actually prematurely terminated.
+            return -1;
+         }
+      } else {
+         stbi__fill_bits(a);
+      }
+   }
+   b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
+   if (b) {
+      s = b >> 9;
+      a->code_buffer >>= s;
+      a->num_bits -= s;
+      return b & 511;
+   }
+   return stbi__zhuffman_decode_slowpath(a, z);
+}
+
+static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
+{
+   char *q;
+   unsigned int cur, limit, old_limit;
+   z->zout = zout;
+   if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
+   cur   = (unsigned int) (z->zout - z->zout_start);
+   limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
+   if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
+   while (cur + n > limit) {
+      if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
+      limit *= 2;
+   }
+   q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
+   STBI_NOTUSED(old_limit);
+   if (q == NULL) return stbi__err("outofmem", "Out of memory");
+   z->zout_start = q;
+   z->zout       = q + cur;
+   z->zout_end   = q + limit;
+   return 1;
+}
+
+static const int stbi__zlength_base[31] = {
+   3,4,5,6,7,8,9,10,11,13,
+   15,17,19,23,27,31,35,43,51,59,
+   67,83,99,115,131,163,195,227,258,0,0 };
+
+static const int stbi__zlength_extra[31]=
+{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
+
+static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
+257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
+
+static const int stbi__zdist_extra[32] =
+{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
+
+static int stbi__parse_huffman_block(stbi__zbuf *a)
+{
+   char *zout = a->zout;
+   for(;;) {
+      int z = stbi__zhuffman_decode(a, &a->z_length);
+      if (z < 256) {
+         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
+         if (zout >= a->zout_end) {
+            if (!stbi__zexpand(a, zout, 1)) return 0;
+            zout = a->zout;
+         }
+         *zout++ = (char) z;
+      } else {
+         stbi_uc *p;
+         int len,dist;
+         if (z == 256) {
+            a->zout = zout;
+            if (a->hit_zeof_once && a->num_bits < 16) {
+               // The first time we hit zeof, we inserted 16 extra zero bits into our bit
+               // buffer so the decoder can just do its speculative decoding. But if we
+               // actually consumed any of those bits (which is the case when num_bits < 16),
+               // the stream actually read past the end so it is malformed.
+               return stbi__err("unexpected end","Corrupt PNG");
+            }
+            return 1;
+         }
+         if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data
+         z -= 257;
+         len = stbi__zlength_base[z];
+         if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
+         z = stbi__zhuffman_decode(a, &a->z_distance);
+         if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data
+         dist = stbi__zdist_base[z];
+         if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
+         if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
+         if (len > a->zout_end - zout) {
+            if (!stbi__zexpand(a, zout, len)) return 0;
+            zout = a->zout;
+         }
+         p = (stbi_uc *) (zout - dist);
+         if (dist == 1) { // run of one byte; common in images.
+            stbi_uc v = *p;
+            if (len) { do *zout++ = v; while (--len); }
+         } else {
+            if (len) { do *zout++ = *p++; while (--len); }
+         }
+      }
+   }
+}
+
+static int stbi__compute_huffman_codes(stbi__zbuf *a)
+{
+   static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
+   stbi__zhuffman z_codelength;
+   stbi_uc lencodes[286+32+137];//padding for maximum single op
+   stbi_uc codelength_sizes[19];
+   int i,n;
+
+   int hlit  = stbi__zreceive(a,5) + 257;
+   int hdist = stbi__zreceive(a,5) + 1;
+   int hclen = stbi__zreceive(a,4) + 4;
+   int ntot  = hlit + hdist;
+
+   memset(codelength_sizes, 0, sizeof(codelength_sizes));
+   for (i=0; i < hclen; ++i) {
+      int s = stbi__zreceive(a,3);
+      codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
+   }
+   if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
+
+   n = 0;
+   while (n < ntot) {
+      int c = stbi__zhuffman_decode(a, &z_codelength);
+      if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
+      if (c < 16)
+         lencodes[n++] = (stbi_uc) c;
+      else {
+         stbi_uc fill = 0;
+         if (c == 16) {
+            c = stbi__zreceive(a,2)+3;
+            if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
+            fill = lencodes[n-1];
+         } else if (c == 17) {
+            c = stbi__zreceive(a,3)+3;
+         } else if (c == 18) {
+            c = stbi__zreceive(a,7)+11;
+         } else {
+            return stbi__err("bad codelengths", "Corrupt PNG");
+         }
+         if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
+         memset(lencodes+n, fill, c);
+         n += c;
+      }
+   }
+   if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
+   if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
+   if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
+   return 1;
+}
+
+static int stbi__parse_uncompressed_block(stbi__zbuf *a)
+{
+   stbi_uc header[4];
+   int len,nlen,k;
+   if (a->num_bits & 7)
+      stbi__zreceive(a, a->num_bits & 7); // discard
+   // drain the bit-packed data into header
+   k = 0;
+   while (a->num_bits > 0) {
+      header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
+      a->code_buffer >>= 8;
+      a->num_bits -= 8;
+   }
+   if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
+   // now fill header the normal way
+   while (k < 4)
+      header[k++] = stbi__zget8(a);
+   len  = header[1] * 256 + header[0];
+   nlen = header[3] * 256 + header[2];
+   if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
+   if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
+   if (a->zout + len > a->zout_end)
+      if (!stbi__zexpand(a, a->zout, len)) return 0;
+   memcpy(a->zout, a->zbuffer, len);
+   a->zbuffer += len;
+   a->zout += len;
+   return 1;
+}
+
+static int stbi__parse_zlib_header(stbi__zbuf *a)
+{
+   int cmf   = stbi__zget8(a);
+   int cm    = cmf & 15;
+   /* int cinfo = cmf >> 4; */
+   int flg   = stbi__zget8(a);
+   if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
+   if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
+   if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
+   if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
+   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
+   return 1;
+}
+
+static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
+{
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
+};
+static const stbi_uc stbi__zdefault_distance[32] =
+{
+   5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
+};
+/*
+Init algorithm:
+{
+   int i;   // use <= to match clearly with spec
+   for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
+   for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
+   for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
+   for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
+
+   for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
+}
+*/
+
+static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
+{
+   int final, type;
+   if (parse_header)
+      if (!stbi__parse_zlib_header(a)) return 0;
+   a->num_bits = 0;
+   a->code_buffer = 0;
+   a->hit_zeof_once = 0;
+   do {
+      final = stbi__zreceive(a,1);
+      type = stbi__zreceive(a,2);
+      if (type == 0) {
+         if (!stbi__parse_uncompressed_block(a)) return 0;
+      } else if (type == 3) {
+         return 0;
+      } else {
+         if (type == 1) {
+            // use fixed code lengths
+            if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
+            if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
+         } else {
+            if (!stbi__compute_huffman_codes(a)) return 0;
+         }
+         if (!stbi__parse_huffman_block(a)) return 0;
+      }
+   } while (!final);
+   return 1;
+}
+
+static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
+{
+   a->zout_start = obuf;
+   a->zout       = obuf;
+   a->zout_end   = obuf + olen;
+   a->z_expandable = exp;
+
+   return stbi__parse_zlib(a, parse_header);
+}
+
+STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
+{
+   stbi__zbuf a;
+   char *p = (char *) stbi__malloc(initial_size);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi_uc *) buffer;
+   a.zbuffer_end = (stbi_uc *) buffer + len;
+   if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      STBI_FREE(a.zout_start);
+      return NULL;
+   }
+}
+
+STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
+{
+   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
+}
+
+STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
+{
+   stbi__zbuf a;
+   char *p = (char *) stbi__malloc(initial_size);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi_uc *) buffer;
+   a.zbuffer_end = (stbi_uc *) buffer + len;
+   if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      STBI_FREE(a.zout_start);
+      return NULL;
+   }
+}
+
+STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
+{
+   stbi__zbuf a;
+   a.zbuffer = (stbi_uc *) ibuffer;
+   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
+   if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
+      return (int) (a.zout - a.zout_start);
+   else
+      return -1;
+}
+
+STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
+{
+   stbi__zbuf a;
+   char *p = (char *) stbi__malloc(16384);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi_uc *) buffer;
+   a.zbuffer_end = (stbi_uc *) buffer+len;
+   if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      STBI_FREE(a.zout_start);
+      return NULL;
+   }
+}
+
+STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
+{
+   stbi__zbuf a;
+   a.zbuffer = (stbi_uc *) ibuffer;
+   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
+   if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
+      return (int) (a.zout - a.zout_start);
+   else
+      return -1;
+}
+#endif
+
+// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
+//    simple implementation
+//      - only 8-bit samples
+//      - no CRC checking
+//      - allocates lots of intermediate memory
+//        - avoids problem of streaming data between subsystems
+//        - avoids explicit window management
+//    performance
+//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
+
+#ifndef STBI_NO_PNG
+typedef struct
+{
+   stbi__uint32 length;
+   stbi__uint32 type;
+} stbi__pngchunk;
+
+static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
+{
+   stbi__pngchunk c;
+   c.length = stbi__get32be(s);
+   c.type   = stbi__get32be(s);
+   return c;
+}
+
+static int stbi__check_png_header(stbi__context *s)
+{
+   static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
+   int i;
+   for (i=0; i < 8; ++i)
+      if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
+   return 1;
+}
+
+typedef struct
+{
+   stbi__context *s;
+   stbi_uc *idata, *expanded, *out;
+   int depth;
+} stbi__png;
+
+
+enum {
+   STBI__F_none=0,
+   STBI__F_sub=1,
+   STBI__F_up=2,
+   STBI__F_avg=3,
+   STBI__F_paeth=4,
+   // synthetic filter used for first scanline to avoid needing a dummy row of 0s
+   STBI__F_avg_first
+};
+
+static stbi_uc first_row_filter[5] =
+{
+   STBI__F_none,
+   STBI__F_sub,
+   STBI__F_none,
+   STBI__F_avg_first,
+   STBI__F_sub // Paeth with b=c=0 turns out to be equivalent to sub
+};
+
+static int stbi__paeth(int a, int b, int c)
+{
+   // This formulation looks very different from the reference in the PNG spec, but is
+   // actually equivalent and has favorable data dependencies and admits straightforward
+   // generation of branch-free code, which helps performance significantly.
+   int thresh = c*3 - (a + b);
+   int lo = a < b ? a : b;
+   int hi = a < b ? b : a;
+   int t0 = (hi <= thresh) ? lo : c;
+   int t1 = (thresh <= lo) ? hi : t0;
+   return t1;
+}
+
+static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
+
+// adds an extra all-255 alpha channel
+// dest == src is legal
+// img_n must be 1 or 3
+static void stbi__create_png_alpha_expand8(stbi_uc *dest, stbi_uc *src, stbi__uint32 x, int img_n)
+{
+   int i;
+   // must process data backwards since we allow dest==src
+   if (img_n == 1) {
+      for (i=x-1; i >= 0; --i) {
+         dest[i*2+1] = 255;
+         dest[i*2+0] = src[i];
+      }
+   } else {
+      STBI_ASSERT(img_n == 3);
+      for (i=x-1; i >= 0; --i) {
+         dest[i*4+3] = 255;
+         dest[i*4+2] = src[i*3+2];
+         dest[i*4+1] = src[i*3+1];
+         dest[i*4+0] = src[i*3+0];
+      }
+   }
+}
+
+// create the png data from post-deflated data
+static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
+{
+   int bytes = (depth == 16 ? 2 : 1);
+   stbi__context *s = a->s;
+   stbi__uint32 i,j,stride = x*out_n*bytes;
+   stbi__uint32 img_len, img_width_bytes;
+   stbi_uc *filter_buf;
+   int all_ok = 1;
+   int k;
+   int img_n = s->img_n; // copy it into a local for later
+
+   int output_bytes = out_n*bytes;
+   int filter_bytes = img_n*bytes;
+   int width = x;
+
+   STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
+   a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
+   if (!a->out) return stbi__err("outofmem", "Out of memory");
+
+   // note: error exits here don't need to clean up a->out individually,
+   // stbi__do_png always does on error.
+   if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
+   img_width_bytes = (((img_n * x * depth) + 7) >> 3);
+   if (!stbi__mad2sizes_valid(img_width_bytes, y, img_width_bytes)) return stbi__err("too large", "Corrupt PNG");
+   img_len = (img_width_bytes + 1) * y;
+
+   // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
+   // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
+   // so just check for raw_len < img_len always.
+   if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
+
+   // Allocate two scan lines worth of filter workspace buffer.
+   filter_buf = (stbi_uc *) stbi__malloc_mad2(img_width_bytes, 2, 0);
+   if (!filter_buf) return stbi__err("outofmem", "Out of memory");
+
+   // Filtering for low-bit-depth images
+   if (depth < 8) {
+      filter_bytes = 1;
+      width = img_width_bytes;
+   }
+
+   for (j=0; j < y; ++j) {
+      // cur/prior filter buffers alternate
+      stbi_uc *cur = filter_buf + (j & 1)*img_width_bytes;
+      stbi_uc *prior = filter_buf + (~j & 1)*img_width_bytes;
+      stbi_uc *dest = a->out + stride*j;
+      int nk = width * filter_bytes;
+      int filter = *raw++;
+
+      // check filter type
+      if (filter > 4) {
+         all_ok = stbi__err("invalid filter","Corrupt PNG");
+         break;
+      }
+
+      // if first row, use special filter that doesn't sample previous row
+      if (j == 0) filter = first_row_filter[filter];
+
+      // perform actual filtering
+      switch (filter) {
+      case STBI__F_none:
+         memcpy(cur, raw, nk);
+         break;
+      case STBI__F_sub:
+         memcpy(cur, raw, filter_bytes);
+         for (k = filter_bytes; k < nk; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]);
+         break;
+      case STBI__F_up:
+         for (k = 0; k < nk; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
+         break;
+      case STBI__F_avg:
+         for (k = 0; k < filter_bytes; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1));
+         for (k = filter_bytes; k < nk; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1));
+         break;
+      case STBI__F_paeth:
+         for (k = 0; k < filter_bytes; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + prior[k]); // prior[k] == stbi__paeth(0,prior[k],0)
+         for (k = filter_bytes; k < nk; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes], prior[k], prior[k-filter_bytes]));
+         break;
+      case STBI__F_avg_first:
+         memcpy(cur, raw, filter_bytes);
+         for (k = filter_bytes; k < nk; ++k)
+            cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1));
+         break;
+      }
+
+      raw += nk;
+
+      // expand decoded bits in cur to dest, also adding an extra alpha channel if desired
+      if (depth < 8) {
+         stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
+         stbi_uc *in = cur;
+         stbi_uc *out = dest;
+         stbi_uc inb = 0;
+         stbi__uint32 nsmp = x*img_n;
+
+         // expand bits to bytes first
+         if (depth == 4) {
+            for (i=0; i < nsmp; ++i) {
+               if ((i & 1) == 0) inb = *in++;
+               *out++ = scale * (inb >> 4);
+               inb <<= 4;
+            }
+         } else if (depth == 2) {
+            for (i=0; i < nsmp; ++i) {
+               if ((i & 3) == 0) inb = *in++;
+               *out++ = scale * (inb >> 6);
+               inb <<= 2;
+            }
+         } else {
+            STBI_ASSERT(depth == 1);
+            for (i=0; i < nsmp; ++i) {
+               if ((i & 7) == 0) inb = *in++;
+               *out++ = scale * (inb >> 7);
+               inb <<= 1;
+            }
+         }
+
+         // insert alpha=255 values if desired
+         if (img_n != out_n)
+            stbi__create_png_alpha_expand8(dest, dest, x, img_n);
+      } else if (depth == 8) {
+         if (img_n == out_n)
+            memcpy(dest, cur, x*img_n);
+         else
+            stbi__create_png_alpha_expand8(dest, cur, x, img_n);
+      } else if (depth == 16) {
+         // convert the image data from big-endian to platform-native
+         stbi__uint16 *dest16 = (stbi__uint16*)dest;
+         stbi__uint32 nsmp = x*img_n;
+
+         if (img_n == out_n) {
+            for (i = 0; i < nsmp; ++i, ++dest16, cur += 2)
+               *dest16 = (cur[0] << 8) | cur[1];
+         } else {
+            STBI_ASSERT(img_n+1 == out_n);
+            if (img_n == 1) {
+               for (i = 0; i < x; ++i, dest16 += 2, cur += 2) {
+                  dest16[0] = (cur[0] << 8) | cur[1];
+                  dest16[1] = 0xffff;
+               }
+            } else {
+               STBI_ASSERT(img_n == 3);
+               for (i = 0; i < x; ++i, dest16 += 4, cur += 6) {
+                  dest16[0] = (cur[0] << 8) | cur[1];
+                  dest16[1] = (cur[2] << 8) | cur[3];
+                  dest16[2] = (cur[4] << 8) | cur[5];
+                  dest16[3] = 0xffff;
+               }
+            }
+         }
+      }
+   }
+
+   STBI_FREE(filter_buf);
+   if (!all_ok) return 0;
+
+   return 1;
+}
+
+static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
+{
+   int bytes = (depth == 16 ? 2 : 1);
+   int out_bytes = out_n * bytes;
+   stbi_uc *final;
+   int p;
+   if (!interlaced)
+      return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
+
+   // de-interlacing
+   final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
+   if (!final) return stbi__err("outofmem", "Out of memory");
+   for (p=0; p < 7; ++p) {
+      int xorig[] = { 0,4,0,2,0,1,0 };
+      int yorig[] = { 0,0,4,0,2,0,1 };
+      int xspc[]  = { 8,8,4,4,2,2,1 };
+      int yspc[]  = { 8,8,8,4,4,2,2 };
+      int i,j,x,y;
+      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
+      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
+      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
+      if (x && y) {
+         stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
+         if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
+            STBI_FREE(final);
+            return 0;
+         }
+         for (j=0; j < y; ++j) {
+            for (i=0; i < x; ++i) {
+               int out_y = j*yspc[p]+yorig[p];
+               int out_x = i*xspc[p]+xorig[p];
+               memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
+                      a->out + (j*x+i)*out_bytes, out_bytes);
+            }
+         }
+         STBI_FREE(a->out);
+         image_data += img_len;
+         image_data_len -= img_len;
+      }
+   }
+   a->out = final;
+
+   return 1;
+}
+
+static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
+{
+   stbi__context *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi_uc *p = z->out;
+
+   // compute color-based transparency, assuming we've
+   // already got 255 as the alpha value in the output
+   STBI_ASSERT(out_n == 2 || out_n == 4);
+
+   if (out_n == 2) {
+      for (i=0; i < pixel_count; ++i) {
+         p[1] = (p[0] == tc[0] ? 0 : 255);
+         p += 2;
+      }
+   } else {
+      for (i=0; i < pixel_count; ++i) {
+         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
+            p[3] = 0;
+         p += 4;
+      }
+   }
+   return 1;
+}
+
+static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
+{
+   stbi__context *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi__uint16 *p = (stbi__uint16*) z->out;
+
+   // compute color-based transparency, assuming we've
+   // already got 65535 as the alpha value in the output
+   STBI_ASSERT(out_n == 2 || out_n == 4);
+
+   if (out_n == 2) {
+      for (i = 0; i < pixel_count; ++i) {
+         p[1] = (p[0] == tc[0] ? 0 : 65535);
+         p += 2;
+      }
+   } else {
+      for (i = 0; i < pixel_count; ++i) {
+         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
+            p[3] = 0;
+         p += 4;
+      }
+   }
+   return 1;
+}
+
+static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
+{
+   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
+   stbi_uc *p, *temp_out, *orig = a->out;
+
+   p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
+   if (p == NULL) return stbi__err("outofmem", "Out of memory");
+
+   // between here and free(out) below, exitting would leak
+   temp_out = p;
+
+   if (pal_img_n == 3) {
+      for (i=0; i < pixel_count; ++i) {
+         int n = orig[i]*4;
+         p[0] = palette[n  ];
+         p[1] = palette[n+1];
+         p[2] = palette[n+2];
+         p += 3;
+      }
+   } else {
+      for (i=0; i < pixel_count; ++i) {
+         int n = orig[i]*4;
+         p[0] = palette[n  ];
+         p[1] = palette[n+1];
+         p[2] = palette[n+2];
+         p[3] = palette[n+3];
+         p += 4;
+      }
+   }
+   STBI_FREE(a->out);
+   a->out = temp_out;
+
+   STBI_NOTUSED(len);
+
+   return 1;
+}
+
+static int stbi__unpremultiply_on_load_global = 0;
+static int stbi__de_iphone_flag_global = 0;
+
+STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
+{
+   stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
+}
+
+STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
+{
+   stbi__de_iphone_flag_global = flag_true_if_should_convert;
+}
+
+#ifndef STBI_THREAD_LOCAL
+#define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
+#define stbi__de_iphone_flag  stbi__de_iphone_flag_global
+#else
+static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
+static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
+
+STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
+{
+   stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
+   stbi__unpremultiply_on_load_set = 1;
+}
+
+STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
+{
+   stbi__de_iphone_flag_local = flag_true_if_should_convert;
+   stbi__de_iphone_flag_set = 1;
+}
+
+#define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
+                                       ? stbi__unpremultiply_on_load_local      \
+                                       : stbi__unpremultiply_on_load_global)
+#define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
+                                ? stbi__de_iphone_flag_local                    \
+                                : stbi__de_iphone_flag_global)
+#endif // STBI_THREAD_LOCAL
+
+static void stbi__de_iphone(stbi__png *z)
+{
+   stbi__context *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi_uc *p = z->out;
+
+   if (s->img_out_n == 3) {  // convert bgr to rgb
+      for (i=0; i < pixel_count; ++i) {
+         stbi_uc t = p[0];
+         p[0] = p[2];
+         p[2] = t;
+         p += 3;
+      }
+   } else {
+      STBI_ASSERT(s->img_out_n == 4);
+      if (stbi__unpremultiply_on_load) {
+         // convert bgr to rgb and unpremultiply
+         for (i=0; i < pixel_count; ++i) {
+            stbi_uc a = p[3];
+            stbi_uc t = p[0];
+            if (a) {
+               stbi_uc half = a / 2;
+               p[0] = (p[2] * 255 + half) / a;
+               p[1] = (p[1] * 255 + half) / a;
+               p[2] = ( t   * 255 + half) / a;
+            } else {
+               p[0] = p[2];
+               p[2] = t;
+            }
+            p += 4;
+         }
+      } else {
+         // convert bgr to rgb
+         for (i=0; i < pixel_count; ++i) {
+            stbi_uc t = p[0];
+            p[0] = p[2];
+            p[2] = t;
+            p += 4;
+         }
+      }
+   }
+}
+
+#define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
+
+static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
+{
+   stbi_uc palette[1024], pal_img_n=0;
+   stbi_uc has_trans=0, tc[3]={0};
+   stbi__uint16 tc16[3];
+   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
+   int first=1,k,interlace=0, color=0, is_iphone=0;
+   stbi__context *s = z->s;
+
+   z->expanded = NULL;
+   z->idata = NULL;
+   z->out = NULL;
+
+   if (!stbi__check_png_header(s)) return 0;
+
+   if (scan == STBI__SCAN_type) return 1;
+
+   for (;;) {
+      stbi__pngchunk c = stbi__get_chunk_header(s);
+      switch (c.type) {
+         case STBI__PNG_TYPE('C','g','B','I'):
+            is_iphone = 1;
+            stbi__skip(s, c.length);
+            break;
+         case STBI__PNG_TYPE('I','H','D','R'): {
+            int comp,filter;
+            if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
+            first = 0;
+            if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
+            s->img_x = stbi__get32be(s);
+            s->img_y = stbi__get32be(s);
+            if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+            if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+            z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
+            color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
+            if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
+            if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
+            comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
+            filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
+            interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
+            if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
+            if (!pal_img_n) {
+               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
+               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
+            } else {
+               // if paletted, then pal_n is our final components, and
+               // img_n is # components to decompress/filter.
+               s->img_n = 1;
+               if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
+            }
+            // even with SCAN_header, have to scan to see if we have a tRNS
+            break;
+         }
+
+         case STBI__PNG_TYPE('P','L','T','E'):  {
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
+            pal_len = c.length / 3;
+            if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
+            for (i=0; i < pal_len; ++i) {
+               palette[i*4+0] = stbi__get8(s);
+               palette[i*4+1] = stbi__get8(s);
+               palette[i*4+2] = stbi__get8(s);
+               palette[i*4+3] = 255;
+            }
+            break;
+         }
+
+         case STBI__PNG_TYPE('t','R','N','S'): {
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
+            if (pal_img_n) {
+               if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
+               if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
+               if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
+               pal_img_n = 4;
+               for (i=0; i < c.length; ++i)
+                  palette[i*4+3] = stbi__get8(s);
+            } else {
+               if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
+               if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
+               has_trans = 1;
+               // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now.
+               if (scan == STBI__SCAN_header) { ++s->img_n; return 1; }
+               if (z->depth == 16) {
+                  for (k = 0; k < s->img_n && k < 3; ++k) // extra loop test to suppress false GCC warning
+                     tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
+               } else {
+                  for (k = 0; k < s->img_n && k < 3; ++k)
+                     tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
+               }
+            }
+            break;
+         }
+
+         case STBI__PNG_TYPE('I','D','A','T'): {
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
+            if (scan == STBI__SCAN_header) {
+               // header scan definitely stops at first IDAT
+               if (pal_img_n)
+                  s->img_n = pal_img_n;
+               return 1;
+            }
+            if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes");
+            if ((int)(ioff + c.length) < (int)ioff) return 0;
+            if (ioff + c.length > idata_limit) {
+               stbi__uint32 idata_limit_old = idata_limit;
+               stbi_uc *p;
+               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
+               while (ioff + c.length > idata_limit)
+                  idata_limit *= 2;
+               STBI_NOTUSED(idata_limit_old);
+               p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
+               z->idata = p;
+            }
+            if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
+            ioff += c.length;
+            break;
+         }
+
+         case STBI__PNG_TYPE('I','E','N','D'): {
+            stbi__uint32 raw_len, bpl;
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (scan != STBI__SCAN_load) return 1;
+            if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
+            // initial guess for decoded data size to avoid unnecessary reallocs
+            bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
+            raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
+            z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
+            if (z->expanded == NULL) return 0; // zlib should set error
+            STBI_FREE(z->idata); z->idata = NULL;
+            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
+               s->img_out_n = s->img_n+1;
+            else
+               s->img_out_n = s->img_n;
+            if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
+            if (has_trans) {
+               if (z->depth == 16) {
+                  if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
+               } else {
+                  if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
+               }
+            }
+            if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
+               stbi__de_iphone(z);
+            if (pal_img_n) {
+               // pal_img_n == 3 or 4
+               s->img_n = pal_img_n; // record the actual colors we had
+               s->img_out_n = pal_img_n;
+               if (req_comp >= 3) s->img_out_n = req_comp;
+               if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
+                  return 0;
+            } else if (has_trans) {
+               // non-paletted image with tRNS -> source image has (constant) alpha
+               ++s->img_n;
+            }
+            STBI_FREE(z->expanded); z->expanded = NULL;
+            // end of PNG chunk, read and skip CRC
+            stbi__get32be(s);
+            return 1;
+         }
+
+         default:
+            // if critical, fail
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if ((c.type & (1 << 29)) == 0) {
+               #ifndef STBI_NO_FAILURE_STRINGS
+               // not threadsafe
+               static char invalid_chunk[] = "XXXX PNG chunk not known";
+               invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
+               invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
+               invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
+               invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
+               #endif
+               return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
+            }
+            stbi__skip(s, c.length);
+            break;
+      }
+      // end of PNG chunk, read and skip CRC
+      stbi__get32be(s);
+   }
+}
+
+static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
+{
+   void *result=NULL;
+   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
+   if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
+      if (p->depth <= 8)
+         ri->bits_per_channel = 8;
+      else if (p->depth == 16)
+         ri->bits_per_channel = 16;
+      else
+         return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
+      result = p->out;
+      p->out = NULL;
+      if (req_comp && req_comp != p->s->img_out_n) {
+         if (ri->bits_per_channel == 8)
+            result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
+         else
+            result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
+         p->s->img_out_n = req_comp;
+         if (result == NULL) return result;
+      }
+      *x = p->s->img_x;
+      *y = p->s->img_y;
+      if (n) *n = p->s->img_n;
+   }
+   STBI_FREE(p->out);      p->out      = NULL;
+   STBI_FREE(p->expanded); p->expanded = NULL;
+   STBI_FREE(p->idata);    p->idata    = NULL;
+
+   return result;
+}
+
+static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi__png p;
+   p.s = s;
+   return stbi__do_png(&p, x,y,comp,req_comp, ri);
+}
+
+static int stbi__png_test(stbi__context *s)
+{
+   int r;
+   r = stbi__check_png_header(s);
+   stbi__rewind(s);
+   return r;
+}
+
+static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
+{
+   if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
+      stbi__rewind( p->s );
+      return 0;
+   }
+   if (x) *x = p->s->img_x;
+   if (y) *y = p->s->img_y;
+   if (comp) *comp = p->s->img_n;
+   return 1;
+}
+
+static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   stbi__png p;
+   p.s = s;
+   return stbi__png_info_raw(&p, x, y, comp);
+}
+
+static int stbi__png_is16(stbi__context *s)
+{
+   stbi__png p;
+   p.s = s;
+   if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
+	   return 0;
+   if (p.depth != 16) {
+      stbi__rewind(p.s);
+      return 0;
+   }
+   return 1;
+}
+#endif
+
+// Microsoft/Windows BMP image
+
+#ifndef STBI_NO_BMP
+static int stbi__bmp_test_raw(stbi__context *s)
+{
+   int r;
+   int sz;
+   if (stbi__get8(s) != 'B') return 0;
+   if (stbi__get8(s) != 'M') return 0;
+   stbi__get32le(s); // discard filesize
+   stbi__get16le(s); // discard reserved
+   stbi__get16le(s); // discard reserved
+   stbi__get32le(s); // discard data offset
+   sz = stbi__get32le(s);
+   r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
+   return r;
+}
+
+static int stbi__bmp_test(stbi__context *s)
+{
+   int r = stbi__bmp_test_raw(s);
+   stbi__rewind(s);
+   return r;
+}
+
+
+// returns 0..31 for the highest set bit
+static int stbi__high_bit(unsigned int z)
+{
+   int n=0;
+   if (z == 0) return -1;
+   if (z >= 0x10000) { n += 16; z >>= 16; }
+   if (z >= 0x00100) { n +=  8; z >>=  8; }
+   if (z >= 0x00010) { n +=  4; z >>=  4; }
+   if (z >= 0x00004) { n +=  2; z >>=  2; }
+   if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
+   return n;
+}
+
+static int stbi__bitcount(unsigned int a)
+{
+   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
+   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
+   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
+   a = (a + (a >> 8)); // max 16 per 8 bits
+   a = (a + (a >> 16)); // max 32 per 8 bits
+   return a & 0xff;
+}
+
+// extract an arbitrarily-aligned N-bit value (N=bits)
+// from v, and then make it 8-bits long and fractionally
+// extend it to full full range.
+static int stbi__shiftsigned(unsigned int v, int shift, int bits)
+{
+   static unsigned int mul_table[9] = {
+      0,
+      0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
+      0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
+   };
+   static unsigned int shift_table[9] = {
+      0, 0,0,1,0,2,4,6,0,
+   };
+   if (shift < 0)
+      v <<= -shift;
+   else
+      v >>= shift;
+   STBI_ASSERT(v < 256);
+   v >>= (8-bits);
+   STBI_ASSERT(bits >= 0 && bits <= 8);
+   return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
+}
+
+typedef struct
+{
+   int bpp, offset, hsz;
+   unsigned int mr,mg,mb,ma, all_a;
+   int extra_read;
+} stbi__bmp_data;
+
+static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
+{
+   // BI_BITFIELDS specifies masks explicitly, don't override
+   if (compress == 3)
+      return 1;
+
+   if (compress == 0) {
+      if (info->bpp == 16) {
+         info->mr = 31u << 10;
+         info->mg = 31u <<  5;
+         info->mb = 31u <<  0;
+      } else if (info->bpp == 32) {
+         info->mr = 0xffu << 16;
+         info->mg = 0xffu <<  8;
+         info->mb = 0xffu <<  0;
+         info->ma = 0xffu << 24;
+         info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
+      } else {
+         // otherwise, use defaults, which is all-0
+         info->mr = info->mg = info->mb = info->ma = 0;
+      }
+      return 1;
+   }
+   return 0; // error
+}
+
+static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
+{
+   int hsz;
+   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
+   stbi__get32le(s); // discard filesize
+   stbi__get16le(s); // discard reserved
+   stbi__get16le(s); // discard reserved
+   info->offset = stbi__get32le(s);
+   info->hsz = hsz = stbi__get32le(s);
+   info->mr = info->mg = info->mb = info->ma = 0;
+   info->extra_read = 14;
+
+   if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
+
+   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
+   if (hsz == 12) {
+      s->img_x = stbi__get16le(s);
+      s->img_y = stbi__get16le(s);
+   } else {
+      s->img_x = stbi__get32le(s);
+      s->img_y = stbi__get32le(s);
+   }
+   if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
+   info->bpp = stbi__get16le(s);
+   if (hsz != 12) {
+      int compress = stbi__get32le(s);
+      if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
+      if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
+      if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
+      stbi__get32le(s); // discard sizeof
+      stbi__get32le(s); // discard hres
+      stbi__get32le(s); // discard vres
+      stbi__get32le(s); // discard colorsused
+      stbi__get32le(s); // discard max important
+      if (hsz == 40 || hsz == 56) {
+         if (hsz == 56) {
+            stbi__get32le(s);
+            stbi__get32le(s);
+            stbi__get32le(s);
+            stbi__get32le(s);
+         }
+         if (info->bpp == 16 || info->bpp == 32) {
+            if (compress == 0) {
+               stbi__bmp_set_mask_defaults(info, compress);
+            } else if (compress == 3) {
+               info->mr = stbi__get32le(s);
+               info->mg = stbi__get32le(s);
+               info->mb = stbi__get32le(s);
+               info->extra_read += 12;
+               // not documented, but generated by photoshop and handled by mspaint
+               if (info->mr == info->mg && info->mg == info->mb) {
+                  // ?!?!?
+                  return stbi__errpuc("bad BMP", "bad BMP");
+               }
+            } else
+               return stbi__errpuc("bad BMP", "bad BMP");
+         }
+      } else {
+         // V4/V5 header
+         int i;
+         if (hsz != 108 && hsz != 124)
+            return stbi__errpuc("bad BMP", "bad BMP");
+         info->mr = stbi__get32le(s);
+         info->mg = stbi__get32le(s);
+         info->mb = stbi__get32le(s);
+         info->ma = stbi__get32le(s);
+         if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
+            stbi__bmp_set_mask_defaults(info, compress);
+         stbi__get32le(s); // discard color space
+         for (i=0; i < 12; ++i)
+            stbi__get32le(s); // discard color space parameters
+         if (hsz == 124) {
+            stbi__get32le(s); // discard rendering intent
+            stbi__get32le(s); // discard offset of profile data
+            stbi__get32le(s); // discard size of profile data
+            stbi__get32le(s); // discard reserved
+         }
+      }
+   }
+   return (void *) 1;
+}
+
+
+static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *out;
+   unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
+   stbi_uc pal[256][4];
+   int psize=0,i,j,width;
+   int flip_vertically, pad, target;
+   stbi__bmp_data info;
+   STBI_NOTUSED(ri);
+
+   info.all_a = 255;
+   if (stbi__bmp_parse_header(s, &info) == NULL)
+      return NULL; // error code already set
+
+   flip_vertically = ((int) s->img_y) > 0;
+   s->img_y = abs((int) s->img_y);
+
+   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   mr = info.mr;
+   mg = info.mg;
+   mb = info.mb;
+   ma = info.ma;
+   all_a = info.all_a;
+
+   if (info.hsz == 12) {
+      if (info.bpp < 24)
+         psize = (info.offset - info.extra_read - 24) / 3;
+   } else {
+      if (info.bpp < 16)
+         psize = (info.offset - info.extra_read - info.hsz) >> 2;
+   }
+   if (psize == 0) {
+      // accept some number of extra bytes after the header, but if the offset points either to before
+      // the header ends or implies a large amount of extra data, reject the file as malformed
+      int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original);
+      int header_limit = 1024; // max we actually read is below 256 bytes currently.
+      int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size.
+      if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
+         return stbi__errpuc("bad header", "Corrupt BMP");
+      }
+      // we established that bytes_read_so_far is positive and sensible.
+      // the first half of this test rejects offsets that are either too small positives, or
+      // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn
+      // ensures the number computed in the second half of the test can't overflow.
+      if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) {
+         return stbi__errpuc("bad offset", "Corrupt BMP");
+      } else {
+         stbi__skip(s, info.offset - bytes_read_so_far);
+      }
+   }
+
+   if (info.bpp == 24 && ma == 0xff000000)
+      s->img_n = 3;
+   else
+      s->img_n = ma ? 4 : 3;
+   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
+      target = req_comp;
+   else
+      target = s->img_n; // if they want monochrome, we'll post-convert
+
+   // sanity-check size
+   if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
+      return stbi__errpuc("too large", "Corrupt BMP");
+
+   out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
+   if (!out) return stbi__errpuc("outofmem", "Out of memory");
+   if (info.bpp < 16) {
+      int z=0;
+      if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
+      for (i=0; i < psize; ++i) {
+         pal[i][2] = stbi__get8(s);
+         pal[i][1] = stbi__get8(s);
+         pal[i][0] = stbi__get8(s);
+         if (info.hsz != 12) stbi__get8(s);
+         pal[i][3] = 255;
+      }
+      stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
+      if (info.bpp == 1) width = (s->img_x + 7) >> 3;
+      else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
+      else if (info.bpp == 8) width = s->img_x;
+      else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
+      pad = (-width)&3;
+      if (info.bpp == 1) {
+         for (j=0; j < (int) s->img_y; ++j) {
+            int bit_offset = 7, v = stbi__get8(s);
+            for (i=0; i < (int) s->img_x; ++i) {
+               int color = (v>>bit_offset)&0x1;
+               out[z++] = pal[color][0];
+               out[z++] = pal[color][1];
+               out[z++] = pal[color][2];
+               if (target == 4) out[z++] = 255;
+               if (i+1 == (int) s->img_x) break;
+               if((--bit_offset) < 0) {
+                  bit_offset = 7;
+                  v = stbi__get8(s);
+               }
+            }
+            stbi__skip(s, pad);
+         }
+      } else {
+         for (j=0; j < (int) s->img_y; ++j) {
+            for (i=0; i < (int) s->img_x; i += 2) {
+               int v=stbi__get8(s),v2=0;
+               if (info.bpp == 4) {
+                  v2 = v & 15;
+                  v >>= 4;
+               }
+               out[z++] = pal[v][0];
+               out[z++] = pal[v][1];
+               out[z++] = pal[v][2];
+               if (target == 4) out[z++] = 255;
+               if (i+1 == (int) s->img_x) break;
+               v = (info.bpp == 8) ? stbi__get8(s) : v2;
+               out[z++] = pal[v][0];
+               out[z++] = pal[v][1];
+               out[z++] = pal[v][2];
+               if (target == 4) out[z++] = 255;
+            }
+            stbi__skip(s, pad);
+         }
+      }
+   } else {
+      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
+      int z = 0;
+      int easy=0;
+      stbi__skip(s, info.offset - info.extra_read - info.hsz);
+      if (info.bpp == 24) width = 3 * s->img_x;
+      else if (info.bpp == 16) width = 2*s->img_x;
+      else /* bpp = 32 and pad = 0 */ width=0;
+      pad = (-width) & 3;
+      if (info.bpp == 24) {
+         easy = 1;
+      } else if (info.bpp == 32) {
+         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
+            easy = 2;
+      }
+      if (!easy) {
+         if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
+         // right shift amt to put high bit in position #7
+         rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
+         gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
+         bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
+         ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
+         if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
+      }
+      for (j=0; j < (int) s->img_y; ++j) {
+         if (easy) {
+            for (i=0; i < (int) s->img_x; ++i) {
+               unsigned char a;
+               out[z+2] = stbi__get8(s);
+               out[z+1] = stbi__get8(s);
+               out[z+0] = stbi__get8(s);
+               z += 3;
+               a = (easy == 2 ? stbi__get8(s) : 255);
+               all_a |= a;
+               if (target == 4) out[z++] = a;
+            }
+         } else {
+            int bpp = info.bpp;
+            for (i=0; i < (int) s->img_x; ++i) {
+               stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
+               unsigned int a;
+               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
+               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
+               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
+               a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
+               all_a |= a;
+               if (target == 4) out[z++] = STBI__BYTECAST(a);
+            }
+         }
+         stbi__skip(s, pad);
+      }
+   }
+
+   // if alpha channel is all 0s, replace with all 255s
+   if (target == 4 && all_a == 0)
+      for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
+         out[i] = 255;
+
+   if (flip_vertically) {
+      stbi_uc t;
+      for (j=0; j < (int) s->img_y>>1; ++j) {
+         stbi_uc *p1 = out +      j     *s->img_x*target;
+         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
+         for (i=0; i < (int) s->img_x*target; ++i) {
+            t = p1[i]; p1[i] = p2[i]; p2[i] = t;
+         }
+      }
+   }
+
+   if (req_comp && req_comp != target) {
+      out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
+      if (out == NULL) return out; // stbi__convert_format frees input on failure
+   }
+
+   *x = s->img_x;
+   *y = s->img_y;
+   if (comp) *comp = s->img_n;
+   return out;
+}
+#endif
+
+// Targa Truevision - TGA
+// by Jonathan Dummer
+#ifndef STBI_NO_TGA
+// returns STBI_rgb or whatever, 0 on error
+static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
+{
+   // only RGB or RGBA (incl. 16bit) or grey allowed
+   if (is_rgb16) *is_rgb16 = 0;
+   switch(bits_per_pixel) {
+      case 8:  return STBI_grey;
+      case 16: if(is_grey) return STBI_grey_alpha;
+               // fallthrough
+      case 15: if(is_rgb16) *is_rgb16 = 1;
+               return STBI_rgb;
+      case 24: // fallthrough
+      case 32: return bits_per_pixel/8;
+      default: return 0;
+   }
+}
+
+static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
+{
+    int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
+    int sz, tga_colormap_type;
+    stbi__get8(s);                   // discard Offset
+    tga_colormap_type = stbi__get8(s); // colormap type
+    if( tga_colormap_type > 1 ) {
+        stbi__rewind(s);
+        return 0;      // only RGB or indexed allowed
+    }
+    tga_image_type = stbi__get8(s); // image type
+    if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
+        if (tga_image_type != 1 && tga_image_type != 9) {
+            stbi__rewind(s);
+            return 0;
+        }
+        stbi__skip(s,4);       // skip index of first colormap entry and number of entries
+        sz = stbi__get8(s);    //   check bits per palette color entry
+        if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
+            stbi__rewind(s);
+            return 0;
+        }
+        stbi__skip(s,4);       // skip image x and y origin
+        tga_colormap_bpp = sz;
+    } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
+        if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
+            stbi__rewind(s);
+            return 0; // only RGB or grey allowed, +/- RLE
+        }
+        stbi__skip(s,9); // skip colormap specification and image x/y origin
+        tga_colormap_bpp = 0;
+    }
+    tga_w = stbi__get16le(s);
+    if( tga_w < 1 ) {
+        stbi__rewind(s);
+        return 0;   // test width
+    }
+    tga_h = stbi__get16le(s);
+    if( tga_h < 1 ) {
+        stbi__rewind(s);
+        return 0;   // test height
+    }
+    tga_bits_per_pixel = stbi__get8(s); // bits per pixel
+    stbi__get8(s); // ignore alpha bits
+    if (tga_colormap_bpp != 0) {
+        if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
+            // when using a colormap, tga_bits_per_pixel is the size of the indexes
+            // I don't think anything but 8 or 16bit indexes makes sense
+            stbi__rewind(s);
+            return 0;
+        }
+        tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
+    } else {
+        tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
+    }
+    if(!tga_comp) {
+      stbi__rewind(s);
+      return 0;
+    }
+    if (x) *x = tga_w;
+    if (y) *y = tga_h;
+    if (comp) *comp = tga_comp;
+    return 1;                   // seems to have passed everything
+}
+
+static int stbi__tga_test(stbi__context *s)
+{
+   int res = 0;
+   int sz, tga_color_type;
+   stbi__get8(s);      //   discard Offset
+   tga_color_type = stbi__get8(s);   //   color type
+   if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
+   sz = stbi__get8(s);   //   image type
+   if ( tga_color_type == 1 ) { // colormapped (paletted) image
+      if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
+      stbi__skip(s,4);       // skip index of first colormap entry and number of entries
+      sz = stbi__get8(s);    //   check bits per palette color entry
+      if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
+      stbi__skip(s,4);       // skip image x and y origin
+   } else { // "normal" image w/o colormap
+      if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
+      stbi__skip(s,9); // skip colormap specification and image x/y origin
+   }
+   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
+   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
+   sz = stbi__get8(s);   //   bits per pixel
+   if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
+   if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
+
+   res = 1; // if we got this far, everything's good and we can return 1 instead of 0
+
+errorEnd:
+   stbi__rewind(s);
+   return res;
+}
+
+// read 16bit value and convert to 24bit RGB
+static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
+{
+   stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
+   stbi__uint16 fiveBitMask = 31;
+   // we have 3 channels with 5bits each
+   int r = (px >> 10) & fiveBitMask;
+   int g = (px >> 5) & fiveBitMask;
+   int b = px & fiveBitMask;
+   // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
+   out[0] = (stbi_uc)((r * 255)/31);
+   out[1] = (stbi_uc)((g * 255)/31);
+   out[2] = (stbi_uc)((b * 255)/31);
+
+   // some people claim that the most significant bit might be used for alpha
+   // (possibly if an alpha-bit is set in the "image descriptor byte")
+   // but that only made 16bit test images completely translucent..
+   // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
+}
+
+static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   //   read in the TGA header stuff
+   int tga_offset = stbi__get8(s);
+   int tga_indexed = stbi__get8(s);
+   int tga_image_type = stbi__get8(s);
+   int tga_is_RLE = 0;
+   int tga_palette_start = stbi__get16le(s);
+   int tga_palette_len = stbi__get16le(s);
+   int tga_palette_bits = stbi__get8(s);
+   int tga_x_origin = stbi__get16le(s);
+   int tga_y_origin = stbi__get16le(s);
+   int tga_width = stbi__get16le(s);
+   int tga_height = stbi__get16le(s);
+   int tga_bits_per_pixel = stbi__get8(s);
+   int tga_comp, tga_rgb16=0;
+   int tga_inverted = stbi__get8(s);
+   // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
+   //   image data
+   unsigned char *tga_data;
+   unsigned char *tga_palette = NULL;
+   int i, j;
+   unsigned char raw_data[4] = {0};
+   int RLE_count = 0;
+   int RLE_repeating = 0;
+   int read_next_pixel = 1;
+   STBI_NOTUSED(ri);
+   STBI_NOTUSED(tga_x_origin); // @TODO
+   STBI_NOTUSED(tga_y_origin); // @TODO
+
+   if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   //   do a tiny bit of precessing
+   if ( tga_image_type >= 8 )
+   {
+      tga_image_type -= 8;
+      tga_is_RLE = 1;
+   }
+   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
+
+   //   If I'm paletted, then I'll use the number of bits from the palette
+   if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
+   else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
+
+   if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
+      return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
+
+   //   tga info
+   *x = tga_width;
+   *y = tga_height;
+   if (comp) *comp = tga_comp;
+
+   if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
+      return stbi__errpuc("too large", "Corrupt TGA");
+
+   tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
+   if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
+
+   // skip to the data's starting position (offset usually = 0)
+   stbi__skip(s, tga_offset );
+
+   if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
+      for (i=0; i < tga_height; ++i) {
+         int row = tga_inverted ? tga_height -i - 1 : i;
+         stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
+         stbi__getn(s, tga_row, tga_width * tga_comp);
+      }
+   } else  {
+      //   do I need to load a palette?
+      if ( tga_indexed)
+      {
+         if (tga_palette_len == 0) {  /* you have to have at least one entry! */
+            STBI_FREE(tga_data);
+            return stbi__errpuc("bad palette", "Corrupt TGA");
+         }
+
+         //   any data to skip? (offset usually = 0)
+         stbi__skip(s, tga_palette_start );
+         //   load the palette
+         tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
+         if (!tga_palette) {
+            STBI_FREE(tga_data);
+            return stbi__errpuc("outofmem", "Out of memory");
+         }
+         if (tga_rgb16) {
+            stbi_uc *pal_entry = tga_palette;
+            STBI_ASSERT(tga_comp == STBI_rgb);
+            for (i=0; i < tga_palette_len; ++i) {
+               stbi__tga_read_rgb16(s, pal_entry);
+               pal_entry += tga_comp;
+            }
+         } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
+               STBI_FREE(tga_data);
+               STBI_FREE(tga_palette);
+               return stbi__errpuc("bad palette", "Corrupt TGA");
+         }
+      }
+      //   load the data
+      for (i=0; i < tga_width * tga_height; ++i)
+      {
+         //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
+         if ( tga_is_RLE )
+         {
+            if ( RLE_count == 0 )
+            {
+               //   yep, get the next byte as a RLE command
+               int RLE_cmd = stbi__get8(s);
+               RLE_count = 1 + (RLE_cmd & 127);
+               RLE_repeating = RLE_cmd >> 7;
+               read_next_pixel = 1;
+            } else if ( !RLE_repeating )
+            {
+               read_next_pixel = 1;
+            }
+         } else
+         {
+            read_next_pixel = 1;
+         }
+         //   OK, if I need to read a pixel, do it now
+         if ( read_next_pixel )
+         {
+            //   load however much data we did have
+            if ( tga_indexed )
+            {
+               // read in index, then perform the lookup
+               int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
+               if ( pal_idx >= tga_palette_len ) {
+                  // invalid index
+                  pal_idx = 0;
+               }
+               pal_idx *= tga_comp;
+               for (j = 0; j < tga_comp; ++j) {
+                  raw_data[j] = tga_palette[pal_idx+j];
+               }
+            } else if(tga_rgb16) {
+               STBI_ASSERT(tga_comp == STBI_rgb);
+               stbi__tga_read_rgb16(s, raw_data);
+            } else {
+               //   read in the data raw
+               for (j = 0; j < tga_comp; ++j) {
+                  raw_data[j] = stbi__get8(s);
+               }
+            }
+            //   clear the reading flag for the next pixel
+            read_next_pixel = 0;
+         } // end of reading a pixel
+
+         // copy data
+         for (j = 0; j < tga_comp; ++j)
+           tga_data[i*tga_comp+j] = raw_data[j];
+
+         //   in case we're in RLE mode, keep counting down
+         --RLE_count;
+      }
+      //   do I need to invert the image?
+      if ( tga_inverted )
+      {
+         for (j = 0; j*2 < tga_height; ++j)
+         {
+            int index1 = j * tga_width * tga_comp;
+            int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
+            for (i = tga_width * tga_comp; i > 0; --i)
+            {
+               unsigned char temp = tga_data[index1];
+               tga_data[index1] = tga_data[index2];
+               tga_data[index2] = temp;
+               ++index1;
+               ++index2;
+            }
+         }
+      }
+      //   clear my palette, if I had one
+      if ( tga_palette != NULL )
+      {
+         STBI_FREE( tga_palette );
+      }
+   }
+
+   // swap RGB - if the source data was RGB16, it already is in the right order
+   if (tga_comp >= 3 && !tga_rgb16)
+   {
+      unsigned char* tga_pixel = tga_data;
+      for (i=0; i < tga_width * tga_height; ++i)
+      {
+         unsigned char temp = tga_pixel[0];
+         tga_pixel[0] = tga_pixel[2];
+         tga_pixel[2] = temp;
+         tga_pixel += tga_comp;
+      }
+   }
+
+   // convert to target component count
+   if (req_comp && req_comp != tga_comp)
+      tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
+
+   //   the things I do to get rid of an error message, and yet keep
+   //   Microsoft's C compilers happy... [8^(
+   tga_palette_start = tga_palette_len = tga_palette_bits =
+         tga_x_origin = tga_y_origin = 0;
+   STBI_NOTUSED(tga_palette_start);
+   //   OK, done
+   return tga_data;
+}
+#endif
+
+// *************************************************************************************************
+// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
+
+#ifndef STBI_NO_PSD
+static int stbi__psd_test(stbi__context *s)
+{
+   int r = (stbi__get32be(s) == 0x38425053);
+   stbi__rewind(s);
+   return r;
+}
+
+static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
+{
+   int count, nleft, len;
+
+   count = 0;
+   while ((nleft = pixelCount - count) > 0) {
+      len = stbi__get8(s);
+      if (len == 128) {
+         // No-op.
+      } else if (len < 128) {
+         // Copy next len+1 bytes literally.
+         len++;
+         if (len > nleft) return 0; // corrupt data
+         count += len;
+         while (len) {
+            *p = stbi__get8(s);
+            p += 4;
+            len--;
+         }
+      } else if (len > 128) {
+         stbi_uc   val;
+         // Next -len+1 bytes in the dest are replicated from next source byte.
+         // (Interpret len as a negative 8-bit int.)
+         len = 257 - len;
+         if (len > nleft) return 0; // corrupt data
+         val = stbi__get8(s);
+         count += len;
+         while (len) {
+            *p = val;
+            p += 4;
+            len--;
+         }
+      }
+   }
+
+   return 1;
+}
+
+static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
+{
+   int pixelCount;
+   int channelCount, compression;
+   int channel, i;
+   int bitdepth;
+   int w,h;
+   stbi_uc *out;
+   STBI_NOTUSED(ri);
+
+   // Check identifier
+   if (stbi__get32be(s) != 0x38425053)   // "8BPS"
+      return stbi__errpuc("not PSD", "Corrupt PSD image");
+
+   // Check file type version.
+   if (stbi__get16be(s) != 1)
+      return stbi__errpuc("wrong version", "Unsupported version of PSD image");
+
+   // Skip 6 reserved bytes.
+   stbi__skip(s, 6 );
+
+   // Read the number of channels (R, G, B, A, etc).
+   channelCount = stbi__get16be(s);
+   if (channelCount < 0 || channelCount > 16)
+      return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
+
+   // Read the rows and columns of the image.
+   h = stbi__get32be(s);
+   w = stbi__get32be(s);
+
+   if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   // Make sure the depth is 8 bits.
+   bitdepth = stbi__get16be(s);
+   if (bitdepth != 8 && bitdepth != 16)
+      return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
+
+   // Make sure the color mode is RGB.
+   // Valid options are:
+   //   0: Bitmap
+   //   1: Grayscale
+   //   2: Indexed color
+   //   3: RGB color
+   //   4: CMYK color
+   //   7: Multichannel
+   //   8: Duotone
+   //   9: Lab color
+   if (stbi__get16be(s) != 3)
+      return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
+
+   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
+   stbi__skip(s,stbi__get32be(s) );
+
+   // Skip the image resources.  (resolution, pen tool paths, etc)
+   stbi__skip(s, stbi__get32be(s) );
+
+   // Skip the reserved data.
+   stbi__skip(s, stbi__get32be(s) );
+
+   // Find out if the data is compressed.
+   // Known values:
+   //   0: no compression
+   //   1: RLE compressed
+   compression = stbi__get16be(s);
+   if (compression > 1)
+      return stbi__errpuc("bad compression", "PSD has an unknown compression format");
+
+   // Check size
+   if (!stbi__mad3sizes_valid(4, w, h, 0))
+      return stbi__errpuc("too large", "Corrupt PSD");
+
+   // Create the destination image.
+
+   if (!compression && bitdepth == 16 && bpc == 16) {
+      out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
+      ri->bits_per_channel = 16;
+   } else
+      out = (stbi_uc *) stbi__malloc(4 * w*h);
+
+   if (!out) return stbi__errpuc("outofmem", "Out of memory");
+   pixelCount = w*h;
+
+   // Initialize the data to zero.
+   //memset( out, 0, pixelCount * 4 );
+
+   // Finally, the image data.
+   if (compression) {
+      // RLE as used by .PSD and .TIFF
+      // Loop until you get the number of unpacked bytes you are expecting:
+      //     Read the next source byte into n.
+      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
+      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
+      //     Else if n is 128, noop.
+      // Endloop
+
+      // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
+      // which we're going to just skip.
+      stbi__skip(s, h * channelCount * 2 );
+
+      // Read the RLE data by channel.
+      for (channel = 0; channel < 4; channel++) {
+         stbi_uc *p;
+
+         p = out+channel;
+         if (channel >= channelCount) {
+            // Fill this channel with default data.
+            for (i = 0; i < pixelCount; i++, p += 4)
+               *p = (channel == 3 ? 255 : 0);
+         } else {
+            // Read the RLE data.
+            if (!stbi__psd_decode_rle(s, p, pixelCount)) {
+               STBI_FREE(out);
+               return stbi__errpuc("corrupt", "bad RLE data");
+            }
+         }
+      }
+
+   } else {
+      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
+      // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
+
+      // Read the data by channel.
+      for (channel = 0; channel < 4; channel++) {
+         if (channel >= channelCount) {
+            // Fill this channel with default data.
+            if (bitdepth == 16 && bpc == 16) {
+               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
+               stbi__uint16 val = channel == 3 ? 65535 : 0;
+               for (i = 0; i < pixelCount; i++, q += 4)
+                  *q = val;
+            } else {
+               stbi_uc *p = out+channel;
+               stbi_uc val = channel == 3 ? 255 : 0;
+               for (i = 0; i < pixelCount; i++, p += 4)
+                  *p = val;
+            }
+         } else {
+            if (ri->bits_per_channel == 16) {    // output bpc
+               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
+               for (i = 0; i < pixelCount; i++, q += 4)
+                  *q = (stbi__uint16) stbi__get16be(s);
+            } else {
+               stbi_uc *p = out+channel;
+               if (bitdepth == 16) {  // input bpc
+                  for (i = 0; i < pixelCount; i++, p += 4)
+                     *p = (stbi_uc) (stbi__get16be(s) >> 8);
+               } else {
+                  for (i = 0; i < pixelCount; i++, p += 4)
+                     *p = stbi__get8(s);
+               }
+            }
+         }
+      }
+   }
+
+   // remove weird white matte from PSD
+   if (channelCount >= 4) {
+      if (ri->bits_per_channel == 16) {
+         for (i=0; i < w*h; ++i) {
+            stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
+            if (pixel[3] != 0 && pixel[3] != 65535) {
+               float a = pixel[3] / 65535.0f;
+               float ra = 1.0f / a;
+               float inv_a = 65535.0f * (1 - ra);
+               pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
+               pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
+               pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
+            }
+         }
+      } else {
+         for (i=0; i < w*h; ++i) {
+            unsigned char *pixel = out + 4*i;
+            if (pixel[3] != 0 && pixel[3] != 255) {
+               float a = pixel[3] / 255.0f;
+               float ra = 1.0f / a;
+               float inv_a = 255.0f * (1 - ra);
+               pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
+               pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
+               pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
+            }
+         }
+      }
+   }
+
+   // convert to desired output format
+   if (req_comp && req_comp != 4) {
+      if (ri->bits_per_channel == 16)
+         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
+      else
+         out = stbi__convert_format(out, 4, req_comp, w, h);
+      if (out == NULL) return out; // stbi__convert_format frees input on failure
+   }
+
+   if (comp) *comp = 4;
+   *y = h;
+   *x = w;
+
+   return out;
+}
+#endif
+
+// *************************************************************************************************
+// Softimage PIC loader
+// by Tom Seddon
+//
+// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
+// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
+
+#ifndef STBI_NO_PIC
+static int stbi__pic_is4(stbi__context *s,const char *str)
+{
+   int i;
+   for (i=0; i<4; ++i)
+      if (stbi__get8(s) != (stbi_uc)str[i])
+         return 0;
+
+   return 1;
+}
+
+static int stbi__pic_test_core(stbi__context *s)
+{
+   int i;
+
+   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
+      return 0;
+
+   for(i=0;i<84;++i)
+      stbi__get8(s);
+
+   if (!stbi__pic_is4(s,"PICT"))
+      return 0;
+
+   return 1;
+}
+
+typedef struct
+{
+   stbi_uc size,type,channel;
+} stbi__pic_packet;
+
+static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
+{
+   int mask=0x80, i;
+
+   for (i=0; i<4; ++i, mask>>=1) {
+      if (channel & mask) {
+         if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
+         dest[i]=stbi__get8(s);
+      }
+   }
+
+   return dest;
+}
+
+static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
+{
+   int mask=0x80,i;
+
+   for (i=0;i<4; ++i, mask>>=1)
+      if (channel&mask)
+         dest[i]=src[i];
+}
+
+static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
+{
+   int act_comp=0,num_packets=0,y,chained;
+   stbi__pic_packet packets[10];
+
+   // this will (should...) cater for even some bizarre stuff like having data
+    // for the same channel in multiple packets.
+   do {
+      stbi__pic_packet *packet;
+
+      if (num_packets==sizeof(packets)/sizeof(packets[0]))
+         return stbi__errpuc("bad format","too many packets");
+
+      packet = &packets[num_packets++];
+
+      chained = stbi__get8(s);
+      packet->size    = stbi__get8(s);
+      packet->type    = stbi__get8(s);
+      packet->channel = stbi__get8(s);
+
+      act_comp |= packet->channel;
+
+      if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
+      if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
+   } while (chained);
+
+   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
+
+   for(y=0; y<height; ++y) {
+      int packet_idx;
+
+      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
+         stbi__pic_packet *packet = &packets[packet_idx];
+         stbi_uc *dest = result+y*width*4;
+
+         switch (packet->type) {
+            default:
+               return stbi__errpuc("bad format","packet has bad compression type");
+
+            case 0: {//uncompressed
+               int x;
+
+               for(x=0;x<width;++x, dest+=4)
+                  if (!stbi__readval(s,packet->channel,dest))
+                     return 0;
+               break;
+            }
+
+            case 1://Pure RLE
+               {
+                  int left=width, i;
+
+                  while (left>0) {
+                     stbi_uc count,value[4];
+
+                     count=stbi__get8(s);
+                     if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
+
+                     if (count > left)
+                        count = (stbi_uc) left;
+
+                     if (!stbi__readval(s,packet->channel,value))  return 0;
+
+                     for(i=0; i<count; ++i,dest+=4)
+                        stbi__copyval(packet->channel,dest,value);
+                     left -= count;
+                  }
+               }
+               break;
+
+            case 2: {//Mixed RLE
+               int left=width;
+               while (left>0) {
+                  int count = stbi__get8(s), i;
+                  if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
+
+                  if (count >= 128) { // Repeated
+                     stbi_uc value[4];
+
+                     if (count==128)
+                        count = stbi__get16be(s);
+                     else
+                        count -= 127;
+                     if (count > left)
+                        return stbi__errpuc("bad file","scanline overrun");
+
+                     if (!stbi__readval(s,packet->channel,value))
+                        return 0;
+
+                     for(i=0;i<count;++i, dest += 4)
+                        stbi__copyval(packet->channel,dest,value);
+                  } else { // Raw
+                     ++count;
+                     if (count>left) return stbi__errpuc("bad file","scanline overrun");
+
+                     for(i=0;i<count;++i, dest+=4)
+                        if (!stbi__readval(s,packet->channel,dest))
+                           return 0;
+                  }
+                  left-=count;
+               }
+               break;
+            }
+         }
+      }
+   }
+
+   return result;
+}
+
+static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *result;
+   int i, x,y, internal_comp;
+   STBI_NOTUSED(ri);
+
+   if (!comp) comp = &internal_comp;
+
+   for (i=0; i<92; ++i)
+      stbi__get8(s);
+
+   x = stbi__get16be(s);
+   y = stbi__get16be(s);
+
+   if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
+   if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
+
+   stbi__get32be(s); //skip `ratio'
+   stbi__get16be(s); //skip `fields'
+   stbi__get16be(s); //skip `pad'
+
+   // intermediate buffer is RGBA
+   result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
+   if (!result) return stbi__errpuc("outofmem", "Out of memory");
+   memset(result, 0xff, x*y*4);
+
+   if (!stbi__pic_load_core(s,x,y,comp, result)) {
+      STBI_FREE(result);
+      result=0;
+   }
+   *px = x;
+   *py = y;
+   if (req_comp == 0) req_comp = *comp;
+   result=stbi__convert_format(result,4,req_comp,x,y);
+
+   return result;
+}
+
+static int stbi__pic_test(stbi__context *s)
+{
+   int r = stbi__pic_test_core(s);
+   stbi__rewind(s);
+   return r;
+}
+#endif
+
+// *************************************************************************************************
+// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
+
+#ifndef STBI_NO_GIF
+typedef struct
+{
+   stbi__int16 prefix;
+   stbi_uc first;
+   stbi_uc suffix;
+} stbi__gif_lzw;
+
+typedef struct
+{
+   int w,h;
+   stbi_uc *out;                 // output buffer (always 4 components)
+   stbi_uc *background;          // The current "background" as far as a gif is concerned
+   stbi_uc *history;
+   int flags, bgindex, ratio, transparent, eflags;
+   stbi_uc  pal[256][4];
+   stbi_uc lpal[256][4];
+   stbi__gif_lzw codes[8192];
+   stbi_uc *color_table;
+   int parse, step;
+   int lflags;
+   int start_x, start_y;
+   int max_x, max_y;
+   int cur_x, cur_y;
+   int line_size;
+   int delay;
+} stbi__gif;
+
+static int stbi__gif_test_raw(stbi__context *s)
+{
+   int sz;
+   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
+   sz = stbi__get8(s);
+   if (sz != '9' && sz != '7') return 0;
+   if (stbi__get8(s) != 'a') return 0;
+   return 1;
+}
+
+static int stbi__gif_test(stbi__context *s)
+{
+   int r = stbi__gif_test_raw(s);
+   stbi__rewind(s);
+   return r;
+}
+
+static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
+{
+   int i;
+   for (i=0; i < num_entries; ++i) {
+      pal[i][2] = stbi__get8(s);
+      pal[i][1] = stbi__get8(s);
+      pal[i][0] = stbi__get8(s);
+      pal[i][3] = transp == i ? 0 : 255;
+   }
+}
+
+static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
+{
+   stbi_uc version;
+   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
+      return stbi__err("not GIF", "Corrupt GIF");
+
+   version = stbi__get8(s);
+   if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
+   if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
+
+   stbi__g_failure_reason = "";
+   g->w = stbi__get16le(s);
+   g->h = stbi__get16le(s);
+   g->flags = stbi__get8(s);
+   g->bgindex = stbi__get8(s);
+   g->ratio = stbi__get8(s);
+   g->transparent = -1;
+
+   if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+   if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+
+   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
+
+   if (is_info) return 1;
+
+   if (g->flags & 0x80)
+      stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
+
+   return 1;
+}
+
+static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
+{
+   stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
+   if (!g) return stbi__err("outofmem", "Out of memory");
+   if (!stbi__gif_header(s, g, comp, 1)) {
+      STBI_FREE(g);
+      stbi__rewind( s );
+      return 0;
+   }
+   if (x) *x = g->w;
+   if (y) *y = g->h;
+   STBI_FREE(g);
+   return 1;
+}
+
+static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
+{
+   stbi_uc *p, *c;
+   int idx;
+
+   // recurse to decode the prefixes, since the linked-list is backwards,
+   // and working backwards through an interleaved image would be nasty
+   if (g->codes[code].prefix >= 0)
+      stbi__out_gif_code(g, g->codes[code].prefix);
+
+   if (g->cur_y >= g->max_y) return;
+
+   idx = g->cur_x + g->cur_y;
+   p = &g->out[idx];
+   g->history[idx / 4] = 1;
+
+   c = &g->color_table[g->codes[code].suffix * 4];
+   if (c[3] > 128) { // don't render transparent pixels;
+      p[0] = c[2];
+      p[1] = c[1];
+      p[2] = c[0];
+      p[3] = c[3];
+   }
+   g->cur_x += 4;
+
+   if (g->cur_x >= g->max_x) {
+      g->cur_x = g->start_x;
+      g->cur_y += g->step;
+
+      while (g->cur_y >= g->max_y && g->parse > 0) {
+         g->step = (1 << g->parse) * g->line_size;
+         g->cur_y = g->start_y + (g->step >> 1);
+         --g->parse;
+      }
+   }
+}
+
+static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
+{
+   stbi_uc lzw_cs;
+   stbi__int32 len, init_code;
+   stbi__uint32 first;
+   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
+   stbi__gif_lzw *p;
+
+   lzw_cs = stbi__get8(s);
+   if (lzw_cs > 12) return NULL;
+   clear = 1 << lzw_cs;
+   first = 1;
+   codesize = lzw_cs + 1;
+   codemask = (1 << codesize) - 1;
+   bits = 0;
+   valid_bits = 0;
+   for (init_code = 0; init_code < clear; init_code++) {
+      g->codes[init_code].prefix = -1;
+      g->codes[init_code].first = (stbi_uc) init_code;
+      g->codes[init_code].suffix = (stbi_uc) init_code;
+   }
+
+   // support no starting clear code
+   avail = clear+2;
+   oldcode = -1;
+
+   len = 0;
+   for(;;) {
+      if (valid_bits < codesize) {
+         if (len == 0) {
+            len = stbi__get8(s); // start new block
+            if (len == 0)
+               return g->out;
+         }
+         --len;
+         bits |= (stbi__int32) stbi__get8(s) << valid_bits;
+         valid_bits += 8;
+      } else {
+         stbi__int32 code = bits & codemask;
+         bits >>= codesize;
+         valid_bits -= codesize;
+         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
+         if (code == clear) {  // clear code
+            codesize = lzw_cs + 1;
+            codemask = (1 << codesize) - 1;
+            avail = clear + 2;
+            oldcode = -1;
+            first = 0;
+         } else if (code == clear + 1) { // end of stream code
+            stbi__skip(s, len);
+            while ((len = stbi__get8(s)) > 0)
+               stbi__skip(s,len);
+            return g->out;
+         } else if (code <= avail) {
+            if (first) {
+               return stbi__errpuc("no clear code", "Corrupt GIF");
+            }
+
+            if (oldcode >= 0) {
+               p = &g->codes[avail++];
+               if (avail > 8192) {
+                  return stbi__errpuc("too many codes", "Corrupt GIF");
+               }
+
+               p->prefix = (stbi__int16) oldcode;
+               p->first = g->codes[oldcode].first;
+               p->suffix = (code == avail) ? p->first : g->codes[code].first;
+            } else if (code == avail)
+               return stbi__errpuc("illegal code in raster", "Corrupt GIF");
+
+            stbi__out_gif_code(g, (stbi__uint16) code);
+
+            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
+               codesize++;
+               codemask = (1 << codesize) - 1;
+            }
+
+            oldcode = code;
+         } else {
+            return stbi__errpuc("illegal code in raster", "Corrupt GIF");
+         }
+      }
+   }
+}
+
+// this function is designed to support animated gifs, although stb_image doesn't support it
+// two back is the image from two frames ago, used for a very specific disposal format
+static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
+{
+   int dispose;
+   int first_frame;
+   int pi;
+   int pcount;
+   STBI_NOTUSED(req_comp);
+
+   // on first frame, any non-written pixels get the background colour (non-transparent)
+   first_frame = 0;
+   if (g->out == 0) {
+      if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
+      if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
+         return stbi__errpuc("too large", "GIF image is too large");
+      pcount = g->w * g->h;
+      g->out = (stbi_uc *) stbi__malloc(4 * pcount);
+      g->background = (stbi_uc *) stbi__malloc(4 * pcount);
+      g->history = (stbi_uc *) stbi__malloc(pcount);
+      if (!g->out || !g->background || !g->history)
+         return stbi__errpuc("outofmem", "Out of memory");
+
+      // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
+      // background colour is only used for pixels that are not rendered first frame, after that "background"
+      // color refers to the color that was there the previous frame.
+      memset(g->out, 0x00, 4 * pcount);
+      memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
+      memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
+      first_frame = 1;
+   } else {
+      // second frame - how do we dispose of the previous one?
+      dispose = (g->eflags & 0x1C) >> 2;
+      pcount = g->w * g->h;
+
+      if ((dispose == 3) && (two_back == 0)) {
+         dispose = 2; // if I don't have an image to revert back to, default to the old background
+      }
+
+      if (dispose == 3) { // use previous graphic
+         for (pi = 0; pi < pcount; ++pi) {
+            if (g->history[pi]) {
+               memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
+            }
+         }
+      } else if (dispose == 2) {
+         // restore what was changed last frame to background before that frame;
+         for (pi = 0; pi < pcount; ++pi) {
+            if (g->history[pi]) {
+               memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
+            }
+         }
+      } else {
+         // This is a non-disposal case eithe way, so just
+         // leave the pixels as is, and they will become the new background
+         // 1: do not dispose
+         // 0:  not specified.
+      }
+
+      // background is what out is after the undoing of the previou frame;
+      memcpy( g->background, g->out, 4 * g->w * g->h );
+   }
+
+   // clear my history;
+   memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
+
+   for (;;) {
+      int tag = stbi__get8(s);
+      switch (tag) {
+         case 0x2C: /* Image Descriptor */
+         {
+            stbi__int32 x, y, w, h;
+            stbi_uc *o;
+
+            x = stbi__get16le(s);
+            y = stbi__get16le(s);
+            w = stbi__get16le(s);
+            h = stbi__get16le(s);
+            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
+               return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
+
+            g->line_size = g->w * 4;
+            g->start_x = x * 4;
+            g->start_y = y * g->line_size;
+            g->max_x   = g->start_x + w * 4;
+            g->max_y   = g->start_y + h * g->line_size;
+            g->cur_x   = g->start_x;
+            g->cur_y   = g->start_y;
+
+            // if the width of the specified rectangle is 0, that means
+            // we may not see *any* pixels or the image is malformed;
+            // to make sure this is caught, move the current y down to
+            // max_y (which is what out_gif_code checks).
+            if (w == 0)
+               g->cur_y = g->max_y;
+
+            g->lflags = stbi__get8(s);
+
+            if (g->lflags & 0x40) {
+               g->step = 8 * g->line_size; // first interlaced spacing
+               g->parse = 3;
+            } else {
+               g->step = g->line_size;
+               g->parse = 0;
+            }
+
+            if (g->lflags & 0x80) {
+               stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
+               g->color_table = (stbi_uc *) g->lpal;
+            } else if (g->flags & 0x80) {
+               g->color_table = (stbi_uc *) g->pal;
+            } else
+               return stbi__errpuc("missing color table", "Corrupt GIF");
+
+            o = stbi__process_gif_raster(s, g);
+            if (!o) return NULL;
+
+            // if this was the first frame,
+            pcount = g->w * g->h;
+            if (first_frame && (g->bgindex > 0)) {
+               // if first frame, any pixel not drawn to gets the background color
+               for (pi = 0; pi < pcount; ++pi) {
+                  if (g->history[pi] == 0) {
+                     g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
+                     memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
+                  }
+               }
+            }
+
+            return o;
+         }
+
+         case 0x21: // Comment Extension.
+         {
+            int len;
+            int ext = stbi__get8(s);
+            if (ext == 0xF9) { // Graphic Control Extension.
+               len = stbi__get8(s);
+               if (len == 4) {
+                  g->eflags = stbi__get8(s);
+                  g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
+
+                  // unset old transparent
+                  if (g->transparent >= 0) {
+                     g->pal[g->transparent][3] = 255;
+                  }
+                  if (g->eflags & 0x01) {
+                     g->transparent = stbi__get8(s);
+                     if (g->transparent >= 0) {
+                        g->pal[g->transparent][3] = 0;
+                     }
+                  } else {
+                     // don't need transparent
+                     stbi__skip(s, 1);
+                     g->transparent = -1;
+                  }
+               } else {
+                  stbi__skip(s, len);
+                  break;
+               }
+            }
+            while ((len = stbi__get8(s)) != 0) {
+               stbi__skip(s, len);
+            }
+            break;
+         }
+
+         case 0x3B: // gif stream termination code
+            return (stbi_uc *) s; // using '1' causes warning on some compilers
+
+         default:
+            return stbi__errpuc("unknown code", "Corrupt GIF");
+      }
+   }
+}
+
+static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
+{
+   STBI_FREE(g->out);
+   STBI_FREE(g->history);
+   STBI_FREE(g->background);
+
+   if (out) STBI_FREE(out);
+   if (delays && *delays) STBI_FREE(*delays);
+   return stbi__errpuc("outofmem", "Out of memory");
+}
+
+static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
+{
+   if (stbi__gif_test(s)) {
+      int layers = 0;
+      stbi_uc *u = 0;
+      stbi_uc *out = 0;
+      stbi_uc *two_back = 0;
+      stbi__gif g;
+      int stride;
+      int out_size = 0;
+      int delays_size = 0;
+
+      STBI_NOTUSED(out_size);
+      STBI_NOTUSED(delays_size);
+
+      memset(&g, 0, sizeof(g));
+      if (delays) {
+         *delays = 0;
+      }
+
+      do {
+         u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
+         if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
+
+         if (u) {
+            *x = g.w;
+            *y = g.h;
+            ++layers;
+            stride = g.w * g.h * 4;
+
+            if (out) {
+               void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
+               if (!tmp)
+                  return stbi__load_gif_main_outofmem(&g, out, delays);
+               else {
+                   out = (stbi_uc*) tmp;
+                   out_size = layers * stride;
+               }
+
+               if (delays) {
+                  int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
+                  if (!new_delays)
+                     return stbi__load_gif_main_outofmem(&g, out, delays);
+                  *delays = new_delays;
+                  delays_size = layers * sizeof(int);
+               }
+            } else {
+               out = (stbi_uc*)stbi__malloc( layers * stride );
+               if (!out)
+                  return stbi__load_gif_main_outofmem(&g, out, delays);
+               out_size = layers * stride;
+               if (delays) {
+                  *delays = (int*) stbi__malloc( layers * sizeof(int) );
+                  if (!*delays)
+                     return stbi__load_gif_main_outofmem(&g, out, delays);
+                  delays_size = layers * sizeof(int);
+               }
+            }
+            memcpy( out + ((layers - 1) * stride), u, stride );
+            if (layers >= 2) {
+               two_back = out - 2 * stride;
+            }
+
+            if (delays) {
+               (*delays)[layers - 1U] = g.delay;
+            }
+         }
+      } while (u != 0);
+
+      // free temp buffer;
+      STBI_FREE(g.out);
+      STBI_FREE(g.history);
+      STBI_FREE(g.background);
+
+      // do the final conversion after loading everything;
+      if (req_comp && req_comp != 4)
+         out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
+
+      *z = layers;
+      return out;
+   } else {
+      return stbi__errpuc("not GIF", "Image was not as a gif type.");
+   }
+}
+
+static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *u = 0;
+   stbi__gif g;
+   memset(&g, 0, sizeof(g));
+   STBI_NOTUSED(ri);
+
+   u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
+   if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
+   if (u) {
+      *x = g.w;
+      *y = g.h;
+
+      // moved conversion to after successful load so that the same
+      // can be done for multiple frames.
+      if (req_comp && req_comp != 4)
+         u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
+   } else if (g.out) {
+      // if there was an error and we allocated an image buffer, free it!
+      STBI_FREE(g.out);
+   }
+
+   // free buffers needed for multiple frame loading;
+   STBI_FREE(g.history);
+   STBI_FREE(g.background);
+
+   return u;
+}
+
+static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   return stbi__gif_info_raw(s,x,y,comp);
+}
+#endif
+
+// *************************************************************************************************
+// Radiance RGBE HDR loader
+// originally by Nicolas Schulz
+#ifndef STBI_NO_HDR
+static int stbi__hdr_test_core(stbi__context *s, const char *signature)
+{
+   int i;
+   for (i=0; signature[i]; ++i)
+      if (stbi__get8(s) != signature[i])
+          return 0;
+   stbi__rewind(s);
+   return 1;
+}
+
+static int stbi__hdr_test(stbi__context* s)
+{
+   int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
+   stbi__rewind(s);
+   if(!r) {
+       r = stbi__hdr_test_core(s, "#?RGBE\n");
+       stbi__rewind(s);
+   }
+   return r;
+}
+
+#define STBI__HDR_BUFLEN  1024
+static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
+{
+   int len=0;
+   char c = '\0';
+
+   c = (char) stbi__get8(z);
+
+   while (!stbi__at_eof(z) && c != '\n') {
+      buffer[len++] = c;
+      if (len == STBI__HDR_BUFLEN-1) {
+         // flush to end of line
+         while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
+            ;
+         break;
+      }
+      c = (char) stbi__get8(z);
+   }
+
+   buffer[len] = 0;
+   return buffer;
+}
+
+static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
+{
+   if ( input[3] != 0 ) {
+      float f1;
+      // Exponent
+      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
+      if (req_comp <= 2)
+         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
+      else {
+         output[0] = input[0] * f1;
+         output[1] = input[1] * f1;
+         output[2] = input[2] * f1;
+      }
+      if (req_comp == 2) output[1] = 1;
+      if (req_comp == 4) output[3] = 1;
+   } else {
+      switch (req_comp) {
+         case 4: output[3] = 1; /* fallthrough */
+         case 3: output[0] = output[1] = output[2] = 0;
+                 break;
+         case 2: output[1] = 1; /* fallthrough */
+         case 1: output[0] = 0;
+                 break;
+      }
+   }
+}
+
+static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   char buffer[STBI__HDR_BUFLEN];
+   char *token;
+   int valid = 0;
+   int width, height;
+   stbi_uc *scanline;
+   float *hdr_data;
+   int len;
+   unsigned char count, value;
+   int i, j, k, c1,c2, z;
+   const char *headerToken;
+   STBI_NOTUSED(ri);
+
+   // Check identifier
+   headerToken = stbi__hdr_gettoken(s,buffer);
+   if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
+      return stbi__errpf("not HDR", "Corrupt HDR image");
+
+   // Parse header
+   for(;;) {
+      token = stbi__hdr_gettoken(s,buffer);
+      if (token[0] == 0) break;
+      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
+   }
+
+   if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
+
+   // Parse width and height
+   // can't use sscanf() if we're not using stdio!
+   token = stbi__hdr_gettoken(s,buffer);
+   if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
+   token += 3;
+   height = (int) strtol(token, &token, 10);
+   while (*token == ' ') ++token;
+   if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
+   token += 3;
+   width = (int) strtol(token, NULL, 10);
+
+   if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
+   if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
+
+   *x = width;
+   *y = height;
+
+   if (comp) *comp = 3;
+   if (req_comp == 0) req_comp = 3;
+
+   if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
+      return stbi__errpf("too large", "HDR image is too large");
+
+   // Read data
+   hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
+   if (!hdr_data)
+      return stbi__errpf("outofmem", "Out of memory");
+
+   // Load image data
+   // image data is stored as some number of sca
+   if ( width < 8 || width >= 32768) {
+      // Read flat data
+      for (j=0; j < height; ++j) {
+         for (i=0; i < width; ++i) {
+            stbi_uc rgbe[4];
+           main_decode_loop:
+            stbi__getn(s, rgbe, 4);
+            stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
+         }
+      }
+   } else {
+      // Read RLE-encoded data
+      scanline = NULL;
+
+      for (j = 0; j < height; ++j) {
+         c1 = stbi__get8(s);
+         c2 = stbi__get8(s);
+         len = stbi__get8(s);
+         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
+            // not run-length encoded, so we have to actually use THIS data as a decoded
+            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
+            stbi_uc rgbe[4];
+            rgbe[0] = (stbi_uc) c1;
+            rgbe[1] = (stbi_uc) c2;
+            rgbe[2] = (stbi_uc) len;
+            rgbe[3] = (stbi_uc) stbi__get8(s);
+            stbi__hdr_convert(hdr_data, rgbe, req_comp);
+            i = 1;
+            j = 0;
+            STBI_FREE(scanline);
+            goto main_decode_loop; // yes, this makes no sense
+         }
+         len <<= 8;
+         len |= stbi__get8(s);
+         if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
+         if (scanline == NULL) {
+            scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
+            if (!scanline) {
+               STBI_FREE(hdr_data);
+               return stbi__errpf("outofmem", "Out of memory");
+            }
+         }
+
+         for (k = 0; k < 4; ++k) {
+            int nleft;
+            i = 0;
+            while ((nleft = width - i) > 0) {
+               count = stbi__get8(s);
+               if (count > 128) {
+                  // Run
+                  value = stbi__get8(s);
+                  count -= 128;
+                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
+                  for (z = 0; z < count; ++z)
+                     scanline[i++ * 4 + k] = value;
+               } else {
+                  // Dump
+                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
+                  for (z = 0; z < count; ++z)
+                     scanline[i++ * 4 + k] = stbi__get8(s);
+               }
+            }
+         }
+         for (i=0; i < width; ++i)
+            stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
+      }
+      if (scanline)
+         STBI_FREE(scanline);
+   }
+
+   return hdr_data;
+}
+
+static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   char buffer[STBI__HDR_BUFLEN];
+   char *token;
+   int valid = 0;
+   int dummy;
+
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+
+   if (stbi__hdr_test(s) == 0) {
+       stbi__rewind( s );
+       return 0;
+   }
+
+   for(;;) {
+      token = stbi__hdr_gettoken(s,buffer);
+      if (token[0] == 0) break;
+      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
+   }
+
+   if (!valid) {
+       stbi__rewind( s );
+       return 0;
+   }
+   token = stbi__hdr_gettoken(s,buffer);
+   if (strncmp(token, "-Y ", 3)) {
+       stbi__rewind( s );
+       return 0;
+   }
+   token += 3;
+   *y = (int) strtol(token, &token, 10);
+   while (*token == ' ') ++token;
+   if (strncmp(token, "+X ", 3)) {
+       stbi__rewind( s );
+       return 0;
+   }
+   token += 3;
+   *x = (int) strtol(token, NULL, 10);
+   *comp = 3;
+   return 1;
+}
+#endif // STBI_NO_HDR
+
+#ifndef STBI_NO_BMP
+static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   void *p;
+   stbi__bmp_data info;
+
+   info.all_a = 255;
+   p = stbi__bmp_parse_header(s, &info);
+   if (p == NULL) {
+      stbi__rewind( s );
+      return 0;
+   }
+   if (x) *x = s->img_x;
+   if (y) *y = s->img_y;
+   if (comp) {
+      if (info.bpp == 24 && info.ma == 0xff000000)
+         *comp = 3;
+      else
+         *comp = info.ma ? 4 : 3;
+   }
+   return 1;
+}
+#endif
+
+#ifndef STBI_NO_PSD
+static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int channelCount, dummy, depth;
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+   if (stbi__get32be(s) != 0x38425053) {
+       stbi__rewind( s );
+       return 0;
+   }
+   if (stbi__get16be(s) != 1) {
+       stbi__rewind( s );
+       return 0;
+   }
+   stbi__skip(s, 6);
+   channelCount = stbi__get16be(s);
+   if (channelCount < 0 || channelCount > 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   *y = stbi__get32be(s);
+   *x = stbi__get32be(s);
+   depth = stbi__get16be(s);
+   if (depth != 8 && depth != 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   if (stbi__get16be(s) != 3) {
+       stbi__rewind( s );
+       return 0;
+   }
+   *comp = 4;
+   return 1;
+}
+
+static int stbi__psd_is16(stbi__context *s)
+{
+   int channelCount, depth;
+   if (stbi__get32be(s) != 0x38425053) {
+       stbi__rewind( s );
+       return 0;
+   }
+   if (stbi__get16be(s) != 1) {
+       stbi__rewind( s );
+       return 0;
+   }
+   stbi__skip(s, 6);
+   channelCount = stbi__get16be(s);
+   if (channelCount < 0 || channelCount > 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   STBI_NOTUSED(stbi__get32be(s));
+   STBI_NOTUSED(stbi__get32be(s));
+   depth = stbi__get16be(s);
+   if (depth != 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   return 1;
+}
+#endif
+
+#ifndef STBI_NO_PIC
+static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int act_comp=0,num_packets=0,chained,dummy;
+   stbi__pic_packet packets[10];
+
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+
+   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
+      stbi__rewind(s);
+      return 0;
+   }
+
+   stbi__skip(s, 88);
+
+   *x = stbi__get16be(s);
+   *y = stbi__get16be(s);
+   if (stbi__at_eof(s)) {
+      stbi__rewind( s);
+      return 0;
+   }
+   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
+      stbi__rewind( s );
+      return 0;
+   }
+
+   stbi__skip(s, 8);
+
+   do {
+      stbi__pic_packet *packet;
+
+      if (num_packets==sizeof(packets)/sizeof(packets[0]))
+         return 0;
+
+      packet = &packets[num_packets++];
+      chained = stbi__get8(s);
+      packet->size    = stbi__get8(s);
+      packet->type    = stbi__get8(s);
+      packet->channel = stbi__get8(s);
+      act_comp |= packet->channel;
+
+      if (stbi__at_eof(s)) {
+          stbi__rewind( s );
+          return 0;
+      }
+      if (packet->size != 8) {
+          stbi__rewind( s );
+          return 0;
+      }
+   } while (chained);
+
+   *comp = (act_comp & 0x10 ? 4 : 3);
+
+   return 1;
+}
+#endif
+
+// *************************************************************************************************
+// Portable Gray Map and Portable Pixel Map loader
+// by Ken Miller
+//
+// PGM: http://netpbm.sourceforge.net/doc/pgm.html
+// PPM: http://netpbm.sourceforge.net/doc/ppm.html
+//
+// Known limitations:
+//    Does not support comments in the header section
+//    Does not support ASCII image data (formats P2 and P3)
+
+#ifndef STBI_NO_PNM
+
+static int      stbi__pnm_test(stbi__context *s)
+{
+   char p, t;
+   p = (char) stbi__get8(s);
+   t = (char) stbi__get8(s);
+   if (p != 'P' || (t != '5' && t != '6')) {
+       stbi__rewind( s );
+       return 0;
+   }
+   return 1;
+}
+
+static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *out;
+   STBI_NOTUSED(ri);
+
+   ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
+   if (ri->bits_per_channel == 0)
+      return 0;
+
+   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   *x = s->img_x;
+   *y = s->img_y;
+   if (comp) *comp = s->img_n;
+
+   if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
+      return stbi__errpuc("too large", "PNM too large");
+
+   out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
+   if (!out) return stbi__errpuc("outofmem", "Out of memory");
+   if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) {
+      STBI_FREE(out);
+      return stbi__errpuc("bad PNM", "PNM file truncated");
+   }
+
+   if (req_comp && req_comp != s->img_n) {
+      if (ri->bits_per_channel == 16) {
+         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y);
+      } else {
+         out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
+      }
+      if (out == NULL) return out; // stbi__convert_format frees input on failure
+   }
+   return out;
+}
+
+static int      stbi__pnm_isspace(char c)
+{
+   return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
+}
+
+static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
+{
+   for (;;) {
+      while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
+         *c = (char) stbi__get8(s);
+
+      if (stbi__at_eof(s) || *c != '#')
+         break;
+
+      while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
+         *c = (char) stbi__get8(s);
+   }
+}
+
+static int      stbi__pnm_isdigit(char c)
+{
+   return c >= '0' && c <= '9';
+}
+
+static int      stbi__pnm_getinteger(stbi__context *s, char *c)
+{
+   int value = 0;
+
+   while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
+      value = value*10 + (*c - '0');
+      *c = (char) stbi__get8(s);
+      if((value > 214748364) || (value == 214748364 && *c > '7'))
+          return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int");
+   }
+
+   return value;
+}
+
+static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int maxv, dummy;
+   char c, p, t;
+
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+
+   stbi__rewind(s);
+
+   // Get identifier
+   p = (char) stbi__get8(s);
+   t = (char) stbi__get8(s);
+   if (p != 'P' || (t != '5' && t != '6')) {
+       stbi__rewind(s);
+       return 0;
+   }
+
+   *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
+
+   c = (char) stbi__get8(s);
+   stbi__pnm_skip_whitespace(s, &c);
+
+   *x = stbi__pnm_getinteger(s, &c); // read width
+   if(*x == 0)
+       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
+   stbi__pnm_skip_whitespace(s, &c);
+
+   *y = stbi__pnm_getinteger(s, &c); // read height
+   if (*y == 0)
+       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
+   stbi__pnm_skip_whitespace(s, &c);
+
+   maxv = stbi__pnm_getinteger(s, &c);  // read max value
+   if (maxv > 65535)
+      return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
+   else if (maxv > 255)
+      return 16;
+   else
+      return 8;
+}
+
+static int stbi__pnm_is16(stbi__context *s)
+{
+   if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
+	   return 1;
+   return 0;
+}
+#endif
+
+static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
+{
+   #ifndef STBI_NO_JPEG
+   if (stbi__jpeg_info(s, x, y, comp)) return 1;
+   #endif
+
+   #ifndef STBI_NO_PNG
+   if (stbi__png_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_GIF
+   if (stbi__gif_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_BMP
+   if (stbi__bmp_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PSD
+   if (stbi__psd_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PIC
+   if (stbi__pic_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PNM
+   if (stbi__pnm_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_HDR
+   if (stbi__hdr_info(s, x, y, comp))  return 1;
+   #endif
+
+   // test tga last because it's a crappy test!
+   #ifndef STBI_NO_TGA
+   if (stbi__tga_info(s, x, y, comp))
+       return 1;
+   #endif
+   return stbi__err("unknown image type", "Image not of any known type, or corrupt");
+}
+
+static int stbi__is_16_main(stbi__context *s)
+{
+   #ifndef STBI_NO_PNG
+   if (stbi__png_is16(s))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PSD
+   if (stbi__psd_is16(s))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PNM
+   if (stbi__pnm_is16(s))  return 1;
+   #endif
+   return 0;
+}
+
+#ifndef STBI_NO_STDIO
+STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
+{
+    FILE *f = stbi__fopen(filename, "rb");
+    int result;
+    if (!f) return stbi__err("can't fopen", "Unable to open file");
+    result = stbi_info_from_file(f, x, y, comp);
+    fclose(f);
+    return result;
+}
+
+STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
+{
+   int r;
+   stbi__context s;
+   long pos = ftell(f);
+   stbi__start_file(&s, f);
+   r = stbi__info_main(&s,x,y,comp);
+   fseek(f,pos,SEEK_SET);
+   return r;
+}
+
+STBIDEF int stbi_is_16_bit(char const *filename)
+{
+    FILE *f = stbi__fopen(filename, "rb");
+    int result;
+    if (!f) return stbi__err("can't fopen", "Unable to open file");
+    result = stbi_is_16_bit_from_file(f);
+    fclose(f);
+    return result;
+}
+
+STBIDEF int stbi_is_16_bit_from_file(FILE *f)
+{
+   int r;
+   stbi__context s;
+   long pos = ftell(f);
+   stbi__start_file(&s, f);
+   r = stbi__is_16_main(&s);
+   fseek(f,pos,SEEK_SET);
+   return r;
+}
+#endif // !STBI_NO_STDIO
+
+STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__info_main(&s,x,y,comp);
+}
+
+STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
+   return stbi__info_main(&s,x,y,comp);
+}
+
+STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__is_16_main(&s);
+}
+
+STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
+   return stbi__is_16_main(&s);
+}
+
+#endif // STB_IMAGE_IMPLEMENTATION
+
+/*
+   revision history:
+      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
+      2.19  (2018-02-11) fix warning
+      2.18  (2018-01-30) fix warnings
+      2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
+                         1-bit BMP
+                         *_is_16_bit api
+                         avoid warnings
+      2.16  (2017-07-23) all functions have 16-bit variants;
+                         STBI_NO_STDIO works again;
+                         compilation fixes;
+                         fix rounding in unpremultiply;
+                         optimize vertical flip;
+                         disable raw_len validation;
+                         documentation fixes
+      2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
+                         warning fixes; disable run-time SSE detection on gcc;
+                         uniform handling of optional "return" values;
+                         thread-safe initialization of zlib tables
+      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
+      2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
+      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
+      2.11  (2016-04-02) allocate large structures on the stack
+                         remove white matting for transparent PSD
+                         fix reported channel count for PNG & BMP
+                         re-enable SSE2 in non-gcc 64-bit
+                         support RGB-formatted JPEG
+                         read 16-bit PNGs (only as 8-bit)
+      2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
+      2.09  (2016-01-16) allow comments in PNM files
+                         16-bit-per-pixel TGA (not bit-per-component)
+                         info() for TGA could break due to .hdr handling
+                         info() for BMP to shares code instead of sloppy parse
+                         can use STBI_REALLOC_SIZED if allocator doesn't support realloc
+                         code cleanup
+      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
+      2.07  (2015-09-13) fix compiler warnings
+                         partial animated GIF support
+                         limited 16-bpc PSD support
+                         #ifdef unused functions
+                         bug with < 92 byte PIC,PNM,HDR,TGA
+      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
+      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
+      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
+      2.03  (2015-04-12) extra corruption checking (mmozeiko)
+                         stbi_set_flip_vertically_on_load (nguillemot)
+                         fix NEON support; fix mingw support
+      2.02  (2015-01-19) fix incorrect assert, fix warning
+      2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
+      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
+      2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
+                         progressive JPEG (stb)
+                         PGM/PPM support (Ken Miller)
+                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
+                         GIF bugfix -- seemingly never worked
+                         STBI_NO_*, STBI_ONLY_*
+      1.48  (2014-12-14) fix incorrectly-named assert()
+      1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
+                         optimize PNG (ryg)
+                         fix bug in interlaced PNG with user-specified channel count (stb)
+      1.46  (2014-08-26)
+              fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
+      1.45  (2014-08-16)
+              fix MSVC-ARM internal compiler error by wrapping malloc
+      1.44  (2014-08-07)
+              various warning fixes from Ronny Chevalier
+      1.43  (2014-07-15)
+              fix MSVC-only compiler problem in code changed in 1.42
+      1.42  (2014-07-09)
+              don't define _CRT_SECURE_NO_WARNINGS (affects user code)
+              fixes to stbi__cleanup_jpeg path
+              added STBI_ASSERT to avoid requiring assert.h
+      1.41  (2014-06-25)
+              fix search&replace from 1.36 that messed up comments/error messages
+      1.40  (2014-06-22)
+              fix gcc struct-initialization warning
+      1.39  (2014-06-15)
+              fix to TGA optimization when req_comp != number of components in TGA;
+              fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
+              add support for BMP version 5 (more ignored fields)
+      1.38  (2014-06-06)
+              suppress MSVC warnings on integer casts truncating values
+              fix accidental rename of 'skip' field of I/O
+      1.37  (2014-06-04)
+              remove duplicate typedef
+      1.36  (2014-06-03)
+              convert to header file single-file library
+              if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
+      1.35  (2014-05-27)
+              various warnings
+              fix broken STBI_SIMD path
+              fix bug where stbi_load_from_file no longer left file pointer in correct place
+              fix broken non-easy path for 32-bit BMP (possibly never used)
+              TGA optimization by Arseny Kapoulkine
+      1.34  (unknown)
+              use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
+      1.33  (2011-07-14)
+              make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
+      1.32  (2011-07-13)
+              support for "info" function for all supported filetypes (SpartanJ)
+      1.31  (2011-06-20)
+              a few more leak fixes, bug in PNG handling (SpartanJ)
+      1.30  (2011-06-11)
+              added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
+              removed deprecated format-specific test/load functions
+              removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
+              error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
+              fix inefficiency in decoding 32-bit BMP (David Woo)
+      1.29  (2010-08-16)
+              various warning fixes from Aurelien Pocheville
+      1.28  (2010-08-01)
+              fix bug in GIF palette transparency (SpartanJ)
+      1.27  (2010-08-01)
+              cast-to-stbi_uc to fix warnings
+      1.26  (2010-07-24)
+              fix bug in file buffering for PNG reported by SpartanJ
+      1.25  (2010-07-17)
+              refix trans_data warning (Won Chun)
+      1.24  (2010-07-12)
+              perf improvements reading from files on platforms with lock-heavy fgetc()
+              minor perf improvements for jpeg
+              deprecated type-specific functions so we'll get feedback if they're needed
+              attempt to fix trans_data warning (Won Chun)
+      1.23    fixed bug in iPhone support
+      1.22  (2010-07-10)
+              removed image *writing* support
+              stbi_info support from Jetro Lauha
+              GIF support from Jean-Marc Lienher
+              iPhone PNG-extensions from James Brown
+              warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
+      1.21    fix use of 'stbi_uc' in header (reported by jon blow)
+      1.20    added support for Softimage PIC, by Tom Seddon
+      1.19    bug in interlaced PNG corruption check (found by ryg)
+      1.18  (2008-08-02)
+              fix a threading bug (local mutable static)
+      1.17    support interlaced PNG
+      1.16    major bugfix - stbi__convert_format converted one too many pixels
+      1.15    initialize some fields for thread safety
+      1.14    fix threadsafe conversion bug
+              header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
+      1.13    threadsafe
+      1.12    const qualifiers in the API
+      1.11    Support installable IDCT, colorspace conversion routines
+      1.10    Fixes for 64-bit (don't use "unsigned long")
+              optimized upsampling by Fabian "ryg" Giesen
+      1.09    Fix format-conversion for PSD code (bad global variables!)
+      1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
+      1.07    attempt to fix C++ warning/errors again
+      1.06    attempt to fix C++ warning/errors again
+      1.05    fix TGA loading to return correct *comp and use good luminance calc
+      1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
+      1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
+      1.02    support for (subset of) HDR files, float interface for preferred access to them
+      1.01    fix bug: possible bug in handling right-side up bmps... not sure
+              fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
+      1.00    interface to zlib that skips zlib header
+      0.99    correct handling of alpha in palette
+      0.98    TGA loader by lonesock; dynamically add loaders (untested)
+      0.97    jpeg errors on too large a file; also catch another malloc failure
+      0.96    fix detection of invalid v value - particleman@mollyrocket forum
+      0.95    during header scan, seek to markers in case of padding
+      0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
+      0.93    handle jpegtran output; verbose errors
+      0.92    read 4,8,16,24,32-bit BMP files of several formats
+      0.91    output 24-bit Windows 3.0 BMP files
+      0.90    fix a few more warnings; bump version number to approach 1.0
+      0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
+      0.60    fix compiling as c++
+      0.59    fix warnings: merge Dave Moore's -Wall fixes
+      0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
+      0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
+      0.56    fix bug: zlib uncompressed mode len vs. nlen
+      0.55    fix bug: restart_interval not initialized to 0
+      0.54    allow NULL for 'int *comp'
+      0.53    fix bug in png 3->4; speedup png decoding
+      0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
+      0.51    obey req_comp requests, 1-component jpegs return as 1-component,
+              on 'test' only check type, not whether we support this variant
+      0.50  (2006-11-19)
+              first released version
+*/
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_image_resize2.h b/vendor/stb/stb_image_resize2.h
new file mode 100644
index 0000000..6146ab7
--- /dev/null
+++ b/vendor/stb/stb_image_resize2.h
@@ -0,0 +1,10651 @@
+/* stb_image_resize2 - v2.17 - public domain image resizing
+
+   by Jeff Roberts (v2) and Jorge L Rodriguez
+   http://github.com/nothings/stb
+
+   Can be threaded with the extended API. SSE2, AVX, Neon and WASM SIMD support. Only
+   scaling and translation is supported, no rotations or shears.
+
+   COMPILING & LINKING
+      In one C/C++ file that #includes this file, do this:
+         #define STB_IMAGE_RESIZE_IMPLEMENTATION
+      before the #include. That will create the implementation in that file.
+
+   EASY API CALLS:
+     Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation, clamps to edge.
+
+     stbir_resize_uint8_srgb( input_pixels,  input_w,  input_h,  input_stride_in_bytes,
+                              output_pixels, output_w, output_h, output_stride_in_bytes,
+                              pixel_layout_enum )
+
+     stbir_resize_uint8_linear( input_pixels,  input_w,  input_h,  input_stride_in_bytes,
+                                output_pixels, output_w, output_h, output_stride_in_bytes,
+                                pixel_layout_enum )
+
+     stbir_resize_float_linear( input_pixels,  input_w,  input_h,  input_stride_in_bytes,
+                                output_pixels, output_w, output_h, output_stride_in_bytes,
+                                pixel_layout_enum )
+
+     If you pass NULL or zero for the output_pixels, we will allocate the output buffer
+     for you and return it from the function (free with free() or STBIR_FREE).
+     As a special case, XX_stride_in_bytes of 0 means packed continuously in memory.
+
+   API LEVELS
+      There are three levels of API - easy-to-use, medium-complexity and extended-complexity.
+
+      See the "header file" section of the source for API documentation.
+
+   ADDITIONAL DOCUMENTATION
+
+      MEMORY ALLOCATION
+         By default, we use malloc and free for memory allocation.  To override the
+         memory allocation, before the implementation #include, add a:
+
+            #define STBIR_MALLOC(size,user_data) ...
+            #define STBIR_FREE(ptr,user_data)   ...
+
+         Each resize makes exactly one call to malloc/free (unless you use the
+         extended API where you can do one allocation for many resizes). Under
+         address sanitizer, we do separate allocations to find overread/writes.
+
+      PERFORMANCE
+         This library was written with an emphasis on performance. When testing
+         stb_image_resize with RGBA, the fastest mode is STBIR_4CHANNEL with
+         STBIR_TYPE_UINT8 pixels and CLAMPed edges (which is what many other resize
+         libs do by default). Also, make sure SIMD is turned on of course (default
+         for 64-bit targets). Avoid WRAP edge mode if you want the fastest speed.
+
+         This library also comes with profiling built-in. If you define STBIR_PROFILE,
+         you can use the advanced API and get low-level profiling information by
+         calling stbir_resize_extended_profile_info() or stbir_resize_split_profile_info()
+         after a resize.
+
+      SIMD
+         Most of the routines have optimized SSE2, AVX, NEON and WASM versions.
+
+         On Microsoft compilers, we automatically turn on SIMD for 64-bit x64 and
+         ARM; for 32-bit x86 and ARM, you select SIMD mode by defining STBIR_SSE2 or
+         STBIR_NEON. For AVX and AVX2, we auto-select it by detecting the /arch:AVX
+         or /arch:AVX2 switches. You can also always manually turn SSE2, AVX or AVX2
+         support on by defining STBIR_SSE2, STBIR_AVX or STBIR_AVX2.
+
+         On Linux, SSE2 and Neon is on by default for 64-bit x64 or ARM64. For 32-bit,
+         we select x86 SIMD mode by whether you have -msse2, -mavx or -mavx2 enabled
+         on the command line. For 32-bit ARM, you must pass -mfpu=neon-vfpv4 for both
+         clang and GCC, but GCC also requires an additional -mfp16-format=ieee to
+         automatically enable NEON.
+
+         On x86 platforms, you can also define STBIR_FP16C to turn on FP16C instructions
+         for converting back and forth to half-floats. This is autoselected when we
+         are using AVX2. Clang and GCC also require the -mf16c switch. ARM always uses
+         the built-in half float hardware NEON instructions.
+
+         You can also tell us to use multiply-add instructions with STBIR_USE_FMA.
+         Because x86 doesn't always have fma, we turn it off by default to maintain
+         determinism across all platforms. If you don't care about non-FMA determinism
+         and are willing to restrict yourself to more recent x86 CPUs (around the AVX
+         timeframe), then fma will give you around a 15% speedup.
+
+         You can force off SIMD in all cases by defining STBIR_NO_SIMD. You can turn
+         off AVX or AVX2 specifically with STBIR_NO_AVX or STBIR_NO_AVX2. AVX is 10%
+         to 40% faster, and AVX2 is generally another 12%.
+
+      ALPHA CHANNEL
+         Most of the resizing functions provide the ability to control how the alpha
+         channel of an image is processed.
+
+         When alpha represents transparency, it is important that when combining
+         colors with filtering, the pixels should not be treated equally; they
+         should use a weighted average based on their alpha values. For example,
+         if a pixel is 1% opaque bright green and another pixel is 99% opaque
+         black and you average them, the average will be 50% opaque, but the
+         unweighted average and will be a middling green color, while the weighted
+         average will be nearly black. This means the unweighted version introduced
+         green energy that didn't exist in the source image.
+
+         (If you want to know why this makes sense, you can work out the math for
+         the following: consider what happens if you alpha composite a source image
+         over a fixed color and then average the output, vs. if you average the
+         source image pixels and then composite that over the same fixed color.
+         Only the weighted average produces the same result as the ground truth
+         composite-then-average result.)
+
+         Therefore, it is in general best to "alpha weight" the pixels when applying
+         filters to them. This essentially means multiplying the colors by the alpha
+         values before combining them, and then dividing by the alpha value at the
+         end.
+
+         The computer graphics industry introduced a technique called "premultiplied
+         alpha" or "associated alpha" in which image colors are stored in image files
+         already multiplied by their alpha. This saves some math when compositing,
+         and also avoids the need to divide by the alpha at the end (which is quite
+         inefficient). However, while premultiplied alpha is common in the movie CGI
+         industry, it is not commonplace in other industries like videogames, and most
+         consumer file formats are generally expected to contain not-premultiplied
+         colors. For example, Photoshop saves PNG files "unpremultiplied", and web
+         browsers like Chrome and Firefox expect PNG images to be unpremultiplied.
+
+         Note that there are three possibilities that might describe your image
+         and resize expectation:
+
+             1. images are not premultiplied, alpha weighting is desired
+             2. images are not premultiplied, alpha weighting is not desired
+             3. images are premultiplied
+
+         Both case #2 and case #3 require the exact same math: no alpha weighting
+         should be applied or removed. Only case 1 requires extra math operations;
+         the other two cases can be handled identically.
+
+         stb_image_resize expects case #1 by default, applying alpha weighting to
+         images, expecting the input images to be unpremultiplied. This is what the
+         COLOR+ALPHA buffer types tell the resizer to do.
+
+         When you use the pixel layouts STBIR_RGBA, STBIR_BGRA, STBIR_ARGB,
+         STBIR_ABGR, STBIR_RX, or STBIR_XR you are telling us that the pixels are
+         non-premultiplied. In these cases, the resizer will alpha weight the colors
+         (effectively creating the premultiplied image), do the filtering, and then
+         convert back to non-premult on exit.
+
+         When you use the pixel layouts STBIR_RGBA_PM, STBIR_RGBA_PM, STBIR_RGBA_PM,
+         STBIR_RGBA_PM, STBIR_RX_PM or STBIR_XR_PM, you are telling that the pixels
+         ARE premultiplied. In this case, the resizer doesn't have to do the
+         premultipling - it can filter directly on the input. This about twice as
+         fast as the non-premultiplied case, so it's the right option if your data is
+         already setup correctly.
+
+         When you use the pixel layout STBIR_4CHANNEL or STBIR_2CHANNEL, you are
+         telling us that there is no channel that represents transparency; it may be
+         RGB and some unrelated fourth channel that has been stored in the alpha
+         channel, but it is actually not alpha. No special processing will be
+         performed.
+
+         The difference between the generic 4 or 2 channel layouts, and the
+         specialized _PM versions is with the _PM versions you are telling us that
+         the data *is* alpha, just don't premultiply it. That's important when
+         using SRGB pixel formats, we need to know where the alpha is, because
+         it is converted linearly (rather than with the SRGB converters).
+
+         Because alpha weighting produces the same effect as premultiplying, you
+         even have the option with non-premultiplied inputs to let the resizer
+         produce a premultiplied output. Because the intially computed alpha-weighted
+         output image is effectively premultiplied, this is actually more performant
+         than the normal path which un-premultiplies the output image as a final step.
+
+         Finally, when converting both in and out of non-premulitplied space (for
+         example, when using STBIR_RGBA), we go to somewhat heroic measures to
+         ensure that areas with zero alpha value pixels get something reasonable
+         in the RGB values. If you don't care about the RGB values of zero alpha
+         pixels, you can call the stbir_set_non_pm_alpha_speed_over_quality()
+         function - this runs a premultiplied resize about 25% faster. That said,
+         when you really care about speed, using premultiplied pixels for both in
+         and out (STBIR_RGBA_PM, etc) much faster than both of these premultiplied
+         options.
+
+      PIXEL LAYOUT CONVERSION
+         The resizer can convert from some pixel layouts to others. When using the
+         stbir_set_pixel_layouts(), you can, for example, specify STBIR_RGBA
+         on input, and STBIR_ARGB on output, and it will re-organize the channels
+         during the resize. Currently, you can only convert between two pixel
+         layouts with the same number of channels.
+
+      DETERMINISM
+         We commit to being deterministic (from x64 to ARM to scalar to SIMD, etc).
+         This requires compiling with fast-math off (using at least /fp:precise).
+         Also, you must turn off fp-contracting (which turns mult+adds into fmas)!
+         We attempt to do this with pragmas, but with Clang, you usually want to add
+         -ffp-contract=off to the command line as well.
+
+         For 32-bit x86, you must use SSE and SSE2 codegen for determinism. That is,
+         if the scalar x87 unit gets used at all, we immediately lose determinism.
+         On Microsoft Visual Studio 2008 and earlier, from what we can tell there is
+         no way to be deterministic in 32-bit x86 (some x87 always leaks in, even
+         with fp:strict). On 32-bit x86 GCC, determinism requires both -msse2 and
+         -fpmath=sse.
+
+         Note that we will not be deterministic with float data containing NaNs -
+         the NaNs will propagate differently on different SIMD and platforms.
+
+         If you turn on STBIR_USE_FMA, then we will be deterministic with other
+         fma targets, but we will differ from non-fma targets (this is unavoidable,
+         because a fma isn't simply an add with a mult - it also introduces a
+         rounding difference compared to non-fma instruction sequences.
+
+      FLOAT PIXEL FORMAT RANGE
+         Any range of values can be used for the non-alpha float data that you pass
+         in (0 to 1, -1 to 1, whatever). However, if you are inputting float values
+         but *outputting* bytes or shorts, you must use a range of 0 to 1 so that we
+         scale back properly. The alpha channel must also be 0 to 1 for any format
+         that does premultiplication prior to resizing.
+
+         Note also that with float output, using filters with negative lobes, the
+         output filtered values might go slightly out of range. You can define
+         STBIR_FLOAT_LOW_CLAMP and/or STBIR_FLOAT_HIGH_CLAMP to specify the range
+         to clamp to on output, if that's important.
+
+      MAX/MIN SCALE FACTORS
+         The input pixel resolutions are in integers, and we do the internal pointer
+         resolution in size_t sized integers. However, the scale ratio from input
+         resolution to output resolution is calculated in float form. This means
+         the effective possible scale ratio is limited to 24 bits (or 16 million
+         to 1). As you get close to the size of the float resolution (again, 16
+         million pixels wide or high), you might start seeing float inaccuracy
+         issues in general in the pipeline. If you have to do extreme resizes,
+         you can usually do this is multiple stages (using float intermediate
+         buffers).
+
+      FLIPPED IMAGES
+         Stride is just the delta from one scanline to the next. This means you can
+         use a negative stride to handle inverted images (point to the final
+         scanline and use a negative stride). You can invert the input or output,
+         using negative strides.
+
+      DEFAULT FILTERS
+         For functions which don't provide explicit control over what filters to
+         use, you can change the compile-time defaults with:
+
+            #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
+            #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
+
+         See stbir_filter in the header-file section for the list of filters.
+
+      NEW FILTERS
+         A number of 1D filter kernels are supplied. For a list of supported
+         filters, see the stbir_filter enum. You can install your own filters by
+         using the stbir_set_filter_callbacks function.
+
+      PROGRESS
+         For interactive use with slow resize operations, you can use the 
+         scanline callbacks in the extended API. It would have to be a *very* large
+         image resample to need progress though - we're very fast.
+
+      CEIL and FLOOR
+         In scalar mode, the only functions we use from math.h are ceilf and floorf,
+         but if you have your own versions, you can define the STBIR_CEILF(v) and
+         STBIR_FLOORF(v) macros and we'll use them instead. In SIMD, we just use
+         our own versions.
+
+      ASSERT
+         Define STBIR_ASSERT(boolval) to override assert() and not use assert.h
+
+     PORTING FROM VERSION 1
+        The API has changed. You can continue to use the old version of stb_image_resize.h,
+        which is available in the "deprecated/" directory.
+
+        If you're using the old simple-to-use API, porting is straightforward.
+        (For more advanced APIs, read the documentation.)
+
+          stbir_resize_uint8():
+            - call `stbir_resize_uint8_linear`, cast channel count to `stbir_pixel_layout`
+
+          stbir_resize_float():
+            - call `stbir_resize_float_linear`, cast channel count to `stbir_pixel_layout`
+
+          stbir_resize_uint8_srgb():
+            - function name is unchanged
+            - cast channel count to `stbir_pixel_layout`
+            - above is sufficient unless your image has alpha and it's not RGBA/BGRA
+              - in that case, follow the below instructions for stbir_resize_uint8_srgb_edgemode
+
+          stbir_resize_uint8_srgb_edgemode()
+            - switch to the "medium complexity" API
+            - stbir_resize(), very similar API but a few more parameters:
+              - pixel_layout: cast channel count to `stbir_pixel_layout`
+              - data_type:    STBIR_TYPE_UINT8_SRGB
+              - edge:         unchanged (STBIR_EDGE_WRAP, etc.)
+              - filter:       STBIR_FILTER_DEFAULT
+            - which channel is alpha is specified in stbir_pixel_layout, see enum for details
+
+      FUTURE TODOS
+        *  For polyphase integral filters, we just memcpy the coeffs to dupe
+           them, but we should indirect and use the same coeff memory.
+        *  Add pixel layout conversions for sensible different channel counts
+           (maybe, 1->3/4, 3->4, 4->1, 3->1).
+         * For SIMD encode and decode scanline routines, do any pre-aligning
+           for bad input/output buffer alignments and pitch?
+         * For very wide scanlines, we should we do vertical strips to stay within
+           L2 cache. Maybe do chunks of 1K pixels at a time. There would be
+           some pixel reconversion, but probably dwarfed by things falling out
+           of cache. Probably also something possible with alternating between
+           scattering and gathering at high resize scales?
+         * Should we have a multiple MIPs at the same time function (could keep
+           more memory in cache during multiple resizes)?
+         * Rewrite the coefficient generator to do many at once.
+         * AVX-512 vertical kernels - worried about downclocking here.
+         * Convert the reincludes to macros when we know they aren't changing.
+         * Experiment with pivoting the horizontal and always using the
+           vertical filters (which are faster, but perhaps not enough to overcome
+           the pivot cost and the extra memory touches). Need to buffer the whole
+           image so have to balance memory use.
+         * Most of our code is internally function pointers, should we compile
+           all the SIMD stuff always and dynamically dispatch?
+
+   CONTRIBUTORS
+      Jeff Roberts: 2.0 implementation, optimizations, SIMD
+      Martins Mozeiko: NEON simd, WASM simd, clang and GCC whisperer
+      Fabian Giesen: half float and srgb converters
+      Sean Barrett: API design, optimizations
+      Jorge L Rodriguez: Original 1.0 implementation
+      Aras Pranckevicius: bugfixes
+      Nathan Reed: warning fixes for 1.0
+
+   REVISIONS
+      2.17 (2025-10-25) silly format bug in easy-to-use APIs.
+      2.16 (2025-10-21) fixed the easy-to-use APIs to allow inverted bitmaps (negative
+                          strides), fix vertical filter kernel callback, fix threaded
+                          gather buffer priming (and assert).
+                          (thanks adipose, TainZerL, and Harrison Green)
+      2.15 (2025-07-17) fixed an assert in debug mode when using floats with input
+                          callbacks, work around GCC warning when adding to null ptr
+                          (thanks Johannes Spohr and Pyry Kovanen).
+      2.14 (2025-05-09) fixed a bug using downsampling gather horizontal first, and 
+                          scatter with vertical first.
+      2.13 (2025-02-27) fixed a bug when using input callbacks, turned off simd for 
+                          tiny-c, fixed some variables that should have been static,
+                          fixes a bug when calculating temp memory with resizes that
+                          exceed 2GB of temp memory (very large resizes).
+      2.12 (2024-10-18) fix incorrect use of user_data with STBIR_FREE
+      2.11 (2024-09-08) fix harmless asan warnings in 2-channel and 3-channel mode
+                          with AVX-2, fix some weird scaling edge conditions with
+                          point sample mode.
+      2.10 (2024-07-27) fix the defines GCC and mingw for loop unroll control,
+                          fix MSVC 32-bit arm half float routines.
+      2.09 (2024-06-19) fix the defines for 32-bit ARM GCC builds (was selecting
+                          hardware half floats).
+      2.08 (2024-06-10) fix for RGB->BGR three channel flips and add SIMD (thanks
+                          to Ryan Salsbury), fix for sub-rect resizes, use the
+                          pragmas to control unrolling when they are available.
+      2.07 (2024-05-24) fix for slow final split during threaded conversions of very 
+                          wide scanlines when downsampling (caused by extra input 
+                          converting), fix for wide scanline resamples with many 
+                          splits (int overflow), fix GCC warning.
+      2.06 (2024-02-10) fix for identical width/height 3x or more down-scaling 
+                          undersampling a single row on rare resize ratios (about 1%).
+      2.05 (2024-02-07) fix for 2 pixel to 1 pixel resizes with wrap (thanks Aras),
+                        fix for output callback (thanks Julien Koenen).
+      2.04 (2023-11-17) fix for rare AVX bug, shadowed symbol (thanks Nikola Smiljanic).
+      2.03 (2023-11-01) ASAN and TSAN warnings fixed, minor tweaks.
+      2.00 (2023-10-10) mostly new source: new api, optimizations, simd, vertical-first, etc
+                          2x-5x faster without simd, 4x-12x faster with simd,
+                          in some cases, 20x to 40x faster esp resizing large to very small.
+      0.96 (2019-03-04) fixed warnings
+      0.95 (2017-07-23) fixed warnings
+      0.94 (2017-03-18) fixed warnings
+      0.93 (2017-03-03) fixed bug with certain combinations of heights
+      0.92 (2017-01-02) fix integer overflow on large (>2GB) images
+      0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
+      0.90 (2014-09-17) first released version
+
+   LICENSE
+     See end of file for license information.
+*/
+
+#if !defined(STB_IMAGE_RESIZE_DO_HORIZONTALS) && !defined(STB_IMAGE_RESIZE_DO_VERTICALS) && !defined(STB_IMAGE_RESIZE_DO_CODERS)   // for internal re-includes
+
+#ifndef STBIR_INCLUDE_STB_IMAGE_RESIZE2_H
+#define STBIR_INCLUDE_STB_IMAGE_RESIZE2_H
+
+#include <stddef.h>
+#ifdef _MSC_VER
+typedef unsigned char    stbir_uint8;
+typedef unsigned short   stbir_uint16;
+typedef unsigned int     stbir_uint32;
+typedef unsigned __int64 stbir_uint64;
+#else
+#include <stdint.h>
+typedef uint8_t  stbir_uint8;
+typedef uint16_t stbir_uint16;
+typedef uint32_t stbir_uint32;
+typedef uint64_t stbir_uint64;
+#endif
+
+#ifndef STBIRDEF
+#ifdef STB_IMAGE_RESIZE_STATIC
+#define STBIRDEF static
+#else
+#ifdef __cplusplus
+#define STBIRDEF extern "C"
+#else
+#define STBIRDEF extern
+#endif
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+////   start "header file" ///////////////////////////////////////////////////
+//
+// Easy-to-use API:
+//
+//     * stride is the offset between successive rows of image data
+//        in memory, in bytes. specify 0 for packed continuously in memory
+//     * colorspace is linear or sRGB as specified by function name
+//     * Uses the default filters
+//     * Uses edge mode clamped
+//     * returned result is 1 for success or 0 in case of an error.
+
+
+// stbir_pixel_layout specifies:
+//   number of channels
+//   order of channels
+//   whether color is premultiplied by alpha
+// for back compatibility, you can cast the old channel count to an stbir_pixel_layout
+typedef enum
+{
+  STBIR_1CHANNEL = 1,
+  STBIR_2CHANNEL = 2,
+  STBIR_RGB      = 3,               // 3-chan, with order specified (for channel flipping)
+  STBIR_BGR      = 0,               // 3-chan, with order specified (for channel flipping)
+  STBIR_4CHANNEL = 5,
+
+  STBIR_RGBA = 4,                   // alpha formats, where alpha is NOT premultiplied into color channels
+  STBIR_BGRA = 6,
+  STBIR_ARGB = 7,
+  STBIR_ABGR = 8,
+  STBIR_RA   = 9,
+  STBIR_AR   = 10,
+
+  STBIR_RGBA_PM = 11,               // alpha formats, where alpha is premultiplied into color channels
+  STBIR_BGRA_PM = 12,
+  STBIR_ARGB_PM = 13,
+  STBIR_ABGR_PM = 14,
+  STBIR_RA_PM   = 15,
+  STBIR_AR_PM   = 16,
+
+  STBIR_RGBA_NO_AW = 11,            // alpha formats, where NO alpha weighting is applied at all!
+  STBIR_BGRA_NO_AW = 12,            //   these are just synonyms for the _PM flags (which also do
+  STBIR_ARGB_NO_AW = 13,            //   no alpha weighting). These names just make it more clear
+  STBIR_ABGR_NO_AW = 14,            //   for some folks).
+  STBIR_RA_NO_AW   = 15,
+  STBIR_AR_NO_AW   = 16,
+
+} stbir_pixel_layout;
+
+//===============================================================
+//  Simple-complexity API
+//
+//    If output_pixels is NULL (0), then we will allocate the buffer and return it to you.
+//--------------------------------
+
+STBIRDEF unsigned char * stbir_resize_uint8_srgb( const unsigned char *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                                        unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                                        stbir_pixel_layout pixel_type );
+
+STBIRDEF unsigned char * stbir_resize_uint8_linear( const unsigned char *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                                          unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                                          stbir_pixel_layout pixel_type );
+
+STBIRDEF float * stbir_resize_float_linear( const float *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                                  float *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                                  stbir_pixel_layout pixel_type );
+//===============================================================
+
+//===============================================================
+// Medium-complexity API
+//
+// This extends the easy-to-use API as follows:
+//
+//     * Can specify the datatype - U8, U8_SRGB, U16, FLOAT, HALF_FLOAT
+//     * Edge wrap can selected explicitly
+//     * Filter can be selected explicitly
+//--------------------------------
+
+typedef enum
+{
+  STBIR_EDGE_CLAMP   = 0,
+  STBIR_EDGE_REFLECT = 1,
+  STBIR_EDGE_WRAP    = 2,  // this edge mode is slower and uses more memory
+  STBIR_EDGE_ZERO    = 3,
+} stbir_edge;
+
+typedef enum
+{
+  STBIR_FILTER_DEFAULT      = 0,  // use same filter type that easy-to-use API chooses
+  STBIR_FILTER_BOX          = 1,  // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
+  STBIR_FILTER_TRIANGLE     = 2,  // On upsampling, produces same results as bilinear texture filtering
+  STBIR_FILTER_CUBICBSPLINE = 3,  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
+  STBIR_FILTER_CATMULLROM   = 4,  // An interpolating cubic spline
+  STBIR_FILTER_MITCHELL     = 5,  // Mitchell-Netrevalli filter with B=1/3, C=1/3
+  STBIR_FILTER_POINT_SAMPLE = 6,  // Simple point sampling
+  STBIR_FILTER_OTHER        = 7,  // User callback specified
+} stbir_filter;
+
+typedef enum
+{
+  STBIR_TYPE_UINT8            = 0,
+  STBIR_TYPE_UINT8_SRGB       = 1,
+  STBIR_TYPE_UINT8_SRGB_ALPHA = 2,  // alpha channel, when present, should also be SRGB (this is very unusual)
+  STBIR_TYPE_UINT16           = 3,
+  STBIR_TYPE_FLOAT            = 4,
+  STBIR_TYPE_HALF_FLOAT       = 5
+} stbir_datatype;
+
+// medium api
+STBIRDEF void *  stbir_resize( const void *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                     void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                               stbir_pixel_layout pixel_layout, stbir_datatype data_type,
+                               stbir_edge edge, stbir_filter filter );
+//===============================================================
+
+
+
+//===============================================================
+// Extended-complexity API
+//
+// This API exposes all resize functionality.
+//
+//     * Separate filter types for each axis
+//     * Separate edge modes for each axis
+//     * Separate input and output data types
+//     * Can specify regions with subpixel correctness
+//     * Can specify alpha flags
+//     * Can specify a memory callback
+//     * Can specify a callback data type for pixel input and output
+//     * Can be threaded for a single resize
+//     * Can be used to resize many frames without recalculating the sampler info
+//
+//  Use this API as follows:
+//     1) Call the stbir_resize_init function on a local STBIR_RESIZE structure
+//     2) Call any of the stbir_set functions
+//     3) Optionally call stbir_build_samplers() if you are going to resample multiple times
+//        with the same input and output dimensions (like resizing video frames)
+//     4) Resample by calling stbir_resize_extended().
+//     5) Call stbir_free_samplers() if you called stbir_build_samplers()
+//--------------------------------
+
+
+// Types:
+
+// INPUT CALLBACK: this callback is used for input scanlines
+typedef void const * stbir_input_callback( void * optional_output, void const * input_ptr, int num_pixels, int x, int y, void * context );
+
+// OUTPUT CALLBACK: this callback is used for output scanlines
+typedef void stbir_output_callback( void const * output_ptr, int num_pixels, int y, void * context );
+
+// callbacks for user installed filters
+typedef float stbir__kernel_callback( float x, float scale, void * user_data ); // centered at zero
+typedef float stbir__support_callback( float scale, void * user_data );
+
+// internal structure with precomputed scaling
+typedef struct stbir__info stbir__info;
+
+typedef struct STBIR_RESIZE  // use the stbir_resize_init and stbir_override functions to set these values for future compatibility
+{
+  void * user_data;
+  void const * input_pixels;
+  int input_w, input_h;
+  double input_s0, input_t0, input_s1, input_t1;
+  stbir_input_callback * input_cb;
+  void * output_pixels;
+  int output_w, output_h;
+  int output_subx, output_suby, output_subw, output_subh;
+  stbir_output_callback * output_cb;
+  int input_stride_in_bytes;
+  int output_stride_in_bytes;
+  int splits;
+  int fast_alpha;
+  int needs_rebuild;
+  int called_alloc;
+  stbir_pixel_layout input_pixel_layout_public;
+  stbir_pixel_layout output_pixel_layout_public;
+  stbir_datatype input_data_type;
+  stbir_datatype output_data_type;
+  stbir_filter horizontal_filter, vertical_filter;
+  stbir_edge horizontal_edge, vertical_edge;
+  stbir__kernel_callback * horizontal_filter_kernel; stbir__support_callback * horizontal_filter_support;
+  stbir__kernel_callback * vertical_filter_kernel; stbir__support_callback * vertical_filter_support;
+  stbir__info * samplers;
+} STBIR_RESIZE;
+
+// extended complexity api
+
+
+// First off, you must ALWAYS call stbir_resize_init on your resize structure before any of the other calls!
+STBIRDEF void stbir_resize_init( STBIR_RESIZE * resize,
+                                 const void *input_pixels,  int input_w,  int input_h, int input_stride_in_bytes, // stride can be zero
+                                       void *output_pixels, int output_w, int output_h, int output_stride_in_bytes, // stride can be zero
+                                 stbir_pixel_layout pixel_layout, stbir_datatype data_type );
+
+//===============================================================
+// You can update these parameters any time after resize_init and there is no cost
+//--------------------------------
+
+STBIRDEF void stbir_set_datatypes( STBIR_RESIZE * resize, stbir_datatype input_type, stbir_datatype output_type );
+STBIRDEF void stbir_set_pixel_callbacks( STBIR_RESIZE * resize, stbir_input_callback * input_cb, stbir_output_callback * output_cb );   // no callbacks by default
+STBIRDEF void stbir_set_user_data( STBIR_RESIZE * resize, void * user_data );                                               // pass back STBIR_RESIZE* by default
+STBIRDEF void stbir_set_buffer_ptrs( STBIR_RESIZE * resize, const void * input_pixels, int input_stride_in_bytes, void * output_pixels, int output_stride_in_bytes );
+
+//===============================================================
+
+
+//===============================================================
+// If you call any of these functions, you will trigger a sampler rebuild!
+//--------------------------------
+
+STBIRDEF int stbir_set_pixel_layouts( STBIR_RESIZE * resize, stbir_pixel_layout input_pixel_layout, stbir_pixel_layout output_pixel_layout );  // sets new buffer layouts
+STBIRDEF int stbir_set_edgemodes( STBIR_RESIZE * resize, stbir_edge horizontal_edge, stbir_edge vertical_edge );       // CLAMP by default
+
+STBIRDEF int stbir_set_filters( STBIR_RESIZE * resize, stbir_filter horizontal_filter, stbir_filter vertical_filter ); // STBIR_DEFAULT_FILTER_UPSAMPLE/DOWNSAMPLE by default
+STBIRDEF int stbir_set_filter_callbacks( STBIR_RESIZE * resize, stbir__kernel_callback * horizontal_filter, stbir__support_callback * horizontal_support, stbir__kernel_callback * vertical_filter, stbir__support_callback * vertical_support );
+
+STBIRDEF int stbir_set_pixel_subrect( STBIR_RESIZE * resize, int subx, int suby, int subw, int subh );        // sets both sub-regions (full regions by default)
+STBIRDEF int stbir_set_input_subrect( STBIR_RESIZE * resize, double s0, double t0, double s1, double t1 );    // sets input sub-region (full region by default)
+STBIRDEF int stbir_set_output_pixel_subrect( STBIR_RESIZE * resize, int subx, int suby, int subw, int subh ); // sets output sub-region (full region by default)
+
+// when inputting AND outputting non-premultiplied alpha pixels, we use a slower but higher quality technique
+//   that fills the zero alpha pixel's RGB values with something plausible.  If you don't care about areas of
+//   zero alpha, you can call this function to get about a 25% speed improvement for STBIR_RGBA to STBIR_RGBA
+//   types of resizes.
+STBIRDEF int stbir_set_non_pm_alpha_speed_over_quality( STBIR_RESIZE * resize, int non_pma_alpha_speed_over_quality );
+//===============================================================
+
+
+//===============================================================
+// You can call build_samplers to prebuild all the internal data we need to resample.
+//   Then, if you call resize_extended many times with the same resize, you only pay the
+//   cost once.
+// If you do call build_samplers, you MUST call free_samplers eventually.
+//--------------------------------
+
+// This builds the samplers and does one allocation
+STBIRDEF int stbir_build_samplers( STBIR_RESIZE * resize );
+
+// You MUST call this, if you call stbir_build_samplers or stbir_build_samplers_with_splits
+STBIRDEF void stbir_free_samplers( STBIR_RESIZE * resize );
+//===============================================================
+
+
+// And this is the main function to perform the resize synchronously on one thread.
+STBIRDEF int stbir_resize_extended( STBIR_RESIZE * resize );
+
+
+//===============================================================
+// Use these functions for multithreading.
+//   1) You call stbir_build_samplers_with_splits first on the main thread
+//   2) Then stbir_resize_with_split on each thread
+//   3) stbir_free_samplers when done on the main thread
+//--------------------------------
+
+// This will build samplers for threading.
+//   You can pass in the number of threads you'd like to use (try_splits).
+//   It returns the number of splits (threads) that you can call it with.
+///  It might be less if the image resize can't be split up that many ways.
+
+STBIRDEF int stbir_build_samplers_with_splits( STBIR_RESIZE * resize, int try_splits );
+
+// This function does a split of the resizing (you call this fuction for each
+// split, on multiple threads). A split is a piece of the output resize pixel space.
+
+// Note that you MUST call stbir_build_samplers_with_splits before stbir_resize_extended_split!
+
+// Usually, you will always call stbir_resize_split with split_start as the thread_index
+//   and "1" for the split_count.
+// But, if you have a weird situation where you MIGHT want 8 threads, but sometimes
+//   only 4 threads, you can use 0,2,4,6 for the split_start's and use "2" for the
+//   split_count each time to turn in into a 4 thread resize. (This is unusual).
+
+STBIRDEF int stbir_resize_extended_split( STBIR_RESIZE * resize, int split_start, int split_count );
+//===============================================================
+
+
+//===============================================================
+// Pixel Callbacks info:
+//--------------------------------
+
+//   The input callback is super flexible - it calls you with the input address
+//   (based on the stride and base pointer), it gives you an optional_output
+//   pointer that you can fill, or you can just return your own pointer into
+//   your own data.
+//
+//   You can also do conversion from non-supported data types if necessary - in
+//   this case, you ignore the input_ptr and just use the x and y parameters to
+//   calculate your own input_ptr based on the size of each non-supported pixel.
+//   (Something like the third example below.)
+//
+//   You can also install just an input or just an output callback by setting the
+//   callback that you don't want to zero.
+//
+//     First example, progress: (getting a callback that you can monitor the progress):
+//        void const * my_callback( void * optional_output, void const * input_ptr, int num_pixels, int x, int y, void * context )
+//        {
+//           percentage_done = y / input_height;
+//           return input_ptr;  // use buffer from call
+//        }
+//
+//     Next example, copying: (copy from some other buffer or stream):
+//        void const * my_callback( void * optional_output, void const * input_ptr, int num_pixels, int x, int y, void * context )
+//        {
+//           CopyOrStreamData( optional_output, other_data_src, num_pixels * pixel_width_in_bytes );
+//           return optional_output;  // return the optional buffer that we filled
+//        }
+//
+//     Third example, input another buffer without copying: (zero-copy from other buffer):
+//        void const * my_callback( void * optional_output, void const * input_ptr, int num_pixels, int x, int y, void * context )
+//        {
+//           void * pixels = ( (char*) other_image_base ) + ( y * other_image_stride ) + ( x * other_pixel_width_in_bytes );
+//           return pixels;       // return pointer to your data without copying
+//        }
+//
+//
+//   The output callback is considerably simpler - it just calls you so that you can dump
+//   out each scanline. You could even directly copy out to disk if you have a simple format
+//   like TGA or BMP. You can also convert to other output types here if you want.
+//
+//   Simple example:
+//        void const * my_output( void * output_ptr, int num_pixels, int y, void * context )
+//        {
+//           percentage_done = y / output_height;
+//           fwrite( output_ptr, pixel_width_in_bytes, num_pixels, output_file );
+//        }
+//===============================================================
+
+
+
+
+//===============================================================
+// optional built-in profiling API
+//--------------------------------
+
+#ifdef STBIR_PROFILE
+
+typedef struct STBIR_PROFILE_INFO
+{
+  stbir_uint64 total_clocks;
+
+  // how many clocks spent (of total_clocks) in the various resize routines, along with a string description
+  //    there are "resize_count" number of zones
+  stbir_uint64 clocks[ 8 ];
+  char const ** descriptions;
+
+  // count of clocks and descriptions
+  stbir_uint32 count;
+} STBIR_PROFILE_INFO;
+
+// use after calling stbir_resize_extended (or stbir_build_samplers or stbir_build_samplers_with_splits)
+STBIRDEF void stbir_resize_build_profile_info( STBIR_PROFILE_INFO * out_info, STBIR_RESIZE const * resize );
+
+// use after calling stbir_resize_extended
+STBIRDEF void stbir_resize_extended_profile_info( STBIR_PROFILE_INFO * out_info, STBIR_RESIZE const * resize );
+
+// use after calling stbir_resize_extended_split
+STBIRDEF void stbir_resize_split_profile_info( STBIR_PROFILE_INFO * out_info, STBIR_RESIZE const * resize, int split_start, int split_num );
+
+//===============================================================
+
+#endif
+
+
+////   end header file   /////////////////////////////////////////////////////
+#endif // STBIR_INCLUDE_STB_IMAGE_RESIZE2_H
+
+#if defined(STB_IMAGE_RESIZE_IMPLEMENTATION) || defined(STB_IMAGE_RESIZE2_IMPLEMENTATION)
+
+#ifndef STBIR_ASSERT
+#include <assert.h>
+#define STBIR_ASSERT(x) assert(x)
+#endif
+
+#ifndef STBIR_MALLOC
+#include <stdlib.h>
+#define STBIR_MALLOC(size,user_data) ((void)(user_data), malloc(size))
+#define STBIR_FREE(ptr,user_data)    ((void)(user_data), free(ptr))
+// (we used the comma operator to evaluate user_data, to avoid "unused parameter" warnings)
+#endif
+
+#ifdef _MSC_VER
+
+#define stbir__inline __forceinline
+
+#else
+
+#define stbir__inline __inline__
+
+// Clang address sanitizer
+#if defined(__has_feature)
+  #if __has_feature(address_sanitizer) || __has_feature(memory_sanitizer)
+    #ifndef STBIR__SEPARATE_ALLOCATIONS
+      #define STBIR__SEPARATE_ALLOCATIONS
+    #endif
+  #endif
+#endif
+
+#endif
+
+// GCC and MSVC
+#if defined(__SANITIZE_ADDRESS__)
+  #ifndef STBIR__SEPARATE_ALLOCATIONS
+    #define STBIR__SEPARATE_ALLOCATIONS
+  #endif
+#endif
+
+// Always turn off automatic FMA use - use STBIR_USE_FMA if you want.
+// Otherwise, this is a determinism disaster.
+#ifndef STBIR_DONT_CHANGE_FP_CONTRACT  // override in case you don't want this behavior
+#if defined(_MSC_VER) && !defined(__clang__)
+#if _MSC_VER > 1200
+#pragma fp_contract(off)
+#endif
+#elif defined(__GNUC__) &&  !defined(__clang__)
+#pragma GCC optimize("fp-contract=off")
+#else
+#pragma STDC FP_CONTRACT OFF
+#endif
+#endif
+
+#ifdef _MSC_VER
+#define STBIR__UNUSED(v)  (void)(v)
+#else
+#define STBIR__UNUSED(v)  (void)sizeof(v)
+#endif
+
+#define STBIR__ARRAY_SIZE(a) (sizeof((a))/sizeof((a)[0]))
+
+
+#ifndef STBIR_DEFAULT_FILTER_UPSAMPLE
+#define STBIR_DEFAULT_FILTER_UPSAMPLE    STBIR_FILTER_CATMULLROM
+#endif
+
+#ifndef STBIR_DEFAULT_FILTER_DOWNSAMPLE
+#define STBIR_DEFAULT_FILTER_DOWNSAMPLE  STBIR_FILTER_MITCHELL
+#endif
+
+
+#ifndef STBIR__HEADER_FILENAME
+#define STBIR__HEADER_FILENAME "stb_image_resize2.h"
+#endif
+
+// the internal pixel layout enums are in a different order, so we can easily do range comparisons of types
+//   the public pixel layout is ordered in a way that if you cast num_channels (1-4) to the enum, you get something sensible
+typedef enum
+{
+  STBIRI_1CHANNEL = 0,
+  STBIRI_2CHANNEL = 1,
+  STBIRI_RGB      = 2,
+  STBIRI_BGR      = 3,
+  STBIRI_4CHANNEL = 4,
+
+  STBIRI_RGBA = 5,
+  STBIRI_BGRA = 6,
+  STBIRI_ARGB = 7,
+  STBIRI_ABGR = 8,
+  STBIRI_RA   = 9,
+  STBIRI_AR   = 10,
+
+  STBIRI_RGBA_PM = 11,
+  STBIRI_BGRA_PM = 12,
+  STBIRI_ARGB_PM = 13,
+  STBIRI_ABGR_PM = 14,
+  STBIRI_RA_PM   = 15,
+  STBIRI_AR_PM   = 16,
+} stbir_internal_pixel_layout;
+
+// define the public pixel layouts to not compile inside the implementation (to avoid accidental use)
+#define STBIR_BGR bad_dont_use_in_implementation
+#define STBIR_1CHANNEL STBIR_BGR
+#define STBIR_2CHANNEL STBIR_BGR
+#define STBIR_RGB STBIR_BGR
+#define STBIR_RGBA STBIR_BGR
+#define STBIR_4CHANNEL STBIR_BGR
+#define STBIR_BGRA STBIR_BGR
+#define STBIR_ARGB STBIR_BGR
+#define STBIR_ABGR STBIR_BGR
+#define STBIR_RA STBIR_BGR
+#define STBIR_AR STBIR_BGR
+#define STBIR_RGBA_PM STBIR_BGR
+#define STBIR_BGRA_PM STBIR_BGR
+#define STBIR_ARGB_PM STBIR_BGR
+#define STBIR_ABGR_PM STBIR_BGR
+#define STBIR_RA_PM STBIR_BGR
+#define STBIR_AR_PM STBIR_BGR
+
+// must match stbir_datatype
+static unsigned char stbir__type_size[] = {
+  1,1,1,2,4,2 // STBIR_TYPE_UINT8,STBIR_TYPE_UINT8_SRGB,STBIR_TYPE_UINT8_SRGB_ALPHA,STBIR_TYPE_UINT16,STBIR_TYPE_FLOAT,STBIR_TYPE_HALF_FLOAT
+};
+
+// When gathering, the contributors are which source pixels contribute.
+// When scattering, the contributors are which destination pixels are contributed to.
+typedef struct
+{
+  int n0; // First contributing pixel
+  int n1; // Last contributing pixel
+} stbir__contributors;
+
+typedef struct
+{
+  int lowest;    // First sample index for whole filter
+  int highest;   // Last sample index for whole filter
+  int widest;    // widest single set of samples for an output
+} stbir__filter_extent_info;
+
+typedef struct
+{
+  int n0; // First pixel of decode buffer to write to
+  int n1; // Last pixel of decode that will be written to
+  int pixel_offset_for_input;  // Pixel offset into input_scanline
+} stbir__span;
+
+typedef struct stbir__scale_info
+{
+  int input_full_size;
+  int output_sub_size;
+  float scale;
+  float inv_scale;
+  float pixel_shift; // starting shift in output pixel space (in pixels)
+  int scale_is_rational;
+  stbir_uint32 scale_numerator, scale_denominator;
+} stbir__scale_info;
+
+typedef struct
+{
+  stbir__contributors * contributors;
+  float* coefficients;
+  stbir__contributors * gather_prescatter_contributors;
+  float * gather_prescatter_coefficients;
+  stbir__scale_info scale_info;
+  float support;
+  stbir_filter filter_enum;
+  stbir__kernel_callback * filter_kernel;
+  stbir__support_callback * filter_support;
+  stbir_edge edge;
+  int coefficient_width;
+  int filter_pixel_width;
+  int filter_pixel_margin;
+  int num_contributors;
+  int contributors_size;
+  int coefficients_size;
+  stbir__filter_extent_info extent_info;
+  int is_gather;  // 0 = scatter, 1 = gather with scale >= 1, 2 = gather with scale < 1
+  int gather_prescatter_num_contributors;
+  int gather_prescatter_coefficient_width;
+  int gather_prescatter_contributors_size;
+  int gather_prescatter_coefficients_size;
+} stbir__sampler;
+
+typedef struct
+{
+  stbir__contributors conservative;
+  int edge_sizes[2];    // this can be less than filter_pixel_margin, if the filter and scaling falls off
+  stbir__span spans[2]; // can be two spans, if doing input subrect with clamp mode WRAP
+} stbir__extents;
+
+typedef struct
+{
+#ifdef STBIR_PROFILE
+  union
+  {
+    struct { stbir_uint64 total, looping, vertical, horizontal, decode, encode, alpha, unalpha; } named;
+    stbir_uint64 array[8];
+  } profile;
+  stbir_uint64 * current_zone_excluded_ptr;
+#endif
+  float* decode_buffer;
+
+  int ring_buffer_first_scanline;
+  int ring_buffer_last_scanline;
+  int ring_buffer_begin_index;    // first_scanline is at this index in the ring buffer
+  int start_output_y, end_output_y;
+  int start_input_y, end_input_y;  // used in scatter only
+
+  #ifdef STBIR__SEPARATE_ALLOCATIONS
+    float** ring_buffers; // one pointer for each ring buffer
+  #else
+    float* ring_buffer;  // one big buffer that we index into
+  #endif
+
+  float* vertical_buffer;
+
+  char no_cache_straddle[64];
+} stbir__per_split_info;
+
+typedef float * stbir__decode_pixels_func( float * decode, int width_times_channels, void const * input );
+typedef void stbir__alpha_weight_func( float * decode_buffer, int width_times_channels );
+typedef void stbir__horizontal_gather_channels_func( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer,
+  stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width );
+typedef void stbir__alpha_unweight_func(float * encode_buffer, int width_times_channels );
+typedef void stbir__encode_pixels_func( void * output, int width_times_channels, float const * encode );
+
+struct stbir__info
+{
+#ifdef STBIR_PROFILE
+  union
+  {
+    struct { stbir_uint64 total, build, alloc, horizontal, vertical, cleanup, pivot; } named;
+    stbir_uint64 array[7];
+  } profile;
+  stbir_uint64 * current_zone_excluded_ptr;
+#endif
+  stbir__sampler horizontal;
+  stbir__sampler vertical;
+
+  void const * input_data;
+  void * output_data;
+
+  int input_stride_bytes;
+  int output_stride_bytes;
+  int ring_buffer_length_bytes;   // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
+  int ring_buffer_num_entries;    // Total number of entries in the ring buffer.
+
+  stbir_datatype input_type;
+  stbir_datatype output_type;
+
+  stbir_input_callback * in_pixels_cb;
+  void * user_data;
+  stbir_output_callback * out_pixels_cb;
+
+  stbir__extents scanline_extents;
+
+  void * alloced_mem;
+  stbir__per_split_info * split_info;  // by default 1, but there will be N of these allocated based on the thread init you did
+
+  stbir__decode_pixels_func * decode_pixels;
+  stbir__alpha_weight_func * alpha_weight;
+  stbir__horizontal_gather_channels_func * horizontal_gather_channels;
+  stbir__alpha_unweight_func * alpha_unweight;
+  stbir__encode_pixels_func * encode_pixels;
+
+  int alloc_ring_buffer_num_entries;    // Number of entries in the ring buffer that will be allocated
+  int splits; // count of splits
+
+  stbir_internal_pixel_layout input_pixel_layout_internal;
+  stbir_internal_pixel_layout output_pixel_layout_internal;
+
+  int input_color_and_type;
+  int offset_x, offset_y; // offset within output_data
+  int vertical_first;
+  int channels;
+  int effective_channels; // same as channels, except on RGBA/ARGB (7), or XA/AX (3)
+  size_t alloced_total;
+};
+
+
+#define stbir__max_uint8_as_float             255.0f
+#define stbir__max_uint16_as_float            65535.0f
+#define stbir__max_uint8_as_float_inverted    3.9215689e-03f     // (1.0f/255.0f)
+#define stbir__max_uint16_as_float_inverted   1.5259022e-05f     // (1.0f/65535.0f)
+#define stbir__small_float ((float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20))
+
+// min/max friendly
+#define STBIR_CLAMP(x, xmin, xmax) for(;;) { \
+  if ( (x) < (xmin) ) (x) = (xmin);     \
+  if ( (x) > (xmax) ) (x) = (xmax);     \
+  break;                                \
+}
+
+static stbir__inline int stbir__min(int a, int b)
+{
+  return a < b ? a : b;
+}
+
+static stbir__inline int stbir__max(int a, int b)
+{
+  return a > b ? a : b;
+}
+
+static float stbir__srgb_uchar_to_linear_float[256] = {
+  0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
+  0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
+  0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
+  0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
+  0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
+  0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
+  0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
+  0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
+  0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
+  0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
+  0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
+  0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
+  0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
+  0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
+  0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
+  0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
+  0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
+  0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
+  0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
+  0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
+  0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
+  0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
+  0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
+  0.982251f, 0.991102f, 1.0f
+};
+
+typedef union
+{
+  unsigned int u;
+  float f;
+} stbir__FP32;
+
+// From https://gist.github.com/rygorous/2203834
+
+static const stbir_uint32 fp32_to_srgb8_tab4[104] = {
+  0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
+  0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
+  0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
+  0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
+  0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
+  0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
+  0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
+  0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
+  0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
+  0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
+  0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
+  0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
+  0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
+};
+
+static stbir__inline stbir_uint8 stbir__linear_to_srgb_uchar(float in)
+{
+  static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
+  static const stbir__FP32 minval = { (127-13) << 23 };
+  stbir_uint32 tab,bias,scale,t;
+  stbir__FP32 f;
+
+  // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
+  // The tests are carefully written so that NaNs map to 0, same as in the reference
+  // implementation.
+  if (!(in > minval.f)) // written this way to catch NaNs
+      return 0;
+  if (in > almostone.f)
+      return 255;
+
+  // Do the table lookup and unpack bias, scale
+  f.f = in;
+  tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
+  bias = (tab >> 16) << 9;
+  scale = tab & 0xffff;
+
+  // Grab next-highest mantissa bits and perform linear interpolation
+  t = (f.u >> 12) & 0xff;
+  return (unsigned char) ((bias + scale*t) >> 16);
+}
+
+#ifndef STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT
+#define STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT 32 // when downsampling and <= 32 scanlines of buffering, use gather. gather used down to 1/8th scaling for 25% win.
+#endif
+
+#ifndef STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS
+#define STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS 4 // when threading, what is the minimum number of scanlines for a split?
+#endif
+
+#define STBIR_INPUT_CALLBACK_PADDING 3
+
+#ifdef _M_IX86_FP
+#if ( _M_IX86_FP >= 1 )
+#ifndef STBIR_SSE
+#define STBIR_SSE
+#endif
+#endif
+#endif
+
+#ifdef __TINYC__
+  // tiny c has no intrinsics yet - this can become a version check if they add them
+  #define STBIR_NO_SIMD
+#endif
+
+#if defined(_x86_64) || defined( __x86_64__ ) || defined( _M_X64 ) || defined(__x86_64) || defined(_M_AMD64) || defined(__SSE2__) || defined(STBIR_SSE) || defined(STBIR_SSE2)
+  #ifndef STBIR_SSE2
+    #define STBIR_SSE2
+  #endif
+  #if defined(__AVX__) || defined(STBIR_AVX2)
+    #ifndef STBIR_AVX
+      #ifndef STBIR_NO_AVX
+        #define STBIR_AVX
+      #endif
+    #endif
+  #endif
+  #if defined(__AVX2__) || defined(STBIR_AVX2)
+    #ifndef STBIR_NO_AVX2
+      #ifndef STBIR_AVX2
+        #define STBIR_AVX2
+      #endif
+      #if defined( _MSC_VER ) && !defined(__clang__)
+        #ifndef STBIR_FP16C  // FP16C instructions are on all AVX2 cpus, so we can autoselect it here on microsoft - clang needs -m16c
+          #define STBIR_FP16C
+        #endif
+      #endif
+    #endif
+  #endif
+  #ifdef __F16C__
+    #ifndef STBIR_FP16C  // turn on FP16C instructions if the define is set (for clang and gcc)
+      #define STBIR_FP16C
+    #endif
+  #endif
+#endif
+
+#if defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ ) || ((__ARM_NEON_FP & 4) != 0) || defined(__ARM_NEON__)
+#ifndef STBIR_NEON
+#define STBIR_NEON
+#endif
+#endif
+
+#if defined(_M_ARM) || defined(__arm__)
+#ifdef STBIR_USE_FMA
+#undef STBIR_USE_FMA // no FMA for 32-bit arm on MSVC
+#endif
+#endif
+
+#if defined(__wasm__) && defined(__wasm_simd128__)
+#ifndef STBIR_WASM
+#define STBIR_WASM
+#endif
+#endif
+
+// restrict pointers for the output pointers, other loop and unroll control
+#if defined( _MSC_VER ) && !defined(__clang__)
+  #define STBIR_STREAMOUT_PTR( star ) star __restrict
+  #define STBIR_NO_UNROLL( ptr ) __assume(ptr) // this oddly keeps msvc from unrolling a loop
+  #if _MSC_VER >= 1900
+    #define STBIR_NO_UNROLL_LOOP_START __pragma(loop( no_vector )) 
+  #else
+    #define STBIR_NO_UNROLL_LOOP_START 
+  #endif
+#elif defined( __clang__ )
+  #define STBIR_STREAMOUT_PTR( star ) star __restrict__
+  #define STBIR_NO_UNROLL( ptr ) __asm__ (""::"r"(ptr)) 
+  #if ( __clang_major__ >= 4 ) || ( ( __clang_major__ >= 3 ) && ( __clang_minor__ >= 5 ) )
+    #define STBIR_NO_UNROLL_LOOP_START _Pragma("clang loop unroll(disable)") _Pragma("clang loop vectorize(disable)")
+  #else
+    #define STBIR_NO_UNROLL_LOOP_START
+  #endif 
+#elif defined( __GNUC__ )
+  #define STBIR_STREAMOUT_PTR( star ) star __restrict__
+  #define STBIR_NO_UNROLL( ptr ) __asm__ (""::"r"(ptr))
+  #if __GNUC__ >= 14
+    #define STBIR_NO_UNROLL_LOOP_START _Pragma("GCC unroll 0") _Pragma("GCC novector")
+  #else
+    #define STBIR_NO_UNROLL_LOOP_START
+  #endif
+  #define STBIR_NO_UNROLL_LOOP_START_INF_FOR
+#else
+  #define STBIR_STREAMOUT_PTR( star ) star
+  #define STBIR_NO_UNROLL( ptr )
+  #define STBIR_NO_UNROLL_LOOP_START
+#endif
+
+#ifndef STBIR_NO_UNROLL_LOOP_START_INF_FOR
+#define STBIR_NO_UNROLL_LOOP_START_INF_FOR STBIR_NO_UNROLL_LOOP_START
+#endif
+
+#ifdef STBIR_NO_SIMD // force simd off for whatever reason
+
+// force simd off overrides everything else, so clear it all
+
+#ifdef STBIR_SSE2
+#undef STBIR_SSE2
+#endif
+
+#ifdef STBIR_AVX
+#undef STBIR_AVX
+#endif
+
+#ifdef STBIR_NEON
+#undef STBIR_NEON
+#endif
+
+#ifdef STBIR_AVX2
+#undef STBIR_AVX2
+#endif
+
+#ifdef STBIR_FP16C
+#undef STBIR_FP16C
+#endif
+
+#ifdef STBIR_WASM
+#undef STBIR_WASM
+#endif
+
+#ifdef STBIR_SIMD
+#undef STBIR_SIMD
+#endif
+
+#else // STBIR_SIMD
+
+#ifdef STBIR_SSE2
+  #include <emmintrin.h>
+
+  #define stbir__simdf __m128
+  #define stbir__simdi __m128i
+
+  #define stbir_simdi_castf( reg ) _mm_castps_si128(reg)
+  #define stbir_simdf_casti( reg ) _mm_castsi128_ps(reg)
+
+  #define stbir__simdf_load( reg, ptr ) (reg) = _mm_loadu_ps( (float const*)(ptr) )
+  #define stbir__simdi_load( reg, ptr ) (reg) = _mm_loadu_si128 ( (stbir__simdi const*)(ptr) )
+  #define stbir__simdf_load1( out, ptr ) (out) = _mm_load_ss( (float const*)(ptr) )  // top values can be random (not denormal or nan for perf)
+  #define stbir__simdi_load1( out, ptr ) (out) = _mm_castps_si128( _mm_load_ss( (float const*)(ptr) ))
+  #define stbir__simdf_load1z( out, ptr ) (out) = _mm_load_ss( (float const*)(ptr) )  // top values must be zero
+  #define stbir__simdf_frep4( fvar ) _mm_set_ps1( fvar )
+  #define stbir__simdf_load1frep4( out, fvar ) (out) = _mm_set_ps1( fvar )
+  #define stbir__simdf_load2( out, ptr ) (out) = _mm_castsi128_ps( _mm_loadl_epi64( (__m128i*)(ptr)) ) // top values can be random (not denormal or nan for perf)
+  #define stbir__simdf_load2z( out, ptr ) (out) = _mm_castsi128_ps( _mm_loadl_epi64( (__m128i*)(ptr)) ) // top values must be zero
+  #define stbir__simdf_load2hmerge( out, reg, ptr ) (out) = _mm_castpd_ps(_mm_loadh_pd( _mm_castps_pd(reg), (double*)(ptr) ))
+
+  #define stbir__simdf_zeroP() _mm_setzero_ps()
+  #define stbir__simdf_zero( reg ) (reg) = _mm_setzero_ps()
+
+  #define stbir__simdf_store( ptr, reg )  _mm_storeu_ps( (float*)(ptr), reg )
+  #define stbir__simdf_store1( ptr, reg ) _mm_store_ss( (float*)(ptr), reg )
+  #define stbir__simdf_store2( ptr, reg ) _mm_storel_epi64( (__m128i*)(ptr), _mm_castps_si128(reg) )
+  #define stbir__simdf_store2h( ptr, reg ) _mm_storeh_pd( (double*)(ptr), _mm_castps_pd(reg) )
+
+  #define stbir__simdi_store( ptr, reg )  _mm_storeu_si128( (__m128i*)(ptr), reg )
+  #define stbir__simdi_store1( ptr, reg ) _mm_store_ss( (float*)(ptr), _mm_castsi128_ps(reg) )
+  #define stbir__simdi_store2( ptr, reg ) _mm_storel_epi64( (__m128i*)(ptr), (reg) )
+
+  #define stbir__prefetch( ptr ) _mm_prefetch((char*)(ptr), _MM_HINT_T0 )
+
+  #define stbir__simdi_expand_u8_to_u32(out0,out1,out2,out3,ireg) \
+  { \
+    stbir__simdi zero = _mm_setzero_si128(); \
+    out2 = _mm_unpacklo_epi8( ireg, zero ); \
+    out3 = _mm_unpackhi_epi8( ireg, zero ); \
+    out0 = _mm_unpacklo_epi16( out2, zero ); \
+    out1 = _mm_unpackhi_epi16( out2, zero ); \
+    out2 = _mm_unpacklo_epi16( out3, zero ); \
+    out3 = _mm_unpackhi_epi16( out3, zero ); \
+  }
+
+#define stbir__simdi_expand_u8_to_1u32(out,ireg) \
+  { \
+    stbir__simdi zero = _mm_setzero_si128(); \
+    out = _mm_unpacklo_epi8( ireg, zero ); \
+    out = _mm_unpacklo_epi16( out, zero ); \
+  }
+
+  #define stbir__simdi_expand_u16_to_u32(out0,out1,ireg) \
+  { \
+    stbir__simdi zero = _mm_setzero_si128(); \
+    out0 = _mm_unpacklo_epi16( ireg, zero ); \
+    out1 = _mm_unpackhi_epi16( ireg, zero ); \
+  }
+
+  #define stbir__simdf_convert_float_to_i32( i, f ) (i) = _mm_cvttps_epi32(f)
+  #define stbir__simdf_convert_float_to_int( f ) _mm_cvtt_ss2si(f)
+  #define stbir__simdf_convert_float_to_uint8( f ) ((unsigned char)_mm_cvtsi128_si32(_mm_cvttps_epi32(_mm_max_ps(_mm_min_ps(f,STBIR__CONSTF(STBIR_max_uint8_as_float)),_mm_setzero_ps()))))
+  #define stbir__simdf_convert_float_to_short( f ) ((unsigned short)_mm_cvtsi128_si32(_mm_cvttps_epi32(_mm_max_ps(_mm_min_ps(f,STBIR__CONSTF(STBIR_max_uint16_as_float)),_mm_setzero_ps()))))
+
+  #define stbir__simdi_to_int( i ) _mm_cvtsi128_si32(i)
+  #define stbir__simdi_convert_i32_to_float(out, ireg) (out) = _mm_cvtepi32_ps( ireg )
+  #define stbir__simdf_add( out, reg0, reg1 ) (out) = _mm_add_ps( reg0, reg1 )
+  #define stbir__simdf_mult( out, reg0, reg1 ) (out) = _mm_mul_ps( reg0, reg1 )
+  #define stbir__simdf_mult_mem( out, reg, ptr ) (out) = _mm_mul_ps( reg, _mm_loadu_ps( (float const*)(ptr) ) )
+  #define stbir__simdf_mult1_mem( out, reg, ptr ) (out) = _mm_mul_ss( reg, _mm_load_ss( (float const*)(ptr) ) )
+  #define stbir__simdf_add_mem( out, reg, ptr ) (out) = _mm_add_ps( reg, _mm_loadu_ps( (float const*)(ptr) ) )
+  #define stbir__simdf_add1_mem( out, reg, ptr ) (out) = _mm_add_ss( reg, _mm_load_ss( (float const*)(ptr) ) )
+
+  #ifdef STBIR_USE_FMA           // not on by default to maintain bit identical simd to non-simd
+  #include <immintrin.h>
+  #define stbir__simdf_madd( out, add, mul1, mul2 ) (out) = _mm_fmadd_ps( mul1, mul2, add )
+  #define stbir__simdf_madd1( out, add, mul1, mul2 ) (out) = _mm_fmadd_ss( mul1, mul2, add )
+  #define stbir__simdf_madd_mem( out, add, mul, ptr ) (out) = _mm_fmadd_ps( mul, _mm_loadu_ps( (float const*)(ptr) ), add )
+  #define stbir__simdf_madd1_mem( out, add, mul, ptr ) (out) = _mm_fmadd_ss( mul, _mm_load_ss( (float const*)(ptr) ), add )
+  #else
+  #define stbir__simdf_madd( out, add, mul1, mul2 ) (out) = _mm_add_ps( add, _mm_mul_ps( mul1, mul2 ) )
+  #define stbir__simdf_madd1( out, add, mul1, mul2 ) (out) = _mm_add_ss( add, _mm_mul_ss( mul1, mul2 ) )
+  #define stbir__simdf_madd_mem( out, add, mul, ptr ) (out) = _mm_add_ps( add, _mm_mul_ps( mul, _mm_loadu_ps( (float const*)(ptr) ) ) )
+  #define stbir__simdf_madd1_mem( out, add, mul, ptr ) (out) = _mm_add_ss( add, _mm_mul_ss( mul, _mm_load_ss( (float const*)(ptr) ) ) )
+  #endif
+
+  #define stbir__simdf_add1( out, reg0, reg1 ) (out) = _mm_add_ss( reg0, reg1 )
+  #define stbir__simdf_mult1( out, reg0, reg1 ) (out) = _mm_mul_ss( reg0, reg1 )
+
+  #define stbir__simdf_and( out, reg0, reg1 ) (out) = _mm_and_ps( reg0, reg1 )
+  #define stbir__simdf_or( out, reg0, reg1 ) (out) = _mm_or_ps( reg0, reg1 )
+
+  #define stbir__simdf_min( out, reg0, reg1 ) (out) = _mm_min_ps( reg0, reg1 )
+  #define stbir__simdf_max( out, reg0, reg1 ) (out) = _mm_max_ps( reg0, reg1 )
+  #define stbir__simdf_min1( out, reg0, reg1 ) (out) = _mm_min_ss( reg0, reg1 )
+  #define stbir__simdf_max1( out, reg0, reg1 ) (out) = _mm_max_ss( reg0, reg1 )
+
+  #define stbir__simdf_0123ABCDto3ABx( out, reg0, reg1 ) (out)=_mm_castsi128_ps( _mm_shuffle_epi32( _mm_castps_si128( _mm_shuffle_ps( reg1,reg0, (0<<0) + (1<<2) + (2<<4) + (3<<6) )), (3<<0) + (0<<2) + (1<<4) + (2<<6) ) )
+  #define stbir__simdf_0123ABCDto23Ax( out, reg0, reg1 ) (out)=_mm_castsi128_ps( _mm_shuffle_epi32( _mm_castps_si128( _mm_shuffle_ps( reg1,reg0, (0<<0) + (1<<2) + (2<<4) + (3<<6) )), (2<<0) + (3<<2) + (0<<4) + (1<<6) ) )
+
+  static const stbir__simdf STBIR_zeroones = { 0.0f,1.0f,0.0f,1.0f };
+  static const stbir__simdf STBIR_onezeros = { 1.0f,0.0f,1.0f,0.0f };
+  #define stbir__simdf_aaa1( out, alp, ones ) (out)=_mm_castsi128_ps( _mm_shuffle_epi32( _mm_castps_si128( _mm_movehl_ps( ones, alp ) ), (1<<0) + (1<<2) + (1<<4) + (2<<6) ) )
+  #define stbir__simdf_1aaa( out, alp, ones ) (out)=_mm_castsi128_ps( _mm_shuffle_epi32( _mm_castps_si128( _mm_movelh_ps( ones, alp ) ), (0<<0) + (2<<2) + (2<<4) + (2<<6) ) )
+  #define stbir__simdf_a1a1( out, alp, ones) (out) = _mm_or_ps( _mm_castsi128_ps( _mm_srli_epi64( _mm_castps_si128(alp), 32 ) ), STBIR_zeroones )
+  #define stbir__simdf_1a1a( out, alp, ones) (out) = _mm_or_ps( _mm_castsi128_ps( _mm_slli_epi64( _mm_castps_si128(alp), 32 ) ), STBIR_onezeros )
+
+  #define stbir__simdf_swiz( reg, one, two, three, four ) _mm_castsi128_ps( _mm_shuffle_epi32( _mm_castps_si128( reg ), (one<<0) + (two<<2) + (three<<4) + (four<<6) ) )
+
+  #define stbir__simdi_and( out, reg0, reg1 ) (out) = _mm_and_si128( reg0, reg1 )
+  #define stbir__simdi_or( out, reg0, reg1 ) (out) = _mm_or_si128( reg0, reg1 )
+  #define stbir__simdi_16madd( out, reg0, reg1 ) (out) = _mm_madd_epi16( reg0, reg1 )
+
+  #define stbir__simdf_pack_to_8bytes(out,aa,bb) \
+  { \
+    stbir__simdf af,bf; \
+    stbir__simdi a,b; \
+    af = _mm_min_ps( aa, STBIR_max_uint8_as_float ); \
+    bf = _mm_min_ps( bb, STBIR_max_uint8_as_float ); \
+    af = _mm_max_ps( af, _mm_setzero_ps() ); \
+    bf = _mm_max_ps( bf, _mm_setzero_ps() ); \
+    a = _mm_cvttps_epi32( af ); \
+    b = _mm_cvttps_epi32( bf ); \
+    a = _mm_packs_epi32( a, b ); \
+    out = _mm_packus_epi16( a, a ); \
+  }
+
+  #define stbir__simdf_load4_transposed( o0, o1, o2, o3, ptr ) \
+      stbir__simdf_load( o0, (ptr) );    \
+      stbir__simdf_load( o1, (ptr)+4 );  \
+      stbir__simdf_load( o2, (ptr)+8 );  \
+      stbir__simdf_load( o3, (ptr)+12 ); \
+      {                                  \
+        __m128 tmp0, tmp1, tmp2, tmp3;   \
+        tmp0 = _mm_unpacklo_ps(o0, o1);  \
+        tmp2 = _mm_unpacklo_ps(o2, o3);  \
+        tmp1 = _mm_unpackhi_ps(o0, o1);  \
+        tmp3 = _mm_unpackhi_ps(o2, o3);  \
+        o0 = _mm_movelh_ps(tmp0, tmp2);  \
+        o1 = _mm_movehl_ps(tmp2, tmp0);  \
+        o2 = _mm_movelh_ps(tmp1, tmp3);  \
+        o3 = _mm_movehl_ps(tmp3, tmp1);  \
+      }
+
+  #define stbir__interleave_pack_and_store_16_u8( ptr, r0, r1, r2, r3 ) \
+      r0 = _mm_packs_epi32( r0, r1 ); \
+      r2 = _mm_packs_epi32( r2, r3 ); \
+      r1 = _mm_unpacklo_epi16( r0, r2 ); \
+      r3 = _mm_unpackhi_epi16( r0, r2 ); \
+      r0 = _mm_unpacklo_epi16( r1, r3 ); \
+      r2 = _mm_unpackhi_epi16( r1, r3 ); \
+      r0 = _mm_packus_epi16( r0, r2 ); \
+      stbir__simdi_store( ptr, r0 ); \
+
+  #define stbir__simdi_32shr( out, reg, imm ) out = _mm_srli_epi32( reg, imm )
+
+  #if defined(_MSC_VER) && !defined(__clang__)
+    // msvc inits with 8 bytes
+    #define STBIR__CONST_32_TO_8( v ) (char)(unsigned char)((v)&255),(char)(unsigned char)(((v)>>8)&255),(char)(unsigned char)(((v)>>16)&255),(char)(unsigned char)(((v)>>24)&255)
+    #define STBIR__CONST_4_32i( v ) STBIR__CONST_32_TO_8( v ), STBIR__CONST_32_TO_8( v ), STBIR__CONST_32_TO_8( v ), STBIR__CONST_32_TO_8( v )
+    #define STBIR__CONST_4d_32i( v0, v1, v2, v3 ) STBIR__CONST_32_TO_8( v0 ), STBIR__CONST_32_TO_8( v1 ), STBIR__CONST_32_TO_8( v2 ), STBIR__CONST_32_TO_8( v3 )
+  #else
+    // everything else inits with long long's
+    #define STBIR__CONST_4_32i( v ) (long long)((((stbir_uint64)(stbir_uint32)(v))<<32)|((stbir_uint64)(stbir_uint32)(v))),(long long)((((stbir_uint64)(stbir_uint32)(v))<<32)|((stbir_uint64)(stbir_uint32)(v)))
+    #define STBIR__CONST_4d_32i( v0, v1, v2, v3 ) (long long)((((stbir_uint64)(stbir_uint32)(v1))<<32)|((stbir_uint64)(stbir_uint32)(v0))),(long long)((((stbir_uint64)(stbir_uint32)(v3))<<32)|((stbir_uint64)(stbir_uint32)(v2)))
+  #endif
+
+  #define STBIR__SIMDF_CONST(var, x) stbir__simdf var = { x, x, x, x }
+  #define STBIR__SIMDI_CONST(var, x) stbir__simdi var = { STBIR__CONST_4_32i(x) }
+  #define STBIR__CONSTF(var) (var)
+  #define STBIR__CONSTI(var) (var)
+
+  #if defined(STBIR_AVX) || defined(__SSE4_1__)
+    #include <smmintrin.h>
+    #define stbir__simdf_pack_to_8words(out,reg0,reg1) out = _mm_packus_epi32(_mm_cvttps_epi32(_mm_max_ps(_mm_min_ps(reg0,STBIR__CONSTF(STBIR_max_uint16_as_float)),_mm_setzero_ps())), _mm_cvttps_epi32(_mm_max_ps(_mm_min_ps(reg1,STBIR__CONSTF(STBIR_max_uint16_as_float)),_mm_setzero_ps())))
+  #else
+    static STBIR__SIMDI_CONST(stbir__s32_32768, 32768);
+    static STBIR__SIMDI_CONST(stbir__s16_32768, ((32768<<16)|32768));
+
+    #define stbir__simdf_pack_to_8words(out,reg0,reg1) \
+      { \
+        stbir__simdi tmp0,tmp1; \
+        tmp0 = _mm_cvttps_epi32(_mm_max_ps(_mm_min_ps(reg0,STBIR__CONSTF(STBIR_max_uint16_as_float)),_mm_setzero_ps())); \
+        tmp1 = _mm_cvttps_epi32(_mm_max_ps(_mm_min_ps(reg1,STBIR__CONSTF(STBIR_max_uint16_as_float)),_mm_setzero_ps())); \
+        tmp0 = _mm_sub_epi32( tmp0, stbir__s32_32768 ); \
+        tmp1 = _mm_sub_epi32( tmp1, stbir__s32_32768 ); \
+        out = _mm_packs_epi32( tmp0, tmp1 ); \
+        out = _mm_sub_epi16( out, stbir__s16_32768 ); \
+      }
+
+  #endif
+
+  #define STBIR_SIMD
+
+  // if we detect AVX, set the simd8 defines
+  #ifdef STBIR_AVX
+    #include <immintrin.h>
+    #define STBIR_SIMD8
+    #define stbir__simdf8 __m256
+    #define stbir__simdi8 __m256i
+    #define stbir__simdf8_load( out, ptr ) (out) = _mm256_loadu_ps( (float const *)(ptr) )
+    #define stbir__simdi8_load( out, ptr ) (out) = _mm256_loadu_si256( (__m256i const *)(ptr) )
+    #define stbir__simdf8_mult( out, a, b ) (out) = _mm256_mul_ps( (a), (b) )
+    #define stbir__simdf8_store( ptr, out ) _mm256_storeu_ps( (float*)(ptr), out )
+    #define stbir__simdi8_store( ptr, reg )  _mm256_storeu_si256( (__m256i*)(ptr), reg )
+    #define stbir__simdf8_frep8( fval ) _mm256_set1_ps( fval )
+
+    #define stbir__simdf8_min( out, reg0, reg1 ) (out) = _mm256_min_ps( reg0, reg1 )
+    #define stbir__simdf8_max( out, reg0, reg1 ) (out) = _mm256_max_ps( reg0, reg1 )
+
+    #define stbir__simdf8_add4halves( out, bot4, top8 ) (out) = _mm_add_ps( bot4, _mm256_extractf128_ps( top8, 1 ) )
+    #define stbir__simdf8_mult_mem( out, reg, ptr ) (out) = _mm256_mul_ps( reg, _mm256_loadu_ps( (float const*)(ptr) ) )
+    #define stbir__simdf8_add_mem( out, reg, ptr ) (out) = _mm256_add_ps( reg, _mm256_loadu_ps( (float const*)(ptr) ) )
+    #define stbir__simdf8_add( out, a, b ) (out) = _mm256_add_ps( a, b )
+    #define stbir__simdf8_load1b( out, ptr ) (out) = _mm256_broadcast_ss( ptr )
+    #define stbir__simdf_load1rep4( out, ptr ) (out) = _mm_broadcast_ss( ptr )  // avx load instruction
+
+    #define stbir__simdi8_convert_i32_to_float(out, ireg) (out) = _mm256_cvtepi32_ps( ireg )
+    #define stbir__simdf8_convert_float_to_i32( i, f ) (i) = _mm256_cvttps_epi32(f)
+
+    #define stbir__simdf8_bot4s( out, a, b ) (out) = _mm256_permute2f128_ps(a,b, (0<<0)+(2<<4) )
+    #define stbir__simdf8_top4s( out, a, b ) (out) = _mm256_permute2f128_ps(a,b, (1<<0)+(3<<4) )
+
+    #define stbir__simdf8_gettop4( reg ) _mm256_extractf128_ps(reg,1)
+
+    #ifdef STBIR_AVX2
+
+    #define stbir__simdi8_expand_u8_to_u32(out0,out1,ireg) \
+    { \
+      stbir__simdi8 a, zero  =_mm256_setzero_si256();\
+      a = _mm256_permute4x64_epi64( _mm256_unpacklo_epi8( _mm256_permute4x64_epi64(_mm256_castsi128_si256(ireg),(0<<0)+(2<<2)+(1<<4)+(3<<6)), zero ),(0<<0)+(2<<2)+(1<<4)+(3<<6)); \
+      out0 = _mm256_unpacklo_epi16( a, zero ); \
+      out1 = _mm256_unpackhi_epi16( a, zero ); \
+    }
+
+    #define stbir__simdf8_pack_to_16bytes(out,aa,bb) \
+    { \
+      stbir__simdi8 t; \
+      stbir__simdf8 af,bf; \
+      stbir__simdi8 a,b; \
+      af = _mm256_min_ps( aa, STBIR_max_uint8_as_floatX ); \
+      bf = _mm256_min_ps( bb, STBIR_max_uint8_as_floatX ); \
+      af = _mm256_max_ps( af, _mm256_setzero_ps() ); \
+      bf = _mm256_max_ps( bf, _mm256_setzero_ps() ); \
+      a = _mm256_cvttps_epi32( af ); \
+      b = _mm256_cvttps_epi32( bf ); \
+      t = _mm256_permute4x64_epi64( _mm256_packs_epi32( a, b ), (0<<0)+(2<<2)+(1<<4)+(3<<6) ); \
+      out = _mm256_castsi256_si128( _mm256_permute4x64_epi64( _mm256_packus_epi16( t, t ), (0<<0)+(2<<2)+(1<<4)+(3<<6) ) ); \
+    }
+
+    #define stbir__simdi8_expand_u16_to_u32(out,ireg) out = _mm256_unpacklo_epi16( _mm256_permute4x64_epi64(_mm256_castsi128_si256(ireg),(0<<0)+(2<<2)+(1<<4)+(3<<6)), _mm256_setzero_si256() );
+
+    #define stbir__simdf8_pack_to_16words(out,aa,bb) \
+      { \
+        stbir__simdf8 af,bf; \
+        stbir__simdi8 a,b; \
+        af = _mm256_min_ps( aa, STBIR_max_uint16_as_floatX ); \
+        bf = _mm256_min_ps( bb, STBIR_max_uint16_as_floatX ); \
+        af = _mm256_max_ps( af, _mm256_setzero_ps() ); \
+        bf = _mm256_max_ps( bf, _mm256_setzero_ps() ); \
+        a = _mm256_cvttps_epi32( af ); \
+        b = _mm256_cvttps_epi32( bf ); \
+        (out) = _mm256_permute4x64_epi64( _mm256_packus_epi32(a, b), (0<<0)+(2<<2)+(1<<4)+(3<<6) ); \
+      }
+
+    #else
+
+    #define stbir__simdi8_expand_u8_to_u32(out0,out1,ireg) \
+    { \
+      stbir__simdi a,zero = _mm_setzero_si128(); \
+      a = _mm_unpacklo_epi8( ireg, zero ); \
+      out0 = _mm256_setr_m128i( _mm_unpacklo_epi16( a, zero ), _mm_unpackhi_epi16( a, zero ) ); \
+      a = _mm_unpackhi_epi8( ireg, zero ); \
+      out1 = _mm256_setr_m128i( _mm_unpacklo_epi16( a, zero ), _mm_unpackhi_epi16( a, zero ) ); \
+    }
+
+    #define stbir__simdf8_pack_to_16bytes(out,aa,bb) \
+    { \
+      stbir__simdi t; \
+      stbir__simdf8 af,bf; \
+      stbir__simdi8 a,b; \
+      af = _mm256_min_ps( aa, STBIR_max_uint8_as_floatX ); \
+      bf = _mm256_min_ps( bb, STBIR_max_uint8_as_floatX ); \
+      af = _mm256_max_ps( af, _mm256_setzero_ps() ); \
+      bf = _mm256_max_ps( bf, _mm256_setzero_ps() ); \
+      a = _mm256_cvttps_epi32( af ); \
+      b = _mm256_cvttps_epi32( bf ); \
+      out = _mm_packs_epi32( _mm256_castsi256_si128(a), _mm256_extractf128_si256( a, 1 ) ); \
+      out = _mm_packus_epi16( out, out ); \
+      t = _mm_packs_epi32( _mm256_castsi256_si128(b), _mm256_extractf128_si256( b, 1 ) ); \
+      t = _mm_packus_epi16( t, t ); \
+      out = _mm_castps_si128( _mm_shuffle_ps( _mm_castsi128_ps(out), _mm_castsi128_ps(t), (0<<0)+(1<<2)+(0<<4)+(1<<6) ) ); \
+    }
+
+    #define stbir__simdi8_expand_u16_to_u32(out,ireg) \
+    { \
+      stbir__simdi a,b,zero = _mm_setzero_si128(); \
+      a = _mm_unpacklo_epi16( ireg, zero ); \
+      b = _mm_unpackhi_epi16( ireg, zero ); \
+      out = _mm256_insertf128_si256( _mm256_castsi128_si256( a ), b, 1 ); \
+    }
+
+    #define stbir__simdf8_pack_to_16words(out,aa,bb) \
+      { \
+        stbir__simdi t0,t1; \
+        stbir__simdf8 af,bf; \
+        stbir__simdi8 a,b; \
+        af = _mm256_min_ps( aa, STBIR_max_uint16_as_floatX ); \
+        bf = _mm256_min_ps( bb, STBIR_max_uint16_as_floatX ); \
+        af = _mm256_max_ps( af, _mm256_setzero_ps() ); \
+        bf = _mm256_max_ps( bf, _mm256_setzero_ps() ); \
+        a = _mm256_cvttps_epi32( af ); \
+        b = _mm256_cvttps_epi32( bf ); \
+        t0 = _mm_packus_epi32( _mm256_castsi256_si128(a), _mm256_extractf128_si256( a, 1 ) ); \
+        t1 = _mm_packus_epi32( _mm256_castsi256_si128(b), _mm256_extractf128_si256( b, 1 ) ); \
+        out = _mm256_setr_m128i( t0, t1 ); \
+      }
+
+    #endif
+
+    static __m256i stbir_00001111 = { STBIR__CONST_4d_32i( 0, 0, 0, 0 ), STBIR__CONST_4d_32i( 1, 1, 1, 1 ) };
+    #define stbir__simdf8_0123to00001111( out, in ) (out) = _mm256_permutevar_ps ( in, stbir_00001111 )
+
+    static __m256i stbir_22223333 = { STBIR__CONST_4d_32i( 2, 2, 2, 2 ), STBIR__CONST_4d_32i( 3, 3, 3, 3 ) };
+    #define stbir__simdf8_0123to22223333( out, in ) (out) = _mm256_permutevar_ps ( in, stbir_22223333 )
+
+    #define stbir__simdf8_0123to2222( out, in ) (out) = stbir__simdf_swiz(_mm256_castps256_ps128(in), 2,2,2,2 )
+
+    #define stbir__simdf8_load4b( out, ptr ) (out) = _mm256_broadcast_ps( (__m128 const *)(ptr) )
+
+    static __m256i stbir_00112233 = { STBIR__CONST_4d_32i( 0, 0, 1, 1 ), STBIR__CONST_4d_32i( 2, 2, 3, 3 ) };
+    #define stbir__simdf8_0123to00112233( out, in ) (out) = _mm256_permutevar_ps ( in, stbir_00112233 )
+    #define stbir__simdf8_add4( out, a8, b ) (out) = _mm256_add_ps( a8,  _mm256_castps128_ps256( b ) )
+
+    static __m256i stbir_load6 = { STBIR__CONST_4_32i( 0x80000000 ), STBIR__CONST_4d_32i(  0x80000000,  0x80000000, 0, 0 ) };
+    #define stbir__simdf8_load6z( out, ptr ) (out) = _mm256_maskload_ps( ptr, stbir_load6 )
+
+    #define stbir__simdf8_0123to00000000( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (0<<0)+(0<<2)+(0<<4)+(0<<6) )
+    #define stbir__simdf8_0123to11111111( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (1<<0)+(1<<2)+(1<<4)+(1<<6) )
+    #define stbir__simdf8_0123to22222222( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (2<<0)+(2<<2)+(2<<4)+(2<<6) )
+    #define stbir__simdf8_0123to33333333( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (3<<0)+(3<<2)+(3<<4)+(3<<6) )
+    #define stbir__simdf8_0123to21032103( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (2<<0)+(1<<2)+(0<<4)+(3<<6) )
+    #define stbir__simdf8_0123to32103210( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (3<<0)+(2<<2)+(1<<4)+(0<<6) )
+    #define stbir__simdf8_0123to12301230( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (1<<0)+(2<<2)+(3<<4)+(0<<6) )
+    #define stbir__simdf8_0123to10321032( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (1<<0)+(0<<2)+(3<<4)+(2<<6) )
+    #define stbir__simdf8_0123to30123012( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (3<<0)+(0<<2)+(1<<4)+(2<<6) )
+
+    #define stbir__simdf8_0123to11331133( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (1<<0)+(1<<2)+(3<<4)+(3<<6) )
+    #define stbir__simdf8_0123to00220022( out, in ) (out) =  _mm256_shuffle_ps ( in, in, (0<<0)+(0<<2)+(2<<4)+(2<<6) )
+
+    #define stbir__simdf8_aaa1( out, alp, ones ) (out) = _mm256_blend_ps( alp, ones, (1<<0)+(1<<1)+(1<<2)+(0<<3)+(1<<4)+(1<<5)+(1<<6)+(0<<7)); (out)=_mm256_shuffle_ps( out,out, (3<<0) + (3<<2) + (3<<4) + (0<<6) )
+    #define stbir__simdf8_1aaa( out, alp, ones ) (out) = _mm256_blend_ps( alp, ones, (0<<0)+(1<<1)+(1<<2)+(1<<3)+(0<<4)+(1<<5)+(1<<6)+(1<<7)); (out)=_mm256_shuffle_ps( out,out, (1<<0) + (0<<2) + (0<<4) + (0<<6) )
+    #define stbir__simdf8_a1a1( out, alp, ones) (out) = _mm256_blend_ps( alp, ones, (1<<0)+(0<<1)+(1<<2)+(0<<3)+(1<<4)+(0<<5)+(1<<6)+(0<<7)); (out)=_mm256_shuffle_ps( out,out, (1<<0) + (0<<2) + (3<<4) + (2<<6) )
+    #define stbir__simdf8_1a1a( out, alp, ones) (out) = _mm256_blend_ps( alp, ones, (0<<0)+(1<<1)+(0<<2)+(1<<3)+(0<<4)+(1<<5)+(0<<6)+(1<<7)); (out)=_mm256_shuffle_ps( out,out, (1<<0) + (0<<2) + (3<<4) + (2<<6) )
+
+    #define stbir__simdf8_zero( reg ) (reg) = _mm256_setzero_ps()
+
+    #ifdef STBIR_USE_FMA           // not on by default to maintain bit identical simd to non-simd
+    #define stbir__simdf8_madd( out, add, mul1, mul2 ) (out) = _mm256_fmadd_ps( mul1, mul2, add )
+    #define stbir__simdf8_madd_mem( out, add, mul, ptr ) (out) = _mm256_fmadd_ps( mul, _mm256_loadu_ps( (float const*)(ptr) ), add )
+    #define stbir__simdf8_madd_mem4( out, add, mul, ptr )(out) = _mm256_fmadd_ps( _mm256_setr_m128( mul, _mm_setzero_ps() ), _mm256_setr_m128( _mm_loadu_ps( (float const*)(ptr) ), _mm_setzero_ps() ), add )
+    #else
+    #define stbir__simdf8_madd( out, add, mul1, mul2 ) (out) = _mm256_add_ps( add, _mm256_mul_ps( mul1, mul2 ) )
+    #define stbir__simdf8_madd_mem( out, add, mul, ptr ) (out) = _mm256_add_ps( add, _mm256_mul_ps( mul, _mm256_loadu_ps( (float const*)(ptr) ) ) )
+    #define stbir__simdf8_madd_mem4( out, add, mul, ptr )  (out) = _mm256_add_ps( add, _mm256_setr_m128( _mm_mul_ps( mul, _mm_loadu_ps( (float const*)(ptr) ) ), _mm_setzero_ps() ) )
+    #endif
+    #define stbir__if_simdf8_cast_to_simdf4( val ) _mm256_castps256_ps128( val )
+
+  #endif
+
+  #ifdef STBIR_FLOORF
+  #undef STBIR_FLOORF
+  #endif
+  #define STBIR_FLOORF stbir_simd_floorf
+  static stbir__inline float stbir_simd_floorf(float x)  // martins floorf
+  {
+    #if defined(STBIR_AVX) || defined(__SSE4_1__) || defined(STBIR_SSE41)
+    __m128 t = _mm_set_ss(x);
+    return _mm_cvtss_f32( _mm_floor_ss(t, t) );
+    #else
+    __m128 f = _mm_set_ss(x);
+    __m128 t = _mm_cvtepi32_ps(_mm_cvttps_epi32(f));
+    __m128 r = _mm_add_ss(t, _mm_and_ps(_mm_cmplt_ss(f, t), _mm_set_ss(-1.0f)));
+    return _mm_cvtss_f32(r);
+    #endif
+  }
+
+  #ifdef STBIR_CEILF
+  #undef STBIR_CEILF
+  #endif
+  #define STBIR_CEILF stbir_simd_ceilf
+  static stbir__inline float stbir_simd_ceilf(float x)  // martins ceilf
+  {
+    #if defined(STBIR_AVX) || defined(__SSE4_1__) || defined(STBIR_SSE41)
+    __m128 t = _mm_set_ss(x);
+    return _mm_cvtss_f32( _mm_ceil_ss(t, t) );
+    #else
+    __m128 f = _mm_set_ss(x);
+    __m128 t = _mm_cvtepi32_ps(_mm_cvttps_epi32(f));
+    __m128 r = _mm_add_ss(t, _mm_and_ps(_mm_cmplt_ss(t, f), _mm_set_ss(1.0f)));
+    return _mm_cvtss_f32(r);
+    #endif
+  }
+
+#elif defined(STBIR_NEON)
+
+  #include <arm_neon.h>
+
+  #define stbir__simdf float32x4_t
+  #define stbir__simdi uint32x4_t
+
+  #define stbir_simdi_castf( reg ) vreinterpretq_u32_f32(reg)
+  #define stbir_simdf_casti( reg ) vreinterpretq_f32_u32(reg)
+
+  #define stbir__simdf_load( reg, ptr ) (reg) = vld1q_f32( (float const*)(ptr) )
+  #define stbir__simdi_load( reg, ptr ) (reg) = vld1q_u32( (uint32_t const*)(ptr) )
+  #define stbir__simdf_load1( out, ptr ) (out) = vld1q_dup_f32( (float const*)(ptr) ) // top values can be random (not denormal or nan for perf)
+  #define stbir__simdi_load1( out, ptr ) (out) = vld1q_dup_u32( (uint32_t const*)(ptr) )
+  #define stbir__simdf_load1z( out, ptr ) (out) = vld1q_lane_f32( (float const*)(ptr), vdupq_n_f32(0), 0 ) // top values must be zero
+  #define stbir__simdf_frep4( fvar ) vdupq_n_f32( fvar )
+  #define stbir__simdf_load1frep4( out, fvar ) (out) = vdupq_n_f32( fvar )
+  #define stbir__simdf_load2( out, ptr ) (out) = vcombine_f32( vld1_f32( (float const*)(ptr) ), vcreate_f32(0) ) // top values can be random (not denormal or nan for perf)
+  #define stbir__simdf_load2z( out, ptr ) (out) = vcombine_f32( vld1_f32( (float const*)(ptr) ), vcreate_f32(0) )  // top values must be zero
+  #define stbir__simdf_load2hmerge( out, reg, ptr ) (out) = vcombine_f32( vget_low_f32(reg), vld1_f32( (float const*)(ptr) ) )
+
+  #define stbir__simdf_zeroP() vdupq_n_f32(0)
+  #define stbir__simdf_zero( reg ) (reg) = vdupq_n_f32(0)
+
+  #define stbir__simdf_store( ptr, reg )  vst1q_f32( (float*)(ptr), reg )
+  #define stbir__simdf_store1( ptr, reg ) vst1q_lane_f32( (float*)(ptr), reg, 0)
+  #define stbir__simdf_store2( ptr, reg ) vst1_f32( (float*)(ptr), vget_low_f32(reg) )
+  #define stbir__simdf_store2h( ptr, reg ) vst1_f32( (float*)(ptr), vget_high_f32(reg) )
+
+  #define stbir__simdi_store( ptr, reg )  vst1q_u32( (uint32_t*)(ptr), reg )
+  #define stbir__simdi_store1( ptr, reg ) vst1q_lane_u32( (uint32_t*)(ptr), reg, 0 )
+  #define stbir__simdi_store2( ptr, reg ) vst1_u32( (uint32_t*)(ptr), vget_low_u32(reg) )
+
+  #define stbir__prefetch( ptr )
+
+  #define stbir__simdi_expand_u8_to_u32(out0,out1,out2,out3,ireg) \
+  { \
+    uint16x8_t l = vmovl_u8( vget_low_u8 ( vreinterpretq_u8_u32(ireg) ) ); \
+    uint16x8_t h = vmovl_u8( vget_high_u8( vreinterpretq_u8_u32(ireg) ) ); \
+    out0 = vmovl_u16( vget_low_u16 ( l ) ); \
+    out1 = vmovl_u16( vget_high_u16( l ) ); \
+    out2 = vmovl_u16( vget_low_u16 ( h ) ); \
+    out3 = vmovl_u16( vget_high_u16( h ) ); \
+  }
+
+  #define stbir__simdi_expand_u8_to_1u32(out,ireg) \
+  { \
+    uint16x8_t tmp = vmovl_u8( vget_low_u8( vreinterpretq_u8_u32(ireg) ) ); \
+    out = vmovl_u16( vget_low_u16( tmp ) ); \
+  }
+
+  #define stbir__simdi_expand_u16_to_u32(out0,out1,ireg) \
+  { \
+    uint16x8_t tmp = vreinterpretq_u16_u32(ireg); \
+    out0 = vmovl_u16( vget_low_u16 ( tmp ) ); \
+    out1 = vmovl_u16( vget_high_u16( tmp ) ); \
+  }
+
+  #define stbir__simdf_convert_float_to_i32( i, f ) (i) = vreinterpretq_u32_s32( vcvtq_s32_f32(f) )
+  #define stbir__simdf_convert_float_to_int( f ) vgetq_lane_s32(vcvtq_s32_f32(f), 0)
+  #define stbir__simdi_to_int( i ) (int)vgetq_lane_u32(i, 0)
+  #define stbir__simdf_convert_float_to_uint8( f ) ((unsigned char)vgetq_lane_s32(vcvtq_s32_f32(vmaxq_f32(vminq_f32(f,STBIR__CONSTF(STBIR_max_uint8_as_float)),vdupq_n_f32(0))), 0))
+  #define stbir__simdf_convert_float_to_short( f ) ((unsigned short)vgetq_lane_s32(vcvtq_s32_f32(vmaxq_f32(vminq_f32(f,STBIR__CONSTF(STBIR_max_uint16_as_float)),vdupq_n_f32(0))), 0))
+  #define stbir__simdi_convert_i32_to_float(out, ireg) (out) = vcvtq_f32_s32( vreinterpretq_s32_u32(ireg) )
+  #define stbir__simdf_add( out, reg0, reg1 ) (out) = vaddq_f32( reg0, reg1 )
+  #define stbir__simdf_mult( out, reg0, reg1 ) (out) = vmulq_f32( reg0, reg1 )
+  #define stbir__simdf_mult_mem( out, reg, ptr ) (out) = vmulq_f32( reg, vld1q_f32( (float const*)(ptr) ) )
+  #define stbir__simdf_mult1_mem( out, reg, ptr ) (out) = vmulq_f32( reg, vld1q_dup_f32( (float const*)(ptr) ) )
+  #define stbir__simdf_add_mem( out, reg, ptr ) (out) = vaddq_f32( reg, vld1q_f32( (float const*)(ptr) ) )
+  #define stbir__simdf_add1_mem( out, reg, ptr ) (out) = vaddq_f32( reg, vld1q_dup_f32( (float const*)(ptr) ) )
+
+  #ifdef STBIR_USE_FMA           // not on by default to maintain bit identical simd to non-simd (and also x64 no madd to arm madd)
+  #define stbir__simdf_madd( out, add, mul1, mul2 ) (out) = vfmaq_f32( add, mul1, mul2 )
+  #define stbir__simdf_madd1( out, add, mul1, mul2 ) (out) = vfmaq_f32( add, mul1, mul2 )
+  #define stbir__simdf_madd_mem( out, add, mul, ptr ) (out) = vfmaq_f32( add, mul, vld1q_f32( (float const*)(ptr) ) )
+  #define stbir__simdf_madd1_mem( out, add, mul, ptr ) (out) = vfmaq_f32( add, mul, vld1q_dup_f32( (float const*)(ptr) ) )
+  #else
+  #define stbir__simdf_madd( out, add, mul1, mul2 ) (out) = vaddq_f32( add, vmulq_f32( mul1, mul2 ) )
+  #define stbir__simdf_madd1( out, add, mul1, mul2 ) (out) = vaddq_f32( add, vmulq_f32( mul1, mul2 ) )
+  #define stbir__simdf_madd_mem( out, add, mul, ptr ) (out) = vaddq_f32( add, vmulq_f32( mul, vld1q_f32( (float const*)(ptr) ) ) )
+  #define stbir__simdf_madd1_mem( out, add, mul, ptr ) (out) = vaddq_f32( add, vmulq_f32( mul, vld1q_dup_f32( (float const*)(ptr) ) ) )
+  #endif
+
+  #define stbir__simdf_add1( out, reg0, reg1 ) (out) = vaddq_f32( reg0, reg1 )
+  #define stbir__simdf_mult1( out, reg0, reg1 ) (out) = vmulq_f32( reg0, reg1 )
+
+  #define stbir__simdf_and( out, reg0, reg1 ) (out) = vreinterpretq_f32_u32( vandq_u32( vreinterpretq_u32_f32(reg0), vreinterpretq_u32_f32(reg1) ) )
+  #define stbir__simdf_or( out, reg0, reg1 ) (out) = vreinterpretq_f32_u32( vorrq_u32( vreinterpretq_u32_f32(reg0), vreinterpretq_u32_f32(reg1) ) )
+
+  #define stbir__simdf_min( out, reg0, reg1 ) (out) = vminq_f32( reg0, reg1 )
+  #define stbir__simdf_max( out, reg0, reg1 ) (out) = vmaxq_f32( reg0, reg1 )
+  #define stbir__simdf_min1( out, reg0, reg1 ) (out) = vminq_f32( reg0, reg1 )
+  #define stbir__simdf_max1( out, reg0, reg1 ) (out) = vmaxq_f32( reg0, reg1 )
+
+  #define stbir__simdf_0123ABCDto3ABx( out, reg0, reg1 ) (out) = vextq_f32( reg0, reg1, 3 )
+  #define stbir__simdf_0123ABCDto23Ax( out, reg0, reg1 ) (out) = vextq_f32( reg0, reg1, 2 )
+
+  #define stbir__simdf_a1a1( out, alp, ones ) (out) = vzipq_f32(vuzpq_f32(alp, alp).val[1], ones).val[0]
+  #define stbir__simdf_1a1a( out, alp, ones ) (out) = vzipq_f32(ones, vuzpq_f32(alp, alp).val[0]).val[0]
+
+  #if defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ )
+
+    #define stbir__simdf_aaa1( out, alp, ones ) (out) = vcopyq_laneq_f32(vdupq_n_f32(vgetq_lane_f32(alp, 3)), 3, ones, 3)
+    #define stbir__simdf_1aaa( out, alp, ones ) (out) = vcopyq_laneq_f32(vdupq_n_f32(vgetq_lane_f32(alp, 0)), 0, ones, 0)
+
+    #if defined( _MSC_VER ) && !defined(__clang__)
+      #define stbir_make16(a,b,c,d) vcombine_u8( \
+        vcreate_u8( (4*a+0) | ((4*a+1)<<8) | ((4*a+2)<<16) | ((4*a+3)<<24) | \
+          ((stbir_uint64)(4*b+0)<<32) | ((stbir_uint64)(4*b+1)<<40) | ((stbir_uint64)(4*b+2)<<48) | ((stbir_uint64)(4*b+3)<<56)), \
+        vcreate_u8( (4*c+0) | ((4*c+1)<<8) | ((4*c+2)<<16) | ((4*c+3)<<24) | \
+          ((stbir_uint64)(4*d+0)<<32) | ((stbir_uint64)(4*d+1)<<40) | ((stbir_uint64)(4*d+2)<<48) | ((stbir_uint64)(4*d+3)<<56) ) )
+
+      static stbir__inline uint8x16x2_t stbir_make16x2(float32x4_t rega,float32x4_t regb)
+      {
+        uint8x16x2_t r = { vreinterpretq_u8_f32(rega), vreinterpretq_u8_f32(regb) };
+        return r;
+      }
+    #else
+      #define stbir_make16(a,b,c,d) (uint8x16_t){4*a+0,4*a+1,4*a+2,4*a+3,4*b+0,4*b+1,4*b+2,4*b+3,4*c+0,4*c+1,4*c+2,4*c+3,4*d+0,4*d+1,4*d+2,4*d+3}
+      #define stbir_make16x2(a,b) (uint8x16x2_t){{vreinterpretq_u8_f32(a),vreinterpretq_u8_f32(b)}}
+    #endif
+
+    #define stbir__simdf_swiz( reg, one, two, three, four ) vreinterpretq_f32_u8( vqtbl1q_u8( vreinterpretq_u8_f32(reg), stbir_make16(one, two, three, four) ) )
+    #define stbir__simdf_swiz2( rega, regb, one, two, three, four ) vreinterpretq_f32_u8( vqtbl2q_u8( stbir_make16x2(rega,regb), stbir_make16(one, two, three, four) ) )
+
+    #define stbir__simdi_16madd( out, reg0, reg1 ) \
+    { \
+      int16x8_t r0 = vreinterpretq_s16_u32(reg0); \
+      int16x8_t r1 = vreinterpretq_s16_u32(reg1); \
+      int32x4_t tmp0 = vmull_s16( vget_low_s16(r0), vget_low_s16(r1) ); \
+      int32x4_t tmp1 = vmull_s16( vget_high_s16(r0), vget_high_s16(r1) ); \
+      (out) = vreinterpretq_u32_s32( vpaddq_s32(tmp0, tmp1) ); \
+    }
+
+  #else
+
+    #define stbir__simdf_aaa1( out, alp, ones ) (out) = vsetq_lane_f32(1.0f, vdupq_n_f32(vgetq_lane_f32(alp, 3)), 3)
+    #define stbir__simdf_1aaa( out, alp, ones ) (out) = vsetq_lane_f32(1.0f, vdupq_n_f32(vgetq_lane_f32(alp, 0)), 0)
+
+    #if defined( _MSC_VER ) && !defined(__clang__)
+      static stbir__inline uint8x8x2_t stbir_make8x2(float32x4_t reg)
+      {
+        uint8x8x2_t r = { { vget_low_u8(vreinterpretq_u8_f32(reg)), vget_high_u8(vreinterpretq_u8_f32(reg)) } };
+        return r;
+      }
+      #define stbir_make8(a,b) vcreate_u8( \
+        (4*a+0) | ((4*a+1)<<8) | ((4*a+2)<<16) | ((4*a+3)<<24) | \
+        ((stbir_uint64)(4*b+0)<<32) | ((stbir_uint64)(4*b+1)<<40) | ((stbir_uint64)(4*b+2)<<48) | ((stbir_uint64)(4*b+3)<<56) )
+    #else
+      #define stbir_make8x2(reg) (uint8x8x2_t){ { vget_low_u8(vreinterpretq_u8_f32(reg)), vget_high_u8(vreinterpretq_u8_f32(reg)) } }
+      #define stbir_make8(a,b) (uint8x8_t){4*a+0,4*a+1,4*a+2,4*a+3,4*b+0,4*b+1,4*b+2,4*b+3}
+    #endif
+
+    #define stbir__simdf_swiz( reg, one, two, three, four ) vreinterpretq_f32_u8( vcombine_u8( \
+        vtbl2_u8( stbir_make8x2( reg ), stbir_make8( one, two ) ), \
+        vtbl2_u8( stbir_make8x2( reg ), stbir_make8( three, four ) ) ) )
+
+    #define stbir__simdi_16madd( out, reg0, reg1 ) \
+    { \
+      int16x8_t r0 = vreinterpretq_s16_u32(reg0); \
+      int16x8_t r1 = vreinterpretq_s16_u32(reg1); \
+      int32x4_t tmp0 = vmull_s16( vget_low_s16(r0), vget_low_s16(r1) ); \
+      int32x4_t tmp1 = vmull_s16( vget_high_s16(r0), vget_high_s16(r1) ); \
+      int32x2_t out0 = vpadd_s32( vget_low_s32(tmp0), vget_high_s32(tmp0) ); \
+      int32x2_t out1 = vpadd_s32( vget_low_s32(tmp1), vget_high_s32(tmp1) ); \
+      (out) = vreinterpretq_u32_s32( vcombine_s32(out0, out1) ); \
+    }
+
+  #endif
+
+  #define stbir__simdi_and( out, reg0, reg1 ) (out) = vandq_u32( reg0, reg1 )
+  #define stbir__simdi_or( out, reg0, reg1 ) (out) = vorrq_u32( reg0, reg1 )
+
+  #define stbir__simdf_pack_to_8bytes(out,aa,bb) \
+  { \
+    float32x4_t af = vmaxq_f32( vminq_f32(aa,STBIR__CONSTF(STBIR_max_uint8_as_float) ), vdupq_n_f32(0) ); \
+    float32x4_t bf = vmaxq_f32( vminq_f32(bb,STBIR__CONSTF(STBIR_max_uint8_as_float) ), vdupq_n_f32(0) ); \
+    int16x4_t ai = vqmovn_s32( vcvtq_s32_f32( af ) ); \
+    int16x4_t bi = vqmovn_s32( vcvtq_s32_f32( bf ) ); \
+    uint8x8_t out8 = vqmovun_s16( vcombine_s16(ai, bi) ); \
+    out = vreinterpretq_u32_u8( vcombine_u8(out8, out8) ); \
+  }
+
+  #define stbir__simdf_pack_to_8words(out,aa,bb) \
+  { \
+    float32x4_t af = vmaxq_f32( vminq_f32(aa,STBIR__CONSTF(STBIR_max_uint16_as_float) ), vdupq_n_f32(0) ); \
+    float32x4_t bf = vmaxq_f32( vminq_f32(bb,STBIR__CONSTF(STBIR_max_uint16_as_float) ), vdupq_n_f32(0) ); \
+    int32x4_t ai = vcvtq_s32_f32( af ); \
+    int32x4_t bi = vcvtq_s32_f32( bf ); \
+    out = vreinterpretq_u32_u16( vcombine_u16(vqmovun_s32(ai), vqmovun_s32(bi)) ); \
+  }
+
+  #define stbir__interleave_pack_and_store_16_u8( ptr, r0, r1, r2, r3 ) \
+  { \
+    int16x4x2_t tmp0 = vzip_s16( vqmovn_s32(vreinterpretq_s32_u32(r0)), vqmovn_s32(vreinterpretq_s32_u32(r2)) ); \
+    int16x4x2_t tmp1 = vzip_s16( vqmovn_s32(vreinterpretq_s32_u32(r1)), vqmovn_s32(vreinterpretq_s32_u32(r3)) ); \
+    uint8x8x2_t out = \
+    { { \
+      vqmovun_s16( vcombine_s16(tmp0.val[0], tmp0.val[1]) ), \
+      vqmovun_s16( vcombine_s16(tmp1.val[0], tmp1.val[1]) ), \
+    } }; \
+    vst2_u8(ptr, out); \
+  }
+
+  #define stbir__simdf_load4_transposed( o0, o1, o2, o3, ptr ) \
+  { \
+    float32x4x4_t tmp = vld4q_f32(ptr); \
+    o0 = tmp.val[0]; \
+    o1 = tmp.val[1]; \
+    o2 = tmp.val[2]; \
+    o3 = tmp.val[3]; \
+  }
+
+  #define stbir__simdi_32shr( out, reg, imm ) out = vshrq_n_u32( reg, imm )
+
+  #if defined( _MSC_VER ) && !defined(__clang__)
+    #define STBIR__SIMDF_CONST(var, x) __declspec(align(8)) float var[] = { x, x, x, x }
+    #define STBIR__SIMDI_CONST(var, x) __declspec(align(8)) uint32_t var[] = { x, x, x, x }
+    #define STBIR__CONSTF(var) (*(const float32x4_t*)var)
+    #define STBIR__CONSTI(var) (*(const uint32x4_t*)var)
+  #else
+    #define STBIR__SIMDF_CONST(var, x) stbir__simdf var = { x, x, x, x }
+    #define STBIR__SIMDI_CONST(var, x) stbir__simdi var = { x, x, x, x }
+    #define STBIR__CONSTF(var) (var)
+    #define STBIR__CONSTI(var) (var)
+  #endif
+
+  #ifdef STBIR_FLOORF
+  #undef STBIR_FLOORF
+  #endif
+  #define STBIR_FLOORF stbir_simd_floorf
+  static stbir__inline float stbir_simd_floorf(float x)
+  {
+    #if defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ )
+    return vget_lane_f32( vrndm_f32( vdup_n_f32(x) ), 0);
+    #else
+    float32x2_t f = vdup_n_f32(x);
+    float32x2_t t = vcvt_f32_s32(vcvt_s32_f32(f));
+    uint32x2_t a = vclt_f32(f, t);
+    uint32x2_t b = vreinterpret_u32_f32(vdup_n_f32(-1.0f));
+    float32x2_t r = vadd_f32(t, vreinterpret_f32_u32(vand_u32(a, b)));
+    return vget_lane_f32(r, 0);
+    #endif
+  }
+
+  #ifdef STBIR_CEILF
+  #undef STBIR_CEILF
+  #endif
+  #define STBIR_CEILF stbir_simd_ceilf
+  static stbir__inline float stbir_simd_ceilf(float x)
+  {
+    #if defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ )
+    return vget_lane_f32( vrndp_f32( vdup_n_f32(x) ), 0);
+    #else
+    float32x2_t f = vdup_n_f32(x);
+    float32x2_t t = vcvt_f32_s32(vcvt_s32_f32(f));
+    uint32x2_t a = vclt_f32(t, f);
+    uint32x2_t b = vreinterpret_u32_f32(vdup_n_f32(1.0f));
+    float32x2_t r = vadd_f32(t, vreinterpret_f32_u32(vand_u32(a, b)));
+    return vget_lane_f32(r, 0);
+    #endif
+  }
+
+  #define STBIR_SIMD
+
+#elif defined(STBIR_WASM)
+
+  #include <wasm_simd128.h>
+
+  #define stbir__simdf v128_t
+  #define stbir__simdi v128_t
+
+  #define stbir_simdi_castf( reg ) (reg)
+  #define stbir_simdf_casti( reg ) (reg)
+
+  #define stbir__simdf_load( reg, ptr )             (reg) = wasm_v128_load( (void const*)(ptr) )
+  #define stbir__simdi_load( reg, ptr )             (reg) = wasm_v128_load( (void const*)(ptr) )
+  #define stbir__simdf_load1( out, ptr )            (out) = wasm_v128_load32_splat( (void const*)(ptr) ) // top values can be random (not denormal or nan for perf)
+  #define stbir__simdi_load1( out, ptr )            (out) = wasm_v128_load32_splat( (void const*)(ptr) )
+  #define stbir__simdf_load1z( out, ptr )           (out) = wasm_v128_load32_zero( (void const*)(ptr) ) // top values must be zero
+  #define stbir__simdf_frep4( fvar )                wasm_f32x4_splat( fvar )
+  #define stbir__simdf_load1frep4( out, fvar )      (out) = wasm_f32x4_splat( fvar )
+  #define stbir__simdf_load2( out, ptr )            (out) = wasm_v128_load64_splat( (void const*)(ptr) ) // top values can be random (not denormal or nan for perf)
+  #define stbir__simdf_load2z( out, ptr )           (out) = wasm_v128_load64_zero( (void const*)(ptr) ) // top values must be zero
+  #define stbir__simdf_load2hmerge( out, reg, ptr ) (out) = wasm_v128_load64_lane( (void const*)(ptr), reg, 1 )
+
+  #define stbir__simdf_zeroP() wasm_f32x4_const_splat(0)
+  #define stbir__simdf_zero( reg ) (reg) = wasm_f32x4_const_splat(0)
+
+  #define stbir__simdf_store( ptr, reg )   wasm_v128_store( (void*)(ptr), reg )
+  #define stbir__simdf_store1( ptr, reg )  wasm_v128_store32_lane( (void*)(ptr), reg, 0 )
+  #define stbir__simdf_store2( ptr, reg )  wasm_v128_store64_lane( (void*)(ptr), reg, 0 )
+  #define stbir__simdf_store2h( ptr, reg ) wasm_v128_store64_lane( (void*)(ptr), reg, 1 )
+
+  #define stbir__simdi_store( ptr, reg )  wasm_v128_store( (void*)(ptr), reg )
+  #define stbir__simdi_store1( ptr, reg ) wasm_v128_store32_lane( (void*)(ptr), reg, 0 )
+  #define stbir__simdi_store2( ptr, reg ) wasm_v128_store64_lane( (void*)(ptr), reg, 0 )
+
+  #define stbir__prefetch( ptr )
+
+  #define stbir__simdi_expand_u8_to_u32(out0,out1,out2,out3,ireg) \
+  { \
+    v128_t l = wasm_u16x8_extend_low_u8x16 ( ireg ); \
+    v128_t h = wasm_u16x8_extend_high_u8x16( ireg ); \
+    out0 = wasm_u32x4_extend_low_u16x8 ( l ); \
+    out1 = wasm_u32x4_extend_high_u16x8( l ); \
+    out2 = wasm_u32x4_extend_low_u16x8 ( h ); \
+    out3 = wasm_u32x4_extend_high_u16x8( h ); \
+  }
+
+  #define stbir__simdi_expand_u8_to_1u32(out,ireg) \
+  { \
+    v128_t tmp = wasm_u16x8_extend_low_u8x16(ireg); \
+    out = wasm_u32x4_extend_low_u16x8(tmp); \
+  }
+
+  #define stbir__simdi_expand_u16_to_u32(out0,out1,ireg) \
+  { \
+    out0 = wasm_u32x4_extend_low_u16x8 ( ireg ); \
+    out1 = wasm_u32x4_extend_high_u16x8( ireg ); \
+  }
+
+  #define stbir__simdf_convert_float_to_i32( i, f )    (i) = wasm_i32x4_trunc_sat_f32x4(f)
+  #define stbir__simdf_convert_float_to_int( f )       wasm_i32x4_extract_lane(wasm_i32x4_trunc_sat_f32x4(f), 0)
+  #define stbir__simdi_to_int( i )                     wasm_i32x4_extract_lane(i, 0)
+  #define stbir__simdf_convert_float_to_uint8( f )     ((unsigned char)wasm_i32x4_extract_lane(wasm_i32x4_trunc_sat_f32x4(wasm_f32x4_max(wasm_f32x4_min(f,STBIR_max_uint8_as_float),wasm_f32x4_const_splat(0))), 0))
+  #define stbir__simdf_convert_float_to_short( f )     ((unsigned short)wasm_i32x4_extract_lane(wasm_i32x4_trunc_sat_f32x4(wasm_f32x4_max(wasm_f32x4_min(f,STBIR_max_uint16_as_float),wasm_f32x4_const_splat(0))), 0))
+  #define stbir__simdi_convert_i32_to_float(out, ireg) (out) = wasm_f32x4_convert_i32x4(ireg)
+  #define stbir__simdf_add( out, reg0, reg1 )          (out) = wasm_f32x4_add( reg0, reg1 )
+  #define stbir__simdf_mult( out, reg0, reg1 )         (out) = wasm_f32x4_mul( reg0, reg1 )
+  #define stbir__simdf_mult_mem( out, reg, ptr )       (out) = wasm_f32x4_mul( reg, wasm_v128_load( (void const*)(ptr) ) )
+  #define stbir__simdf_mult1_mem( out, reg, ptr )      (out) = wasm_f32x4_mul( reg, wasm_v128_load32_splat( (void const*)(ptr) ) )
+  #define stbir__simdf_add_mem( out, reg, ptr )        (out) = wasm_f32x4_add( reg, wasm_v128_load( (void const*)(ptr) ) )
+  #define stbir__simdf_add1_mem( out, reg, ptr )       (out) = wasm_f32x4_add( reg, wasm_v128_load32_splat( (void const*)(ptr) ) )
+
+  #define stbir__simdf_madd( out, add, mul1, mul2 )    (out) = wasm_f32x4_add( add, wasm_f32x4_mul( mul1, mul2 ) )
+  #define stbir__simdf_madd1( out, add, mul1, mul2 )   (out) = wasm_f32x4_add( add, wasm_f32x4_mul( mul1, mul2 ) )
+  #define stbir__simdf_madd_mem( out, add, mul, ptr )  (out) = wasm_f32x4_add( add, wasm_f32x4_mul( mul, wasm_v128_load( (void const*)(ptr) ) ) )
+  #define stbir__simdf_madd1_mem( out, add, mul, ptr ) (out) = wasm_f32x4_add( add, wasm_f32x4_mul( mul, wasm_v128_load32_splat( (void const*)(ptr) ) ) )
+
+  #define stbir__simdf_add1( out, reg0, reg1 )  (out) = wasm_f32x4_add( reg0, reg1 )
+  #define stbir__simdf_mult1( out, reg0, reg1 ) (out) = wasm_f32x4_mul( reg0, reg1 )
+
+  #define stbir__simdf_and( out, reg0, reg1 ) (out) = wasm_v128_and( reg0, reg1 )
+  #define stbir__simdf_or( out, reg0, reg1 )  (out) = wasm_v128_or( reg0, reg1 )
+
+  #define stbir__simdf_min( out, reg0, reg1 ) (out) = wasm_f32x4_min( reg0, reg1 )
+  #define stbir__simdf_max( out, reg0, reg1 ) (out) = wasm_f32x4_max( reg0, reg1 )
+  #define stbir__simdf_min1( out, reg0, reg1 ) (out) = wasm_f32x4_min( reg0, reg1 )
+  #define stbir__simdf_max1( out, reg0, reg1 ) (out) = wasm_f32x4_max( reg0, reg1 )
+
+  #define stbir__simdf_0123ABCDto3ABx( out, reg0, reg1 ) (out) = wasm_i32x4_shuffle( reg0, reg1, 3, 4, 5, -1 )
+  #define stbir__simdf_0123ABCDto23Ax( out, reg0, reg1 ) (out) = wasm_i32x4_shuffle( reg0, reg1, 2, 3, 4, -1 )
+
+  #define stbir__simdf_aaa1(out,alp,ones) (out) = wasm_i32x4_shuffle(alp, ones, 3, 3, 3, 4)
+  #define stbir__simdf_1aaa(out,alp,ones) (out) = wasm_i32x4_shuffle(alp, ones, 4, 0, 0, 0)
+  #define stbir__simdf_a1a1(out,alp,ones) (out) = wasm_i32x4_shuffle(alp, ones, 1, 4, 3, 4)
+  #define stbir__simdf_1a1a(out,alp,ones) (out) = wasm_i32x4_shuffle(alp, ones, 4, 0, 4, 2)
+
+  #define stbir__simdf_swiz( reg, one, two, three, four ) wasm_i32x4_shuffle(reg, reg, one, two, three, four)
+
+  #define stbir__simdi_and( out, reg0, reg1 )    (out) = wasm_v128_and( reg0, reg1 )
+  #define stbir__simdi_or( out, reg0, reg1 )     (out) = wasm_v128_or( reg0, reg1 )
+  #define stbir__simdi_16madd( out, reg0, reg1 ) (out) = wasm_i32x4_dot_i16x8( reg0, reg1 )
+
+  #define stbir__simdf_pack_to_8bytes(out,aa,bb) \
+  { \
+    v128_t af = wasm_f32x4_max( wasm_f32x4_min(aa, STBIR_max_uint8_as_float), wasm_f32x4_const_splat(0) ); \
+    v128_t bf = wasm_f32x4_max( wasm_f32x4_min(bb, STBIR_max_uint8_as_float), wasm_f32x4_const_splat(0) ); \
+    v128_t ai = wasm_i32x4_trunc_sat_f32x4( af ); \
+    v128_t bi = wasm_i32x4_trunc_sat_f32x4( bf ); \
+    v128_t out16 = wasm_i16x8_narrow_i32x4( ai, bi ); \
+    out = wasm_u8x16_narrow_i16x8( out16, out16 ); \
+  }
+
+  #define stbir__simdf_pack_to_8words(out,aa,bb) \
+  { \
+    v128_t af = wasm_f32x4_max( wasm_f32x4_min(aa, STBIR_max_uint16_as_float), wasm_f32x4_const_splat(0)); \
+    v128_t bf = wasm_f32x4_max( wasm_f32x4_min(bb, STBIR_max_uint16_as_float), wasm_f32x4_const_splat(0)); \
+    v128_t ai = wasm_i32x4_trunc_sat_f32x4( af ); \
+    v128_t bi = wasm_i32x4_trunc_sat_f32x4( bf ); \
+    out = wasm_u16x8_narrow_i32x4( ai, bi ); \
+  }
+
+  #define stbir__interleave_pack_and_store_16_u8( ptr, r0, r1, r2, r3 ) \
+  { \
+    v128_t tmp0 = wasm_i16x8_narrow_i32x4(r0, r1); \
+    v128_t tmp1 = wasm_i16x8_narrow_i32x4(r2, r3); \
+    v128_t tmp = wasm_u8x16_narrow_i16x8(tmp0, tmp1); \
+    tmp = wasm_i8x16_shuffle(tmp, tmp, 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15); \
+    wasm_v128_store( (void*)(ptr), tmp); \
+  }
+
+  #define stbir__simdf_load4_transposed( o0, o1, o2, o3, ptr ) \
+  { \
+    v128_t t0 = wasm_v128_load( ptr    ); \
+    v128_t t1 = wasm_v128_load( ptr+4  ); \
+    v128_t t2 = wasm_v128_load( ptr+8  ); \
+    v128_t t3 = wasm_v128_load( ptr+12 ); \
+    v128_t s0 = wasm_i32x4_shuffle(t0, t1, 0, 4, 2, 6); \
+    v128_t s1 = wasm_i32x4_shuffle(t0, t1, 1, 5, 3, 7); \
+    v128_t s2 = wasm_i32x4_shuffle(t2, t3, 0, 4, 2, 6); \
+    v128_t s3 = wasm_i32x4_shuffle(t2, t3, 1, 5, 3, 7); \
+    o0 = wasm_i32x4_shuffle(s0, s2, 0, 1, 4, 5); \
+    o1 = wasm_i32x4_shuffle(s1, s3, 0, 1, 4, 5); \
+    o2 = wasm_i32x4_shuffle(s0, s2, 2, 3, 6, 7); \
+    o3 = wasm_i32x4_shuffle(s1, s3, 2, 3, 6, 7); \
+  }
+
+  #define stbir__simdi_32shr( out, reg, imm ) out = wasm_u32x4_shr( reg, imm )
+
+  typedef float stbir__f32x4 __attribute__((__vector_size__(16), __aligned__(16)));
+  #define STBIR__SIMDF_CONST(var, x) stbir__simdf var = (v128_t)(stbir__f32x4){ x, x, x, x }
+  #define STBIR__SIMDI_CONST(var, x) stbir__simdi var = { x, x, x, x }
+  #define STBIR__CONSTF(var) (var)
+  #define STBIR__CONSTI(var) (var)
+
+  #ifdef STBIR_FLOORF
+  #undef STBIR_FLOORF
+  #endif
+  #define STBIR_FLOORF stbir_simd_floorf
+  static stbir__inline float stbir_simd_floorf(float x)
+  {
+    return wasm_f32x4_extract_lane( wasm_f32x4_floor( wasm_f32x4_splat(x) ), 0);
+  }
+
+  #ifdef STBIR_CEILF
+  #undef STBIR_CEILF
+  #endif
+  #define STBIR_CEILF stbir_simd_ceilf
+  static stbir__inline float stbir_simd_ceilf(float x)
+  {
+    return wasm_f32x4_extract_lane( wasm_f32x4_ceil( wasm_f32x4_splat(x) ), 0);
+  }
+
+  #define STBIR_SIMD
+
+#endif  // SSE2/NEON/WASM
+
+#endif // NO SIMD
+
+#ifdef STBIR_SIMD8
+  #define stbir__simdfX stbir__simdf8
+  #define stbir__simdiX stbir__simdi8
+  #define stbir__simdfX_load stbir__simdf8_load
+  #define stbir__simdiX_load stbir__simdi8_load
+  #define stbir__simdfX_mult stbir__simdf8_mult
+  #define stbir__simdfX_add_mem stbir__simdf8_add_mem
+  #define stbir__simdfX_madd_mem stbir__simdf8_madd_mem
+  #define stbir__simdfX_store stbir__simdf8_store
+  #define stbir__simdiX_store stbir__simdi8_store
+  #define stbir__simdf_frepX  stbir__simdf8_frep8
+  #define stbir__simdfX_madd stbir__simdf8_madd
+  #define stbir__simdfX_min stbir__simdf8_min
+  #define stbir__simdfX_max stbir__simdf8_max
+  #define stbir__simdfX_aaa1 stbir__simdf8_aaa1
+  #define stbir__simdfX_1aaa stbir__simdf8_1aaa
+  #define stbir__simdfX_a1a1 stbir__simdf8_a1a1
+  #define stbir__simdfX_1a1a stbir__simdf8_1a1a
+  #define stbir__simdfX_convert_float_to_i32 stbir__simdf8_convert_float_to_i32
+  #define stbir__simdfX_pack_to_words stbir__simdf8_pack_to_16words
+  #define stbir__simdfX_zero stbir__simdf8_zero
+  #define STBIR_onesX STBIR_ones8
+  #define STBIR_max_uint8_as_floatX STBIR_max_uint8_as_float8
+  #define STBIR_max_uint16_as_floatX STBIR_max_uint16_as_float8
+  #define STBIR_simd_point5X STBIR_simd_point58
+  #define stbir__simdfX_float_count 8
+  #define stbir__simdfX_0123to1230 stbir__simdf8_0123to12301230
+  #define stbir__simdfX_0123to2103 stbir__simdf8_0123to21032103
+  static const stbir__simdf8 STBIR_max_uint16_as_float_inverted8 = { stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted,stbir__max_uint16_as_float_inverted };
+  static const stbir__simdf8 STBIR_max_uint8_as_float_inverted8 = { stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted,stbir__max_uint8_as_float_inverted };
+  static const stbir__simdf8 STBIR_ones8 = { 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0 };
+  static const stbir__simdf8 STBIR_simd_point58 = { 0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5 };
+  static const stbir__simdf8 STBIR_max_uint8_as_float8 = { stbir__max_uint8_as_float,stbir__max_uint8_as_float,stbir__max_uint8_as_float,stbir__max_uint8_as_float, stbir__max_uint8_as_float,stbir__max_uint8_as_float,stbir__max_uint8_as_float,stbir__max_uint8_as_float };
+  static const stbir__simdf8 STBIR_max_uint16_as_float8 = { stbir__max_uint16_as_float,stbir__max_uint16_as_float,stbir__max_uint16_as_float,stbir__max_uint16_as_float, stbir__max_uint16_as_float,stbir__max_uint16_as_float,stbir__max_uint16_as_float,stbir__max_uint16_as_float };
+#else
+  #define stbir__simdfX stbir__simdf
+  #define stbir__simdiX stbir__simdi
+  #define stbir__simdfX_load stbir__simdf_load
+  #define stbir__simdiX_load stbir__simdi_load
+  #define stbir__simdfX_mult stbir__simdf_mult
+  #define stbir__simdfX_add_mem stbir__simdf_add_mem
+  #define stbir__simdfX_madd_mem stbir__simdf_madd_mem
+  #define stbir__simdfX_store stbir__simdf_store
+  #define stbir__simdiX_store stbir__simdi_store
+  #define stbir__simdf_frepX  stbir__simdf_frep4
+  #define stbir__simdfX_madd stbir__simdf_madd
+  #define stbir__simdfX_min stbir__simdf_min
+  #define stbir__simdfX_max stbir__simdf_max
+  #define stbir__simdfX_aaa1 stbir__simdf_aaa1
+  #define stbir__simdfX_1aaa stbir__simdf_1aaa
+  #define stbir__simdfX_a1a1 stbir__simdf_a1a1
+  #define stbir__simdfX_1a1a stbir__simdf_1a1a
+  #define stbir__simdfX_convert_float_to_i32 stbir__simdf_convert_float_to_i32
+  #define stbir__simdfX_pack_to_words stbir__simdf_pack_to_8words
+  #define stbir__simdfX_zero stbir__simdf_zero
+  #define STBIR_onesX STBIR__CONSTF(STBIR_ones)
+  #define STBIR_simd_point5X STBIR__CONSTF(STBIR_simd_point5)
+  #define STBIR_max_uint8_as_floatX STBIR__CONSTF(STBIR_max_uint8_as_float)
+  #define STBIR_max_uint16_as_floatX STBIR__CONSTF(STBIR_max_uint16_as_float)
+  #define stbir__simdfX_float_count 4
+  #define stbir__if_simdf8_cast_to_simdf4( val ) ( val )
+  #define stbir__simdfX_0123to1230 stbir__simdf_0123to1230
+  #define stbir__simdfX_0123to2103 stbir__simdf_0123to2103
+#endif
+
+
+#if defined(STBIR_NEON) && !defined(_M_ARM) && !defined(__arm__)
+
+  #if defined( _MSC_VER ) && !defined(__clang__)
+  typedef __int16 stbir__FP16;
+  #else
+  typedef float16_t stbir__FP16;
+  #endif
+
+#else // no NEON, or 32-bit ARM for MSVC
+
+  typedef union stbir__FP16
+  {
+    unsigned short u;
+  } stbir__FP16;
+
+#endif
+
+#if (!defined(STBIR_NEON) && !defined(STBIR_FP16C)) || (defined(STBIR_NEON) && defined(_M_ARM)) || (defined(STBIR_NEON) && defined(__arm__))
+
+  // Fabian's half float routines, see: https://gist.github.com/rygorous/2156668
+
+  static stbir__inline float stbir__half_to_float( stbir__FP16 h )
+  {
+    static const stbir__FP32 magic = { (254 - 15) << 23 };
+    static const stbir__FP32 was_infnan = { (127 + 16) << 23 };
+    stbir__FP32 o;
+
+    o.u = (h.u & 0x7fff) << 13;     // exponent/mantissa bits
+    o.f *= magic.f;                 // exponent adjust
+    if (o.f >= was_infnan.f)        // make sure Inf/NaN survive
+      o.u |= 255 << 23;
+    o.u |= (h.u & 0x8000) << 16;    // sign bit
+    return o.f;
+  }
+
+  static stbir__inline stbir__FP16 stbir__float_to_half(float val)
+  {
+    stbir__FP32 f32infty = { 255 << 23 };
+    stbir__FP32 f16max   = { (127 + 16) << 23 };
+    stbir__FP32 denorm_magic = { ((127 - 15) + (23 - 10) + 1) << 23 };
+    unsigned int sign_mask = 0x80000000u;
+    stbir__FP16 o = { 0 };
+    stbir__FP32 f;
+    unsigned int sign;
+
+    f.f = val;
+    sign = f.u & sign_mask;
+    f.u ^= sign;
+
+    if (f.u >= f16max.u) // result is Inf or NaN (all exponent bits set)
+      o.u = (f.u > f32infty.u) ? 0x7e00 : 0x7c00; // NaN->qNaN and Inf->Inf
+    else // (De)normalized number or zero
+    {
+      if (f.u < (113 << 23)) // resulting FP16 is subnormal or zero
+      {
+        // use a magic value to align our 10 mantissa bits at the bottom of
+        // the float. as long as FP addition is round-to-nearest-even this
+        // just works.
+        f.f += denorm_magic.f;
+        // and one integer subtract of the bias later, we have our final float!
+        o.u = (unsigned short) ( f.u - denorm_magic.u );
+      }
+      else
+      {
+        unsigned int mant_odd = (f.u >> 13) & 1; // resulting mantissa is odd
+        // update exponent, rounding bias part 1
+        f.u = f.u + ((15u - 127) << 23) + 0xfff;
+        // rounding bias part 2
+        f.u += mant_odd;
+        // take the bits!
+        o.u = (unsigned short) ( f.u >> 13 );
+      }
+    }
+
+    o.u |= sign >> 16;
+    return o;
+  }
+
+#endif
+
+
+#if defined(STBIR_FP16C)
+
+  #include <immintrin.h>
+
+  static stbir__inline void stbir__half_to_float_SIMD(float * output, stbir__FP16 const * input)
+  {
+    _mm256_storeu_ps( (float*)output, _mm256_cvtph_ps( _mm_loadu_si128( (__m128i const* )input ) ) );
+  }
+
+  static stbir__inline void stbir__float_to_half_SIMD(stbir__FP16 * output, float const * input)
+  {
+    _mm_storeu_si128( (__m128i*)output, _mm256_cvtps_ph( _mm256_loadu_ps( input ), 0 ) );
+  }
+
+  static stbir__inline float stbir__half_to_float( stbir__FP16 h )
+  {
+    return _mm_cvtss_f32( _mm_cvtph_ps( _mm_cvtsi32_si128( (int)h.u ) ) );
+  }
+
+  static stbir__inline stbir__FP16 stbir__float_to_half( float f )
+  {
+    stbir__FP16 h;
+    h.u = (unsigned short) _mm_cvtsi128_si32( _mm_cvtps_ph( _mm_set_ss( f ), 0 ) );
+    return h;
+  }
+
+#elif defined(STBIR_SSE2)
+
+  // Fabian's half float routines, see: https://gist.github.com/rygorous/2156668
+  stbir__inline static void stbir__half_to_float_SIMD(float * output, void const * input)
+  {
+    static const STBIR__SIMDI_CONST(mask_nosign,      0x7fff);
+    static const STBIR__SIMDI_CONST(smallest_normal,  0x0400);
+    static const STBIR__SIMDI_CONST(infinity,         0x7c00);
+    static const STBIR__SIMDI_CONST(expadjust_normal, (127 - 15) << 23);
+    static const STBIR__SIMDI_CONST(magic_denorm,     113 << 23);
+
+    __m128i i = _mm_loadu_si128 ( (__m128i const*)(input) );
+    __m128i h = _mm_unpacklo_epi16 ( i, _mm_setzero_si128() );
+    __m128i mnosign     = STBIR__CONSTI(mask_nosign);
+    __m128i eadjust     = STBIR__CONSTI(expadjust_normal);
+    __m128i smallest    = STBIR__CONSTI(smallest_normal);
+    __m128i infty       = STBIR__CONSTI(infinity);
+    __m128i expmant     = _mm_and_si128(mnosign, h);
+    __m128i justsign    = _mm_xor_si128(h, expmant);
+    __m128i b_notinfnan = _mm_cmpgt_epi32(infty, expmant);
+    __m128i b_isdenorm  = _mm_cmpgt_epi32(smallest, expmant);
+    __m128i shifted     = _mm_slli_epi32(expmant, 13);
+    __m128i adj_infnan  = _mm_andnot_si128(b_notinfnan, eadjust);
+    __m128i adjusted    = _mm_add_epi32(eadjust, shifted);
+    __m128i den1        = _mm_add_epi32(shifted, STBIR__CONSTI(magic_denorm));
+    __m128i adjusted2   = _mm_add_epi32(adjusted, adj_infnan);
+    __m128  den2        = _mm_sub_ps(_mm_castsi128_ps(den1), *(const __m128 *)&magic_denorm);
+    __m128  adjusted3   = _mm_and_ps(den2, _mm_castsi128_ps(b_isdenorm));
+    __m128  adjusted4   = _mm_andnot_ps(_mm_castsi128_ps(b_isdenorm), _mm_castsi128_ps(adjusted2));
+    __m128  adjusted5   = _mm_or_ps(adjusted3, adjusted4);
+    __m128i sign        = _mm_slli_epi32(justsign, 16);
+    __m128  final       = _mm_or_ps(adjusted5, _mm_castsi128_ps(sign));
+    stbir__simdf_store( output + 0,  final );
+
+    h = _mm_unpackhi_epi16 ( i, _mm_setzero_si128() );
+    expmant     = _mm_and_si128(mnosign, h);
+    justsign    = _mm_xor_si128(h, expmant);
+    b_notinfnan = _mm_cmpgt_epi32(infty, expmant);
+    b_isdenorm  = _mm_cmpgt_epi32(smallest, expmant);
+    shifted     = _mm_slli_epi32(expmant, 13);
+    adj_infnan  = _mm_andnot_si128(b_notinfnan, eadjust);
+    adjusted    = _mm_add_epi32(eadjust, shifted);
+    den1        = _mm_add_epi32(shifted, STBIR__CONSTI(magic_denorm));
+    adjusted2   = _mm_add_epi32(adjusted, adj_infnan);
+    den2        = _mm_sub_ps(_mm_castsi128_ps(den1), *(const __m128 *)&magic_denorm);
+    adjusted3   = _mm_and_ps(den2, _mm_castsi128_ps(b_isdenorm));
+    adjusted4   = _mm_andnot_ps(_mm_castsi128_ps(b_isdenorm), _mm_castsi128_ps(adjusted2));
+    adjusted5   = _mm_or_ps(adjusted3, adjusted4);
+    sign        = _mm_slli_epi32(justsign, 16);
+    final       = _mm_or_ps(adjusted5, _mm_castsi128_ps(sign));
+    stbir__simdf_store( output + 4,  final );
+
+    // ~38 SSE2 ops for 8 values
+  }
+
+  // Fabian's round-to-nearest-even float to half
+  // ~48 SSE2 ops for 8 output
+  stbir__inline static void stbir__float_to_half_SIMD(void * output, float const * input)
+  {
+    static const STBIR__SIMDI_CONST(mask_sign,      0x80000000u);
+    static const STBIR__SIMDI_CONST(c_f16max,       (127 + 16) << 23); // all FP32 values >=this round to +inf
+    static const STBIR__SIMDI_CONST(c_nanbit,        0x200);
+    static const STBIR__SIMDI_CONST(c_infty_as_fp16, 0x7c00);
+    static const STBIR__SIMDI_CONST(c_min_normal,    (127 - 14) << 23); // smallest FP32 that yields a normalized FP16
+    static const STBIR__SIMDI_CONST(c_subnorm_magic, ((127 - 15) + (23 - 10) + 1) << 23);
+    static const STBIR__SIMDI_CONST(c_normal_bias,    0xfff - ((127 - 15) << 23)); // adjust exponent and add mantissa rounding
+
+    __m128  f           =  _mm_loadu_ps(input);
+    __m128  msign       = _mm_castsi128_ps(STBIR__CONSTI(mask_sign));
+    __m128  justsign    = _mm_and_ps(msign, f);
+    __m128  absf        = _mm_xor_ps(f, justsign);
+    __m128i absf_int    = _mm_castps_si128(absf); // the cast is "free" (extra bypass latency, but no thruput hit)
+    __m128i f16max      = STBIR__CONSTI(c_f16max);
+    __m128  b_isnan     = _mm_cmpunord_ps(absf, absf); // is this a NaN?
+    __m128i b_isregular = _mm_cmpgt_epi32(f16max, absf_int); // (sub)normalized or special?
+    __m128i nanbit      = _mm_and_si128(_mm_castps_si128(b_isnan), STBIR__CONSTI(c_nanbit));
+    __m128i inf_or_nan  = _mm_or_si128(nanbit, STBIR__CONSTI(c_infty_as_fp16)); // output for specials
+
+    __m128i min_normal  = STBIR__CONSTI(c_min_normal);
+    __m128i b_issub     = _mm_cmpgt_epi32(min_normal, absf_int);
+
+    // "result is subnormal" path
+    __m128  subnorm1    = _mm_add_ps(absf, _mm_castsi128_ps(STBIR__CONSTI(c_subnorm_magic))); // magic value to round output mantissa
+    __m128i subnorm2    = _mm_sub_epi32(_mm_castps_si128(subnorm1), STBIR__CONSTI(c_subnorm_magic)); // subtract out bias
+
+    // "result is normal" path
+    __m128i mantoddbit  = _mm_slli_epi32(absf_int, 31 - 13); // shift bit 13 (mantissa LSB) to sign
+    __m128i mantodd     = _mm_srai_epi32(mantoddbit, 31); // -1 if FP16 mantissa odd, else 0
+
+    __m128i round1      = _mm_add_epi32(absf_int, STBIR__CONSTI(c_normal_bias));
+    __m128i round2      = _mm_sub_epi32(round1, mantodd); // if mantissa LSB odd, bias towards rounding up (RTNE)
+    __m128i normal      = _mm_srli_epi32(round2, 13); // rounded result
+
+    // combine the two non-specials
+    __m128i nonspecial  = _mm_or_si128(_mm_and_si128(subnorm2, b_issub), _mm_andnot_si128(b_issub, normal));
+
+    // merge in specials as well
+    __m128i joined      = _mm_or_si128(_mm_and_si128(nonspecial, b_isregular), _mm_andnot_si128(b_isregular, inf_or_nan));
+
+    __m128i sign_shift  = _mm_srai_epi32(_mm_castps_si128(justsign), 16);
+    __m128i final2, final= _mm_or_si128(joined, sign_shift);
+
+    f           =  _mm_loadu_ps(input+4);
+    justsign    = _mm_and_ps(msign, f);
+    absf        = _mm_xor_ps(f, justsign);
+    absf_int    = _mm_castps_si128(absf); // the cast is "free" (extra bypass latency, but no thruput hit)
+    b_isnan     = _mm_cmpunord_ps(absf, absf); // is this a NaN?
+    b_isregular = _mm_cmpgt_epi32(f16max, absf_int); // (sub)normalized or special?
+    nanbit      = _mm_and_si128(_mm_castps_si128(b_isnan), c_nanbit);
+    inf_or_nan  = _mm_or_si128(nanbit, STBIR__CONSTI(c_infty_as_fp16)); // output for specials
+
+    b_issub     = _mm_cmpgt_epi32(min_normal, absf_int);
+
+    // "result is subnormal" path
+    subnorm1    = _mm_add_ps(absf, _mm_castsi128_ps(STBIR__CONSTI(c_subnorm_magic))); // magic value to round output mantissa
+    subnorm2    = _mm_sub_epi32(_mm_castps_si128(subnorm1), STBIR__CONSTI(c_subnorm_magic)); // subtract out bias
+
+    // "result is normal" path
+    mantoddbit  = _mm_slli_epi32(absf_int, 31 - 13); // shift bit 13 (mantissa LSB) to sign
+    mantodd     = _mm_srai_epi32(mantoddbit, 31); // -1 if FP16 mantissa odd, else 0
+
+    round1      = _mm_add_epi32(absf_int, STBIR__CONSTI(c_normal_bias));
+    round2      = _mm_sub_epi32(round1, mantodd); // if mantissa LSB odd, bias towards rounding up (RTNE)
+    normal      = _mm_srli_epi32(round2, 13); // rounded result
+
+    // combine the two non-specials
+    nonspecial  = _mm_or_si128(_mm_and_si128(subnorm2, b_issub), _mm_andnot_si128(b_issub, normal));
+
+    // merge in specials as well
+    joined      = _mm_or_si128(_mm_and_si128(nonspecial, b_isregular), _mm_andnot_si128(b_isregular, inf_or_nan));
+
+    sign_shift  = _mm_srai_epi32(_mm_castps_si128(justsign), 16);
+    final2      = _mm_or_si128(joined, sign_shift);
+    final       = _mm_packs_epi32(final, final2);
+    stbir__simdi_store( output,final );
+  }
+
+#elif defined(STBIR_NEON) && defined(_MSC_VER) && defined(_M_ARM64) && !defined(__clang__) // 64-bit ARM on MSVC (not clang)
+
+  static stbir__inline void stbir__half_to_float_SIMD(float * output, stbir__FP16 const * input)
+  {
+    float16x4_t in0 = vld1_f16(input + 0);
+    float16x4_t in1 = vld1_f16(input + 4);
+    vst1q_f32(output + 0, vcvt_f32_f16(in0));
+    vst1q_f32(output + 4, vcvt_f32_f16(in1));
+  }
+
+  static stbir__inline void stbir__float_to_half_SIMD(stbir__FP16 * output, float const * input)
+  {
+    float16x4_t out0 = vcvt_f16_f32(vld1q_f32(input + 0));
+    float16x4_t out1 = vcvt_f16_f32(vld1q_f32(input + 4));
+    vst1_f16(output+0, out0);
+    vst1_f16(output+4, out1);
+  }
+
+  static stbir__inline float stbir__half_to_float( stbir__FP16 h )
+  {
+    return vgetq_lane_f32(vcvt_f32_f16(vld1_dup_f16(&h)), 0);
+  }
+
+  static stbir__inline stbir__FP16 stbir__float_to_half( float f )
+  {
+    return vget_lane_f16(vcvt_f16_f32(vdupq_n_f32(f)), 0).n16_u16[0];
+  }
+
+#elif defined(STBIR_NEON) && ( defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ ) ) // 64-bit ARM
+
+  static stbir__inline void stbir__half_to_float_SIMD(float * output, stbir__FP16 const * input)
+  {
+    float16x8_t in = vld1q_f16(input);
+    vst1q_f32(output + 0, vcvt_f32_f16(vget_low_f16(in)));
+    vst1q_f32(output + 4, vcvt_f32_f16(vget_high_f16(in)));
+  }
+
+  static stbir__inline void stbir__float_to_half_SIMD(stbir__FP16 * output, float const * input)
+  {
+    float16x4_t out0 = vcvt_f16_f32(vld1q_f32(input + 0));
+    float16x4_t out1 = vcvt_f16_f32(vld1q_f32(input + 4));
+    vst1q_f16(output, vcombine_f16(out0, out1));
+  }
+
+  static stbir__inline float stbir__half_to_float( stbir__FP16 h )
+  {
+    return vgetq_lane_f32(vcvt_f32_f16(vdup_n_f16(h)), 0);
+  }
+
+  static stbir__inline stbir__FP16 stbir__float_to_half( float f )
+  {
+    return vget_lane_f16(vcvt_f16_f32(vdupq_n_f32(f)), 0);
+  }
+
+#elif defined(STBIR_WASM) || (defined(STBIR_NEON) && (defined(_MSC_VER) || defined(_M_ARM) || defined(__arm__))) // WASM or 32-bit ARM on MSVC/clang
+
+  static stbir__inline void stbir__half_to_float_SIMD(float * output, stbir__FP16 const * input)
+  {
+    for (int i=0; i<8; i++)
+    {
+      output[i] = stbir__half_to_float(input[i]);
+    }
+  }
+  static stbir__inline void stbir__float_to_half_SIMD(stbir__FP16 * output, float const * input)
+  {
+    for (int i=0; i<8; i++)
+    {
+      output[i] = stbir__float_to_half(input[i]);
+    }
+  }
+
+#endif
+
+
+#ifdef STBIR_SIMD
+
+#define stbir__simdf_0123to3333( out, reg ) (out) = stbir__simdf_swiz( reg, 3,3,3,3 )
+#define stbir__simdf_0123to2222( out, reg ) (out) = stbir__simdf_swiz( reg, 2,2,2,2 )
+#define stbir__simdf_0123to1111( out, reg ) (out) = stbir__simdf_swiz( reg, 1,1,1,1 )
+#define stbir__simdf_0123to0000( out, reg ) (out) = stbir__simdf_swiz( reg, 0,0,0,0 )
+#define stbir__simdf_0123to0003( out, reg ) (out) = stbir__simdf_swiz( reg, 0,0,0,3 )
+#define stbir__simdf_0123to0001( out, reg ) (out) = stbir__simdf_swiz( reg, 0,0,0,1 )
+#define stbir__simdf_0123to1122( out, reg ) (out) = stbir__simdf_swiz( reg, 1,1,2,2 )
+#define stbir__simdf_0123to2333( out, reg ) (out) = stbir__simdf_swiz( reg, 2,3,3,3 )
+#define stbir__simdf_0123to0023( out, reg ) (out) = stbir__simdf_swiz( reg, 0,0,2,3 )
+#define stbir__simdf_0123to1230( out, reg ) (out) = stbir__simdf_swiz( reg, 1,2,3,0 )
+#define stbir__simdf_0123to2103( out, reg ) (out) = stbir__simdf_swiz( reg, 2,1,0,3 )
+#define stbir__simdf_0123to3210( out, reg ) (out) = stbir__simdf_swiz( reg, 3,2,1,0 )
+#define stbir__simdf_0123to2301( out, reg ) (out) = stbir__simdf_swiz( reg, 2,3,0,1 )
+#define stbir__simdf_0123to3012( out, reg ) (out) = stbir__simdf_swiz( reg, 3,0,1,2 )
+#define stbir__simdf_0123to0011( out, reg ) (out) = stbir__simdf_swiz( reg, 0,0,1,1 )
+#define stbir__simdf_0123to1100( out, reg ) (out) = stbir__simdf_swiz( reg, 1,1,0,0 )
+#define stbir__simdf_0123to2233( out, reg ) (out) = stbir__simdf_swiz( reg, 2,2,3,3 )
+#define stbir__simdf_0123to1133( out, reg ) (out) = stbir__simdf_swiz( reg, 1,1,3,3 )
+#define stbir__simdf_0123to0022( out, reg ) (out) = stbir__simdf_swiz( reg, 0,0,2,2 )
+#define stbir__simdf_0123to1032( out, reg ) (out) = stbir__simdf_swiz( reg, 1,0,3,2 )
+
+typedef union stbir__simdi_u32
+{
+  stbir_uint32 m128i_u32[4];
+  int m128i_i32[4];
+  stbir__simdi m128i_i128;
+} stbir__simdi_u32;
+
+static const int STBIR_mask[9] = { 0,0,0,-1,-1,-1,0,0,0 };
+
+static const STBIR__SIMDF_CONST(STBIR_max_uint8_as_float,           stbir__max_uint8_as_float);
+static const STBIR__SIMDF_CONST(STBIR_max_uint16_as_float,          stbir__max_uint16_as_float);
+static const STBIR__SIMDF_CONST(STBIR_max_uint8_as_float_inverted,  stbir__max_uint8_as_float_inverted);
+static const STBIR__SIMDF_CONST(STBIR_max_uint16_as_float_inverted, stbir__max_uint16_as_float_inverted);
+
+static const STBIR__SIMDF_CONST(STBIR_simd_point5,   0.5f);
+static const STBIR__SIMDF_CONST(STBIR_ones,          1.0f);
+static const STBIR__SIMDI_CONST(STBIR_almost_zero,   (127 - 13) << 23);
+static const STBIR__SIMDI_CONST(STBIR_almost_one,    0x3f7fffff);
+static const STBIR__SIMDI_CONST(STBIR_mastissa_mask, 0xff);
+static const STBIR__SIMDI_CONST(STBIR_topscale,      0x02000000);
+
+//   Basically, in simd mode, we unroll the proper amount, and we don't want
+//   the non-simd remnant loops to be unroll because they only run a few times
+//   Adding this switch saves about 5K on clang which is Captain Unroll the 3rd.
+#define STBIR_SIMD_STREAMOUT_PTR( star )  STBIR_STREAMOUT_PTR( star )
+#define STBIR_SIMD_NO_UNROLL(ptr) STBIR_NO_UNROLL(ptr)
+#define STBIR_SIMD_NO_UNROLL_LOOP_START STBIR_NO_UNROLL_LOOP_START
+#define STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR STBIR_NO_UNROLL_LOOP_START_INF_FOR
+
+#ifdef STBIR_MEMCPY
+#undef STBIR_MEMCPY
+#endif
+#define STBIR_MEMCPY stbir_simd_memcpy
+
+// override normal use of memcpy with much simpler copy (faster and smaller with our sized copies)
+static void stbir_simd_memcpy( void * dest, void const * src, size_t bytes )
+{
+  char STBIR_SIMD_STREAMOUT_PTR (*) d = (char*) dest;
+  char STBIR_SIMD_STREAMOUT_PTR( * ) d_end = ((char*) dest) + bytes;
+  ptrdiff_t ofs_to_src = (char*)src - (char*)dest;
+
+  // check overlaps
+  STBIR_ASSERT( ( ( d >= ( (char*)src) + bytes ) ) || ( ( d + bytes ) <= (char*)src ) );
+
+  if ( bytes < (16*stbir__simdfX_float_count) )
+  {
+    if ( bytes < 16 )
+    {
+      if ( bytes )
+      {
+        STBIR_SIMD_NO_UNROLL_LOOP_START
+        do
+        {
+          STBIR_SIMD_NO_UNROLL(d);
+          d[ 0 ] = d[ ofs_to_src ];
+          ++d;
+        } while ( d < d_end );
+      }
+    }
+    else
+    {
+      stbir__simdf x;
+      // do one unaligned to get us aligned for the stream out below
+      stbir__simdf_load( x, ( d + ofs_to_src ) );
+      stbir__simdf_store( d, x );
+      d = (char*)( ( ( (size_t)d ) + 16 ) & ~15 );
+
+      STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+      for(;;)
+      {
+        STBIR_SIMD_NO_UNROLL(d);
+
+        if ( d > ( d_end - 16 ) )
+        {
+          if ( d == d_end )
+            return;
+          d = d_end - 16;
+        }
+
+        stbir__simdf_load( x, ( d + ofs_to_src ) );
+        stbir__simdf_store( d, x );
+        d += 16;
+      }
+    }
+  }
+  else
+  {
+    stbir__simdfX x0,x1,x2,x3;
+
+    // do one unaligned to get us aligned for the stream out below
+    stbir__simdfX_load( x0, ( d + ofs_to_src ) +  0*stbir__simdfX_float_count );
+    stbir__simdfX_load( x1, ( d + ofs_to_src ) +  4*stbir__simdfX_float_count );
+    stbir__simdfX_load( x2, ( d + ofs_to_src ) +  8*stbir__simdfX_float_count );
+    stbir__simdfX_load( x3, ( d + ofs_to_src ) + 12*stbir__simdfX_float_count );
+    stbir__simdfX_store( d +  0*stbir__simdfX_float_count, x0 );
+    stbir__simdfX_store( d +  4*stbir__simdfX_float_count, x1 );
+    stbir__simdfX_store( d +  8*stbir__simdfX_float_count, x2 );
+    stbir__simdfX_store( d + 12*stbir__simdfX_float_count, x3 );
+    d = (char*)( ( ( (size_t)d ) + (16*stbir__simdfX_float_count) ) & ~((16*stbir__simdfX_float_count)-1) );
+
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      STBIR_SIMD_NO_UNROLL(d);
+
+      if ( d > ( d_end - (16*stbir__simdfX_float_count) ) )
+      {
+        if ( d == d_end )
+          return;
+        d = d_end - (16*stbir__simdfX_float_count);
+      }
+
+      stbir__simdfX_load( x0, ( d + ofs_to_src ) +  0*stbir__simdfX_float_count );
+      stbir__simdfX_load( x1, ( d + ofs_to_src ) +  4*stbir__simdfX_float_count );
+      stbir__simdfX_load( x2, ( d + ofs_to_src ) +  8*stbir__simdfX_float_count );
+      stbir__simdfX_load( x3, ( d + ofs_to_src ) + 12*stbir__simdfX_float_count );
+      stbir__simdfX_store( d +  0*stbir__simdfX_float_count, x0 );
+      stbir__simdfX_store( d +  4*stbir__simdfX_float_count, x1 );
+      stbir__simdfX_store( d +  8*stbir__simdfX_float_count, x2 );
+      stbir__simdfX_store( d + 12*stbir__simdfX_float_count, x3 );
+      d += (16*stbir__simdfX_float_count);
+    }
+  }
+}
+
+// memcpy that is specically intentionally overlapping (src is smaller then dest, so can be
+//   a normal forward copy, bytes is divisible by 4 and bytes is greater than or equal to
+//   the diff between dest and src)
+static void stbir_overlapping_memcpy( void * dest, void const * src, size_t bytes )
+{
+  char STBIR_SIMD_STREAMOUT_PTR (*) sd = (char*) src;
+  char STBIR_SIMD_STREAMOUT_PTR( * ) s_end = ((char*) src) + bytes;
+  ptrdiff_t ofs_to_dest = (char*)dest - (char*)src;
+
+  if ( ofs_to_dest >= 16 ) // is the overlap more than 16 away?
+  {
+    char STBIR_SIMD_STREAMOUT_PTR( * ) s_end16 = ((char*) src) + (bytes&~15);
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    do
+    {
+      stbir__simdf x;
+      STBIR_SIMD_NO_UNROLL(sd);
+      stbir__simdf_load( x, sd );
+      stbir__simdf_store(  ( sd + ofs_to_dest ), x );
+      sd += 16;
+    } while ( sd < s_end16 );
+
+    if ( sd == s_end )
+      return;
+  }
+
+  do
+  {
+    STBIR_SIMD_NO_UNROLL(sd);
+    *(int*)( sd + ofs_to_dest ) = *(int*) sd;
+    sd += 4;
+  } while ( sd < s_end );
+}
+
+#else // no SSE2
+
+// when in scalar mode, we let unrolling happen, so this macro just does the __restrict
+#define STBIR_SIMD_STREAMOUT_PTR( star ) STBIR_STREAMOUT_PTR( star )
+#define STBIR_SIMD_NO_UNROLL(ptr)
+#define STBIR_SIMD_NO_UNROLL_LOOP_START
+#define STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+
+#endif // SSE2
+
+
+#ifdef STBIR_PROFILE
+
+#ifndef STBIR_PROFILE_FUNC
+
+#if defined(_x86_64) || defined( __x86_64__ ) || defined( _M_X64 ) || defined(__x86_64) || defined(__SSE2__) || defined(STBIR_SSE) || defined( _M_IX86_FP ) || defined(__i386) || defined( __i386__ ) || defined( _M_IX86 ) || defined( _X86_ )
+
+#ifdef _MSC_VER
+
+  STBIRDEF stbir_uint64 __rdtsc();
+  #define STBIR_PROFILE_FUNC() __rdtsc()
+
+#else // non msvc
+
+  static stbir__inline stbir_uint64 STBIR_PROFILE_FUNC()
+  {
+    stbir_uint32 lo, hi;
+    asm volatile ("rdtsc" : "=a" (lo), "=d" (hi) );
+    return ( ( (stbir_uint64) hi ) << 32 ) | ( (stbir_uint64) lo );
+  }
+
+#endif  // msvc
+
+#elif defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ ) || defined(__ARM_NEON__)
+
+#if defined( _MSC_VER ) && !defined(__clang__)
+
+  #define STBIR_PROFILE_FUNC() _ReadStatusReg(ARM64_CNTVCT)
+
+#else
+
+  static stbir__inline stbir_uint64 STBIR_PROFILE_FUNC()
+  {
+    stbir_uint64 tsc;
+    asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
+    return tsc;
+  }
+
+#endif
+
+#else // x64, arm
+
+#error Unknown platform for profiling.
+
+#endif  // x64, arm
+
+#endif // STBIR_PROFILE_FUNC
+
+#define STBIR_ONLY_PROFILE_GET_SPLIT_INFO ,stbir__per_split_info * split_info
+#define STBIR_ONLY_PROFILE_SET_SPLIT_INFO ,split_info
+
+#define STBIR_ONLY_PROFILE_BUILD_GET_INFO ,stbir__info * profile_info
+#define STBIR_ONLY_PROFILE_BUILD_SET_INFO ,profile_info
+
+// super light-weight micro profiler
+#define STBIR_PROFILE_START_ll( info, wh ) { stbir_uint64 wh##thiszonetime = STBIR_PROFILE_FUNC(); stbir_uint64 * wh##save_parent_excluded_ptr = info->current_zone_excluded_ptr; stbir_uint64 wh##current_zone_excluded = 0; info->current_zone_excluded_ptr = &wh##current_zone_excluded;
+#define STBIR_PROFILE_END_ll( info, wh ) wh##thiszonetime = STBIR_PROFILE_FUNC() - wh##thiszonetime; info->profile.named.wh += wh##thiszonetime - wh##current_zone_excluded; *wh##save_parent_excluded_ptr += wh##thiszonetime; info->current_zone_excluded_ptr = wh##save_parent_excluded_ptr; }
+#define STBIR_PROFILE_FIRST_START_ll( info, wh ) { int i; info->current_zone_excluded_ptr = &info->profile.named.total; for(i=0;i<STBIR__ARRAY_SIZE(info->profile.array);i++) info->profile.array[i]=0; } STBIR_PROFILE_START_ll( info, wh );
+#define STBIR_PROFILE_CLEAR_EXTRAS_ll( info, num ) { int extra; for(extra=1;extra<(num);extra++) { int i; for(i=0;i<STBIR__ARRAY_SIZE((info)->profile.array);i++) (info)[extra].profile.array[i]=0; } }
+
+// for thread data
+#define STBIR_PROFILE_START( wh ) STBIR_PROFILE_START_ll( split_info, wh )
+#define STBIR_PROFILE_END( wh ) STBIR_PROFILE_END_ll( split_info, wh )
+#define STBIR_PROFILE_FIRST_START( wh ) STBIR_PROFILE_FIRST_START_ll( split_info, wh )
+#define STBIR_PROFILE_CLEAR_EXTRAS() STBIR_PROFILE_CLEAR_EXTRAS_ll( split_info, split_count )
+
+// for build data
+#define STBIR_PROFILE_BUILD_START( wh ) STBIR_PROFILE_START_ll( profile_info, wh )
+#define STBIR_PROFILE_BUILD_END( wh ) STBIR_PROFILE_END_ll( profile_info, wh )
+#define STBIR_PROFILE_BUILD_FIRST_START( wh ) STBIR_PROFILE_FIRST_START_ll( profile_info, wh )
+#define STBIR_PROFILE_BUILD_CLEAR( info ) { int i; for(i=0;i<STBIR__ARRAY_SIZE(info->profile.array);i++) info->profile.array[i]=0; }
+
+#else  // no profile
+
+#define STBIR_ONLY_PROFILE_GET_SPLIT_INFO
+#define STBIR_ONLY_PROFILE_SET_SPLIT_INFO
+
+#define STBIR_ONLY_PROFILE_BUILD_GET_INFO
+#define STBIR_ONLY_PROFILE_BUILD_SET_INFO
+
+#define STBIR_PROFILE_START( wh )
+#define STBIR_PROFILE_END( wh )
+#define STBIR_PROFILE_FIRST_START( wh )
+#define STBIR_PROFILE_CLEAR_EXTRAS( )
+
+#define STBIR_PROFILE_BUILD_START( wh )
+#define STBIR_PROFILE_BUILD_END( wh )
+#define STBIR_PROFILE_BUILD_FIRST_START( wh )
+#define STBIR_PROFILE_BUILD_CLEAR( info )
+
+#endif  // stbir_profile
+
+#ifndef STBIR_CEILF
+#include <math.h>
+#if _MSC_VER <= 1200 // support VC6 for Sean
+#define STBIR_CEILF(x) ((float)ceil((float)(x)))
+#define STBIR_FLOORF(x) ((float)floor((float)(x)))
+#else
+#define STBIR_CEILF(x) ceilf(x)
+#define STBIR_FLOORF(x) floorf(x)
+#endif
+#endif
+
+#ifndef STBIR_MEMCPY
+// For memcpy
+#include <string.h>
+#define STBIR_MEMCPY( dest, src, len ) memcpy( dest, src, len )
+#endif
+
+#ifndef STBIR_SIMD
+
+// memcpy that is specifically intentionally overlapping (src is smaller then dest, so can be
+//   a normal forward copy, bytes is divisible by 4 and bytes is greater than or equal to
+//   the diff between dest and src)
+static void stbir_overlapping_memcpy( void * dest, void const * src, size_t bytes )
+{
+  char STBIR_SIMD_STREAMOUT_PTR (*) sd = (char*) src;
+  char STBIR_SIMD_STREAMOUT_PTR( * ) s_end = ((char*) src) + bytes;
+  ptrdiff_t ofs_to_dest = (char*)dest - (char*)src;
+
+  if ( ofs_to_dest >= 8 ) // is the overlap more than 8 away?
+  {
+    char STBIR_SIMD_STREAMOUT_PTR( * ) s_end8 = ((char*) src) + (bytes&~7);
+    STBIR_NO_UNROLL_LOOP_START
+    do
+    {
+      STBIR_NO_UNROLL(sd);
+      *(stbir_uint64*)( sd + ofs_to_dest ) = *(stbir_uint64*) sd;
+      sd += 8;
+    } while ( sd < s_end8 );
+
+    if ( sd == s_end )
+      return;
+  }
+
+  STBIR_NO_UNROLL_LOOP_START
+  do
+  {
+    STBIR_NO_UNROLL(sd);
+    *(int*)( sd + ofs_to_dest ) = *(int*) sd;
+    sd += 4;
+  } while ( sd < s_end );
+}
+
+#endif
+
+static float stbir__filter_trapezoid(float x, float scale, void * user_data)
+{
+  float halfscale = scale / 2;
+  float t = 0.5f + halfscale;
+  STBIR_ASSERT(scale <= 1);
+  STBIR__UNUSED(user_data);
+
+  if ( x < 0.0f ) x = -x;
+
+  if (x >= t)
+    return 0.0f;
+  else
+  {
+    float r = 0.5f - halfscale;
+    if (x <= r)
+      return 1.0f;
+    else
+      return (t - x) / scale;
+  }
+}
+
+static float stbir__support_trapezoid(float scale, void * user_data)
+{
+  STBIR__UNUSED(user_data);
+  return 0.5f + scale / 2.0f;
+}
+
+static float stbir__filter_triangle(float x, float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+
+  if ( x < 0.0f ) x = -x;
+
+  if (x <= 1.0f)
+    return 1.0f - x;
+  else
+    return 0.0f;
+}
+
+static float stbir__filter_point(float x, float s, void * user_data)
+{
+  STBIR__UNUSED(x);
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+
+  return 1.0f;
+}
+
+static float stbir__filter_cubic(float x, float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+
+  if ( x < 0.0f ) x = -x;
+
+  if (x < 1.0f)
+    return (4.0f + x*x*(3.0f*x - 6.0f))/6.0f;
+  else if (x < 2.0f)
+    return (8.0f + x*(-12.0f + x*(6.0f - x)))/6.0f;
+
+  return (0.0f);
+}
+
+static float stbir__filter_catmullrom(float x, float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+
+  if ( x < 0.0f ) x = -x;
+
+  if (x < 1.0f)
+    return 1.0f - x*x*(2.5f - 1.5f*x);
+  else if (x < 2.0f)
+    return 2.0f - x*(4.0f + x*(0.5f*x - 2.5f));
+
+  return (0.0f);
+}
+
+static float stbir__filter_mitchell(float x, float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+
+  if ( x < 0.0f ) x = -x;
+
+  if (x < 1.0f)
+    return (16.0f + x*x*(21.0f * x - 36.0f))/18.0f;
+  else if (x < 2.0f)
+    return (32.0f + x*(-60.0f + x*(36.0f - 7.0f*x)))/18.0f;
+
+  return (0.0f);
+}
+
+static float stbir__support_zeropoint5(float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+  return 0.5f;
+}
+
+static float stbir__support_one(float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+  return 1;
+}
+
+static float stbir__support_two(float s, void * user_data)
+{
+  STBIR__UNUSED(s);
+  STBIR__UNUSED(user_data);
+  return 2;
+}
+
+// This is the maximum number of input samples that can affect an output sample
+// with the given filter from the output pixel's perspective
+static int stbir__get_filter_pixel_width(stbir__support_callback * support, float scale, void * user_data)
+{
+  STBIR_ASSERT(support != 0);
+
+  if ( scale >= ( 1.0f-stbir__small_float ) ) // upscale
+    return (int)STBIR_CEILF(support(1.0f/scale,user_data) * 2.0f);
+  else
+    return (int)STBIR_CEILF(support(scale,user_data) * 2.0f / scale);
+}
+
+// this is how many coefficents per run of the filter (which is different
+//   from the filter_pixel_width depending on if we are scattering or gathering)
+static int stbir__get_coefficient_width(stbir__sampler * samp, int is_gather, void * user_data)
+{
+  float scale = samp->scale_info.scale;
+  stbir__support_callback * support = samp->filter_support;
+
+  switch( is_gather )
+  {
+    case 1:
+      return (int)STBIR_CEILF(support(1.0f / scale, user_data) * 2.0f);
+    case 2:
+      return (int)STBIR_CEILF(support(scale, user_data) * 2.0f / scale);
+    case 0:
+      return (int)STBIR_CEILF(support(scale, user_data) * 2.0f);
+    default:
+      STBIR_ASSERT( (is_gather >= 0 ) && (is_gather <= 2 ) );
+      return 0;
+  }
+}
+
+static int stbir__get_contributors(stbir__sampler * samp, int is_gather)
+{
+  if (is_gather)
+      return samp->scale_info.output_sub_size;
+  else
+      return (samp->scale_info.input_full_size + samp->filter_pixel_margin * 2);
+}
+
+static int stbir__edge_zero_full( int n, int max )
+{
+  STBIR__UNUSED(n);
+  STBIR__UNUSED(max);
+  return 0; // NOTREACHED
+}
+
+static int stbir__edge_clamp_full( int n, int max )
+{
+  if (n < 0)
+    return 0;
+
+  if (n >= max)
+    return max - 1;
+
+  return n; // NOTREACHED
+}
+
+static int stbir__edge_reflect_full( int n, int max )
+{
+  if (n < 0)
+  {
+    if (n > -max)
+      return -n;
+    else
+      return max - 1;
+  }
+
+  if (n >= max)
+  {
+    int max2 = max * 2;
+    if (n >= max2)
+      return 0;
+    else
+      return max2 - n - 1;
+  }
+
+  return n; // NOTREACHED
+}
+
+static int stbir__edge_wrap_full( int n, int max )
+{
+  if (n >= 0)
+    return (n % max);
+  else
+  {
+    int m = (-n) % max;
+
+    if (m != 0)
+      m = max - m;
+
+    return (m);
+  }
+}
+
+typedef int stbir__edge_wrap_func( int n, int max );
+static stbir__edge_wrap_func * stbir__edge_wrap_slow[] =
+{
+  stbir__edge_clamp_full,    // STBIR_EDGE_CLAMP
+  stbir__edge_reflect_full,  // STBIR_EDGE_REFLECT
+  stbir__edge_wrap_full,     // STBIR_EDGE_WRAP
+  stbir__edge_zero_full,     // STBIR_EDGE_ZERO
+};
+
+stbir__inline static int stbir__edge_wrap(stbir_edge edge, int n, int max)
+{
+  // avoid per-pixel switch
+  if (n >= 0 && n < max)
+      return n;
+  return stbir__edge_wrap_slow[edge]( n, max );
+}
+
+#define STBIR__MERGE_RUNS_PIXEL_THRESHOLD 16
+
+// get information on the extents of a sampler
+static void stbir__get_extents( stbir__sampler * samp, stbir__extents * scanline_extents )
+{
+  int j, stop;
+  int left_margin, right_margin;
+  int min_n = 0x7fffffff, max_n = -0x7fffffff;
+  int min_left = 0x7fffffff, max_left = -0x7fffffff;
+  int min_right = 0x7fffffff, max_right = -0x7fffffff;
+  stbir_edge edge = samp->edge;
+  stbir__contributors* contributors = samp->contributors;
+  int output_sub_size = samp->scale_info.output_sub_size;
+  int input_full_size = samp->scale_info.input_full_size;
+  int filter_pixel_margin = samp->filter_pixel_margin;
+
+  STBIR_ASSERT( samp->is_gather );
+
+  stop = output_sub_size;
+  for (j = 0; j < stop; j++ )
+  {
+    STBIR_ASSERT( contributors[j].n1 >= contributors[j].n0 );
+    if ( contributors[j].n0 < min_n )
+    {
+      min_n = contributors[j].n0;
+      stop = j + filter_pixel_margin;  // if we find a new min, only scan another filter width
+      if ( stop > output_sub_size ) stop = output_sub_size;
+    }
+  }
+
+  stop = 0;
+  for (j = output_sub_size - 1; j >= stop; j-- )
+  {
+    STBIR_ASSERT( contributors[j].n1 >= contributors[j].n0 );
+    if ( contributors[j].n1 > max_n )
+    {
+      max_n = contributors[j].n1;
+      stop = j - filter_pixel_margin;  // if we find a new max, only scan another filter width
+      if (stop<0) stop = 0;
+    }
+  }
+
+  STBIR_ASSERT( scanline_extents->conservative.n0 <= min_n );
+  STBIR_ASSERT( scanline_extents->conservative.n1 >= max_n );
+
+  // now calculate how much into the margins we really read
+  left_margin = 0;
+  if ( min_n < 0 )
+  {
+    left_margin = -min_n;
+    min_n = 0;
+  }
+
+  right_margin = 0;
+  if ( max_n >= input_full_size )
+  {
+    right_margin = max_n - input_full_size + 1;
+    max_n = input_full_size - 1;
+  }
+
+  // index 1 is margin pixel extents (how many pixels we hang over the edge)
+  scanline_extents->edge_sizes[0] = left_margin;
+  scanline_extents->edge_sizes[1] = right_margin;
+
+  // index 2 is pixels read from the input
+  scanline_extents->spans[0].n0 = min_n;
+  scanline_extents->spans[0].n1 = max_n;
+  scanline_extents->spans[0].pixel_offset_for_input = min_n;
+
+  // default to no other input range
+  scanline_extents->spans[1].n0 = 0;
+  scanline_extents->spans[1].n1 = -1;
+  scanline_extents->spans[1].pixel_offset_for_input = 0;
+
+  // don't have to do edge calc for zero clamp
+  if ( edge == STBIR_EDGE_ZERO )
+    return;
+
+  // convert margin pixels to the pixels within the input (min and max)
+  for( j = -left_margin ; j < 0 ; j++ )
+  {
+      int p = stbir__edge_wrap( edge, j, input_full_size );
+      if ( p < min_left )
+        min_left = p;
+      if ( p > max_left )
+        max_left = p;
+  }
+
+  for( j = input_full_size ; j < (input_full_size + right_margin) ; j++ )
+  {
+      int p = stbir__edge_wrap( edge, j, input_full_size );
+      if ( p < min_right )
+        min_right = p;
+      if ( p > max_right )
+        max_right = p;
+  }
+
+  // merge the left margin pixel region if it connects within 4 pixels of main pixel region
+  if ( min_left != 0x7fffffff )
+  {
+    if ( ( ( min_left <= min_n ) && ( ( max_left  + STBIR__MERGE_RUNS_PIXEL_THRESHOLD ) >= min_n ) ) ||
+         ( ( min_n <= min_left ) && ( ( max_n  + STBIR__MERGE_RUNS_PIXEL_THRESHOLD ) >= max_left ) ) )
+    {
+      scanline_extents->spans[0].n0 = min_n = stbir__min( min_n, min_left );
+      scanline_extents->spans[0].n1 = max_n = stbir__max( max_n, max_left );
+      scanline_extents->spans[0].pixel_offset_for_input = min_n;
+      left_margin = 0;
+    }
+  }
+
+  // merge the right margin pixel region if it connects within 4 pixels of main pixel region
+  if ( min_right != 0x7fffffff )
+  {
+    if ( ( ( min_right <= min_n ) && ( ( max_right  + STBIR__MERGE_RUNS_PIXEL_THRESHOLD ) >= min_n ) ) ||
+         ( ( min_n <= min_right ) && ( ( max_n  + STBIR__MERGE_RUNS_PIXEL_THRESHOLD ) >= max_right ) ) )
+    {
+      scanline_extents->spans[0].n0 = min_n = stbir__min( min_n, min_right );
+      scanline_extents->spans[0].n1 = max_n = stbir__max( max_n, max_right );
+      scanline_extents->spans[0].pixel_offset_for_input = min_n;
+      right_margin = 0;
+    }
+  }
+
+  STBIR_ASSERT( scanline_extents->conservative.n0 <= min_n );
+  STBIR_ASSERT( scanline_extents->conservative.n1 >= max_n );
+
+  // you get two ranges when you have the WRAP edge mode and you are doing just the a piece of the resize
+  //   so you need to get a second run of pixels from the opposite side of the scanline (which you
+  //   wouldn't need except for WRAP)
+
+
+  // if we can't merge the min_left range, add it as a second range
+  if ( ( left_margin ) && ( min_left != 0x7fffffff ) )
+  {
+    stbir__span * newspan = scanline_extents->spans + 1;
+    STBIR_ASSERT( right_margin == 0 );
+    if ( min_left < scanline_extents->spans[0].n0 )
+    {
+      scanline_extents->spans[1].pixel_offset_for_input = scanline_extents->spans[0].n0;
+      scanline_extents->spans[1].n0 = scanline_extents->spans[0].n0;
+      scanline_extents->spans[1].n1 = scanline_extents->spans[0].n1;
+      --newspan;
+    }
+    newspan->pixel_offset_for_input = min_left;
+    newspan->n0 = -left_margin;
+    newspan->n1 = ( max_left - min_left ) - left_margin;
+    scanline_extents->edge_sizes[0] = 0;  // don't need to copy the left margin, since we are directly decoding into the margin
+  }
+  // if we can't merge the min_left range, add it as a second range
+  else  
+  if ( ( right_margin ) && ( min_right != 0x7fffffff ) )
+  {
+    stbir__span * newspan = scanline_extents->spans + 1;
+    if ( min_right < scanline_extents->spans[0].n0 )
+    {
+      scanline_extents->spans[1].pixel_offset_for_input = scanline_extents->spans[0].n0;
+      scanline_extents->spans[1].n0 = scanline_extents->spans[0].n0;
+      scanline_extents->spans[1].n1 = scanline_extents->spans[0].n1;
+      --newspan;
+    }
+    newspan->pixel_offset_for_input = min_right;
+    newspan->n0 = scanline_extents->spans[1].n1 + 1;
+    newspan->n1 = scanline_extents->spans[1].n1 + 1 + ( max_right - min_right );
+    scanline_extents->edge_sizes[1] = 0;  // don't need to copy the right margin, since we are directly decoding into the margin
+  }
+
+  // sort the spans into write output order
+  if ( ( scanline_extents->spans[1].n1 > scanline_extents->spans[1].n0 ) && ( scanline_extents->spans[0].n0 > scanline_extents->spans[1].n0 ) )
+  {
+    stbir__span tspan = scanline_extents->spans[0];
+    scanline_extents->spans[0] = scanline_extents->spans[1];
+    scanline_extents->spans[1] = tspan;
+  }
+}
+
+static void stbir__calculate_in_pixel_range( int * first_pixel, int * last_pixel, float out_pixel_center, float out_filter_radius, float inv_scale, float out_shift, int input_size, stbir_edge edge )
+{
+  int first, last;
+  float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
+  float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
+
+  float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) * inv_scale;
+  float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) * inv_scale;
+
+  first = (int)(STBIR_FLOORF(in_pixel_influence_lowerbound + 0.5f));
+  last = (int)(STBIR_FLOORF(in_pixel_influence_upperbound - 0.5f));
+  if ( last < first ) last = first; // point sample mode can span a value *right* at 0.5, and cause these to cross
+
+  if ( edge == STBIR_EDGE_WRAP )
+  {
+    if ( first < -input_size )
+      first = -input_size;
+    if ( last >= (input_size*2))
+      last = (input_size*2) - 1;
+  }
+
+  *first_pixel = first;
+  *last_pixel = last;
+}
+
+static void stbir__calculate_coefficients_for_gather_upsample( float out_filter_radius, stbir__kernel_callback * kernel, stbir__scale_info * scale_info, int num_contributors, stbir__contributors* contributors, float* coefficient_group, int coefficient_width, stbir_edge edge, void * user_data )
+{
+  int n, end;
+  float inv_scale = scale_info->inv_scale;
+  float out_shift = scale_info->pixel_shift;
+  int input_size  = scale_info->input_full_size;
+  int numerator = scale_info->scale_numerator;
+  int polyphase = ( ( scale_info->scale_is_rational ) && ( numerator < num_contributors ) );
+
+  // Looping through out pixels
+  end = num_contributors; if ( polyphase ) end = numerator;
+  for (n = 0; n < end; n++)
+  {
+    int i;
+    int last_non_zero;
+    float out_pixel_center = (float)n + 0.5f;
+    float in_center_of_out = (out_pixel_center + out_shift) * inv_scale;
+
+    int in_first_pixel, in_last_pixel;
+
+    stbir__calculate_in_pixel_range( &in_first_pixel, &in_last_pixel, out_pixel_center, out_filter_radius, inv_scale, out_shift, input_size, edge );
+
+    // make sure we never generate a range larger than our precalculated coeff width
+    //   this only happens in point sample mode, but it's a good safe thing to do anyway
+    if ( ( in_last_pixel - in_first_pixel + 1 ) > coefficient_width )
+      in_last_pixel = in_first_pixel + coefficient_width - 1;
+
+    last_non_zero = -1;
+    for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
+    {
+      float in_pixel_center = (float)(i + in_first_pixel) + 0.5f;
+      float coeff = kernel(in_center_of_out - in_pixel_center, inv_scale, user_data);
+
+      // kill denormals
+      if ( ( ( coeff < stbir__small_float ) && ( coeff > -stbir__small_float ) ) )
+      {
+        if ( i == 0 )  // if we're at the front, just eat zero contributors
+        {
+          STBIR_ASSERT ( ( in_last_pixel - in_first_pixel ) != 0 ); // there should be at least one contrib
+          ++in_first_pixel;
+          i--;
+          continue;
+        }
+        coeff = 0;  // make sure is fully zero (should keep denormals away)
+      }
+      else
+        last_non_zero = i;
+
+      coefficient_group[i] = coeff;
+    }
+
+    in_last_pixel = last_non_zero+in_first_pixel; // kills trailing zeros
+    contributors->n0 = in_first_pixel;
+    contributors->n1 = in_last_pixel;
+
+    STBIR_ASSERT(contributors->n1 >= contributors->n0);
+
+    ++contributors;
+    coefficient_group += coefficient_width;
+  }
+}
+
+static void stbir__insert_coeff( stbir__contributors * contribs, float * coeffs, int new_pixel, float new_coeff, int max_width )
+{
+  if ( new_pixel <= contribs->n1 )  // before the end
+  {
+    if ( new_pixel < contribs->n0 ) // before the front?
+    {
+      if ( ( contribs->n1 - new_pixel + 1 ) <= max_width )
+      { 
+        int j, o = contribs->n0 - new_pixel;
+        for ( j = contribs->n1 - contribs->n0 ; j <= 0 ; j-- )
+          coeffs[ j + o ] = coeffs[ j ];
+        for ( j = 1 ; j < o ; j-- )
+          coeffs[ j ] = coeffs[ 0 ];
+        coeffs[ 0 ] = new_coeff;
+        contribs->n0 = new_pixel;
+      }
+    }
+    else
+    {
+      coeffs[ new_pixel - contribs->n0 ] += new_coeff;
+    }
+  }
+  else
+  {
+    if ( ( new_pixel - contribs->n0 + 1 ) <= max_width )
+    {
+      int j, e = new_pixel - contribs->n0;
+      for( j = ( contribs->n1 - contribs->n0 ) + 1 ; j < e ; j++ ) // clear in-betweens coeffs if there are any
+        coeffs[j] = 0;
+
+      coeffs[ e ] = new_coeff;
+      contribs->n1 = new_pixel;
+    }
+  }
+}
+
+static void stbir__calculate_out_pixel_range( int * first_pixel, int * last_pixel, float in_pixel_center, float in_pixels_radius, float scale, float out_shift, int out_size )
+{
+  float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
+  float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
+  float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale - out_shift;
+  float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale - out_shift;
+  int out_first_pixel = (int)(STBIR_FLOORF(out_pixel_influence_lowerbound + 0.5f));
+  int out_last_pixel = (int)(STBIR_FLOORF(out_pixel_influence_upperbound - 0.5f));
+
+  if ( out_first_pixel < 0 )
+    out_first_pixel = 0;
+  if ( out_last_pixel >= out_size )
+    out_last_pixel = out_size - 1;
+  *first_pixel = out_first_pixel;
+  *last_pixel = out_last_pixel;
+}
+
+static void stbir__calculate_coefficients_for_gather_downsample( int start, int end, float in_pixels_radius, stbir__kernel_callback * kernel, stbir__scale_info * scale_info, int coefficient_width, int num_contributors, stbir__contributors * contributors, float * coefficient_group, void * user_data )
+{
+  int in_pixel;
+  int i;
+  int first_out_inited = -1;
+  float scale = scale_info->scale;
+  float out_shift = scale_info->pixel_shift;
+  int out_size = scale_info->output_sub_size;
+  int numerator = scale_info->scale_numerator;
+  int polyphase = ( ( scale_info->scale_is_rational ) && ( numerator < out_size ) );
+
+  STBIR__UNUSED(num_contributors);
+
+  // Loop through the input pixels
+  for (in_pixel = start; in_pixel < end; in_pixel++)
+  {
+    float in_pixel_center = (float)in_pixel + 0.5f;
+    float out_center_of_in = in_pixel_center * scale - out_shift;
+    int out_first_pixel, out_last_pixel;
+
+    stbir__calculate_out_pixel_range( &out_first_pixel, &out_last_pixel, in_pixel_center, in_pixels_radius, scale, out_shift, out_size );
+
+    if ( out_first_pixel > out_last_pixel )
+      continue;
+
+    // clamp or exit if we are using polyphase filtering, and the limit is up
+    if ( polyphase )
+    {
+      // when polyphase, you only have to do coeffs up to the numerator count
+      if ( out_first_pixel == numerator )
+        break;
+
+      // don't do any extra work, clamp last pixel at numerator too
+      if ( out_last_pixel >= numerator )
+        out_last_pixel = numerator - 1;
+    }
+
+    for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
+    {
+      float out_pixel_center = (float)(i + out_first_pixel) + 0.5f;
+      float x = out_pixel_center - out_center_of_in;
+      float coeff = kernel(x, scale, user_data) * scale;
+
+      // kill the coeff if it's too small (avoid denormals)
+      if ( ( ( coeff < stbir__small_float ) && ( coeff > -stbir__small_float ) ) )
+        coeff = 0.0f;
+
+      {
+        int out = i + out_first_pixel;
+        float * coeffs = coefficient_group + out * coefficient_width;
+        stbir__contributors * contribs = contributors + out;
+
+        // is this the first time this output pixel has been seen?  Init it.
+        if ( out > first_out_inited )
+        {
+          STBIR_ASSERT( out == ( first_out_inited + 1 ) ); // ensure we have only advanced one at time
+          first_out_inited = out;
+          contribs->n0 = in_pixel;
+          contribs->n1 = in_pixel;
+          coeffs[0]  = coeff;
+        }
+        else
+        {
+          // insert on end (always in order)
+          if ( coeffs[0] == 0.0f )  // if the first coefficent is zero, then zap it for this coeffs
+          {
+            STBIR_ASSERT( ( in_pixel - contribs->n0 ) == 1 ); // ensure that when we zap, we're at the 2nd pos
+            contribs->n0 = in_pixel;
+          }
+          contribs->n1 = in_pixel;
+          STBIR_ASSERT( ( in_pixel - contribs->n0 ) < coefficient_width );
+          coeffs[in_pixel - contribs->n0]  = coeff;
+        }
+      }
+    }
+  }
+}
+
+#ifdef STBIR_RENORMALIZE_IN_FLOAT
+#define STBIR_RENORM_TYPE float
+#else
+#define STBIR_RENORM_TYPE double
+#endif
+
+static void stbir__cleanup_gathered_coefficients( stbir_edge edge, stbir__filter_extent_info* filter_info, stbir__scale_info * scale_info, int num_contributors, stbir__contributors* contributors, float * coefficient_group, int coefficient_width )
+{
+  int input_size = scale_info->input_full_size;
+  int input_last_n1 = input_size - 1;
+  int n, end;
+  int lowest = 0x7fffffff;
+  int highest = -0x7fffffff;
+  int widest = -1;
+  int numerator = scale_info->scale_numerator;
+  int denominator = scale_info->scale_denominator;
+  int polyphase = ( ( scale_info->scale_is_rational ) && ( numerator < num_contributors ) );
+  float * coeffs;
+  stbir__contributors * contribs;
+
+  // weight all the coeffs for each sample
+  coeffs = coefficient_group;
+  contribs = contributors;
+  end = num_contributors; if ( polyphase ) end = numerator;
+  for (n = 0; n < end; n++)
+  {
+    int i;
+    STBIR_RENORM_TYPE filter_scale, total_filter = 0;
+    int e;
+
+    // add all contribs
+    e = contribs->n1 - contribs->n0;
+    for( i = 0 ; i <= e ; i++ )
+    {
+      total_filter += (STBIR_RENORM_TYPE) coeffs[i];
+      STBIR_ASSERT( ( coeffs[i] >= -2.0f ) && ( coeffs[i] <= 2.0f )  ); // check for wonky weights
+    }
+
+    // rescale
+    if ( ( total_filter < stbir__small_float ) && ( total_filter > -stbir__small_float ) )
+    {
+      // all coeffs are extremely small, just zero it
+      contribs->n1 = contribs->n0;
+      coeffs[0] = 0.0f;
+    }
+    else
+    {
+      // if the total isn't 1.0, rescale everything
+      if ( ( total_filter < (1.0f-stbir__small_float) ) || ( total_filter > (1.0f+stbir__small_float) ) )
+      {
+        filter_scale = ((STBIR_RENORM_TYPE)1.0) / total_filter;
+
+        // scale them all
+        for (i = 0; i <= e; i++)
+          coeffs[i] = (float) ( coeffs[i] * filter_scale );
+      }
+    }
+    ++contribs;
+    coeffs += coefficient_width;
+  }
+
+  // if we have a rational for the scale, we can exploit the polyphaseness to not calculate
+  //   most of the coefficients, so we copy them here
+  if ( polyphase )
+  {
+    stbir__contributors * prev_contribs = contributors;
+    stbir__contributors * cur_contribs = contributors + numerator;
+
+    for( n = numerator ; n < num_contributors ; n++ )
+    {
+      cur_contribs->n0 = prev_contribs->n0 + denominator;
+      cur_contribs->n1 = prev_contribs->n1 + denominator;
+      ++cur_contribs;
+      ++prev_contribs;
+    }
+    stbir_overlapping_memcpy( coefficient_group + numerator * coefficient_width, coefficient_group, ( num_contributors - numerator ) * coefficient_width * sizeof( coeffs[ 0 ] ) );
+  }
+
+  coeffs = coefficient_group;
+  contribs = contributors;
+
+  for (n = 0; n < num_contributors; n++)
+  {
+    int i;
+
+    // in zero edge mode, just remove out of bounds contribs completely (since their weights are accounted for now)
+    if ( edge == STBIR_EDGE_ZERO )
+    {
+      // shrink the right side if necessary
+      if ( contribs->n1 > input_last_n1 )
+        contribs->n1 = input_last_n1;
+
+      // shrink the left side
+      if ( contribs->n0 < 0 )
+      {
+        int j, left, skips = 0;
+
+        skips = -contribs->n0;
+        contribs->n0 = 0;
+
+        // now move down the weights
+        left = contribs->n1 - contribs->n0 + 1;
+        if ( left > 0 )
+        {
+          for( j = 0 ; j < left ; j++ )
+            coeffs[ j ] = coeffs[ j + skips ];
+        }
+      }
+    }
+    else if ( ( edge == STBIR_EDGE_CLAMP ) || ( edge == STBIR_EDGE_REFLECT ) )
+    {
+      // for clamp and reflect, calculate the true inbounds position (based on edge type) and just add that to the existing weight
+
+      // right hand side first
+      if ( contribs->n1 > input_last_n1 )
+      {
+        int start = contribs->n0;
+        int endi = contribs->n1;
+        contribs->n1 = input_last_n1;
+        for( i = input_size; i <= endi; i++ )
+          stbir__insert_coeff( contribs, coeffs, stbir__edge_wrap_slow[edge]( i, input_size ), coeffs[i-start], coefficient_width );
+      }
+
+      // now check left hand edge
+      if ( contribs->n0 < 0 )
+      {
+        int save_n0;
+        float save_n0_coeff;
+        float * c = coeffs - ( contribs->n0 + 1 );
+
+        // reinsert the coeffs with it reflected or clamped (insert accumulates, if the coeffs exist)
+        for( i = -1 ; i > contribs->n0 ; i-- )
+          stbir__insert_coeff( contribs, coeffs, stbir__edge_wrap_slow[edge]( i, input_size ), *c--, coefficient_width );
+        save_n0 = contribs->n0;
+        save_n0_coeff = c[0]; // save it, since we didn't do the final one (i==n0), because there might be too many coeffs to hold (before we resize)!
+
+        // now slide all the coeffs down (since we have accumulated them in the positive contribs) and reset the first contrib
+        contribs->n0 = 0;
+        for(i = 0 ; i <= contribs->n1 ; i++ )
+          coeffs[i] = coeffs[i-save_n0];
+
+        // now that we have shrunk down the contribs, we insert the first one safely
+        stbir__insert_coeff( contribs, coeffs, stbir__edge_wrap_slow[edge]( save_n0, input_size ), save_n0_coeff, coefficient_width );
+      }
+    }
+
+    if ( contribs->n0 <= contribs->n1 )
+    {
+      int diff = contribs->n1 - contribs->n0 + 1;
+      while ( diff && ( coeffs[ diff-1 ] == 0.0f ) )
+        --diff;
+
+      contribs->n1 = contribs->n0 + diff - 1;
+
+      if ( contribs->n0 <= contribs->n1 )
+      {
+        if ( contribs->n0 < lowest )
+          lowest = contribs->n0;
+        if ( contribs->n1 > highest )
+          highest = contribs->n1;
+        if ( diff > widest )
+          widest = diff;
+      }
+
+      // re-zero out unused coefficients (if any)
+      for( i = diff ; i < coefficient_width ; i++ )
+        coeffs[i] = 0.0f;
+    }
+
+    ++contribs;
+    coeffs += coefficient_width;
+  }
+  filter_info->lowest = lowest;
+  filter_info->highest = highest;
+  filter_info->widest = widest;
+}
+
+#undef STBIR_RENORM_TYPE 
+
+static int stbir__pack_coefficients( int num_contributors, stbir__contributors* contributors, float * coefficents, int coefficient_width, int widest, int row0, int row1 ) 
+{
+  #define STBIR_MOVE_1( dest, src ) { STBIR_NO_UNROLL(dest); ((stbir_uint32*)(dest))[0] = ((stbir_uint32*)(src))[0]; }
+  #define STBIR_MOVE_2( dest, src ) { STBIR_NO_UNROLL(dest); ((stbir_uint64*)(dest))[0] = ((stbir_uint64*)(src))[0]; }
+  #ifdef STBIR_SIMD
+  #define STBIR_MOVE_4( dest, src ) { stbir__simdf t; STBIR_NO_UNROLL(dest); stbir__simdf_load( t, src ); stbir__simdf_store( dest, t ); }
+  #else
+  #define STBIR_MOVE_4( dest, src ) { STBIR_NO_UNROLL(dest); ((stbir_uint64*)(dest))[0] = ((stbir_uint64*)(src))[0]; ((stbir_uint64*)(dest))[1] = ((stbir_uint64*)(src))[1]; }
+  #endif
+
+  int row_end = row1 + 1;
+  STBIR__UNUSED( row0 ); // only used in an assert
+
+  if ( coefficient_width != widest )
+  {
+    float * pc = coefficents;
+    float * coeffs = coefficents;
+    float * pc_end = coefficents + num_contributors * widest;
+    switch( widest )
+    {
+      case 1:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_1( pc, coeffs );
+          ++pc;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 2:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_2( pc, coeffs );
+          pc += 2;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 3:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_2( pc, coeffs );
+          STBIR_MOVE_1( pc+2, coeffs+2 );
+          pc += 3;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 4:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          pc += 4;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 5:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_1( pc+4, coeffs+4 );
+          pc += 5;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 6:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_2( pc+4, coeffs+4 );
+          pc += 6;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 7:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_2( pc+4, coeffs+4 );
+          STBIR_MOVE_1( pc+6, coeffs+6 );
+          pc += 7;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 8:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_4( pc+4, coeffs+4 );
+          pc += 8;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 9:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_4( pc+4, coeffs+4 );
+          STBIR_MOVE_1( pc+8, coeffs+8 );
+          pc += 9;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 10:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_4( pc+4, coeffs+4 );
+          STBIR_MOVE_2( pc+8, coeffs+8 );
+          pc += 10;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 11:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_4( pc+4, coeffs+4 );
+          STBIR_MOVE_2( pc+8, coeffs+8 );
+          STBIR_MOVE_1( pc+10, coeffs+10 );
+          pc += 11;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      case 12:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          STBIR_MOVE_4( pc, coeffs );
+          STBIR_MOVE_4( pc+4, coeffs+4 );
+          STBIR_MOVE_4( pc+8, coeffs+8 );
+          pc += 12;
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+      default:
+        STBIR_NO_UNROLL_LOOP_START
+        do {
+          float * copy_end = pc + widest - 4;
+          float * c = coeffs;
+          do {
+            STBIR_NO_UNROLL( pc );
+            STBIR_MOVE_4( pc, c );
+            pc += 4;
+            c += 4;
+          } while ( pc <= copy_end );
+          copy_end += 4;
+          STBIR_NO_UNROLL_LOOP_START
+          while ( pc < copy_end )
+          {
+            STBIR_MOVE_1( pc, c );
+            ++pc; ++c;
+          }
+          coeffs += coefficient_width;
+        } while ( pc < pc_end );
+        break;
+    }
+  }
+
+  // some horizontal routines read one float off the end (which is then masked off), so put in a sentinal so we don't read an snan or denormal
+  coefficents[ widest * num_contributors ] = 8888.0f;
+
+  // the minimum we might read for unrolled filters widths is 12. So, we need to
+  //   make sure we never read outside the decode buffer, by possibly moving
+  //   the sample area back into the scanline, and putting zeros weights first.
+  // we start on the right edge and check until we're well past the possible
+  //   clip area (2*widest).
+  {
+    stbir__contributors * contribs = contributors + num_contributors - 1;
+    float * coeffs = coefficents + widest * ( num_contributors - 1 );
+
+    // go until no chance of clipping (this is usually less than 8 lops)
+    while ( ( contribs >= contributors ) && ( ( contribs->n0 + widest*2 ) >= row_end ) )
+    {
+      // might we clip??
+      if ( ( contribs->n0 + widest ) > row_end )
+      {
+        int stop_range = widest;
+
+        // if range is larger than 12, it will be handled by generic loops that can terminate on the exact length
+        //   of this contrib n1, instead of a fixed widest amount - so calculate this
+        if ( widest > 12 )
+        {
+          int mod;
+
+          // how far will be read in the n_coeff loop (which depends on the widest count mod4);
+          mod = widest & 3;
+          stop_range = ( ( ( contribs->n1 - contribs->n0 + 1 ) - mod + 3 ) & ~3 ) + mod;
+
+          // the n_coeff loops do a minimum amount of coeffs, so factor that in!
+          if ( stop_range < ( 8 + mod ) ) stop_range = 8 + mod;
+        }
+
+        // now see if we still clip with the refined range
+        if ( ( contribs->n0 + stop_range ) > row_end )
+        {
+          int new_n0 = row_end - stop_range;
+          int num = contribs->n1 - contribs->n0 + 1;
+          int backup = contribs->n0 - new_n0;
+          float * from_co = coeffs + num - 1;
+          float * to_co = from_co + backup;
+
+          STBIR_ASSERT( ( new_n0 >= row0 ) && ( new_n0 < contribs->n0 ) );
+
+          // move the coeffs over
+          while( num )
+          {
+            *to_co-- = *from_co--;
+            --num;
+          }
+          // zero new positions
+          while ( to_co >= coeffs )
+            *to_co-- = 0;
+          // set new start point
+          contribs->n0 = new_n0;
+          if ( widest > 12 )
+          {
+            int mod;
+
+            // how far will be read in the n_coeff loop (which depends on the widest count mod4);
+            mod = widest & 3;
+            stop_range = ( ( ( contribs->n1 - contribs->n0 + 1 ) - mod + 3 ) & ~3 ) + mod;
+
+            // the n_coeff loops do a minimum amount of coeffs, so factor that in!
+            if ( stop_range < ( 8 + mod ) ) stop_range = 8 + mod;
+          }
+        }
+      }
+      --contribs;
+      coeffs -= widest;
+    }
+  }
+
+  return widest;
+  #undef STBIR_MOVE_1
+  #undef STBIR_MOVE_2
+  #undef STBIR_MOVE_4
+}
+
+static void stbir__calculate_filters( stbir__sampler * samp, stbir__sampler * other_axis_for_pivot, void * user_data STBIR_ONLY_PROFILE_BUILD_GET_INFO )
+{
+  int n;
+  float scale = samp->scale_info.scale;
+  stbir__kernel_callback * kernel = samp->filter_kernel;
+  stbir__support_callback * support = samp->filter_support;
+  float inv_scale = samp->scale_info.inv_scale;
+  int input_full_size = samp->scale_info.input_full_size;
+  int gather_num_contributors = samp->num_contributors;
+  stbir__contributors* gather_contributors = samp->contributors;
+  float * gather_coeffs = samp->coefficients;
+  int gather_coefficient_width = samp->coefficient_width;
+
+  switch ( samp->is_gather )
+  {
+    case 1: // gather upsample
+    {
+      float out_pixels_radius = support(inv_scale,user_data) * scale;
+
+      stbir__calculate_coefficients_for_gather_upsample( out_pixels_radius, kernel, &samp->scale_info, gather_num_contributors, gather_contributors, gather_coeffs, gather_coefficient_width, samp->edge, user_data );
+
+      STBIR_PROFILE_BUILD_START( cleanup );
+      stbir__cleanup_gathered_coefficients( samp->edge, &samp->extent_info, &samp->scale_info, gather_num_contributors, gather_contributors, gather_coeffs, gather_coefficient_width );
+      STBIR_PROFILE_BUILD_END( cleanup );
+    }
+    break;
+
+    case 0: // scatter downsample (only on vertical)
+    case 2: // gather downsample
+    {
+      float in_pixels_radius = support(scale,user_data) * inv_scale;
+      int filter_pixel_margin = samp->filter_pixel_margin;
+      int input_end = input_full_size + filter_pixel_margin;
+
+      // if this is a scatter, we do a downsample gather to get the coeffs, and then pivot after
+      if ( !samp->is_gather )
+      {
+        // check if we are using the same gather downsample on the horizontal as this vertical,
+        //   if so, then we don't have to generate them, we can just pivot from the horizontal.
+        if ( other_axis_for_pivot )
+        {
+          gather_contributors = other_axis_for_pivot->contributors;
+          gather_coeffs = other_axis_for_pivot->coefficients;
+          gather_coefficient_width = other_axis_for_pivot->coefficient_width;
+          gather_num_contributors = other_axis_for_pivot->num_contributors;
+          samp->extent_info.lowest = other_axis_for_pivot->extent_info.lowest;
+          samp->extent_info.highest = other_axis_for_pivot->extent_info.highest;
+          samp->extent_info.widest = other_axis_for_pivot->extent_info.widest;
+          goto jump_right_to_pivot;
+        }
+
+        gather_contributors = samp->gather_prescatter_contributors;
+        gather_coeffs = samp->gather_prescatter_coefficients;
+        gather_coefficient_width = samp->gather_prescatter_coefficient_width;
+        gather_num_contributors = samp->gather_prescatter_num_contributors;
+      }
+
+      stbir__calculate_coefficients_for_gather_downsample( -filter_pixel_margin, input_end, in_pixels_radius, kernel, &samp->scale_info, gather_coefficient_width, gather_num_contributors, gather_contributors, gather_coeffs, user_data );
+
+      STBIR_PROFILE_BUILD_START( cleanup );
+      stbir__cleanup_gathered_coefficients( samp->edge, &samp->extent_info, &samp->scale_info, gather_num_contributors, gather_contributors, gather_coeffs, gather_coefficient_width );
+      STBIR_PROFILE_BUILD_END( cleanup );
+
+      if ( !samp->is_gather )
+      {
+        // if this is a scatter (vertical only), then we need to pivot the coeffs
+        stbir__contributors * scatter_contributors;
+        int highest_set;
+
+        jump_right_to_pivot:
+
+        STBIR_PROFILE_BUILD_START( pivot );
+
+        highest_set = (-filter_pixel_margin) - 1;
+        for (n = 0; n < gather_num_contributors; n++)
+        {
+          int k;
+          int gn0 = gather_contributors->n0, gn1 = gather_contributors->n1;
+          int scatter_coefficient_width = samp->coefficient_width;
+          float * scatter_coeffs = samp->coefficients + ( gn0 + filter_pixel_margin ) * scatter_coefficient_width;
+          float * g_coeffs = gather_coeffs;
+          scatter_contributors = samp->contributors + ( gn0 + filter_pixel_margin );
+
+          for (k = gn0 ; k <= gn1 ; k++ )
+          {
+            float gc = *g_coeffs++;
+            
+            // skip zero and denormals - must skip zeros to avoid adding coeffs beyond scatter_coefficient_width
+            //   (which happens when pivoting from horizontal, which might have dummy zeros)
+            if ( ( ( gc >= stbir__small_float ) || ( gc <= -stbir__small_float ) ) )
+            {
+              if ( ( k > highest_set ) || ( scatter_contributors->n0 > scatter_contributors->n1 ) )
+              {
+                {
+                  // if we are skipping over several contributors, we need to clear the skipped ones
+                  stbir__contributors * clear_contributors = samp->contributors + ( highest_set + filter_pixel_margin + 1);
+                  while ( clear_contributors < scatter_contributors )
+                  {
+                    clear_contributors->n0 = 0;
+                    clear_contributors->n1 = -1;
+                    ++clear_contributors;
+                  }
+                }
+                scatter_contributors->n0 = n;
+                scatter_contributors->n1 = n;
+                scatter_coeffs[0]  = gc;
+                highest_set = k;
+              }
+              else
+              {
+                stbir__insert_coeff( scatter_contributors, scatter_coeffs, n, gc, scatter_coefficient_width );
+              }
+              STBIR_ASSERT( ( scatter_contributors->n1 - scatter_contributors->n0 + 1 ) <= scatter_coefficient_width );
+            }
+            ++scatter_contributors;
+            scatter_coeffs += scatter_coefficient_width;
+          }
+
+          ++gather_contributors;
+          gather_coeffs += gather_coefficient_width;
+        }
+
+        // now clear any unset contribs
+        {
+          stbir__contributors * clear_contributors = samp->contributors + ( highest_set + filter_pixel_margin + 1);
+          stbir__contributors * end_contributors = samp->contributors + samp->num_contributors;
+          while ( clear_contributors < end_contributors )
+          {
+            clear_contributors->n0 = 0;
+            clear_contributors->n1 = -1;
+            ++clear_contributors;
+          }
+        }
+
+        STBIR_PROFILE_BUILD_END( pivot );
+      }
+    }
+    break;
+  }
+}
+
+
+//========================================================================================================
+// scanline decoders and encoders
+
+#define stbir__coder_min_num 1
+#define STB_IMAGE_RESIZE_DO_CODERS
+#include STBIR__HEADER_FILENAME
+
+#define stbir__decode_suffix BGRA
+#define stbir__decode_swizzle
+#define stbir__decode_order0  2
+#define stbir__decode_order1  1
+#define stbir__decode_order2  0
+#define stbir__decode_order3  3
+#define stbir__encode_order0  2
+#define stbir__encode_order1  1
+#define stbir__encode_order2  0
+#define stbir__encode_order3  3
+#define stbir__coder_min_num 4
+#define STB_IMAGE_RESIZE_DO_CODERS
+#include STBIR__HEADER_FILENAME
+
+#define stbir__decode_suffix ARGB
+#define stbir__decode_swizzle
+#define stbir__decode_order0  1
+#define stbir__decode_order1  2
+#define stbir__decode_order2  3
+#define stbir__decode_order3  0
+#define stbir__encode_order0  3
+#define stbir__encode_order1  0
+#define stbir__encode_order2  1
+#define stbir__encode_order3  2
+#define stbir__coder_min_num 4
+#define STB_IMAGE_RESIZE_DO_CODERS
+#include STBIR__HEADER_FILENAME
+
+#define stbir__decode_suffix ABGR
+#define stbir__decode_swizzle
+#define stbir__decode_order0  3
+#define stbir__decode_order1  2
+#define stbir__decode_order2  1
+#define stbir__decode_order3  0
+#define stbir__encode_order0  3
+#define stbir__encode_order1  2
+#define stbir__encode_order2  1
+#define stbir__encode_order3  0
+#define stbir__coder_min_num 4
+#define STB_IMAGE_RESIZE_DO_CODERS
+#include STBIR__HEADER_FILENAME
+
+#define stbir__decode_suffix AR
+#define stbir__decode_swizzle
+#define stbir__decode_order0  1
+#define stbir__decode_order1  0
+#define stbir__decode_order2  3
+#define stbir__decode_order3  2
+#define stbir__encode_order0  1
+#define stbir__encode_order1  0
+#define stbir__encode_order2  3
+#define stbir__encode_order3  2
+#define stbir__coder_min_num 2
+#define STB_IMAGE_RESIZE_DO_CODERS
+#include STBIR__HEADER_FILENAME
+
+
+// fancy alpha means we expand to keep both premultipied and non-premultiplied color channels
+static void stbir__fancy_alpha_weight_4ch( float * out_buffer, int width_times_channels )
+{
+  float STBIR_STREAMOUT_PTR(*) out = out_buffer;
+  float const * end_decode = out_buffer + ( width_times_channels / 4 ) * 7;  // decode buffer aligned to end of out_buffer
+  float STBIR_STREAMOUT_PTR(*) decode = (float*)end_decode - width_times_channels;
+
+  // fancy alpha is stored internally as R G B A Rpm Gpm Bpm
+
+  #ifdef STBIR_SIMD
+
+  #ifdef STBIR_SIMD8
+  decode += 16;
+  STBIR_NO_UNROLL_LOOP_START
+  while ( decode <= end_decode )
+  {
+    stbir__simdf8 d0,d1,a0,a1,p0,p1;
+    STBIR_NO_UNROLL(decode);
+    stbir__simdf8_load( d0, decode-16 );
+    stbir__simdf8_load( d1, decode-16+8 );
+    stbir__simdf8_0123to33333333( a0, d0 );
+    stbir__simdf8_0123to33333333( a1, d1 );
+    stbir__simdf8_mult( p0, a0, d0 );
+    stbir__simdf8_mult( p1, a1, d1 );
+    stbir__simdf8_bot4s( a0, d0, p0 );
+    stbir__simdf8_bot4s( a1, d1, p1 );
+    stbir__simdf8_top4s( d0, d0, p0 );
+    stbir__simdf8_top4s( d1, d1, p1 );
+    stbir__simdf8_store ( out, a0 );
+    stbir__simdf8_store ( out+7, d0 );
+    stbir__simdf8_store ( out+14, a1 );
+    stbir__simdf8_store ( out+21, d1 );
+    decode += 16;
+    out += 28;
+  }
+  decode -= 16;
+  #else
+  decode += 8;
+  STBIR_NO_UNROLL_LOOP_START
+  while ( decode <= end_decode )
+  {
+    stbir__simdf d0,a0,d1,a1,p0,p1;
+    STBIR_NO_UNROLL(decode);
+    stbir__simdf_load( d0, decode-8 );
+    stbir__simdf_load( d1, decode-8+4 );
+    stbir__simdf_0123to3333( a0, d0 );
+    stbir__simdf_0123to3333( a1, d1 );
+    stbir__simdf_mult( p0, a0, d0 );
+    stbir__simdf_mult( p1, a1, d1 );
+    stbir__simdf_store ( out, d0 );
+    stbir__simdf_store ( out+4, p0 );
+    stbir__simdf_store ( out+7, d1 );
+    stbir__simdf_store ( out+7+4, p1 );
+    decode += 8;
+    out += 14;
+  }
+  decode -= 8;
+  #endif
+
+  // might be one last odd pixel
+  #ifdef STBIR_SIMD8
+  STBIR_NO_UNROLL_LOOP_START
+  while ( decode < end_decode )
+  #else
+  if ( decode < end_decode )
+  #endif
+  {
+    stbir__simdf d,a,p;
+    STBIR_NO_UNROLL(decode);
+    stbir__simdf_load( d, decode );
+    stbir__simdf_0123to3333( a, d );
+    stbir__simdf_mult( p, a, d );
+    stbir__simdf_store ( out, d );
+    stbir__simdf_store ( out+4, p );
+    decode += 4;
+    out += 7;
+  }
+
+  #else
+
+  while( decode < end_decode )
+  {
+    float r = decode[0], g = decode[1], b = decode[2], alpha = decode[3];
+    out[0] = r;
+    out[1] = g;
+    out[2] = b;
+    out[3] = alpha;
+    out[4] = r * alpha;
+    out[5] = g * alpha;
+    out[6] = b * alpha;
+    out += 7;
+    decode += 4;
+  }
+
+  #endif
+}
+
+static void stbir__fancy_alpha_weight_2ch( float * out_buffer, int width_times_channels )
+{
+  float STBIR_STREAMOUT_PTR(*) out = out_buffer;
+  float const * end_decode = out_buffer + ( width_times_channels / 2 ) * 3;
+  float STBIR_STREAMOUT_PTR(*) decode = (float*)end_decode - width_times_channels;
+
+  //  for fancy alpha, turns into: [X A Xpm][X A Xpm],etc
+
+  #ifdef STBIR_SIMD
+
+  decode += 8;
+  if ( decode <= end_decode )
+  {
+    STBIR_NO_UNROLL_LOOP_START
+    do {
+      #ifdef STBIR_SIMD8
+      stbir__simdf8 d0,a0,p0;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdf8_load( d0, decode-8 );
+      stbir__simdf8_0123to11331133( p0, d0 );
+      stbir__simdf8_0123to00220022( a0, d0 );
+      stbir__simdf8_mult( p0, p0, a0 );
+
+      stbir__simdf_store2( out, stbir__if_simdf8_cast_to_simdf4( d0 ) );
+      stbir__simdf_store( out+2, stbir__if_simdf8_cast_to_simdf4( p0 ) );
+      stbir__simdf_store2h( out+3, stbir__if_simdf8_cast_to_simdf4( d0 ) );
+
+      stbir__simdf_store2( out+6, stbir__simdf8_gettop4( d0 ) );
+      stbir__simdf_store( out+8, stbir__simdf8_gettop4( p0 ) );
+      stbir__simdf_store2h( out+9, stbir__simdf8_gettop4( d0 ) );
+      #else
+      stbir__simdf d0,a0,d1,a1,p0,p1;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdf_load( d0, decode-8 );
+      stbir__simdf_load( d1, decode-8+4 );
+      stbir__simdf_0123to1133( p0, d0 );
+      stbir__simdf_0123to1133( p1, d1 );
+      stbir__simdf_0123to0022( a0, d0 );
+      stbir__simdf_0123to0022( a1, d1 );
+      stbir__simdf_mult( p0, p0, a0 );
+      stbir__simdf_mult( p1, p1, a1 );
+
+      stbir__simdf_store2( out, d0 );
+      stbir__simdf_store( out+2, p0 );
+      stbir__simdf_store2h( out+3, d0 );
+
+      stbir__simdf_store2( out+6, d1 );
+      stbir__simdf_store( out+8, p1 );
+      stbir__simdf_store2h( out+9, d1 );
+      #endif
+      decode += 8;
+      out += 12;
+    } while ( decode <= end_decode );
+  }
+  decode -= 8;
+  #endif
+
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode < end_decode )
+  {
+    float x = decode[0], y = decode[1];
+    STBIR_SIMD_NO_UNROLL(decode);
+    out[0] = x;
+    out[1] = y;
+    out[2] = x * y;
+    out += 3;
+    decode += 2;
+  }
+}
+
+static void stbir__fancy_alpha_unweight_4ch( float * encode_buffer, int width_times_channels )
+{
+  float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
+  float STBIR_SIMD_STREAMOUT_PTR(*) input = encode_buffer;
+  float const * end_output = encode_buffer + width_times_channels;
+
+  // fancy RGBA is stored internally as R G B A Rpm Gpm Bpm
+
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float alpha = input[3];
+#ifdef STBIR_SIMD
+    stbir__simdf i,ia;
+    STBIR_SIMD_NO_UNROLL(encode);
+    if ( alpha < stbir__small_float )
+    {
+      stbir__simdf_load( i, input );
+      stbir__simdf_store( encode, i );
+    }
+    else
+    {
+      stbir__simdf_load1frep4( ia, 1.0f / alpha );
+      stbir__simdf_load( i, input+4 );
+      stbir__simdf_mult( i, i, ia );
+      stbir__simdf_store( encode, i );
+      encode[3] = alpha;
+    }
+#else
+    if ( alpha < stbir__small_float )
+    {
+      encode[0] = input[0];
+      encode[1] = input[1];
+      encode[2] = input[2];
+    }
+    else
+    {
+      float ialpha = 1.0f / alpha;
+      encode[0] = input[4] * ialpha;
+      encode[1] = input[5] * ialpha;
+      encode[2] = input[6] * ialpha;
+    }
+    encode[3] = alpha;
+#endif
+
+    input += 7;
+    encode += 4;
+  } while ( encode < end_output );
+}
+
+//  format: [X A Xpm][X A Xpm] etc
+static void stbir__fancy_alpha_unweight_2ch( float * encode_buffer, int width_times_channels )
+{
+  float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
+  float STBIR_SIMD_STREAMOUT_PTR(*) input = encode_buffer;
+  float const * end_output = encode_buffer + width_times_channels;
+
+  do {
+    float alpha = input[1];
+    encode[0] = input[0];
+    if ( alpha >= stbir__small_float )
+      encode[0] = input[2] / alpha;
+    encode[1] = alpha;
+
+    input += 3;
+    encode += 2;
+  } while ( encode < end_output );
+}
+
+static void stbir__simple_alpha_weight_4ch( float * decode_buffer, int width_times_channels )
+{
+  float STBIR_STREAMOUT_PTR(*) decode = decode_buffer;
+  float const * end_decode = decode_buffer + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  {
+    decode += 2 * stbir__simdfX_float_count;
+    STBIR_NO_UNROLL_LOOP_START
+    while ( decode <= end_decode )
+    {
+      stbir__simdfX d0,a0,d1,a1;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdfX_load( d0, decode-2*stbir__simdfX_float_count );
+      stbir__simdfX_load( d1, decode-2*stbir__simdfX_float_count+stbir__simdfX_float_count );
+      stbir__simdfX_aaa1( a0, d0, STBIR_onesX );
+      stbir__simdfX_aaa1( a1, d1, STBIR_onesX );
+      stbir__simdfX_mult( d0, d0, a0 );
+      stbir__simdfX_mult( d1, d1, a1 );
+      stbir__simdfX_store ( decode-2*stbir__simdfX_float_count, d0 );
+      stbir__simdfX_store ( decode-2*stbir__simdfX_float_count+stbir__simdfX_float_count, d1 );
+      decode += 2 * stbir__simdfX_float_count;
+    }
+    decode -= 2 * stbir__simdfX_float_count;
+
+    // few last pixels remnants
+    #ifdef STBIR_SIMD8
+    STBIR_NO_UNROLL_LOOP_START
+    while ( decode < end_decode )
+    #else
+    if ( decode < end_decode )
+    #endif
+    {
+      stbir__simdf d,a;
+      stbir__simdf_load( d, decode );
+      stbir__simdf_aaa1( a, d, STBIR__CONSTF(STBIR_ones) );
+      stbir__simdf_mult( d, d, a );
+      stbir__simdf_store ( decode, d );
+      decode += 4;
+    }
+  }
+
+  #else
+
+  while( decode < end_decode )
+  {
+    float alpha = decode[3];
+    decode[0] *= alpha;
+    decode[1] *= alpha;
+    decode[2] *= alpha;
+    decode += 4;
+  }
+
+  #endif
+}
+
+static void stbir__simple_alpha_weight_2ch( float * decode_buffer, int width_times_channels )
+{
+  float STBIR_STREAMOUT_PTR(*) decode = decode_buffer;
+  float const * end_decode = decode_buffer + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  decode += 2 * stbir__simdfX_float_count;
+  STBIR_NO_UNROLL_LOOP_START
+  while ( decode <= end_decode )
+  {
+    stbir__simdfX d0,a0,d1,a1;
+    STBIR_NO_UNROLL(decode);
+    stbir__simdfX_load( d0, decode-2*stbir__simdfX_float_count );
+    stbir__simdfX_load( d1, decode-2*stbir__simdfX_float_count+stbir__simdfX_float_count );
+    stbir__simdfX_a1a1( a0, d0, STBIR_onesX );
+    stbir__simdfX_a1a1( a1, d1, STBIR_onesX );
+    stbir__simdfX_mult( d0, d0, a0 );
+    stbir__simdfX_mult( d1, d1, a1 );
+    stbir__simdfX_store ( decode-2*stbir__simdfX_float_count, d0 );
+    stbir__simdfX_store ( decode-2*stbir__simdfX_float_count+stbir__simdfX_float_count, d1 );
+    decode += 2 * stbir__simdfX_float_count;
+  }
+  decode -= 2 * stbir__simdfX_float_count;
+  #endif
+
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode < end_decode )
+  {
+    float alpha = decode[1];
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0] *= alpha;
+    decode += 2;
+  }
+}
+
+static void stbir__simple_alpha_unweight_4ch( float * encode_buffer, int width_times_channels )
+{
+  float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
+  float const * end_output = encode_buffer + width_times_channels;
+
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float alpha = encode[3];
+
+#ifdef STBIR_SIMD
+    stbir__simdf i,ia;
+    STBIR_SIMD_NO_UNROLL(encode);
+    if ( alpha >= stbir__small_float )
+    {
+      stbir__simdf_load1frep4( ia, 1.0f / alpha );
+      stbir__simdf_load( i, encode );
+      stbir__simdf_mult( i, i, ia );
+      stbir__simdf_store( encode, i );
+      encode[3] = alpha;
+    }
+#else
+    if ( alpha >= stbir__small_float )
+    {
+      float ialpha = 1.0f / alpha;
+      encode[0] *= ialpha;
+      encode[1] *= ialpha;
+      encode[2] *= ialpha;
+    }
+#endif
+    encode += 4;
+  } while ( encode < end_output );
+}
+
+static void stbir__simple_alpha_unweight_2ch( float * encode_buffer, int width_times_channels )
+{
+  float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
+  float const * end_output = encode_buffer + width_times_channels;
+
+  do {
+    float alpha = encode[1];
+    if ( alpha >= stbir__small_float )
+      encode[0] /= alpha;
+    encode += 2;
+  } while ( encode < end_output );
+}
+
+
+// only used in RGB->BGR or BGR->RGB
+static void stbir__simple_flip_3ch( float * decode_buffer, int width_times_channels )
+{
+  float STBIR_STREAMOUT_PTR(*) decode = decode_buffer;
+  float const * end_decode = decode_buffer + width_times_channels;
+
+#ifdef STBIR_SIMD
+    #ifdef stbir__simdf_swiz2 // do we have two argument swizzles?
+      end_decode -= 12; 
+      STBIR_NO_UNROLL_LOOP_START
+      while( decode <= end_decode )
+      {
+        // on arm64 8 instructions, no overlapping stores
+        stbir__simdf a,b,c,na,nb;
+        STBIR_SIMD_NO_UNROLL(decode);
+        stbir__simdf_load( a, decode );
+        stbir__simdf_load( b, decode+4 );
+        stbir__simdf_load( c, decode+8 );
+
+        na = stbir__simdf_swiz2( a, b, 2, 1, 0, 5 );   
+        b  = stbir__simdf_swiz2( a, b, 4, 3, 6, 7 );   
+        nb = stbir__simdf_swiz2( b, c, 0, 1, 4, 3 );   
+        c  = stbir__simdf_swiz2( b, c, 2, 7, 6, 5 );   
+
+        stbir__simdf_store( decode, na );
+        stbir__simdf_store( decode+4, nb ); 
+        stbir__simdf_store( decode+8, c );
+        decode += 12;
+      }
+      end_decode += 12;
+    #else
+      end_decode -= 24;
+      STBIR_NO_UNROLL_LOOP_START
+      while( decode <= end_decode )
+      {
+        // 26 instructions on x64
+        stbir__simdf a,b,c,d,e,f,g;
+        float i21, i23;
+        STBIR_SIMD_NO_UNROLL(decode);
+        stbir__simdf_load( a, decode );
+        stbir__simdf_load( b, decode+3 );
+        stbir__simdf_load( c, decode+6 );
+        stbir__simdf_load( d, decode+9 );
+        stbir__simdf_load( e, decode+12 );
+        stbir__simdf_load( f, decode+15 );
+        stbir__simdf_load( g, decode+18 );
+
+        a = stbir__simdf_swiz( a, 2, 1, 0, 3 );   
+        b = stbir__simdf_swiz( b, 2, 1, 0, 3 );   
+        c = stbir__simdf_swiz( c, 2, 1, 0, 3 );   
+        d = stbir__simdf_swiz( d, 2, 1, 0, 3 );   
+        e = stbir__simdf_swiz( e, 2, 1, 0, 3 );   
+        f = stbir__simdf_swiz( f, 2, 1, 0, 3 );   
+        g = stbir__simdf_swiz( g, 2, 1, 0, 3 );   
+
+        // stores overlap, need to be in order, 
+        stbir__simdf_store( decode,    a );
+        i21 = decode[21];
+        stbir__simdf_store( decode+3,  b ); 
+        i23 = decode[23];
+        stbir__simdf_store( decode+6,  c );
+        stbir__simdf_store( decode+9,  d );
+        stbir__simdf_store( decode+12, e );
+        stbir__simdf_store( decode+15, f );
+        stbir__simdf_store( decode+18, g );
+        decode[21] = i23;
+        decode[23] = i21;
+        decode += 24;
+      }
+      end_decode += 24;
+    #endif
+#else
+  end_decode -= 12;
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode <= end_decode )
+  {
+    // 16 instructions
+    float t0,t1,t2,t3;
+    STBIR_NO_UNROLL(decode);
+    t0 = decode[0]; t1 = decode[3]; t2 = decode[6]; t3 = decode[9];
+    decode[0] = decode[2]; decode[3] = decode[5]; decode[6] = decode[8]; decode[9] = decode[11];
+    decode[2] = t0; decode[5] = t1; decode[8] = t2; decode[11] = t3;
+    decode += 12;
+  }
+  end_decode += 12;
+#endif
+
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < end_decode )
+  {
+    float t = decode[0];
+    STBIR_NO_UNROLL(decode);
+    decode[0] = decode[2];
+    decode[2] = t;
+    decode += 3;
+  }
+}
+
+
+
+static void stbir__decode_scanline(stbir__info const * stbir_info, int n, float * output_buffer STBIR_ONLY_PROFILE_GET_SPLIT_INFO )
+{
+  int channels = stbir_info->channels;
+  int effective_channels = stbir_info->effective_channels;
+  int input_sample_in_bytes = stbir__type_size[stbir_info->input_type] * channels;
+  stbir_edge edge_horizontal = stbir_info->horizontal.edge;
+  stbir_edge edge_vertical = stbir_info->vertical.edge;
+  int row = stbir__edge_wrap(edge_vertical, n, stbir_info->vertical.scale_info.input_full_size);
+  const void* input_plane_data = ( (char *) stbir_info->input_data ) + (size_t)row * (size_t) stbir_info->input_stride_bytes;
+  stbir__span const * spans = stbir_info->scanline_extents.spans;
+  float * full_decode_buffer = output_buffer - stbir_info->scanline_extents.conservative.n0 * effective_channels;
+  float * last_decoded = 0;
+
+  // if we are on edge_zero, and we get in here with an out of bounds n, then the calculate filters has failed
+  STBIR_ASSERT( !(edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info->vertical.scale_info.input_full_size)) );
+
+  do
+  {
+    float * decode_buffer;
+    void const * input_data;
+    float * end_decode;
+    int width_times_channels;
+    int width;
+
+    if ( spans->n1 < spans->n0 )
+      break;
+
+    width = spans->n1 + 1 - spans->n0;
+    decode_buffer = full_decode_buffer + spans->n0 * effective_channels;
+    end_decode = full_decode_buffer + ( spans->n1 + 1 ) * effective_channels;
+    width_times_channels = width * channels;
+
+    // read directly out of input plane by default
+    input_data = ( (char*)input_plane_data ) + spans->pixel_offset_for_input * input_sample_in_bytes;
+
+    // if we have an input callback, call it to get the input data
+    if ( stbir_info->in_pixels_cb )
+    {
+      // call the callback with a temp buffer (that they can choose to use or not).  the temp is just right aligned memory in the decode_buffer itself
+      input_data = stbir_info->in_pixels_cb( ( (char*) end_decode ) - ( width * input_sample_in_bytes ) + ( ( stbir_info->input_type != STBIR_TYPE_FLOAT ) ? ( sizeof(float)*STBIR_INPUT_CALLBACK_PADDING ) : 0 ), input_plane_data, width, spans->pixel_offset_for_input, row, stbir_info->user_data );
+    }
+
+    STBIR_PROFILE_START( decode );
+    // convert the pixels info the float decode_buffer, (we index from end_decode, so that when channels<effective_channels, we are right justified in the buffer)
+    last_decoded = stbir_info->decode_pixels( (float*)end_decode - width_times_channels, width_times_channels, input_data );
+    STBIR_PROFILE_END( decode );
+
+    if (stbir_info->alpha_weight)
+    {
+      STBIR_PROFILE_START( alpha );
+      stbir_info->alpha_weight( decode_buffer, width_times_channels );
+      STBIR_PROFILE_END( alpha );
+    }
+
+    ++spans;
+  } while ( spans <= ( &stbir_info->scanline_extents.spans[1] ) );
+
+  // handle the edge_wrap filter (all other types are handled back out at the calculate_filter stage)
+  // basically the idea here is that if we have the whole scanline in memory, we don't redecode the
+  //   wrapped edge pixels, and instead just memcpy them from the scanline into the edge positions
+  if ( ( edge_horizontal == STBIR_EDGE_WRAP ) && ( stbir_info->scanline_extents.edge_sizes[0] | stbir_info->scanline_extents.edge_sizes[1] ) )
+  {
+    // this code only runs if we're in edge_wrap, and we're doing the entire scanline
+    int e, start_x[2];
+    int input_full_size = stbir_info->horizontal.scale_info.input_full_size;
+
+    start_x[0] = -stbir_info->scanline_extents.edge_sizes[0];  // left edge start x
+    start_x[1] =  input_full_size;                             // right edge
+
+    for( e = 0; e < 2 ; e++ )
+    {
+      // do each margin
+      int margin = stbir_info->scanline_extents.edge_sizes[e];
+      if ( margin )
+      {
+        int x = start_x[e];
+        float * marg = full_decode_buffer + x * effective_channels;
+        float const * src = full_decode_buffer + stbir__edge_wrap(edge_horizontal, x, input_full_size) * effective_channels;
+        STBIR_MEMCPY( marg, src, margin * effective_channels * sizeof(float) );
+        if ( e == 1 ) last_decoded = marg + margin * effective_channels;
+      }
+    }
+  }
+  
+  // some of the horizontal gathers read one float off the edge (which is masked out), but we force a zero here to make sure no NaNs leak in
+  //   (we can't pre-zero it, because the input callback can use that area as padding)
+  last_decoded[0] = 0.0f; 
+
+  // we clear this extra float, because the final output pixel filter kernel might have used one less coeff than the max filter width
+  //   when this happens, we do read that pixel from the input, so it too could be Nan, so just zero an extra one.
+  //   this fits because each scanline is padded by three floats (STBIR_INPUT_CALLBACK_PADDING)
+  last_decoded[1] = 0.0f;
+}
+
+
+//=================
+// Do 1 channel horizontal routines
+
+#ifdef STBIR_SIMD
+
+#define stbir__1_coeff_only()          \
+    stbir__simdf tot,c;                \
+    STBIR_SIMD_NO_UNROLL(decode);      \
+    stbir__simdf_load1( c, hc );       \
+    stbir__simdf_mult1_mem( tot, c, decode );
+
+#define stbir__2_coeff_only()          \
+    stbir__simdf tot,c,d;              \
+    STBIR_SIMD_NO_UNROLL(decode);      \
+    stbir__simdf_load2z( c, hc );      \
+    stbir__simdf_load2( d, decode );   \
+    stbir__simdf_mult( tot, c, d );    \
+    stbir__simdf_0123to1230( c, tot ); \
+    stbir__simdf_add1( tot, tot, c );
+
+#define stbir__3_coeff_only()                  \
+    stbir__simdf tot,c,t;                      \
+    STBIR_SIMD_NO_UNROLL(decode);              \
+    stbir__simdf_load( c, hc );                \
+    stbir__simdf_mult_mem( tot, c, decode );   \
+    stbir__simdf_0123to1230( c, tot );         \
+    stbir__simdf_0123to2301( t, tot );         \
+    stbir__simdf_add1( tot, tot, c );          \
+    stbir__simdf_add1( tot, tot, t );
+
+#define stbir__store_output_tiny()                \
+    stbir__simdf_store1( output, tot );           \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 1;
+
+#define stbir__4_coeff_start()                 \
+    stbir__simdf tot,c;                        \
+    STBIR_SIMD_NO_UNROLL(decode);              \
+    stbir__simdf_load( c, hc );                \
+    stbir__simdf_mult_mem( tot, c, decode );   \
+
+#define stbir__4_coeff_continue_from_4( ofs )  \
+    STBIR_SIMD_NO_UNROLL(decode);              \
+    stbir__simdf_load( c, hc + (ofs) );        \
+    stbir__simdf_madd_mem( tot, tot, c, decode+(ofs) );
+
+#define stbir__1_coeff_remnant( ofs )          \
+    { stbir__simdf d;                          \
+    stbir__simdf_load1z( c, hc + (ofs) );      \
+    stbir__simdf_load1( d, decode + (ofs) );   \
+    stbir__simdf_madd( tot, tot, d, c ); }
+
+#define stbir__2_coeff_remnant( ofs )          \
+    { stbir__simdf d;                          \
+    stbir__simdf_load2z( c, hc+(ofs) );        \
+    stbir__simdf_load2( d, decode+(ofs) );     \
+    stbir__simdf_madd( tot, tot, d, c ); }
+
+#define stbir__3_coeff_setup()                 \
+    stbir__simdf mask;                         \
+    stbir__simdf_load( mask, STBIR_mask + 3 );
+
+#define stbir__3_coeff_remnant( ofs )                  \
+    stbir__simdf_load( c, hc+(ofs) );                  \
+    stbir__simdf_and( c, c, mask );                    \
+    stbir__simdf_madd_mem( tot, tot, c, decode+(ofs) );
+
+#define stbir__store_output()                     \
+    stbir__simdf_0123to2301( c, tot );            \
+    stbir__simdf_add( tot, tot, c );              \
+    stbir__simdf_0123to1230( c, tot );            \
+    stbir__simdf_add1( tot, tot, c );             \
+    stbir__simdf_store1( output, tot );           \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 1;
+
+#else
+
+#define stbir__1_coeff_only()  \
+    float tot;                 \
+    tot = decode[0]*hc[0];
+
+#define stbir__2_coeff_only()  \
+    float tot;                 \
+    tot = decode[0] * hc[0];   \
+    tot += decode[1] * hc[1];
+
+#define stbir__3_coeff_only()  \
+    float tot;                 \
+    tot = decode[0] * hc[0];   \
+    tot += decode[1] * hc[1];  \
+    tot += decode[2] * hc[2];
+
+#define stbir__store_output_tiny()                \
+    output[0] = tot;                              \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 1;
+
+#define stbir__4_coeff_start()  \
+    float tot0,tot1,tot2,tot3;  \
+    tot0 = decode[0] * hc[0];   \
+    tot1 = decode[1] * hc[1];   \
+    tot2 = decode[2] * hc[2];   \
+    tot3 = decode[3] * hc[3];
+
+#define stbir__4_coeff_continue_from_4( ofs )  \
+    tot0 += decode[0+(ofs)] * hc[0+(ofs)];     \
+    tot1 += decode[1+(ofs)] * hc[1+(ofs)];     \
+    tot2 += decode[2+(ofs)] * hc[2+(ofs)];     \
+    tot3 += decode[3+(ofs)] * hc[3+(ofs)];
+
+#define stbir__1_coeff_remnant( ofs )        \
+    tot0 += decode[0+(ofs)] * hc[0+(ofs)];
+
+#define stbir__2_coeff_remnant( ofs )        \
+    tot0 += decode[0+(ofs)] * hc[0+(ofs)];   \
+    tot1 += decode[1+(ofs)] * hc[1+(ofs)];   \
+
+#define stbir__3_coeff_remnant( ofs )        \
+    tot0 += decode[0+(ofs)] * hc[0+(ofs)];   \
+    tot1 += decode[1+(ofs)] * hc[1+(ofs)];   \
+    tot2 += decode[2+(ofs)] * hc[2+(ofs)];
+
+#define stbir__store_output()                     \
+    output[0] = (tot0+tot2)+(tot1+tot3);          \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 1;
+
+#endif
+
+#define STBIR__horizontal_channels 1
+#define STB_IMAGE_RESIZE_DO_HORIZONTALS
+#include STBIR__HEADER_FILENAME
+
+
+//=================
+// Do 2 channel horizontal routines
+
+#ifdef STBIR_SIMD
+
+#define stbir__1_coeff_only()         \
+    stbir__simdf tot,c,d;             \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    stbir__simdf_load1z( c, hc );     \
+    stbir__simdf_0123to0011( c, c );  \
+    stbir__simdf_load2( d, decode );  \
+    stbir__simdf_mult( tot, d, c );
+
+#define stbir__2_coeff_only()         \
+    stbir__simdf tot,c;               \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    stbir__simdf_load2( c, hc );      \
+    stbir__simdf_0123to0011( c, c );  \
+    stbir__simdf_mult_mem( tot, c, decode );
+
+#define stbir__3_coeff_only()                \
+    stbir__simdf tot,c,cs,d;                 \
+    STBIR_SIMD_NO_UNROLL(decode);            \
+    stbir__simdf_load( cs, hc );             \
+    stbir__simdf_0123to0011( c, cs );        \
+    stbir__simdf_mult_mem( tot, c, decode ); \
+    stbir__simdf_0123to2222( c, cs );        \
+    stbir__simdf_load2z( d, decode+4 );      \
+    stbir__simdf_madd( tot, tot, d, c );
+
+#define stbir__store_output_tiny()                \
+    stbir__simdf_0123to2301( c, tot );            \
+    stbir__simdf_add( tot, tot, c );              \
+    stbir__simdf_store2( output, tot );           \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 2;
+
+#ifdef STBIR_SIMD8
+
+#define stbir__4_coeff_start()                    \
+    stbir__simdf8 tot0,c,cs;                      \
+    STBIR_SIMD_NO_UNROLL(decode);                 \
+    stbir__simdf8_load4b( cs, hc );               \
+    stbir__simdf8_0123to00112233( c, cs );        \
+    stbir__simdf8_mult_mem( tot0, c, decode );
+
+#define stbir__4_coeff_continue_from_4( ofs )        \
+    STBIR_SIMD_NO_UNROLL(decode);                    \
+    stbir__simdf8_load4b( cs, hc + (ofs) );          \
+    stbir__simdf8_0123to00112233( c, cs );           \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*2 );
+
+#define stbir__1_coeff_remnant( ofs )                \
+    { stbir__simdf t,d;                              \
+    stbir__simdf_load1z( t, hc + (ofs) );            \
+    stbir__simdf_load2( d, decode + (ofs) * 2 );     \
+    stbir__simdf_0123to0011( t, t );                 \
+    stbir__simdf_mult( t, t, d );                    \
+    stbir__simdf8_add4( tot0, tot0, t ); }
+ 
+#define stbir__2_coeff_remnant( ofs )                \
+    { stbir__simdf t;                                \
+    stbir__simdf_load2( t, hc + (ofs) );             \
+    stbir__simdf_0123to0011( t, t );                 \
+    stbir__simdf_mult_mem( t, t, decode+(ofs)*2 );   \
+    stbir__simdf8_add4( tot0, tot0, t ); }
+
+#define stbir__3_coeff_remnant( ofs )                \
+    { stbir__simdf8 d;                               \
+    stbir__simdf8_load4b( cs, hc + (ofs) );          \
+    stbir__simdf8_0123to00112233( c, cs );           \
+    stbir__simdf8_load6z( d, decode+(ofs)*2 );       \
+    stbir__simdf8_madd( tot0, tot0, c, d ); }
+
+#define stbir__store_output()                     \
+    { stbir__simdf t,d;                           \
+    stbir__simdf8_add4halves( t, stbir__if_simdf8_cast_to_simdf4(tot0), tot0 );    \
+    stbir__simdf_0123to2301( d, t );              \
+    stbir__simdf_add( t, t, d );                  \
+    stbir__simdf_store2( output, t );             \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 2; }
+
+#else
+
+#define stbir__4_coeff_start()                   \
+    stbir__simdf tot0,tot1,c,cs;                 \
+    STBIR_SIMD_NO_UNROLL(decode);                \
+    stbir__simdf_load( cs, hc );                 \
+    stbir__simdf_0123to0011( c, cs );            \
+    stbir__simdf_mult_mem( tot0, c, decode );    \
+    stbir__simdf_0123to2233( c, cs );            \
+    stbir__simdf_mult_mem( tot1, c, decode+4 );
+
+#define stbir__4_coeff_continue_from_4( ofs )                \
+    STBIR_SIMD_NO_UNROLL(decode);                            \
+    stbir__simdf_load( cs, hc + (ofs) );                     \
+    stbir__simdf_0123to0011( c, cs );                        \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*2 );  \
+    stbir__simdf_0123to2233( c, cs );                        \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*2+4 );
+
+#define stbir__1_coeff_remnant( ofs )            \
+    { stbir__simdf d;                            \
+    stbir__simdf_load1z( cs, hc + (ofs) );       \
+    stbir__simdf_0123to0011( c, cs );            \
+    stbir__simdf_load2( d, decode + (ofs) * 2 ); \
+    stbir__simdf_madd( tot0, tot0, d, c ); }
+
+#define stbir__2_coeff_remnant( ofs )                      \
+    stbir__simdf_load2( cs, hc + (ofs) );                  \
+    stbir__simdf_0123to0011( c, cs );                      \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*2 );
+
+#define stbir__3_coeff_remnant( ofs )                       \
+    { stbir__simdf d;                                       \
+    stbir__simdf_load( cs, hc + (ofs) );                    \
+    stbir__simdf_0123to0011( c, cs );                       \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*2 ); \
+    stbir__simdf_0123to2222( c, cs );                       \
+    stbir__simdf_load2z( d, decode + (ofs) * 2 + 4 );       \
+    stbir__simdf_madd( tot1, tot1, d, c ); }
+
+#define stbir__store_output()                     \
+    stbir__simdf_add( tot0, tot0, tot1 );         \
+    stbir__simdf_0123to2301( c, tot0 );           \
+    stbir__simdf_add( tot0, tot0, c );            \
+    stbir__simdf_store2( output, tot0 );          \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 2;
+
+#endif
+
+#else
+
+#define stbir__1_coeff_only()  \
+    float tota,totb,c;         \
+    c = hc[0];                 \
+    tota = decode[0]*c;        \
+    totb = decode[1]*c;
+
+#define stbir__2_coeff_only()  \
+    float tota,totb,c;         \
+    c = hc[0];                 \
+    tota = decode[0]*c;        \
+    totb = decode[1]*c;        \
+    c = hc[1];                 \
+    tota += decode[2]*c;       \
+    totb += decode[3]*c;
+
+// this weird order of add matches the simd
+#define stbir__3_coeff_only()  \
+    float tota,totb,c;         \
+    c = hc[0];                 \
+    tota = decode[0]*c;        \
+    totb = decode[1]*c;        \
+    c = hc[2];                 \
+    tota += decode[4]*c;       \
+    totb += decode[5]*c;       \
+    c = hc[1];                 \
+    tota += decode[2]*c;       \
+    totb += decode[3]*c;
+
+#define stbir__store_output_tiny()                \
+    output[0] = tota;                             \
+    output[1] = totb;                             \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 2;
+
+#define stbir__4_coeff_start()      \
+    float tota0,tota1,tota2,tota3,totb0,totb1,totb2,totb3,c;  \
+    c = hc[0];                      \
+    tota0 = decode[0]*c;            \
+    totb0 = decode[1]*c;            \
+    c = hc[1];                      \
+    tota1 = decode[2]*c;            \
+    totb1 = decode[3]*c;            \
+    c = hc[2];                      \
+    tota2 = decode[4]*c;            \
+    totb2 = decode[5]*c;            \
+    c = hc[3];                      \
+    tota3 = decode[6]*c;            \
+    totb3 = decode[7]*c;
+
+#define stbir__4_coeff_continue_from_4( ofs )  \
+    c = hc[0+(ofs)];                           \
+    tota0 += decode[0+(ofs)*2]*c;              \
+    totb0 += decode[1+(ofs)*2]*c;              \
+    c = hc[1+(ofs)];                           \
+    tota1 += decode[2+(ofs)*2]*c;              \
+    totb1 += decode[3+(ofs)*2]*c;              \
+    c = hc[2+(ofs)];                           \
+    tota2 += decode[4+(ofs)*2]*c;              \
+    totb2 += decode[5+(ofs)*2]*c;              \
+    c = hc[3+(ofs)];                           \
+    tota3 += decode[6+(ofs)*2]*c;              \
+    totb3 += decode[7+(ofs)*2]*c;
+
+#define stbir__1_coeff_remnant( ofs )  \
+    c = hc[0+(ofs)];                   \
+    tota0 += decode[0+(ofs)*2] * c;    \
+    totb0 += decode[1+(ofs)*2] * c;
+
+#define stbir__2_coeff_remnant( ofs )  \
+    c = hc[0+(ofs)];                   \
+    tota0 += decode[0+(ofs)*2] * c;    \
+    totb0 += decode[1+(ofs)*2] * c;    \
+    c = hc[1+(ofs)];                   \
+    tota1 += decode[2+(ofs)*2] * c;    \
+    totb1 += decode[3+(ofs)*2] * c;
+
+#define stbir__3_coeff_remnant( ofs )  \
+    c = hc[0+(ofs)];                   \
+    tota0 += decode[0+(ofs)*2] * c;    \
+    totb0 += decode[1+(ofs)*2] * c;    \
+    c = hc[1+(ofs)];                   \
+    tota1 += decode[2+(ofs)*2] * c;    \
+    totb1 += decode[3+(ofs)*2] * c;    \
+    c = hc[2+(ofs)];                   \
+    tota2 += decode[4+(ofs)*2] * c;    \
+    totb2 += decode[5+(ofs)*2] * c;
+
+#define stbir__store_output()                     \
+    output[0] = (tota0+tota2)+(tota1+tota3);      \
+    output[1] = (totb0+totb2)+(totb1+totb3);      \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 2;
+
+#endif
+
+#define STBIR__horizontal_channels 2
+#define STB_IMAGE_RESIZE_DO_HORIZONTALS
+#include STBIR__HEADER_FILENAME
+
+
+//=================
+// Do 3 channel horizontal routines
+
+#ifdef STBIR_SIMD
+
+#define stbir__1_coeff_only()         \
+    stbir__simdf tot,c,d;             \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    stbir__simdf_load1z( c, hc );     \
+    stbir__simdf_0123to0001( c, c );  \
+    stbir__simdf_load( d, decode );   \
+    stbir__simdf_mult( tot, d, c );
+
+#define stbir__2_coeff_only()         \
+    stbir__simdf tot,c,cs,d;          \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    stbir__simdf_load2( cs, hc );     \
+    stbir__simdf_0123to0000( c, cs ); \
+    stbir__simdf_load( d, decode );   \
+    stbir__simdf_mult( tot, d, c );   \
+    stbir__simdf_0123to1111( c, cs ); \
+    stbir__simdf_load( d, decode+3 ); \
+    stbir__simdf_madd( tot, tot, d, c );
+
+#define stbir__3_coeff_only()            \
+    stbir__simdf tot,c,d,cs;             \
+    STBIR_SIMD_NO_UNROLL(decode);        \
+    stbir__simdf_load( cs, hc );         \
+    stbir__simdf_0123to0000( c, cs );    \
+    stbir__simdf_load( d, decode );      \
+    stbir__simdf_mult( tot, d, c );      \
+    stbir__simdf_0123to1111( c, cs );    \
+    stbir__simdf_load( d, decode+3 );    \
+    stbir__simdf_madd( tot, tot, d, c ); \
+    stbir__simdf_0123to2222( c, cs );    \
+    stbir__simdf_load( d, decode+6 );    \
+    stbir__simdf_madd( tot, tot, d, c );
+
+#define stbir__store_output_tiny()                \
+    stbir__simdf_store2( output, tot );           \
+    stbir__simdf_0123to2301( tot, tot );          \
+    stbir__simdf_store1( output+2, tot );         \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 3;
+
+#ifdef STBIR_SIMD8
+
+// we're loading from the XXXYYY decode by -1 to get the XXXYYY into different halves of the AVX reg fyi
+#define stbir__4_coeff_start()                     \
+    stbir__simdf8 tot0,tot1,c,cs; stbir__simdf t;  \
+    STBIR_SIMD_NO_UNROLL(decode);                  \
+    stbir__simdf8_load4b( cs, hc );                \
+    stbir__simdf8_0123to00001111( c, cs );         \
+    stbir__simdf8_mult_mem( tot0, c, decode - 1 ); \
+    stbir__simdf8_0123to22223333( c, cs );         \
+    stbir__simdf8_mult_mem( tot1, c, decode+6 - 1 );
+
+#define stbir__4_coeff_continue_from_4( ofs )      \
+    STBIR_SIMD_NO_UNROLL(decode);                  \
+    stbir__simdf8_load4b( cs, hc + (ofs) );        \
+    stbir__simdf8_0123to00001111( c, cs );         \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*3 - 1 ); \
+    stbir__simdf8_0123to22223333( c, cs );         \
+    stbir__simdf8_madd_mem( tot1, tot1, c, decode+(ofs)*3 + 6 - 1 );
+
+#define stbir__1_coeff_remnant( ofs )                          \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf_load1rep4( t, hc + (ofs) );                   \
+    stbir__simdf8_madd_mem4( tot0, tot0, t, decode+(ofs)*3 - 1 );
+
+#define stbir__2_coeff_remnant( ofs )                          \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf8_load4b( cs, hc + (ofs) - 2 );                \
+    stbir__simdf8_0123to22223333( c, cs );                     \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*3 - 1 );
+
+ #define stbir__3_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                                \
+    stbir__simdf8_load4b( cs, hc + (ofs) );                      \
+    stbir__simdf8_0123to00001111( c, cs );                       \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*3 - 1 ); \
+    stbir__simdf8_0123to2222( t, cs );                           \
+    stbir__simdf8_madd_mem4( tot1, tot1, t, decode+(ofs)*3 + 6 - 1 );
+
+#define stbir__store_output()                       \
+    stbir__simdf8_add( tot0, tot0, tot1 );          \
+    stbir__simdf_0123to1230( t, stbir__if_simdf8_cast_to_simdf4( tot0 ) ); \
+    stbir__simdf8_add4halves( t, t, tot0 );         \
+    horizontal_coefficients += coefficient_width;   \
+    ++horizontal_contributors;                      \
+    output += 3;                                    \
+    if ( output < output_end )                      \
+    {                                               \
+      stbir__simdf_store( output-3, t );            \
+      continue;                                     \
+    }                                               \
+    { stbir__simdf tt; stbir__simdf_0123to2301( tt, t ); \
+    stbir__simdf_store2( output-3, t );             \
+    stbir__simdf_store1( output+2-3, tt ); }        \
+    break;
+
+
+#else
+
+#define stbir__4_coeff_start()                  \
+    stbir__simdf tot0,tot1,tot2,c,cs;           \
+    STBIR_SIMD_NO_UNROLL(decode);               \
+    stbir__simdf_load( cs, hc );                \
+    stbir__simdf_0123to0001( c, cs );           \
+    stbir__simdf_mult_mem( tot0, c, decode );   \
+    stbir__simdf_0123to1122( c, cs );           \
+    stbir__simdf_mult_mem( tot1, c, decode+4 ); \
+    stbir__simdf_0123to2333( c, cs );           \
+    stbir__simdf_mult_mem( tot2, c, decode+8 );
+
+#define stbir__4_coeff_continue_from_4( ofs )                 \
+    STBIR_SIMD_NO_UNROLL(decode);                             \
+    stbir__simdf_load( cs, hc + (ofs) );                      \
+    stbir__simdf_0123to0001( c, cs );                         \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*3 );   \
+    stbir__simdf_0123to1122( c, cs );                         \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*3+4 ); \
+    stbir__simdf_0123to2333( c, cs );                         \
+    stbir__simdf_madd_mem( tot2, tot2, c, decode+(ofs)*3+8 );
+
+#define stbir__1_coeff_remnant( ofs )         \
+    STBIR_SIMD_NO_UNROLL(decode);             \
+    stbir__simdf_load1z( c, hc + (ofs) );     \
+    stbir__simdf_0123to0001( c, c );          \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*3 );
+
+#define stbir__2_coeff_remnant( ofs )                       \
+    { stbir__simdf d;                                       \
+    STBIR_SIMD_NO_UNROLL(decode);                           \
+    stbir__simdf_load2z( cs, hc + (ofs) );                  \
+    stbir__simdf_0123to0001( c, cs );                       \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*3 ); \
+    stbir__simdf_0123to1122( c, cs );                       \
+    stbir__simdf_load2z( d, decode+(ofs)*3+4 );             \
+    stbir__simdf_madd( tot1, tot1, c, d ); }
+
+#define stbir__3_coeff_remnant( ofs )                         \
+    { stbir__simdf d;                                         \
+    STBIR_SIMD_NO_UNROLL(decode);                             \
+    stbir__simdf_load( cs, hc + (ofs) );                      \
+    stbir__simdf_0123to0001( c, cs );                         \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*3 );   \
+    stbir__simdf_0123to1122( c, cs );                         \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*3+4 ); \
+    stbir__simdf_0123to2222( c, cs );                         \
+    stbir__simdf_load1z( d, decode+(ofs)*3+8 );               \
+    stbir__simdf_madd( tot2, tot2, c, d );  }
+
+#define stbir__store_output()                       \
+    stbir__simdf_0123ABCDto3ABx( c, tot0, tot1 );   \
+    stbir__simdf_0123ABCDto23Ax( cs, tot1, tot2 );  \
+    stbir__simdf_0123to1230( tot2, tot2 );          \
+    stbir__simdf_add( tot0, tot0, cs );             \
+    stbir__simdf_add( c, c, tot2 );                 \
+    stbir__simdf_add( tot0, tot0, c );              \
+    horizontal_coefficients += coefficient_width;   \
+    ++horizontal_contributors;                      \
+    output += 3;                                    \
+    if ( output < output_end )                      \
+    {                                               \
+      stbir__simdf_store( output-3, tot0 );         \
+      continue;                                     \
+    }                                               \
+    stbir__simdf_0123to2301( tot1, tot0 );          \
+    stbir__simdf_store2( output-3, tot0 );          \
+    stbir__simdf_store1( output+2-3, tot1 );        \
+    break;
+
+#endif
+
+#else
+
+#define stbir__1_coeff_only()  \
+    float tot0, tot1, tot2, c; \
+    c = hc[0];                 \
+    tot0 = decode[0]*c;        \
+    tot1 = decode[1]*c;        \
+    tot2 = decode[2]*c;
+
+#define stbir__2_coeff_only()  \
+    float tot0, tot1, tot2, c; \
+    c = hc[0];                 \
+    tot0 = decode[0]*c;        \
+    tot1 = decode[1]*c;        \
+    tot2 = decode[2]*c;        \
+    c = hc[1];                 \
+    tot0 += decode[3]*c;       \
+    tot1 += decode[4]*c;       \
+    tot2 += decode[5]*c;
+
+#define stbir__3_coeff_only()  \
+    float tot0, tot1, tot2, c; \
+    c = hc[0];                 \
+    tot0 = decode[0]*c;        \
+    tot1 = decode[1]*c;        \
+    tot2 = decode[2]*c;        \
+    c = hc[1];                 \
+    tot0 += decode[3]*c;       \
+    tot1 += decode[4]*c;       \
+    tot2 += decode[5]*c;       \
+    c = hc[2];                 \
+    tot0 += decode[6]*c;       \
+    tot1 += decode[7]*c;       \
+    tot2 += decode[8]*c;
+
+#define stbir__store_output_tiny()                \
+    output[0] = tot0;                             \
+    output[1] = tot1;                             \
+    output[2] = tot2;                             \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 3;
+
+#define stbir__4_coeff_start()      \
+    float tota0,tota1,tota2,totb0,totb1,totb2,totc0,totc1,totc2,totd0,totd1,totd2,c;  \
+    c = hc[0];                      \
+    tota0 = decode[0]*c;            \
+    tota1 = decode[1]*c;            \
+    tota2 = decode[2]*c;            \
+    c = hc[1];                      \
+    totb0 = decode[3]*c;            \
+    totb1 = decode[4]*c;            \
+    totb2 = decode[5]*c;            \
+    c = hc[2];                      \
+    totc0 = decode[6]*c;            \
+    totc1 = decode[7]*c;            \
+    totc2 = decode[8]*c;            \
+    c = hc[3];                      \
+    totd0 = decode[9]*c;            \
+    totd1 = decode[10]*c;           \
+    totd2 = decode[11]*c;
+
+#define stbir__4_coeff_continue_from_4( ofs )  \
+    c = hc[0+(ofs)];                           \
+    tota0 += decode[0+(ofs)*3]*c;              \
+    tota1 += decode[1+(ofs)*3]*c;              \
+    tota2 += decode[2+(ofs)*3]*c;              \
+    c = hc[1+(ofs)];                           \
+    totb0 += decode[3+(ofs)*3]*c;              \
+    totb1 += decode[4+(ofs)*3]*c;              \
+    totb2 += decode[5+(ofs)*3]*c;              \
+    c = hc[2+(ofs)];                           \
+    totc0 += decode[6+(ofs)*3]*c;              \
+    totc1 += decode[7+(ofs)*3]*c;              \
+    totc2 += decode[8+(ofs)*3]*c;              \
+    c = hc[3+(ofs)];                           \
+    totd0 += decode[9+(ofs)*3]*c;              \
+    totd1 += decode[10+(ofs)*3]*c;             \
+    totd2 += decode[11+(ofs)*3]*c;
+
+#define stbir__1_coeff_remnant( ofs )  \
+    c = hc[0+(ofs)];                   \
+    tota0 += decode[0+(ofs)*3]*c;      \
+    tota1 += decode[1+(ofs)*3]*c;      \
+    tota2 += decode[2+(ofs)*3]*c;
+
+#define stbir__2_coeff_remnant( ofs )  \
+    c = hc[0+(ofs)];                   \
+    tota0 += decode[0+(ofs)*3]*c;      \
+    tota1 += decode[1+(ofs)*3]*c;      \
+    tota2 += decode[2+(ofs)*3]*c;      \
+    c = hc[1+(ofs)];                   \
+    totb0 += decode[3+(ofs)*3]*c;      \
+    totb1 += decode[4+(ofs)*3]*c;      \
+    totb2 += decode[5+(ofs)*3]*c;      \
+
+#define stbir__3_coeff_remnant( ofs )  \
+    c = hc[0+(ofs)];                   \
+    tota0 += decode[0+(ofs)*3]*c;      \
+    tota1 += decode[1+(ofs)*3]*c;      \
+    tota2 += decode[2+(ofs)*3]*c;      \
+    c = hc[1+(ofs)];                   \
+    totb0 += decode[3+(ofs)*3]*c;      \
+    totb1 += decode[4+(ofs)*3]*c;      \
+    totb2 += decode[5+(ofs)*3]*c;      \
+    c = hc[2+(ofs)];                   \
+    totc0 += decode[6+(ofs)*3]*c;      \
+    totc1 += decode[7+(ofs)*3]*c;      \
+    totc2 += decode[8+(ofs)*3]*c;
+
+#define stbir__store_output()                     \
+    output[0] = (tota0+totc0)+(totb0+totd0);      \
+    output[1] = (tota1+totc1)+(totb1+totd1);      \
+    output[2] = (tota2+totc2)+(totb2+totd2);      \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 3;
+
+#endif
+
+#define STBIR__horizontal_channels 3
+#define STB_IMAGE_RESIZE_DO_HORIZONTALS
+#include STBIR__HEADER_FILENAME
+
+//=================
+// Do 4 channel horizontal routines
+
+#ifdef STBIR_SIMD
+
+#define stbir__1_coeff_only()             \
+    stbir__simdf tot,c;                   \
+    STBIR_SIMD_NO_UNROLL(decode);         \
+    stbir__simdf_load1( c, hc );          \
+    stbir__simdf_0123to0000( c, c );      \
+    stbir__simdf_mult_mem( tot, c, decode );
+
+#define stbir__2_coeff_only()                       \
+    stbir__simdf tot,c,cs;                          \
+    STBIR_SIMD_NO_UNROLL(decode);                   \
+    stbir__simdf_load2( cs, hc );                   \
+    stbir__simdf_0123to0000( c, cs );               \
+    stbir__simdf_mult_mem( tot, c, decode );        \
+    stbir__simdf_0123to1111( c, cs );               \
+    stbir__simdf_madd_mem( tot, tot, c, decode+4 );
+
+#define stbir__3_coeff_only()                       \
+    stbir__simdf tot,c,cs;                          \
+    STBIR_SIMD_NO_UNROLL(decode);                   \
+    stbir__simdf_load( cs, hc );                    \
+    stbir__simdf_0123to0000( c, cs );               \
+    stbir__simdf_mult_mem( tot, c, decode );        \
+    stbir__simdf_0123to1111( c, cs );               \
+    stbir__simdf_madd_mem( tot, tot, c, decode+4 ); \
+    stbir__simdf_0123to2222( c, cs );               \
+    stbir__simdf_madd_mem( tot, tot, c, decode+8 );
+
+#define stbir__store_output_tiny()                \
+    stbir__simdf_store( output, tot );            \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 4;
+
+#ifdef STBIR_SIMD8
+
+#define stbir__4_coeff_start()                     \
+    stbir__simdf8 tot0,c,cs; stbir__simdf t;  \
+    STBIR_SIMD_NO_UNROLL(decode);                  \
+    stbir__simdf8_load4b( cs, hc );                \
+    stbir__simdf8_0123to00001111( c, cs );         \
+    stbir__simdf8_mult_mem( tot0, c, decode );     \
+    stbir__simdf8_0123to22223333( c, cs );         \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+8 );
+
+#define stbir__4_coeff_continue_from_4( ofs )                  \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf8_load4b( cs, hc + (ofs) );                    \
+    stbir__simdf8_0123to00001111( c, cs );                     \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*4 );   \
+    stbir__simdf8_0123to22223333( c, cs );                     \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*4+8 );
+
+#define stbir__1_coeff_remnant( ofs )                          \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf_load1rep4( t, hc + (ofs) );                   \
+    stbir__simdf8_madd_mem4( tot0, tot0, t, decode+(ofs)*4 );
+
+#define stbir__2_coeff_remnant( ofs )                          \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf8_load4b( cs, hc + (ofs) - 2 );                \
+    stbir__simdf8_0123to22223333( c, cs );                     \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*4 );
+
+ #define stbir__3_coeff_remnant( ofs )                         \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf8_load4b( cs, hc + (ofs) );                    \
+    stbir__simdf8_0123to00001111( c, cs );                     \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*4 );   \
+    stbir__simdf8_0123to2222( t, cs );                         \
+    stbir__simdf8_madd_mem4( tot0, tot0, t, decode+(ofs)*4+8 );
+
+#define stbir__store_output()                      \
+    stbir__simdf8_add4halves( t, stbir__if_simdf8_cast_to_simdf4(tot0), tot0 );     \
+    stbir__simdf_store( output, t );               \
+    horizontal_coefficients += coefficient_width;  \
+    ++horizontal_contributors;                     \
+    output += 4;
+
+#else
+
+#define stbir__4_coeff_start()                        \
+    stbir__simdf tot0,tot1,c,cs;                      \
+    STBIR_SIMD_NO_UNROLL(decode);                     \
+    stbir__simdf_load( cs, hc );                      \
+    stbir__simdf_0123to0000( c, cs );                 \
+    stbir__simdf_mult_mem( tot0, c, decode );         \
+    stbir__simdf_0123to1111( c, cs );                 \
+    stbir__simdf_mult_mem( tot1, c, decode+4 );       \
+    stbir__simdf_0123to2222( c, cs );                 \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+8 ); \
+    stbir__simdf_0123to3333( c, cs );                 \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+12 );
+
+#define stbir__4_coeff_continue_from_4( ofs )                  \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf_load( cs, hc + (ofs) );                       \
+    stbir__simdf_0123to0000( c, cs );                          \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*4 );    \
+    stbir__simdf_0123to1111( c, cs );                          \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*4+4 );  \
+    stbir__simdf_0123to2222( c, cs );                          \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*4+8 );  \
+    stbir__simdf_0123to3333( c, cs );                          \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*4+12 );
+
+#define stbir__1_coeff_remnant( ofs )                       \
+    STBIR_SIMD_NO_UNROLL(decode);                           \
+    stbir__simdf_load1( c, hc + (ofs) );                    \
+    stbir__simdf_0123to0000( c, c );                        \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*4 );
+
+#define stbir__2_coeff_remnant( ofs )                         \
+    STBIR_SIMD_NO_UNROLL(decode);                             \
+    stbir__simdf_load2( cs, hc + (ofs) );                     \
+    stbir__simdf_0123to0000( c, cs );                         \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*4 );   \
+    stbir__simdf_0123to1111( c, cs );                         \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*4+4 );
+
+#define stbir__3_coeff_remnant( ofs )                          \
+    STBIR_SIMD_NO_UNROLL(decode);                              \
+    stbir__simdf_load( cs, hc + (ofs) );                       \
+    stbir__simdf_0123to0000( c, cs );                          \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*4 );    \
+    stbir__simdf_0123to1111( c, cs );                          \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*4+4 );  \
+    stbir__simdf_0123to2222( c, cs );                          \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*4+8 );
+
+#define stbir__store_output()                     \
+    stbir__simdf_add( tot0, tot0, tot1 );         \
+    stbir__simdf_store( output, tot0 );           \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 4;
+
+#endif
+
+#else
+
+#define stbir__1_coeff_only()         \
+    float p0,p1,p2,p3,c;              \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0];                        \
+    p0 = decode[0] * c;               \
+    p1 = decode[1] * c;               \
+    p2 = decode[2] * c;               \
+    p3 = decode[3] * c;
+
+#define stbir__2_coeff_only()         \
+    float p0,p1,p2,p3,c;              \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0];                        \
+    p0 = decode[0] * c;               \
+    p1 = decode[1] * c;               \
+    p2 = decode[2] * c;               \
+    p3 = decode[3] * c;               \
+    c = hc[1];                        \
+    p0 += decode[4] * c;              \
+    p1 += decode[5] * c;              \
+    p2 += decode[6] * c;              \
+    p3 += decode[7] * c;
+
+#define stbir__3_coeff_only()         \
+    float p0,p1,p2,p3,c;              \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0];                        \
+    p0 = decode[0] * c;               \
+    p1 = decode[1] * c;               \
+    p2 = decode[2] * c;               \
+    p3 = decode[3] * c;               \
+    c = hc[1];                        \
+    p0 += decode[4] * c;              \
+    p1 += decode[5] * c;              \
+    p2 += decode[6] * c;              \
+    p3 += decode[7] * c;              \
+    c = hc[2];                        \
+    p0 += decode[8] * c;              \
+    p1 += decode[9] * c;              \
+    p2 += decode[10] * c;             \
+    p3 += decode[11] * c;
+
+#define stbir__store_output_tiny()                \
+    output[0] = p0;                               \
+    output[1] = p1;                               \
+    output[2] = p2;                               \
+    output[3] = p3;                               \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 4;
+
+#define stbir__4_coeff_start()        \
+    float x0,x1,x2,x3,y0,y1,y2,y3,c;  \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0];                        \
+    x0 = decode[0] * c;               \
+    x1 = decode[1] * c;               \
+    x2 = decode[2] * c;               \
+    x3 = decode[3] * c;               \
+    c = hc[1];                        \
+    y0 = decode[4] * c;               \
+    y1 = decode[5] * c;               \
+    y2 = decode[6] * c;               \
+    y3 = decode[7] * c;               \
+    c = hc[2];                        \
+    x0 += decode[8] * c;              \
+    x1 += decode[9] * c;              \
+    x2 += decode[10] * c;             \
+    x3 += decode[11] * c;             \
+    c = hc[3];                        \
+    y0 += decode[12] * c;             \
+    y1 += decode[13] * c;             \
+    y2 += decode[14] * c;             \
+    y3 += decode[15] * c;
+
+#define stbir__4_coeff_continue_from_4( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0+(ofs)];                  \
+    x0 += decode[0+(ofs)*4] * c;      \
+    x1 += decode[1+(ofs)*4] * c;      \
+    x2 += decode[2+(ofs)*4] * c;      \
+    x3 += decode[3+(ofs)*4] * c;      \
+    c = hc[1+(ofs)];                  \
+    y0 += decode[4+(ofs)*4] * c;      \
+    y1 += decode[5+(ofs)*4] * c;      \
+    y2 += decode[6+(ofs)*4] * c;      \
+    y3 += decode[7+(ofs)*4] * c;      \
+    c = hc[2+(ofs)];                  \
+    x0 += decode[8+(ofs)*4] * c;      \
+    x1 += decode[9+(ofs)*4] * c;      \
+    x2 += decode[10+(ofs)*4] * c;     \
+    x3 += decode[11+(ofs)*4] * c;     \
+    c = hc[3+(ofs)];                  \
+    y0 += decode[12+(ofs)*4] * c;     \
+    y1 += decode[13+(ofs)*4] * c;     \
+    y2 += decode[14+(ofs)*4] * c;     \
+    y3 += decode[15+(ofs)*4] * c;
+
+#define stbir__1_coeff_remnant( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0+(ofs)];                  \
+    x0 += decode[0+(ofs)*4] * c;      \
+    x1 += decode[1+(ofs)*4] * c;      \
+    x2 += decode[2+(ofs)*4] * c;      \
+    x3 += decode[3+(ofs)*4] * c;
+
+#define stbir__2_coeff_remnant( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0+(ofs)];                  \
+    x0 += decode[0+(ofs)*4] * c;      \
+    x1 += decode[1+(ofs)*4] * c;      \
+    x2 += decode[2+(ofs)*4] * c;      \
+    x3 += decode[3+(ofs)*4] * c;      \
+    c = hc[1+(ofs)];                  \
+    y0 += decode[4+(ofs)*4] * c;      \
+    y1 += decode[5+(ofs)*4] * c;      \
+    y2 += decode[6+(ofs)*4] * c;      \
+    y3 += decode[7+(ofs)*4] * c;
+
+#define stbir__3_coeff_remnant( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);     \
+    c = hc[0+(ofs)];                  \
+    x0 += decode[0+(ofs)*4] * c;      \
+    x1 += decode[1+(ofs)*4] * c;      \
+    x2 += decode[2+(ofs)*4] * c;      \
+    x3 += decode[3+(ofs)*4] * c;      \
+    c = hc[1+(ofs)];                  \
+    y0 += decode[4+(ofs)*4] * c;      \
+    y1 += decode[5+(ofs)*4] * c;      \
+    y2 += decode[6+(ofs)*4] * c;      \
+    y3 += decode[7+(ofs)*4] * c;      \
+    c = hc[2+(ofs)];                  \
+    x0 += decode[8+(ofs)*4] * c;      \
+    x1 += decode[9+(ofs)*4] * c;      \
+    x2 += decode[10+(ofs)*4] * c;     \
+    x3 += decode[11+(ofs)*4] * c;
+
+#define stbir__store_output()                     \
+    output[0] = x0 + y0;                          \
+    output[1] = x1 + y1;                          \
+    output[2] = x2 + y2;                          \
+    output[3] = x3 + y3;                          \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 4;
+
+#endif
+
+#define STBIR__horizontal_channels 4
+#define STB_IMAGE_RESIZE_DO_HORIZONTALS
+#include STBIR__HEADER_FILENAME
+
+
+
+//=================
+// Do 7 channel horizontal routines
+
+#ifdef STBIR_SIMD
+
+#define stbir__1_coeff_only()                   \
+    stbir__simdf tot0,tot1,c;                   \
+    STBIR_SIMD_NO_UNROLL(decode);               \
+    stbir__simdf_load1( c, hc );                \
+    stbir__simdf_0123to0000( c, c );            \
+    stbir__simdf_mult_mem( tot0, c, decode );   \
+    stbir__simdf_mult_mem( tot1, c, decode+3 );
+
+#define stbir__2_coeff_only()                         \
+    stbir__simdf tot0,tot1,c,cs;                      \
+    STBIR_SIMD_NO_UNROLL(decode);                     \
+    stbir__simdf_load2( cs, hc );                     \
+    stbir__simdf_0123to0000( c, cs );                 \
+    stbir__simdf_mult_mem( tot0, c, decode );         \
+    stbir__simdf_mult_mem( tot1, c, decode+3 );       \
+    stbir__simdf_0123to1111( c, cs );                 \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+7 ); \
+    stbir__simdf_madd_mem( tot1, tot1, c,decode+10 );
+
+#define stbir__3_coeff_only()                           \
+    stbir__simdf tot0,tot1,c,cs;                        \
+    STBIR_SIMD_NO_UNROLL(decode);                       \
+    stbir__simdf_load( cs, hc );                        \
+    stbir__simdf_0123to0000( c, cs );                   \
+    stbir__simdf_mult_mem( tot0, c, decode );           \
+    stbir__simdf_mult_mem( tot1, c, decode+3 );         \
+    stbir__simdf_0123to1111( c, cs );                   \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+7 );   \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+10 );  \
+    stbir__simdf_0123to2222( c, cs );                   \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+14 );  \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+17 );
+
+#define stbir__store_output_tiny()                \
+    stbir__simdf_store( output+3, tot1 );         \
+    stbir__simdf_store( output, tot0 );           \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 7;
+
+#ifdef STBIR_SIMD8
+
+#define stbir__4_coeff_start()                     \
+    stbir__simdf8 tot0,tot1,c,cs;                  \
+    STBIR_SIMD_NO_UNROLL(decode);                  \
+    stbir__simdf8_load4b( cs, hc );                \
+    stbir__simdf8_0123to00000000( c, cs );         \
+    stbir__simdf8_mult_mem( tot0, c, decode );     \
+    stbir__simdf8_0123to11111111( c, cs );         \
+    stbir__simdf8_mult_mem( tot1, c, decode+7 );   \
+    stbir__simdf8_0123to22222222( c, cs );         \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+14 );  \
+    stbir__simdf8_0123to33333333( c, cs );         \
+    stbir__simdf8_madd_mem( tot1, tot1, c, decode+21 );
+
+#define stbir__4_coeff_continue_from_4( ofs )                   \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf8_load4b( cs, hc + (ofs) );                     \
+    stbir__simdf8_0123to00000000( c, cs );                      \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*7 );    \
+    stbir__simdf8_0123to11111111( c, cs );                      \
+    stbir__simdf8_madd_mem( tot1, tot1, c, decode+(ofs)*7+7 );  \
+    stbir__simdf8_0123to22222222( c, cs );                      \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*7+14 ); \
+    stbir__simdf8_0123to33333333( c, cs );                      \
+    stbir__simdf8_madd_mem( tot1, tot1, c, decode+(ofs)*7+21 );
+
+#define stbir__1_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf8_load1b( c, hc + (ofs) );                      \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*7 );
+
+#define stbir__2_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf8_load1b( c, hc + (ofs) );                      \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*7 );    \
+    stbir__simdf8_load1b( c, hc + (ofs)+1 );                    \
+    stbir__simdf8_madd_mem( tot1, tot1, c, decode+(ofs)*7+7 );
+
+#define stbir__3_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf8_load4b( cs, hc + (ofs) );                     \
+    stbir__simdf8_0123to00000000( c, cs );                      \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*7 );    \
+    stbir__simdf8_0123to11111111( c, cs );                      \
+    stbir__simdf8_madd_mem( tot1, tot1, c, decode+(ofs)*7+7 );  \
+    stbir__simdf8_0123to22222222( c, cs );                      \
+    stbir__simdf8_madd_mem( tot0, tot0, c, decode+(ofs)*7+14 );
+
+#define stbir__store_output()                     \
+    stbir__simdf8_add( tot0, tot0, tot1 );        \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 7;                                  \
+    if ( output < output_end )                    \
+    {                                             \
+      stbir__simdf8_store( output-7, tot0 );      \
+      continue;                                   \
+    }                                             \
+    stbir__simdf_store( output-7+3, stbir__simdf_swiz(stbir__simdf8_gettop4(tot0),0,0,1,2) ); \
+    stbir__simdf_store( output-7, stbir__if_simdf8_cast_to_simdf4(tot0) );           \
+    break;
+
+#else
+
+#define stbir__4_coeff_start()                    \
+    stbir__simdf tot0,tot1,tot2,tot3,c,cs;        \
+    STBIR_SIMD_NO_UNROLL(decode);                 \
+    stbir__simdf_load( cs, hc );                  \
+    stbir__simdf_0123to0000( c, cs );             \
+    stbir__simdf_mult_mem( tot0, c, decode );     \
+    stbir__simdf_mult_mem( tot1, c, decode+3 );   \
+    stbir__simdf_0123to1111( c, cs );             \
+    stbir__simdf_mult_mem( tot2, c, decode+7 );   \
+    stbir__simdf_mult_mem( tot3, c, decode+10 );  \
+    stbir__simdf_0123to2222( c, cs );             \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+14 );  \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+17 );  \
+    stbir__simdf_0123to3333( c, cs );                   \
+    stbir__simdf_madd_mem( tot2, tot2, c, decode+21 );  \
+    stbir__simdf_madd_mem( tot3, tot3, c, decode+24 );
+
+#define stbir__4_coeff_continue_from_4( ofs )                   \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf_load( cs, hc + (ofs) );                        \
+    stbir__simdf_0123to0000( c, cs );                           \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*7 );     \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*7+3 );   \
+    stbir__simdf_0123to1111( c, cs );                           \
+    stbir__simdf_madd_mem( tot2, tot2, c, decode+(ofs)*7+7 );   \
+    stbir__simdf_madd_mem( tot3, tot3, c, decode+(ofs)*7+10 );  \
+    stbir__simdf_0123to2222( c, cs );                           \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*7+14 );  \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*7+17 );  \
+    stbir__simdf_0123to3333( c, cs );                           \
+    stbir__simdf_madd_mem( tot2, tot2, c, decode+(ofs)*7+21 );  \
+    stbir__simdf_madd_mem( tot3, tot3, c, decode+(ofs)*7+24 );
+
+#define stbir__1_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf_load1( c, hc + (ofs) );                        \
+    stbir__simdf_0123to0000( c, c );                            \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*7 );     \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*7+3 );   \
+
+#define stbir__2_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf_load2( cs, hc + (ofs) );                       \
+    stbir__simdf_0123to0000( c, cs );                           \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*7 );     \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*7+3 );   \
+    stbir__simdf_0123to1111( c, cs );                           \
+    stbir__simdf_madd_mem( tot2, tot2, c, decode+(ofs)*7+7 );   \
+    stbir__simdf_madd_mem( tot3, tot3, c, decode+(ofs)*7+10 );
+
+#define stbir__3_coeff_remnant( ofs )                           \
+    STBIR_SIMD_NO_UNROLL(decode);                               \
+    stbir__simdf_load( cs, hc + (ofs) );                        \
+    stbir__simdf_0123to0000( c, cs );                           \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*7 );     \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*7+3 );   \
+    stbir__simdf_0123to1111( c, cs );                           \
+    stbir__simdf_madd_mem( tot2, tot2, c, decode+(ofs)*7+7 );   \
+    stbir__simdf_madd_mem( tot3, tot3, c, decode+(ofs)*7+10 );  \
+    stbir__simdf_0123to2222( c, cs );                           \
+    stbir__simdf_madd_mem( tot0, tot0, c, decode+(ofs)*7+14 );  \
+    stbir__simdf_madd_mem( tot1, tot1, c, decode+(ofs)*7+17 );
+
+#define stbir__store_output()                     \
+    stbir__simdf_add( tot0, tot0, tot2 );         \
+    stbir__simdf_add( tot1, tot1, tot3 );         \
+    stbir__simdf_store( output+3, tot1 );         \
+    stbir__simdf_store( output, tot0 );           \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 7;
+
+#endif
+
+#else
+
+#define stbir__1_coeff_only()        \
+    float tot0, tot1, tot2, tot3, tot4, tot5, tot6, c; \
+    c = hc[0];                       \
+    tot0 = decode[0]*c;              \
+    tot1 = decode[1]*c;              \
+    tot2 = decode[2]*c;              \
+    tot3 = decode[3]*c;              \
+    tot4 = decode[4]*c;              \
+    tot5 = decode[5]*c;              \
+    tot6 = decode[6]*c;
+
+#define stbir__2_coeff_only()        \
+    float tot0, tot1, tot2, tot3, tot4, tot5, tot6, c; \
+    c = hc[0];                       \
+    tot0 = decode[0]*c;              \
+    tot1 = decode[1]*c;              \
+    tot2 = decode[2]*c;              \
+    tot3 = decode[3]*c;              \
+    tot4 = decode[4]*c;              \
+    tot5 = decode[5]*c;              \
+    tot6 = decode[6]*c;              \
+    c = hc[1];                       \
+    tot0 += decode[7]*c;             \
+    tot1 += decode[8]*c;             \
+    tot2 += decode[9]*c;             \
+    tot3 += decode[10]*c;            \
+    tot4 += decode[11]*c;            \
+    tot5 += decode[12]*c;            \
+    tot6 += decode[13]*c;            \
+
+#define stbir__3_coeff_only()        \
+    float tot0, tot1, tot2, tot3, tot4, tot5, tot6, c; \
+    c = hc[0];                       \
+    tot0 = decode[0]*c;              \
+    tot1 = decode[1]*c;              \
+    tot2 = decode[2]*c;              \
+    tot3 = decode[3]*c;              \
+    tot4 = decode[4]*c;              \
+    tot5 = decode[5]*c;              \
+    tot6 = decode[6]*c;              \
+    c = hc[1];                       \
+    tot0 += decode[7]*c;             \
+    tot1 += decode[8]*c;             \
+    tot2 += decode[9]*c;             \
+    tot3 += decode[10]*c;            \
+    tot4 += decode[11]*c;            \
+    tot5 += decode[12]*c;            \
+    tot6 += decode[13]*c;            \
+    c = hc[2];                       \
+    tot0 += decode[14]*c;            \
+    tot1 += decode[15]*c;            \
+    tot2 += decode[16]*c;            \
+    tot3 += decode[17]*c;            \
+    tot4 += decode[18]*c;            \
+    tot5 += decode[19]*c;            \
+    tot6 += decode[20]*c;            \
+
+#define stbir__store_output_tiny()                \
+    output[0] = tot0;                             \
+    output[1] = tot1;                             \
+    output[2] = tot2;                             \
+    output[3] = tot3;                             \
+    output[4] = tot4;                             \
+    output[5] = tot5;                             \
+    output[6] = tot6;                             \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 7;
+
+#define stbir__4_coeff_start()    \
+    float x0,x1,x2,x3,x4,x5,x6,y0,y1,y2,y3,y4,y5,y6,c; \
+    STBIR_SIMD_NO_UNROLL(decode); \
+    c = hc[0];                    \
+    x0 = decode[0] * c;           \
+    x1 = decode[1] * c;           \
+    x2 = decode[2] * c;           \
+    x3 = decode[3] * c;           \
+    x4 = decode[4] * c;           \
+    x5 = decode[5] * c;           \
+    x6 = decode[6] * c;           \
+    c = hc[1];                    \
+    y0 = decode[7] * c;           \
+    y1 = decode[8] * c;           \
+    y2 = decode[9] * c;           \
+    y3 = decode[10] * c;          \
+    y4 = decode[11] * c;          \
+    y5 = decode[12] * c;          \
+    y6 = decode[13] * c;          \
+    c = hc[2];                    \
+    x0 += decode[14] * c;         \
+    x1 += decode[15] * c;         \
+    x2 += decode[16] * c;         \
+    x3 += decode[17] * c;         \
+    x4 += decode[18] * c;         \
+    x5 += decode[19] * c;         \
+    x6 += decode[20] * c;         \
+    c = hc[3];                    \
+    y0 += decode[21] * c;         \
+    y1 += decode[22] * c;         \
+    y2 += decode[23] * c;         \
+    y3 += decode[24] * c;         \
+    y4 += decode[25] * c;         \
+    y5 += decode[26] * c;         \
+    y6 += decode[27] * c;
+
+#define stbir__4_coeff_continue_from_4( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);  \
+    c = hc[0+(ofs)];               \
+    x0 += decode[0+(ofs)*7] * c;   \
+    x1 += decode[1+(ofs)*7] * c;   \
+    x2 += decode[2+(ofs)*7] * c;   \
+    x3 += decode[3+(ofs)*7] * c;   \
+    x4 += decode[4+(ofs)*7] * c;   \
+    x5 += decode[5+(ofs)*7] * c;   \
+    x6 += decode[6+(ofs)*7] * c;   \
+    c = hc[1+(ofs)];               \
+    y0 += decode[7+(ofs)*7] * c;   \
+    y1 += decode[8+(ofs)*7] * c;   \
+    y2 += decode[9+(ofs)*7] * c;   \
+    y3 += decode[10+(ofs)*7] * c;  \
+    y4 += decode[11+(ofs)*7] * c;  \
+    y5 += decode[12+(ofs)*7] * c;  \
+    y6 += decode[13+(ofs)*7] * c;  \
+    c = hc[2+(ofs)];               \
+    x0 += decode[14+(ofs)*7] * c;  \
+    x1 += decode[15+(ofs)*7] * c;  \
+    x2 += decode[16+(ofs)*7] * c;  \
+    x3 += decode[17+(ofs)*7] * c;  \
+    x4 += decode[18+(ofs)*7] * c;  \
+    x5 += decode[19+(ofs)*7] * c;  \
+    x6 += decode[20+(ofs)*7] * c;  \
+    c = hc[3+(ofs)];               \
+    y0 += decode[21+(ofs)*7] * c;  \
+    y1 += decode[22+(ofs)*7] * c;  \
+    y2 += decode[23+(ofs)*7] * c;  \
+    y3 += decode[24+(ofs)*7] * c;  \
+    y4 += decode[25+(ofs)*7] * c;  \
+    y5 += decode[26+(ofs)*7] * c;  \
+    y6 += decode[27+(ofs)*7] * c;
+
+#define stbir__1_coeff_remnant( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);  \
+    c = hc[0+(ofs)];               \
+    x0 += decode[0+(ofs)*7] * c;   \
+    x1 += decode[1+(ofs)*7] * c;   \
+    x2 += decode[2+(ofs)*7] * c;   \
+    x3 += decode[3+(ofs)*7] * c;   \
+    x4 += decode[4+(ofs)*7] * c;   \
+    x5 += decode[5+(ofs)*7] * c;   \
+    x6 += decode[6+(ofs)*7] * c;   \
+
+#define stbir__2_coeff_remnant( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);  \
+    c = hc[0+(ofs)];               \
+    x0 += decode[0+(ofs)*7] * c;   \
+    x1 += decode[1+(ofs)*7] * c;   \
+    x2 += decode[2+(ofs)*7] * c;   \
+    x3 += decode[3+(ofs)*7] * c;   \
+    x4 += decode[4+(ofs)*7] * c;   \
+    x5 += decode[5+(ofs)*7] * c;   \
+    x6 += decode[6+(ofs)*7] * c;   \
+    c = hc[1+(ofs)];               \
+    y0 += decode[7+(ofs)*7] * c;   \
+    y1 += decode[8+(ofs)*7] * c;   \
+    y2 += decode[9+(ofs)*7] * c;   \
+    y3 += decode[10+(ofs)*7] * c;  \
+    y4 += decode[11+(ofs)*7] * c;  \
+    y5 += decode[12+(ofs)*7] * c;  \
+    y6 += decode[13+(ofs)*7] * c;  \
+
+#define stbir__3_coeff_remnant( ofs ) \
+    STBIR_SIMD_NO_UNROLL(decode);  \
+    c = hc[0+(ofs)];               \
+    x0 += decode[0+(ofs)*7] * c;   \
+    x1 += decode[1+(ofs)*7] * c;   \
+    x2 += decode[2+(ofs)*7] * c;   \
+    x3 += decode[3+(ofs)*7] * c;   \
+    x4 += decode[4+(ofs)*7] * c;   \
+    x5 += decode[5+(ofs)*7] * c;   \
+    x6 += decode[6+(ofs)*7] * c;   \
+    c = hc[1+(ofs)];               \
+    y0 += decode[7+(ofs)*7] * c;   \
+    y1 += decode[8+(ofs)*7] * c;   \
+    y2 += decode[9+(ofs)*7] * c;   \
+    y3 += decode[10+(ofs)*7] * c;  \
+    y4 += decode[11+(ofs)*7] * c;  \
+    y5 += decode[12+(ofs)*7] * c;  \
+    y6 += decode[13+(ofs)*7] * c;  \
+    c = hc[2+(ofs)];               \
+    x0 += decode[14+(ofs)*7] * c;  \
+    x1 += decode[15+(ofs)*7] * c;  \
+    x2 += decode[16+(ofs)*7] * c;  \
+    x3 += decode[17+(ofs)*7] * c;  \
+    x4 += decode[18+(ofs)*7] * c;  \
+    x5 += decode[19+(ofs)*7] * c;  \
+    x6 += decode[20+(ofs)*7] * c;  \
+
+#define stbir__store_output()                     \
+    output[0] = x0 + y0;                          \
+    output[1] = x1 + y1;                          \
+    output[2] = x2 + y2;                          \
+    output[3] = x3 + y3;                          \
+    output[4] = x4 + y4;                          \
+    output[5] = x5 + y5;                          \
+    output[6] = x6 + y6;                          \
+    horizontal_coefficients += coefficient_width; \
+    ++horizontal_contributors;                    \
+    output += 7;
+
+#endif
+
+#define STBIR__horizontal_channels 7
+#define STB_IMAGE_RESIZE_DO_HORIZONTALS
+#include STBIR__HEADER_FILENAME
+
+
+// include all of the vertical resamplers (both scatter and gather versions)
+
+#define STBIR__vertical_channels 1
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 1
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 2
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 2
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 3
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 3
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 4
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 4
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 5
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 5
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 6
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 6
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 7
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 7
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 8
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#include STBIR__HEADER_FILENAME
+
+#define STBIR__vertical_channels 8
+#define STB_IMAGE_RESIZE_DO_VERTICALS
+#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#include STBIR__HEADER_FILENAME
+
+typedef void STBIR_VERTICAL_GATHERFUNC( float * output, float const * coeffs, float const ** inputs, float const * input0_end );
+
+static STBIR_VERTICAL_GATHERFUNC * stbir__vertical_gathers[ 8 ] =
+{
+  stbir__vertical_gather_with_1_coeffs,stbir__vertical_gather_with_2_coeffs,stbir__vertical_gather_with_3_coeffs,stbir__vertical_gather_with_4_coeffs,stbir__vertical_gather_with_5_coeffs,stbir__vertical_gather_with_6_coeffs,stbir__vertical_gather_with_7_coeffs,stbir__vertical_gather_with_8_coeffs
+};
+
+static STBIR_VERTICAL_GATHERFUNC * stbir__vertical_gathers_continues[ 8 ] =
+{
+  stbir__vertical_gather_with_1_coeffs_cont,stbir__vertical_gather_with_2_coeffs_cont,stbir__vertical_gather_with_3_coeffs_cont,stbir__vertical_gather_with_4_coeffs_cont,stbir__vertical_gather_with_5_coeffs_cont,stbir__vertical_gather_with_6_coeffs_cont,stbir__vertical_gather_with_7_coeffs_cont,stbir__vertical_gather_with_8_coeffs_cont
+};
+
+typedef void STBIR_VERTICAL_SCATTERFUNC( float ** outputs, float const * coeffs, float const * input, float const * input_end );
+
+static STBIR_VERTICAL_SCATTERFUNC * stbir__vertical_scatter_sets[ 8 ] =
+{
+  stbir__vertical_scatter_with_1_coeffs,stbir__vertical_scatter_with_2_coeffs,stbir__vertical_scatter_with_3_coeffs,stbir__vertical_scatter_with_4_coeffs,stbir__vertical_scatter_with_5_coeffs,stbir__vertical_scatter_with_6_coeffs,stbir__vertical_scatter_with_7_coeffs,stbir__vertical_scatter_with_8_coeffs
+};
+
+static STBIR_VERTICAL_SCATTERFUNC * stbir__vertical_scatter_blends[ 8 ] =
+{
+  stbir__vertical_scatter_with_1_coeffs_cont,stbir__vertical_scatter_with_2_coeffs_cont,stbir__vertical_scatter_with_3_coeffs_cont,stbir__vertical_scatter_with_4_coeffs_cont,stbir__vertical_scatter_with_5_coeffs_cont,stbir__vertical_scatter_with_6_coeffs_cont,stbir__vertical_scatter_with_7_coeffs_cont,stbir__vertical_scatter_with_8_coeffs_cont
+};
+
+
+static void stbir__encode_scanline( stbir__info const * stbir_info, void *output_buffer_data, float * encode_buffer, int row  STBIR_ONLY_PROFILE_GET_SPLIT_INFO )
+{
+  int num_pixels = stbir_info->horizontal.scale_info.output_sub_size;
+  int channels = stbir_info->channels;
+  int width_times_channels = num_pixels * channels;
+  void * output_buffer;
+
+  // un-alpha weight if we need to
+  if ( stbir_info->alpha_unweight )
+  {
+    STBIR_PROFILE_START( unalpha );
+    stbir_info->alpha_unweight( encode_buffer, width_times_channels );
+    STBIR_PROFILE_END( unalpha );
+  }
+
+  // write directly into output by default
+  output_buffer = output_buffer_data;
+
+  // if we have an output callback, we first convert the decode buffer in place (and then hand that to the callback)
+  if ( stbir_info->out_pixels_cb )
+    output_buffer = encode_buffer;
+
+  STBIR_PROFILE_START( encode );
+  // convert into the output buffer
+  stbir_info->encode_pixels( output_buffer, width_times_channels, encode_buffer );
+  STBIR_PROFILE_END( encode );
+
+  // if we have an output callback, call it to send the data
+  if ( stbir_info->out_pixels_cb )
+    stbir_info->out_pixels_cb( output_buffer, num_pixels, row, stbir_info->user_data );
+}
+
+
+// Get the ring buffer pointer for an index
+static float* stbir__get_ring_buffer_entry(stbir__info const * stbir_info, stbir__per_split_info const * split_info, int index )
+{
+  STBIR_ASSERT( index < stbir_info->ring_buffer_num_entries );
+
+  #ifdef STBIR__SEPARATE_ALLOCATIONS
+    return split_info->ring_buffers[ index ];
+  #else
+    return (float*) ( ( (char*) split_info->ring_buffer ) + ( index * stbir_info->ring_buffer_length_bytes ) );
+  #endif
+}
+
+// Get the specified scan line from the ring buffer
+static float* stbir__get_ring_buffer_scanline(stbir__info const * stbir_info, stbir__per_split_info const * split_info, int get_scanline)
+{
+  int ring_buffer_index = (split_info->ring_buffer_begin_index + (get_scanline - split_info->ring_buffer_first_scanline)) % stbir_info->ring_buffer_num_entries;
+  return stbir__get_ring_buffer_entry( stbir_info, split_info, ring_buffer_index );
+}
+
+static void stbir__resample_horizontal_gather(stbir__info const * stbir_info, float* output_buffer, float const * input_buffer STBIR_ONLY_PROFILE_GET_SPLIT_INFO )
+{
+  float const * decode_buffer = input_buffer - ( stbir_info->scanline_extents.conservative.n0 * stbir_info->effective_channels );
+
+  STBIR_PROFILE_START( horizontal );
+  if ( ( stbir_info->horizontal.filter_enum == STBIR_FILTER_POINT_SAMPLE ) && ( stbir_info->horizontal.scale_info.scale == 1.0f ) )
+    STBIR_MEMCPY( output_buffer, input_buffer, stbir_info->horizontal.scale_info.output_sub_size * sizeof( float ) * stbir_info->effective_channels );
+  else
+    stbir_info->horizontal_gather_channels( output_buffer, stbir_info->horizontal.scale_info.output_sub_size, decode_buffer, stbir_info->horizontal.contributors, stbir_info->horizontal.coefficients, stbir_info->horizontal.coefficient_width );
+  STBIR_PROFILE_END( horizontal );
+}
+
+static void stbir__resample_vertical_gather(stbir__info const * stbir_info, stbir__per_split_info* split_info, int n, int contrib_n0, int contrib_n1, float const * vertical_coefficients )
+{
+  float* encode_buffer = split_info->vertical_buffer;
+  float* decode_buffer = split_info->decode_buffer;
+  int vertical_first = stbir_info->vertical_first;
+  int width = (vertical_first) ? ( stbir_info->scanline_extents.conservative.n1-stbir_info->scanline_extents.conservative.n0+1 ) : stbir_info->horizontal.scale_info.output_sub_size;
+  int width_times_channels = stbir_info->effective_channels * width;
+
+  STBIR_ASSERT( stbir_info->vertical.is_gather );
+
+  // loop over the contributing scanlines and scale into the buffer
+  STBIR_PROFILE_START( vertical );
+  {
+    int k = 0, total = contrib_n1 - contrib_n0 + 1;
+    STBIR_ASSERT( total > 0 );
+    do {
+      float const * inputs[8];
+      int i, cnt = total; if ( cnt > 8 ) cnt = 8;
+      for( i = 0 ; i < cnt ; i++ )
+        inputs[ i ] = stbir__get_ring_buffer_scanline(stbir_info, split_info, k+i+contrib_n0 );
+
+      // call the N scanlines at a time function (up to 8 scanlines of blending at once)
+      ((k==0)?stbir__vertical_gathers:stbir__vertical_gathers_continues)[cnt-1]( (vertical_first) ? decode_buffer : encode_buffer, vertical_coefficients + k, inputs, inputs[0] + width_times_channels );
+      k += cnt;
+      total -= cnt;
+    } while ( total );
+  }
+  STBIR_PROFILE_END( vertical );
+
+  if ( vertical_first )
+  {
+    // Now resample the gathered vertical data in the horizontal axis into the encode buffer
+    decode_buffer[ width_times_channels ] = 0.0f; // clear two over for horizontals with a remnant of 3
+    decode_buffer[ width_times_channels+1 ] = 0.0f; 
+    stbir__resample_horizontal_gather(stbir_info, encode_buffer, decode_buffer  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+  }
+
+  stbir__encode_scanline( stbir_info, ( (char *) stbir_info->output_data ) + ((size_t)n * (size_t)stbir_info->output_stride_bytes),
+                          encode_buffer, n  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+}
+
+static void stbir__decode_and_resample_for_vertical_gather_loop(stbir__info const * stbir_info, stbir__per_split_info* split_info, int n)
+{
+  int ring_buffer_index;
+  float* ring_buffer;
+
+  // Decode the nth scanline from the source image into the decode buffer.
+  stbir__decode_scanline( stbir_info, n, split_info->decode_buffer  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+  // update new end scanline
+  split_info->ring_buffer_last_scanline = n;
+
+  // get ring buffer
+  ring_buffer_index = (split_info->ring_buffer_begin_index + (split_info->ring_buffer_last_scanline - split_info->ring_buffer_first_scanline)) % stbir_info->ring_buffer_num_entries;
+  ring_buffer = stbir__get_ring_buffer_entry(stbir_info, split_info, ring_buffer_index);
+
+  // Now resample it into the ring buffer.
+  stbir__resample_horizontal_gather( stbir_info, ring_buffer, split_info->decode_buffer  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+  // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
+}
+
+static void stbir__vertical_gather_loop( stbir__info const * stbir_info, stbir__per_split_info* split_info, int split_count )
+{
+  int y, start_output_y, end_output_y;
+  stbir__contributors* vertical_contributors = stbir_info->vertical.contributors;
+  float const * vertical_coefficients = stbir_info->vertical.coefficients;
+
+  STBIR_ASSERT( stbir_info->vertical.is_gather );
+
+  start_output_y = split_info->start_output_y;
+  end_output_y = split_info[split_count-1].end_output_y;
+
+  vertical_contributors += start_output_y;
+  vertical_coefficients += start_output_y * stbir_info->vertical.coefficient_width;
+
+  // initialize the ring buffer for gathering
+  split_info->ring_buffer_begin_index = 0;
+  split_info->ring_buffer_first_scanline = vertical_contributors->n0;
+  split_info->ring_buffer_last_scanline = split_info->ring_buffer_first_scanline - 1; // means "empty"
+
+  for (y = start_output_y; y < end_output_y; y++)
+  {
+    int in_first_scanline, in_last_scanline;
+
+    in_first_scanline = vertical_contributors->n0;
+    in_last_scanline = vertical_contributors->n1;
+
+    // make sure the indexing hasn't broken
+    STBIR_ASSERT( in_first_scanline >= split_info->ring_buffer_first_scanline );
+
+    // Load in new scanlines
+    while (in_last_scanline > split_info->ring_buffer_last_scanline)
+    {
+      STBIR_ASSERT( ( split_info->ring_buffer_last_scanline - split_info->ring_buffer_first_scanline + 1 ) <= stbir_info->ring_buffer_num_entries );
+
+      // make sure there was room in the ring buffer when we add new scanlines
+      if ( ( split_info->ring_buffer_last_scanline - split_info->ring_buffer_first_scanline + 1 ) == stbir_info->ring_buffer_num_entries )
+      {
+        split_info->ring_buffer_first_scanline++;
+        split_info->ring_buffer_begin_index++;
+      }
+
+      if ( stbir_info->vertical_first )
+      {
+        float * ring_buffer = stbir__get_ring_buffer_scanline( stbir_info, split_info, ++split_info->ring_buffer_last_scanline );
+        // Decode the nth scanline from the source image into the decode buffer.
+        stbir__decode_scanline( stbir_info, split_info->ring_buffer_last_scanline, ring_buffer  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+      }
+      else
+      {
+        stbir__decode_and_resample_for_vertical_gather_loop(stbir_info, split_info, split_info->ring_buffer_last_scanline + 1);
+      }
+    }
+
+    // Now all buffers should be ready to write a row of vertical sampling, so do it.
+    stbir__resample_vertical_gather(stbir_info, split_info, y, in_first_scanline, in_last_scanline, vertical_coefficients );
+
+    ++vertical_contributors;
+    vertical_coefficients += stbir_info->vertical.coefficient_width;
+  }
+}
+
+#define STBIR__FLOAT_EMPTY_MARKER 3.0e+38F
+#define STBIR__FLOAT_BUFFER_IS_EMPTY(ptr) ((ptr)[0]==STBIR__FLOAT_EMPTY_MARKER)
+
+static void stbir__encode_first_scanline_from_scatter(stbir__info const * stbir_info, stbir__per_split_info* split_info)
+{
+  // evict a scanline out into the output buffer
+  float* ring_buffer_entry = stbir__get_ring_buffer_entry(stbir_info, split_info, split_info->ring_buffer_begin_index );
+
+  // dump the scanline out
+  stbir__encode_scanline( stbir_info, ( (char *)stbir_info->output_data ) + ( (size_t)split_info->ring_buffer_first_scanline * (size_t)stbir_info->output_stride_bytes ), ring_buffer_entry, split_info->ring_buffer_first_scanline  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+  // mark it as empty
+  ring_buffer_entry[ 0 ] = STBIR__FLOAT_EMPTY_MARKER;
+
+  // advance the first scanline
+  split_info->ring_buffer_first_scanline++;
+  if ( ++split_info->ring_buffer_begin_index == stbir_info->ring_buffer_num_entries )
+    split_info->ring_buffer_begin_index = 0;
+}
+
+static void stbir__horizontal_resample_and_encode_first_scanline_from_scatter(stbir__info const * stbir_info, stbir__per_split_info* split_info)
+{
+  // evict a scanline out into the output buffer
+
+  float* ring_buffer_entry = stbir__get_ring_buffer_entry(stbir_info, split_info, split_info->ring_buffer_begin_index );
+
+  // Now resample it into the buffer.
+  stbir__resample_horizontal_gather( stbir_info, split_info->vertical_buffer, ring_buffer_entry  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+  // dump the scanline out
+  stbir__encode_scanline( stbir_info, ( (char *)stbir_info->output_data ) + ( (size_t)split_info->ring_buffer_first_scanline * (size_t)stbir_info->output_stride_bytes ), split_info->vertical_buffer, split_info->ring_buffer_first_scanline  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+  // mark it as empty
+  ring_buffer_entry[ 0 ] = STBIR__FLOAT_EMPTY_MARKER;
+
+  // advance the first scanline
+  split_info->ring_buffer_first_scanline++;
+  if ( ++split_info->ring_buffer_begin_index == stbir_info->ring_buffer_num_entries )
+    split_info->ring_buffer_begin_index = 0;
+}
+
+static void stbir__resample_vertical_scatter(stbir__info const * stbir_info, stbir__per_split_info* split_info, int n0, int n1, float const * vertical_coefficients, float const * vertical_buffer, float const * vertical_buffer_end )
+{
+  STBIR_ASSERT( !stbir_info->vertical.is_gather );
+
+  STBIR_PROFILE_START( vertical );
+  {
+    int k = 0, total = n1 - n0 + 1;
+    STBIR_ASSERT( total > 0 );
+    do {
+      float * outputs[8];
+      int i, n = total; if ( n > 8 ) n = 8;
+      for( i = 0 ; i < n ; i++ )
+      {
+        outputs[ i ] = stbir__get_ring_buffer_scanline(stbir_info, split_info, k+i+n0 );
+        if ( ( i ) && ( STBIR__FLOAT_BUFFER_IS_EMPTY( outputs[i] ) != STBIR__FLOAT_BUFFER_IS_EMPTY( outputs[0] ) ) ) // make sure runs are of the same type
+        {
+          n = i;
+          break;
+        }
+      }
+      // call the scatter to N scanlines at a time function (up to 8 scanlines of scattering at once)
+      ((STBIR__FLOAT_BUFFER_IS_EMPTY( outputs[0] ))?stbir__vertical_scatter_sets:stbir__vertical_scatter_blends)[n-1]( outputs, vertical_coefficients + k, vertical_buffer, vertical_buffer_end );
+      k += n;
+      total -= n;
+    } while ( total );
+  }
+
+  STBIR_PROFILE_END( vertical );
+}
+
+typedef void stbir__handle_scanline_for_scatter_func(stbir__info const * stbir_info, stbir__per_split_info* split_info);
+
+static void stbir__vertical_scatter_loop( stbir__info const * stbir_info, stbir__per_split_info* split_info, int split_count )
+{
+  int y, start_output_y, end_output_y, start_input_y, end_input_y;
+  stbir__contributors* vertical_contributors = stbir_info->vertical.contributors;
+  float const * vertical_coefficients = stbir_info->vertical.coefficients;
+  stbir__handle_scanline_for_scatter_func * handle_scanline_for_scatter;
+  void * scanline_scatter_buffer;
+  void * scanline_scatter_buffer_end;
+  int on_first_input_y, last_input_y;
+  int width = (stbir_info->vertical_first) ? ( stbir_info->scanline_extents.conservative.n1-stbir_info->scanline_extents.conservative.n0+1 ) : stbir_info->horizontal.scale_info.output_sub_size;
+  int width_times_channels = stbir_info->effective_channels * width;
+
+  STBIR_ASSERT( !stbir_info->vertical.is_gather );
+
+  start_output_y = split_info->start_output_y;
+  end_output_y = split_info[split_count-1].end_output_y;  // may do multiple split counts
+
+  start_input_y = split_info->start_input_y;
+  end_input_y = split_info[split_count-1].end_input_y;
+
+  // adjust for starting offset start_input_y
+  y = start_input_y + stbir_info->vertical.filter_pixel_margin;
+  vertical_contributors += y ;
+  vertical_coefficients += stbir_info->vertical.coefficient_width * y;
+
+  if ( stbir_info->vertical_first )
+  {
+    handle_scanline_for_scatter = stbir__horizontal_resample_and_encode_first_scanline_from_scatter;
+    scanline_scatter_buffer = split_info->decode_buffer;
+    scanline_scatter_buffer_end = ( (char*) scanline_scatter_buffer ) + sizeof( float ) * stbir_info->effective_channels * (stbir_info->scanline_extents.conservative.n1-stbir_info->scanline_extents.conservative.n0+1);
+  }
+  else
+  {
+    handle_scanline_for_scatter = stbir__encode_first_scanline_from_scatter;
+    scanline_scatter_buffer = split_info->vertical_buffer;
+    scanline_scatter_buffer_end = ( (char*) scanline_scatter_buffer ) + sizeof( float ) * stbir_info->effective_channels * stbir_info->horizontal.scale_info.output_sub_size;
+  }
+
+  // initialize the ring buffer for scattering
+  split_info->ring_buffer_first_scanline = start_output_y;
+  split_info->ring_buffer_last_scanline = -1;
+  split_info->ring_buffer_begin_index = -1;
+
+  // mark all the buffers as empty to start
+  for( y = 0 ; y < stbir_info->ring_buffer_num_entries ; y++ )
+  {
+    float * decode_buffer = stbir__get_ring_buffer_entry( stbir_info, split_info, y );
+    decode_buffer[ width_times_channels ] = 0.0f; // clear two over for horizontals with a remnant of 3
+    decode_buffer[ width_times_channels+1 ] = 0.0f; 
+    decode_buffer[0] = STBIR__FLOAT_EMPTY_MARKER; // only used on scatter
+  }
+
+  // do the loop in input space
+  on_first_input_y = 1; last_input_y = start_input_y;
+  for (y = start_input_y ; y < end_input_y; y++)
+  {
+    int out_first_scanline, out_last_scanline;
+
+    out_first_scanline = vertical_contributors->n0;
+    out_last_scanline = vertical_contributors->n1;
+
+    STBIR_ASSERT(out_last_scanline - out_first_scanline + 1 <= stbir_info->ring_buffer_num_entries);
+
+    if ( ( out_last_scanline >= out_first_scanline ) && ( ( ( out_first_scanline >= start_output_y ) && ( out_first_scanline < end_output_y ) ) || ( ( out_last_scanline >= start_output_y ) && ( out_last_scanline < end_output_y ) ) ) )
+    {
+      float const * vc = vertical_coefficients;
+
+      // keep track of the range actually seen for the next resize
+      last_input_y = y;
+      if ( ( on_first_input_y ) && ( y > start_input_y ) )
+        split_info->start_input_y = y;
+      on_first_input_y = 0;
+
+      // clip the region
+      if ( out_first_scanline < start_output_y )
+      {
+        vc += start_output_y - out_first_scanline;
+        out_first_scanline = start_output_y;
+      }
+
+      if ( out_last_scanline >= end_output_y )
+        out_last_scanline = end_output_y - 1;
+
+      // if very first scanline, init the index
+      if (split_info->ring_buffer_begin_index < 0)
+        split_info->ring_buffer_begin_index = out_first_scanline - start_output_y;
+
+      STBIR_ASSERT( split_info->ring_buffer_begin_index <= out_first_scanline );
+
+      // Decode the nth scanline from the source image into the decode buffer.
+      stbir__decode_scanline( stbir_info, y, split_info->decode_buffer  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+      // When horizontal first, we resample horizontally into the vertical buffer before we scatter it out
+      if ( !stbir_info->vertical_first )
+        stbir__resample_horizontal_gather( stbir_info, split_info->vertical_buffer, split_info->decode_buffer  STBIR_ONLY_PROFILE_SET_SPLIT_INFO );
+
+      // Now it's sitting in the buffer ready to be distributed into the ring buffers.
+
+      // evict from the ringbuffer, if we need are full
+      if ( ( ( split_info->ring_buffer_last_scanline - split_info->ring_buffer_first_scanline + 1 ) == stbir_info->ring_buffer_num_entries ) &&
+           ( out_last_scanline > split_info->ring_buffer_last_scanline ) )
+        handle_scanline_for_scatter( stbir_info, split_info );
+
+      // Now the horizontal buffer is ready to write to all ring buffer rows, so do it.
+      stbir__resample_vertical_scatter(stbir_info, split_info, out_first_scanline, out_last_scanline, vc, (float*)scanline_scatter_buffer, (float*)scanline_scatter_buffer_end );
+
+      // update the end of the buffer
+      if ( out_last_scanline > split_info->ring_buffer_last_scanline )
+        split_info->ring_buffer_last_scanline = out_last_scanline;
+    }
+    ++vertical_contributors;
+    vertical_coefficients += stbir_info->vertical.coefficient_width;
+  }
+
+  // now evict the scanlines that are left over in the ring buffer
+  while ( split_info->ring_buffer_first_scanline < end_output_y )
+    handle_scanline_for_scatter(stbir_info, split_info);
+
+  // update the end_input_y if we do multiple resizes with the same data
+  ++last_input_y;
+  for( y = 0 ; y < split_count; y++ )
+    if ( split_info[y].end_input_y > last_input_y )
+      split_info[y].end_input_y = last_input_y;
+}
+
+
+static stbir__kernel_callback * stbir__builtin_kernels[] =   { 0, stbir__filter_trapezoid,  stbir__filter_triangle, stbir__filter_cubic, stbir__filter_catmullrom, stbir__filter_mitchell, stbir__filter_point };
+static stbir__support_callback * stbir__builtin_supports[] = { 0, stbir__support_trapezoid, stbir__support_one,     stbir__support_two,  stbir__support_two,       stbir__support_two,     stbir__support_zeropoint5 };
+
+static void stbir__set_sampler(stbir__sampler * samp, stbir_filter filter, stbir__kernel_callback * kernel, stbir__support_callback * support, stbir_edge edge, stbir__scale_info * scale_info, int always_gather, void * user_data )
+{
+  // set filter
+  if (filter == 0)
+  {
+    filter = STBIR_DEFAULT_FILTER_DOWNSAMPLE; // default to downsample
+    if (scale_info->scale >= ( 1.0f - stbir__small_float ) )
+    {
+      if ( (scale_info->scale <= ( 1.0f + stbir__small_float ) ) && ( STBIR_CEILF(scale_info->pixel_shift) == scale_info->pixel_shift ) )
+        filter = STBIR_FILTER_POINT_SAMPLE;
+      else
+        filter = STBIR_DEFAULT_FILTER_UPSAMPLE;
+    }
+  }
+  samp->filter_enum = filter;
+
+  STBIR_ASSERT(samp->filter_enum != 0);
+  STBIR_ASSERT((unsigned)samp->filter_enum < STBIR_FILTER_OTHER);
+  samp->filter_kernel = stbir__builtin_kernels[ filter ];
+  samp->filter_support = stbir__builtin_supports[ filter ];
+
+  if ( kernel && support )
+  {
+    samp->filter_kernel = kernel;
+    samp->filter_support = support;
+    samp->filter_enum = STBIR_FILTER_OTHER;
+  }
+
+  samp->edge = edge;
+  samp->filter_pixel_width  = stbir__get_filter_pixel_width (samp->filter_support, scale_info->scale, user_data );
+  // Gather is always better, but in extreme downsamples, you have to most or all of the data in memory
+  //    For horizontal, we always have all the pixels, so we always use gather here (always_gather==1).
+  //    For vertical, we use gather if scaling up (which means we will have samp->filter_pixel_width
+  //    scanlines in memory at once).
+  samp->is_gather = 0;
+  if ( scale_info->scale >= ( 1.0f - stbir__small_float ) )
+    samp->is_gather = 1;
+  else if ( ( always_gather ) || ( samp->filter_pixel_width <= STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT ) )
+    samp->is_gather = 2;
+
+  // pre calculate stuff based on the above
+  samp->coefficient_width = stbir__get_coefficient_width(samp, samp->is_gather, user_data);
+
+  // filter_pixel_width is the conservative size in pixels of input that affect an output pixel.
+  //   In rare cases (only with 2 pix to 1 pix with the default filters), it's possible that the 
+  //   filter will extend before or after the scanline beyond just one extra entire copy of the 
+  //   scanline (we would hit the edge twice). We don't let you do that, so we clamp the total 
+  //   width to 3x the total of input pixel (once for the scanline, once for the left side 
+  //   overhang, and once for the right side). We only do this for edge mode, since the other 
+  //   modes can just re-edge clamp back in again.
+  if ( edge == STBIR_EDGE_WRAP )
+    if ( samp->filter_pixel_width > ( scale_info->input_full_size * 3 ) )
+      samp->filter_pixel_width = scale_info->input_full_size * 3;
+
+  // This is how much to expand buffers to account for filters seeking outside
+  // the image boundaries.
+  samp->filter_pixel_margin = samp->filter_pixel_width / 2;
+  
+  // filter_pixel_margin is the amount that this filter can overhang on just one side of either 
+  //   end of the scanline (left or the right). Since we only allow you to overhang 1 scanline's 
+  //   worth of pixels, we clamp this one side of overhang to the input scanline size. Again, 
+  //   this clamping only happens in rare cases with the default filters (2 pix to 1 pix). 
+  if ( edge == STBIR_EDGE_WRAP )
+    if ( samp->filter_pixel_margin > scale_info->input_full_size )
+      samp->filter_pixel_margin = scale_info->input_full_size;
+
+  samp->num_contributors = stbir__get_contributors(samp, samp->is_gather);
+
+  samp->contributors_size = samp->num_contributors * sizeof(stbir__contributors);
+  samp->coefficients_size = samp->num_contributors * samp->coefficient_width * sizeof(float) + sizeof(float)*STBIR_INPUT_CALLBACK_PADDING; // extra sizeof(float) is padding
+
+  samp->gather_prescatter_contributors = 0;
+  samp->gather_prescatter_coefficients = 0;
+  if ( samp->is_gather == 0 )
+  {
+    samp->gather_prescatter_coefficient_width = samp->filter_pixel_width;
+    samp->gather_prescatter_num_contributors  = stbir__get_contributors(samp, 2);
+    samp->gather_prescatter_contributors_size = samp->gather_prescatter_num_contributors * sizeof(stbir__contributors);
+    samp->gather_prescatter_coefficients_size = samp->gather_prescatter_num_contributors * samp->gather_prescatter_coefficient_width * sizeof(float);
+  }
+}
+
+static void stbir__get_conservative_extents( stbir__sampler * samp, stbir__contributors * range, void * user_data )
+{
+  float scale = samp->scale_info.scale;
+  float out_shift = samp->scale_info.pixel_shift;
+  stbir__support_callback * support = samp->filter_support;
+  int input_full_size = samp->scale_info.input_full_size;
+  stbir_edge edge = samp->edge;
+  float inv_scale = samp->scale_info.inv_scale;
+
+  STBIR_ASSERT( samp->is_gather != 0 );
+
+  if ( samp->is_gather == 1 )
+  {
+    int in_first_pixel, in_last_pixel;
+    float out_filter_radius = support(inv_scale, user_data) * scale;
+
+    stbir__calculate_in_pixel_range( &in_first_pixel, &in_last_pixel, 0.5, out_filter_radius, inv_scale, out_shift, input_full_size, edge );
+    range->n0 = in_first_pixel;
+    stbir__calculate_in_pixel_range( &in_first_pixel, &in_last_pixel, ( (float)(samp->scale_info.output_sub_size-1) ) + 0.5f, out_filter_radius, inv_scale, out_shift, input_full_size, edge );
+    range->n1 = in_last_pixel;
+  }
+  else if ( samp->is_gather == 2 ) // downsample gather, refine
+  {
+    float in_pixels_radius = support(scale, user_data) * inv_scale;
+    int filter_pixel_margin = samp->filter_pixel_margin;
+    int output_sub_size = samp->scale_info.output_sub_size;
+    int input_end;
+    int n;
+    int in_first_pixel, in_last_pixel;
+
+    // get a conservative area of the input range
+    stbir__calculate_in_pixel_range( &in_first_pixel, &in_last_pixel, 0, 0, inv_scale, out_shift, input_full_size, edge );
+    range->n0 = in_first_pixel;
+    stbir__calculate_in_pixel_range( &in_first_pixel, &in_last_pixel, (float)output_sub_size, 0, inv_scale, out_shift, input_full_size, edge );
+    range->n1 = in_last_pixel;
+
+    // now go through the margin to the start of area to find bottom
+    n = range->n0 + 1;
+    input_end = -filter_pixel_margin;
+    while( n >= input_end )
+    {
+      int out_first_pixel, out_last_pixel;
+      stbir__calculate_out_pixel_range( &out_first_pixel, &out_last_pixel, ((float)n)+0.5f, in_pixels_radius, scale, out_shift, output_sub_size );
+      if ( out_first_pixel > out_last_pixel )
+        break;
+
+      if ( ( out_first_pixel < output_sub_size ) || ( out_last_pixel >= 0 ) )
+        range->n0 = n;
+      --n;
+    }
+
+    // now go through the end of the area through the margin to find top
+    n = range->n1 - 1;
+    input_end = n + 1 + filter_pixel_margin;
+    while( n <= input_end )
+    {
+      int out_first_pixel, out_last_pixel;
+      stbir__calculate_out_pixel_range( &out_first_pixel, &out_last_pixel, ((float)n)+0.5f, in_pixels_radius, scale, out_shift, output_sub_size );
+      if ( out_first_pixel > out_last_pixel )
+        break;
+      if ( ( out_first_pixel < output_sub_size ) || ( out_last_pixel >= 0 ) )
+        range->n1 = n;
+      ++n;
+    }
+  }
+
+  if ( samp->edge == STBIR_EDGE_WRAP )
+  {
+    // if we are wrapping, and we are very close to the image size (so the edges might merge), just use the scanline up to the edge
+    if ( ( range->n0 > 0 ) && ( range->n1 >= input_full_size ) )
+    {
+      int marg = range->n1 - input_full_size + 1;
+      if ( ( marg + STBIR__MERGE_RUNS_PIXEL_THRESHOLD ) >= range->n0 )
+        range->n0 = 0;
+    }
+    if ( ( range->n0 < 0 ) && ( range->n1 < (input_full_size-1) ) )
+    {
+      int marg = -range->n0;
+      if ( ( input_full_size - marg - STBIR__MERGE_RUNS_PIXEL_THRESHOLD - 1 ) <= range->n1 )
+        range->n1 = input_full_size - 1;
+    }
+  }
+  else
+  {
+    // for non-edge-wrap modes, we never read over the edge, so clamp
+    if ( range->n0 < 0 )
+      range->n0 = 0;
+    if ( range->n1 >= input_full_size )
+      range->n1 = input_full_size - 1;
+  }
+}
+
+static void stbir__get_split_info( stbir__per_split_info* split_info, int splits, int output_height, int vertical_pixel_margin, int input_full_height, int is_gather, stbir__contributors * contribs )
+{
+  int i, cur;
+  int left = output_height;
+
+  cur = 0;
+  for( i = 0 ; i < splits ; i++ )
+  {
+    int each;
+
+    split_info[i].start_output_y = cur;
+    each = left / ( splits - i );
+    split_info[i].end_output_y = cur + each;
+
+    // ok, when we are gathering, we need to make sure we are starting on a y offset that doesn't have
+    //   a "special" set of coefficients. Basically, with exactly the right filter at exactly the right
+    //   resize at exactly the right phase, some of the coefficents can be zero. When they are zero, we
+    //   don't process them at all.  But this leads to a tricky thing with the thread splits, where we
+    //   might have a set of two coeffs like this for example: (4,4) and (3,6).  The 4,4 means there was
+    //   just one single coeff because things worked out perfectly (normally, they all have 4 coeffs
+    //   like the range 3,6.  The problem is that if we start right on the (4,4) on a brand new thread,
+    //   then when we get to (3,6), we don't have the "3" sample in memory (because we didn't load
+    //   it on the initial (4,4) range because it didn't have a 3 (we only add new samples that are 
+    //   larger than our existing samples - it's just how the eviction works). So, our solution here
+    //   is pretty simple, if we start right on a range that has samples that start earlier, then we 
+    //   simply bump up our previous thread split range to include it, and then start this threads
+    //   range with the smaller sample. It just moves one scanline from one thread split to another,
+    //   so that we end with the unusual one, instead of start with it. To do this, we check 2-4 
+    //   sample at each thread split start and then occassionally move them.
+    
+    if ( ( is_gather ) && ( i ) )
+    {
+      stbir__contributors * small_contribs;
+      int j, smallest, stop, start_n0;
+      stbir__contributors * split_contribs = contribs + cur;
+
+      // scan for a max of 3x the filter width or until the next thread split
+      stop = vertical_pixel_margin * 3;
+      if ( each < stop )
+        stop = each;
+
+      // loops a few times before early out
+      smallest = 0;
+      small_contribs = split_contribs;
+      start_n0 = small_contribs->n0;
+      for( j = 1 ; j <= stop ; j++ )
+      {
+        ++split_contribs;
+        if ( split_contribs->n0 > start_n0 )
+          break;
+        if ( split_contribs->n0 < small_contribs->n0 )
+        {
+          small_contribs = split_contribs;
+          smallest = j;
+        }
+      }
+
+      split_info[i-1].end_output_y += smallest;
+      split_info[i].start_output_y += smallest;
+    }
+
+    cur += each;
+    left -= each;
+
+    // scatter range (updated to minimum as you run it)
+    split_info[i].start_input_y = -vertical_pixel_margin;
+    split_info[i].end_input_y = input_full_height + vertical_pixel_margin;
+  }
+}
+
+static void stbir__free_internal_mem( stbir__info *info )
+{
+  #define STBIR__FREE_AND_CLEAR( ptr ) { if ( ptr ) { void * p = (ptr); (ptr) = 0; STBIR_FREE( p, info->user_data); } }
+
+  if ( info )
+  {
+  #ifndef STBIR__SEPARATE_ALLOCATIONS
+    STBIR__FREE_AND_CLEAR( info->alloced_mem );
+  #else
+    int i,j;
+
+    if ( ( info->vertical.gather_prescatter_contributors ) && ( (void*)info->vertical.gather_prescatter_contributors != (void*)info->split_info[0].decode_buffer ) )
+    {
+      STBIR__FREE_AND_CLEAR( info->vertical.gather_prescatter_coefficients );
+      STBIR__FREE_AND_CLEAR( info->vertical.gather_prescatter_contributors );
+    }
+    for( i = 0 ; i < info->splits ; i++ )
+    {
+      for( j = 0 ; j < info->alloc_ring_buffer_num_entries ; j++ )
+      {
+        #ifdef STBIR_SIMD8
+        if ( info->effective_channels == 3 )
+          --info->split_info[i].ring_buffers[j]; // avx in 3 channel mode needs one float at the start of the buffer
+        #endif
+        STBIR__FREE_AND_CLEAR( info->split_info[i].ring_buffers[j] );
+      }
+
+      #ifdef STBIR_SIMD8
+      if ( info->effective_channels == 3 )
+        --info->split_info[i].decode_buffer; // avx in 3 channel mode needs one float at the start of the buffer
+      #endif
+      STBIR__FREE_AND_CLEAR( info->split_info[i].decode_buffer );
+      STBIR__FREE_AND_CLEAR( info->split_info[i].ring_buffers );
+      STBIR__FREE_AND_CLEAR( info->split_info[i].vertical_buffer );
+    }
+    STBIR__FREE_AND_CLEAR( info->split_info );
+    if ( info->vertical.coefficients != info->horizontal.coefficients )
+    {
+      STBIR__FREE_AND_CLEAR( info->vertical.coefficients );
+      STBIR__FREE_AND_CLEAR( info->vertical.contributors );
+    }
+    STBIR__FREE_AND_CLEAR( info->horizontal.coefficients );
+    STBIR__FREE_AND_CLEAR( info->horizontal.contributors );
+    STBIR__FREE_AND_CLEAR( info->alloced_mem );
+    STBIR_FREE( info, info->user_data );
+  #endif
+  }
+
+  #undef STBIR__FREE_AND_CLEAR
+}
+
+static int stbir__get_max_split( int splits, int height )
+{
+  int i;
+  int max = 0;
+
+  for( i = 0 ; i < splits ; i++ )
+  {
+    int each = height / ( splits - i );
+    if ( each > max )
+      max = each;
+    height -= each;
+  }
+  return max;
+}
+
+static stbir__horizontal_gather_channels_func ** stbir__horizontal_gather_n_coeffs_funcs[8] =
+{
+  0, stbir__horizontal_gather_1_channels_with_n_coeffs_funcs, stbir__horizontal_gather_2_channels_with_n_coeffs_funcs, stbir__horizontal_gather_3_channels_with_n_coeffs_funcs, stbir__horizontal_gather_4_channels_with_n_coeffs_funcs, 0,0, stbir__horizontal_gather_7_channels_with_n_coeffs_funcs
+};
+
+static stbir__horizontal_gather_channels_func ** stbir__horizontal_gather_channels_funcs[8] =
+{
+  0, stbir__horizontal_gather_1_channels_funcs, stbir__horizontal_gather_2_channels_funcs, stbir__horizontal_gather_3_channels_funcs, stbir__horizontal_gather_4_channels_funcs, 0,0, stbir__horizontal_gather_7_channels_funcs
+};
+
+// there are six resize classifications: 0 == vertical scatter, 1 == vertical gather < 1x scale, 2 == vertical gather 1x-2x scale, 4 == vertical gather < 3x scale, 4 == vertical gather > 3x scale, 5 == <=4 pixel height, 6 == <=4 pixel wide column
+#define STBIR_RESIZE_CLASSIFICATIONS 8
+
+static float stbir__compute_weights[5][STBIR_RESIZE_CLASSIFICATIONS][4]=  // 5 = 0=1chan, 1=2chan, 2=3chan, 3=4chan, 4=7chan
+{
+  {
+    { 1.00000f, 1.00000f, 0.31250f, 1.00000f },
+    { 0.56250f, 0.59375f, 0.00000f, 0.96875f },
+    { 1.00000f, 0.06250f, 0.00000f, 1.00000f },
+    { 0.00000f, 0.09375f, 1.00000f, 1.00000f },
+    { 1.00000f, 1.00000f, 1.00000f, 1.00000f },
+    { 0.03125f, 0.12500f, 1.00000f, 1.00000f },
+    { 0.06250f, 0.12500f, 0.00000f, 1.00000f },
+    { 0.00000f, 1.00000f, 0.00000f, 0.03125f },
+  }, {
+    { 0.00000f, 0.84375f, 0.00000f, 0.03125f },
+    { 0.09375f, 0.93750f, 0.00000f, 0.78125f },
+    { 0.87500f, 0.21875f, 0.00000f, 0.96875f },
+    { 0.09375f, 0.09375f, 1.00000f, 1.00000f },
+    { 1.00000f, 1.00000f, 1.00000f, 1.00000f },
+    { 0.03125f, 0.12500f, 1.00000f, 1.00000f },
+    { 0.06250f, 0.12500f, 0.00000f, 1.00000f },
+    { 0.00000f, 1.00000f, 0.00000f, 0.53125f },
+  }, {
+    { 0.00000f, 0.53125f, 0.00000f, 0.03125f },
+    { 0.06250f, 0.96875f, 0.00000f, 0.53125f },
+    { 0.87500f, 0.18750f, 0.00000f, 0.93750f },
+    { 0.00000f, 0.09375f, 1.00000f, 1.00000f },
+    { 1.00000f, 1.00000f, 1.00000f, 1.00000f },
+    { 0.03125f, 0.12500f, 1.00000f, 1.00000f },
+    { 0.06250f, 0.12500f, 0.00000f, 1.00000f },
+    { 0.00000f, 1.00000f, 0.00000f, 0.56250f },
+  }, {
+    { 0.00000f, 0.50000f, 0.00000f, 0.71875f },
+    { 0.06250f, 0.84375f, 0.00000f, 0.87500f },
+    { 1.00000f, 0.50000f, 0.50000f, 0.96875f },
+    { 1.00000f, 0.09375f, 0.31250f, 0.50000f },
+    { 1.00000f, 1.00000f, 1.00000f, 1.00000f },
+    { 1.00000f, 0.03125f, 0.03125f, 0.53125f },
+    { 0.18750f, 0.12500f, 0.00000f, 1.00000f },
+    { 0.00000f, 1.00000f, 0.03125f, 0.18750f },
+  }, {
+    { 0.00000f, 0.59375f, 0.00000f, 0.96875f },
+    { 0.06250f, 0.81250f, 0.06250f, 0.59375f },
+    { 0.75000f, 0.43750f, 0.12500f, 0.96875f },
+    { 0.87500f, 0.06250f, 0.18750f, 0.43750f },
+    { 1.00000f, 1.00000f, 1.00000f, 1.00000f },
+    { 0.15625f, 0.12500f, 1.00000f, 1.00000f },
+    { 0.06250f, 0.12500f, 0.00000f, 1.00000f },
+    { 0.00000f, 1.00000f, 0.03125f, 0.34375f },
+  }
+};
+
+// structure that allow us to query and override info for training the costs
+typedef struct STBIR__V_FIRST_INFO
+{
+  double v_cost, h_cost;
+  int control_v_first; // 0 = no control, 1 = force hori, 2 = force vert
+  int v_first;
+  int v_resize_classification;
+  int is_gather;
+} STBIR__V_FIRST_INFO;
+
+#ifdef STBIR__V_FIRST_INFO_BUFFER
+static STBIR__V_FIRST_INFO STBIR__V_FIRST_INFO_BUFFER = {0};
+#define STBIR__V_FIRST_INFO_POINTER &STBIR__V_FIRST_INFO_BUFFER
+#else
+#define STBIR__V_FIRST_INFO_POINTER 0
+#endif
+
+// Figure out whether to scale along the horizontal or vertical first.
+//   This only *super* important when you are scaling by a massively
+//   different amount in the vertical vs the horizontal (for example, if
+//   you are scaling by 2x in the width, and 0.5x in the height, then you
+//   want to do the vertical scale first, because it's around 3x faster
+//   in that order.
+//
+//   In more normal circumstances, this makes a 20-40% differences, so
+//     it's good to get right, but not critical. The normal way that you
+//     decide which direction goes first is just figuring out which
+//     direction does more multiplies. But with modern CPUs with their
+//     fancy caches and SIMD and high IPC abilities, so there's just a lot
+//     more that goes into it.
+//
+//   My handwavy sort of solution is to have an app that does a whole
+//     bunch of timing for both vertical and horizontal first modes,
+//     and then another app that can read lots of these timing files
+//     and try to search for the best weights to use. Dotimings.c
+//     is the app that does a bunch of timings, and vf_train.c is the
+//     app that solves for the best weights (and shows how well it
+//     does currently).
+
+static int stbir__should_do_vertical_first( float weights_table[STBIR_RESIZE_CLASSIFICATIONS][4], int horizontal_filter_pixel_width, float horizontal_scale, int horizontal_output_size, int vertical_filter_pixel_width, float vertical_scale, int vertical_output_size, int is_gather, STBIR__V_FIRST_INFO * info )
+{
+  double v_cost, h_cost;
+  float * weights;
+  int vertical_first;
+  int v_classification;
+
+  // categorize the resize into buckets
+  if ( ( vertical_output_size <= 4 ) || ( horizontal_output_size <= 4 ) )
+    v_classification = ( vertical_output_size < horizontal_output_size ) ? 6 : 7;
+  else if ( vertical_scale <= 1.0f )
+    v_classification = ( is_gather ) ? 1 : 0;
+  else if ( vertical_scale <= 2.0f)
+    v_classification = 2;
+  else if ( vertical_scale <= 3.0f)
+    v_classification = 3;
+  else if ( vertical_scale <= 4.0f)
+    v_classification = 5;
+  else
+    v_classification = 6;
+
+  // use the right weights
+  weights = weights_table[ v_classification ];
+
+  // this is the costs when you don't take into account modern CPUs with high ipc and simd and caches - wish we had a better estimate
+  h_cost = (float)horizontal_filter_pixel_width * weights[0] + horizontal_scale * (float)vertical_filter_pixel_width * weights[1];
+  v_cost = (float)vertical_filter_pixel_width  * weights[2] + vertical_scale * (float)horizontal_filter_pixel_width * weights[3];
+
+  // use computation estimate to decide vertical first or not
+  vertical_first = ( v_cost <= h_cost ) ? 1 : 0;
+
+  // save these, if requested
+  if ( info )
+  {
+    info->h_cost = h_cost;
+    info->v_cost = v_cost;
+    info->v_resize_classification = v_classification;
+    info->v_first = vertical_first;
+    info->is_gather = is_gather;
+  }
+
+  // and this allows us to override everything for testing (see dotiming.c)
+  if ( ( info ) && ( info->control_v_first ) )
+    vertical_first = ( info->control_v_first == 2 ) ? 1 : 0;
+
+  return vertical_first;
+}
+
+// layout lookups - must match stbir_internal_pixel_layout
+static unsigned char stbir__pixel_channels[] = {
+  1,2,3,3,4,   // 1ch, 2ch, rgb, bgr, 4ch
+  4,4,4,4,2,2, // RGBA,BGRA,ARGB,ABGR,RA,AR
+  4,4,4,4,2,2, // RGBA_PM,BGRA_PM,ARGB_PM,ABGR_PM,RA_PM,AR_PM
+};
+
+// the internal pixel layout enums are in a different order, so we can easily do range comparisons of types
+//   the public pixel layout is ordered in a way that if you cast num_channels (1-4) to the enum, you get something sensible
+static stbir_internal_pixel_layout stbir__pixel_layout_convert_public_to_internal[] = {
+  STBIRI_BGR, STBIRI_1CHANNEL, STBIRI_2CHANNEL, STBIRI_RGB, STBIRI_RGBA,
+  STBIRI_4CHANNEL, STBIRI_BGRA, STBIRI_ARGB, STBIRI_ABGR, STBIRI_RA, STBIRI_AR,
+  STBIRI_RGBA_PM, STBIRI_BGRA_PM, STBIRI_ARGB_PM, STBIRI_ABGR_PM, STBIRI_RA_PM, STBIRI_AR_PM,
+};
+
+static stbir__info * stbir__alloc_internal_mem_and_build_samplers( stbir__sampler * horizontal, stbir__sampler * vertical, stbir__contributors * conservative, stbir_pixel_layout input_pixel_layout_public, stbir_pixel_layout output_pixel_layout_public, int splits, int new_x, int new_y, int fast_alpha, void * user_data STBIR_ONLY_PROFILE_BUILD_GET_INFO )
+{
+  static char stbir_channel_count_index[8]={ 9,0,1,2, 3,9,9,4 };
+
+  stbir__info * info = 0;
+  void * alloced = 0;
+  size_t alloced_total = 0;
+  int vertical_first;
+  size_t decode_buffer_size, ring_buffer_length_bytes, ring_buffer_size, vertical_buffer_size;
+  int alloc_ring_buffer_num_entries;
+
+  int alpha_weighting_type = 0; // 0=none, 1=simple, 2=fancy
+  int conservative_split_output_size = stbir__get_max_split( splits, vertical->scale_info.output_sub_size );
+  stbir_internal_pixel_layout input_pixel_layout = stbir__pixel_layout_convert_public_to_internal[ input_pixel_layout_public ];
+  stbir_internal_pixel_layout output_pixel_layout = stbir__pixel_layout_convert_public_to_internal[ output_pixel_layout_public ];
+  int channels = stbir__pixel_channels[ input_pixel_layout ];
+  int effective_channels = channels;
+
+  // first figure out what type of alpha weighting to use (if any)
+  if ( ( horizontal->filter_enum != STBIR_FILTER_POINT_SAMPLE ) || ( vertical->filter_enum != STBIR_FILTER_POINT_SAMPLE ) ) // no alpha weighting on point sampling
+  {
+    if ( ( input_pixel_layout >= STBIRI_RGBA ) && ( input_pixel_layout <= STBIRI_AR ) && ( output_pixel_layout >= STBIRI_RGBA ) && ( output_pixel_layout <= STBIRI_AR ) )
+    {
+      if ( fast_alpha )
+      {
+        alpha_weighting_type = 4;
+      }
+      else
+      {
+        static int fancy_alpha_effective_cnts[6] = { 7, 7, 7, 7, 3, 3 };
+        alpha_weighting_type = 2;
+        effective_channels = fancy_alpha_effective_cnts[ input_pixel_layout - STBIRI_RGBA ];
+      }
+    }
+    else if ( ( input_pixel_layout >= STBIRI_RGBA_PM ) && ( input_pixel_layout <= STBIRI_AR_PM ) && ( output_pixel_layout >= STBIRI_RGBA ) && ( output_pixel_layout <= STBIRI_AR ) )
+    {
+      // input premult, output non-premult
+      alpha_weighting_type = 3;
+    }
+    else if ( ( input_pixel_layout >= STBIRI_RGBA ) && ( input_pixel_layout <= STBIRI_AR ) && ( output_pixel_layout >= STBIRI_RGBA_PM ) && ( output_pixel_layout <= STBIRI_AR_PM ) )
+    {
+      // input non-premult, output premult
+      alpha_weighting_type = 1;
+    }
+  }
+
+  // channel in and out count must match currently
+  if ( channels != stbir__pixel_channels[ output_pixel_layout ] )
+    return 0;
+
+  // get vertical first
+  vertical_first = stbir__should_do_vertical_first( stbir__compute_weights[ (int)stbir_channel_count_index[ effective_channels ] ], horizontal->filter_pixel_width, horizontal->scale_info.scale, horizontal->scale_info.output_sub_size, vertical->filter_pixel_width, vertical->scale_info.scale, vertical->scale_info.output_sub_size, vertical->is_gather, STBIR__V_FIRST_INFO_POINTER );
+
+  // sometimes read one float off in some of the unrolled loops (with a weight of zero coeff, so it doesn't have an effect)
+  //   we use a few extra floats instead of just 1, so that input callback buffer can overlap with the decode buffer without
+  //   the conversion routines overwriting the callback input data.
+  decode_buffer_size = ( conservative->n1 - conservative->n0 + 1 ) * effective_channels * sizeof(float) + sizeof(float)*STBIR_INPUT_CALLBACK_PADDING; // extra floats for input callback stagger
+
+#if defined( STBIR__SEPARATE_ALLOCATIONS ) && defined(STBIR_SIMD8)
+  if ( effective_channels == 3 )
+    decode_buffer_size += sizeof(float); // avx in 3 channel mode needs one float at the start of the buffer (only with separate allocations)
+#endif
+
+  ring_buffer_length_bytes = (size_t)horizontal->scale_info.output_sub_size * (size_t)effective_channels * sizeof(float) + sizeof(float)*STBIR_INPUT_CALLBACK_PADDING; // extra floats for padding
+
+  // if we do vertical first, the ring buffer holds a whole decoded line
+  if ( vertical_first )
+    ring_buffer_length_bytes = ( decode_buffer_size + 15 ) & ~15;
+
+  if ( ( ring_buffer_length_bytes & 4095 ) == 0 ) ring_buffer_length_bytes += 64*3; // avoid 4k alias
+
+  // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
+  alloc_ring_buffer_num_entries = vertical->filter_pixel_width + 1;
+
+  // we never need more ring buffer entries than the scanlines we're outputting when in scatter mode
+  if ( ( !vertical->is_gather ) && ( alloc_ring_buffer_num_entries > conservative_split_output_size ) )
+    alloc_ring_buffer_num_entries = conservative_split_output_size;
+
+  ring_buffer_size = (size_t)alloc_ring_buffer_num_entries * (size_t)ring_buffer_length_bytes;
+
+  // The vertical buffer is used differently, depending on whether we are scattering
+  //   the vertical scanlines, or gathering them.
+  //   If scattering, it's used at the temp buffer to accumulate each output.
+  //   If gathering, it's just the output buffer.
+  vertical_buffer_size = (size_t)horizontal->scale_info.output_sub_size * (size_t)effective_channels * sizeof(float) + sizeof(float);  // extra float for padding
+
+  // we make two passes through this loop, 1st to add everything up, 2nd to allocate and init
+  for(;;)
+  {
+    int i;
+    void * advance_mem = alloced;
+    int copy_horizontal = 0;
+    stbir__sampler * possibly_use_horizontal_for_pivot = 0;
+
+#ifdef STBIR__SEPARATE_ALLOCATIONS
+    #define STBIR__NEXT_PTR( ptr, size, ntype ) if ( alloced ) { void * p = STBIR_MALLOC( size, user_data); if ( p == 0 ) { stbir__free_internal_mem( info ); return 0; } (ptr) = (ntype*)p; }
+#else
+    #define STBIR__NEXT_PTR( ptr, size, ntype ) advance_mem = (void*) ( ( ((size_t)advance_mem) + 15 ) & ~15 ); if ( alloced ) ptr = (ntype*)advance_mem; advance_mem = (char*)(((size_t)advance_mem) + (size));
+#endif
+
+    STBIR__NEXT_PTR( info, sizeof( stbir__info ), stbir__info );
+
+    STBIR__NEXT_PTR( info->split_info, sizeof( stbir__per_split_info ) * splits, stbir__per_split_info );
+
+    if ( info )
+    {
+      static stbir__alpha_weight_func * fancy_alpha_weights[6]  =    { stbir__fancy_alpha_weight_4ch,   stbir__fancy_alpha_weight_4ch,   stbir__fancy_alpha_weight_4ch,   stbir__fancy_alpha_weight_4ch,   stbir__fancy_alpha_weight_2ch,   stbir__fancy_alpha_weight_2ch };
+      static stbir__alpha_unweight_func * fancy_alpha_unweights[6] = { stbir__fancy_alpha_unweight_4ch, stbir__fancy_alpha_unweight_4ch, stbir__fancy_alpha_unweight_4ch, stbir__fancy_alpha_unweight_4ch, stbir__fancy_alpha_unweight_2ch, stbir__fancy_alpha_unweight_2ch };
+      static stbir__alpha_weight_func * simple_alpha_weights[6] = { stbir__simple_alpha_weight_4ch, stbir__simple_alpha_weight_4ch, stbir__simple_alpha_weight_4ch, stbir__simple_alpha_weight_4ch, stbir__simple_alpha_weight_2ch, stbir__simple_alpha_weight_2ch };
+      static stbir__alpha_unweight_func * simple_alpha_unweights[6] = { stbir__simple_alpha_unweight_4ch, stbir__simple_alpha_unweight_4ch, stbir__simple_alpha_unweight_4ch, stbir__simple_alpha_unweight_4ch, stbir__simple_alpha_unweight_2ch, stbir__simple_alpha_unweight_2ch };
+
+      // initialize info fields
+      info->alloced_mem = alloced;
+      info->alloced_total = alloced_total;
+
+      info->channels = channels;
+      info->effective_channels = effective_channels;
+
+      info->offset_x = new_x;
+      info->offset_y = new_y;
+      info->alloc_ring_buffer_num_entries = (int)alloc_ring_buffer_num_entries;
+      info->ring_buffer_num_entries = 0;
+      info->ring_buffer_length_bytes = (int)ring_buffer_length_bytes;
+      info->splits = splits;
+      info->vertical_first = vertical_first;
+
+      info->input_pixel_layout_internal = input_pixel_layout;
+      info->output_pixel_layout_internal = output_pixel_layout;
+
+      // setup alpha weight functions
+      info->alpha_weight = 0;
+      info->alpha_unweight = 0;
+
+      // handle alpha weighting functions and overrides
+      if ( alpha_weighting_type == 2 )
+      {
+        // high quality alpha multiplying on the way in, dividing on the way out
+        info->alpha_weight = fancy_alpha_weights[ input_pixel_layout - STBIRI_RGBA ];
+        info->alpha_unweight = fancy_alpha_unweights[ output_pixel_layout - STBIRI_RGBA ];
+      }
+      else if ( alpha_weighting_type == 4 )
+      {
+        // fast alpha multiplying on the way in, dividing on the way out
+        info->alpha_weight = simple_alpha_weights[ input_pixel_layout - STBIRI_RGBA ];
+        info->alpha_unweight = simple_alpha_unweights[ output_pixel_layout - STBIRI_RGBA ];
+      }
+      else if ( alpha_weighting_type == 1 )
+      {
+        // fast alpha on the way in, leave in premultiplied form on way out
+        info->alpha_weight = simple_alpha_weights[ input_pixel_layout - STBIRI_RGBA ];
+      }
+      else if ( alpha_weighting_type == 3 )
+      {
+        // incoming is premultiplied, fast alpha dividing on the way out - non-premultiplied output
+        info->alpha_unweight = simple_alpha_unweights[ output_pixel_layout - STBIRI_RGBA ];
+      }
+
+      // handle 3-chan color flipping, using the alpha weight path
+      if ( ( ( input_pixel_layout == STBIRI_RGB ) && ( output_pixel_layout == STBIRI_BGR ) ) ||
+           ( ( input_pixel_layout == STBIRI_BGR ) && ( output_pixel_layout == STBIRI_RGB ) ) )
+      {
+        // do the flipping on the smaller of the two ends
+        if ( horizontal->scale_info.scale < 1.0f )
+          info->alpha_unweight = stbir__simple_flip_3ch;
+        else
+          info->alpha_weight = stbir__simple_flip_3ch;
+      }
+
+    }
+
+    // get all the per-split buffers
+    for( i = 0 ; i < splits ; i++ )
+    {
+      STBIR__NEXT_PTR( info->split_info[i].decode_buffer, decode_buffer_size, float );
+
+#ifdef STBIR__SEPARATE_ALLOCATIONS
+
+      #ifdef STBIR_SIMD8
+      if ( ( info ) && ( effective_channels == 3 ) )
+        ++info->split_info[i].decode_buffer; // avx in 3 channel mode needs one float at the start of the buffer
+      #endif
+
+      STBIR__NEXT_PTR( info->split_info[i].ring_buffers, alloc_ring_buffer_num_entries * sizeof(float*), float* );
+      {
+        int j;
+        for( j = 0 ; j < alloc_ring_buffer_num_entries ; j++ )
+        {
+          STBIR__NEXT_PTR( info->split_info[i].ring_buffers[j], ring_buffer_length_bytes, float );
+          #ifdef STBIR_SIMD8
+          if ( ( info ) && ( effective_channels == 3 ) )
+            ++info->split_info[i].ring_buffers[j]; // avx in 3 channel mode needs one float at the start of the buffer
+          #endif
+        }
+      }
+#else
+      STBIR__NEXT_PTR( info->split_info[i].ring_buffer, ring_buffer_size, float );
+#endif
+      STBIR__NEXT_PTR( info->split_info[i].vertical_buffer, vertical_buffer_size, float );
+    }
+
+    // alloc memory for to-be-pivoted coeffs (if necessary)
+    if ( vertical->is_gather == 0 )
+    {
+      size_t both;
+      size_t temp_mem_amt;
+
+      // when in vertical scatter mode, we first build the coefficients in gather mode, and then pivot after,
+      //   that means we need two buffers, so we try to use the decode buffer and ring buffer for this. if that
+      //   is too small, we just allocate extra memory to use as this temp.
+
+      both = (size_t)vertical->gather_prescatter_contributors_size + (size_t)vertical->gather_prescatter_coefficients_size;
+
+#ifdef STBIR__SEPARATE_ALLOCATIONS
+      temp_mem_amt = decode_buffer_size;
+
+      #ifdef STBIR_SIMD8
+      if ( effective_channels == 3 )
+        --temp_mem_amt; // avx in 3 channel mode needs one float at the start of the buffer
+      #endif
+#else
+      temp_mem_amt = (size_t)( decode_buffer_size + ring_buffer_size + vertical_buffer_size ) * (size_t)splits;
+#endif
+      if ( temp_mem_amt >= both )
+      {
+        if ( info )
+        {
+          vertical->gather_prescatter_contributors = (stbir__contributors*)info->split_info[0].decode_buffer;
+          vertical->gather_prescatter_coefficients = (float*) ( ( (char*)info->split_info[0].decode_buffer ) + vertical->gather_prescatter_contributors_size );
+        }
+      }
+      else
+      {
+        // ring+decode memory is too small, so allocate temp memory
+        STBIR__NEXT_PTR( vertical->gather_prescatter_contributors, vertical->gather_prescatter_contributors_size, stbir__contributors );
+        STBIR__NEXT_PTR( vertical->gather_prescatter_coefficients, vertical->gather_prescatter_coefficients_size, float );
+      }
+    }
+
+    STBIR__NEXT_PTR( horizontal->contributors, horizontal->contributors_size, stbir__contributors );
+    STBIR__NEXT_PTR( horizontal->coefficients, horizontal->coefficients_size, float );
+
+    // are the two filters identical?? (happens a lot with mipmap generation)
+    if ( ( horizontal->filter_kernel == vertical->filter_kernel ) && ( horizontal->filter_support == vertical->filter_support ) && ( horizontal->edge == vertical->edge ) && ( horizontal->scale_info.output_sub_size == vertical->scale_info.output_sub_size ) )
+    {
+      float diff_scale = horizontal->scale_info.scale - vertical->scale_info.scale;
+      float diff_shift = horizontal->scale_info.pixel_shift - vertical->scale_info.pixel_shift;
+      if ( diff_scale < 0.0f ) diff_scale = -diff_scale;
+      if ( diff_shift < 0.0f ) diff_shift = -diff_shift;
+      if ( ( diff_scale <= stbir__small_float ) && ( diff_shift <= stbir__small_float ) )
+      {
+        if ( horizontal->is_gather == vertical->is_gather )
+        {
+          copy_horizontal = 1;
+          goto no_vert_alloc;
+        }
+        // everything matches, but vertical is scatter, horizontal is gather, use horizontal coeffs for vertical pivot coeffs
+        possibly_use_horizontal_for_pivot = horizontal;
+      }
+    }
+
+    STBIR__NEXT_PTR( vertical->contributors, vertical->contributors_size, stbir__contributors );
+    STBIR__NEXT_PTR( vertical->coefficients, vertical->coefficients_size, float );
+
+   no_vert_alloc:
+
+    if ( info )
+    {
+      STBIR_PROFILE_BUILD_START( horizontal );
+
+      stbir__calculate_filters( horizontal, 0, user_data STBIR_ONLY_PROFILE_BUILD_SET_INFO );
+
+      // setup the horizontal gather functions
+      // start with defaulting to the n_coeffs functions (specialized on channels and remnant leftover)
+      info->horizontal_gather_channels = stbir__horizontal_gather_n_coeffs_funcs[ effective_channels ][ horizontal->extent_info.widest & 3 ];
+      // but if the number of coeffs <= 12, use another set of special cases. <=12 coeffs is any enlarging resize, or shrinking resize down to about 1/3 size
+      if ( horizontal->extent_info.widest <= 12 )
+        info->horizontal_gather_channels = stbir__horizontal_gather_channels_funcs[ effective_channels ][ horizontal->extent_info.widest - 1 ];
+
+      info->scanline_extents.conservative.n0 = conservative->n0;
+      info->scanline_extents.conservative.n1 = conservative->n1;
+
+      // get exact extents
+      stbir__get_extents( horizontal, &info->scanline_extents );
+
+      // pack the horizontal coeffs
+      horizontal->coefficient_width = stbir__pack_coefficients(horizontal->num_contributors, horizontal->contributors, horizontal->coefficients, horizontal->coefficient_width, horizontal->extent_info.widest, info->scanline_extents.conservative.n0, info->scanline_extents.conservative.n1 );
+
+      STBIR_MEMCPY( &info->horizontal, horizontal, sizeof( stbir__sampler ) );
+
+      STBIR_PROFILE_BUILD_END( horizontal );
+
+      if ( copy_horizontal )
+      {
+        STBIR_MEMCPY( &info->vertical, horizontal, sizeof( stbir__sampler ) );
+      }
+      else
+      {
+        STBIR_PROFILE_BUILD_START( vertical );
+
+        stbir__calculate_filters( vertical, possibly_use_horizontal_for_pivot, user_data STBIR_ONLY_PROFILE_BUILD_SET_INFO );
+        STBIR_MEMCPY( &info->vertical, vertical, sizeof( stbir__sampler ) );
+
+        STBIR_PROFILE_BUILD_END( vertical );
+      }
+
+      // setup the vertical split ranges
+      stbir__get_split_info( info->split_info, info->splits, info->vertical.scale_info.output_sub_size, info->vertical.filter_pixel_margin, info->vertical.scale_info.input_full_size, info->vertical.is_gather, info->vertical.contributors );
+
+      // now we know precisely how many entries we need
+      info->ring_buffer_num_entries = info->vertical.extent_info.widest;
+
+      // we never need more ring buffer entries than the scanlines we're outputting
+      if ( ( !info->vertical.is_gather ) && ( info->ring_buffer_num_entries > conservative_split_output_size ) )
+        info->ring_buffer_num_entries = conservative_split_output_size;
+      STBIR_ASSERT( info->ring_buffer_num_entries <= info->alloc_ring_buffer_num_entries );
+    }
+    #undef STBIR__NEXT_PTR
+
+
+    // is this the first time through loop?
+    if ( info == 0 )
+    {
+      alloced_total = ( 15 + (size_t)advance_mem );
+      alloced = STBIR_MALLOC( alloced_total, user_data );
+      if ( alloced == 0 )
+        return 0;
+    }
+    else
+      return info;  // success
+  }
+}
+
+static int stbir__perform_resize( stbir__info const * info, int split_start, int split_count )
+{
+  stbir__per_split_info * split_info = info->split_info + split_start;
+
+  STBIR_PROFILE_CLEAR_EXTRAS();
+
+  STBIR_PROFILE_FIRST_START( looping );
+  if (info->vertical.is_gather)
+    stbir__vertical_gather_loop( info, split_info, split_count );
+  else
+    stbir__vertical_scatter_loop( info, split_info, split_count );
+  STBIR_PROFILE_END( looping );
+
+  return 1;
+}
+
+static void stbir__update_info_from_resize( stbir__info * info, STBIR_RESIZE * resize )
+{
+  static stbir__decode_pixels_func * decode_simple[STBIR_TYPE_HALF_FLOAT-STBIR_TYPE_UINT8_SRGB+1]=
+  {
+    /* 1ch-4ch */ stbir__decode_uint8_srgb, stbir__decode_uint8_srgb, 0, stbir__decode_float_linear, stbir__decode_half_float_linear,
+  };
+
+  static stbir__decode_pixels_func * decode_alphas[STBIRI_AR-STBIRI_RGBA+1][STBIR_TYPE_HALF_FLOAT-STBIR_TYPE_UINT8_SRGB+1]=
+  {
+    { /* RGBA */ stbir__decode_uint8_srgb4_linearalpha,      stbir__decode_uint8_srgb,      0, stbir__decode_float_linear,      stbir__decode_half_float_linear },
+    { /* BGRA */ stbir__decode_uint8_srgb4_linearalpha_BGRA, stbir__decode_uint8_srgb_BGRA, 0, stbir__decode_float_linear_BGRA, stbir__decode_half_float_linear_BGRA },
+    { /* ARGB */ stbir__decode_uint8_srgb4_linearalpha_ARGB, stbir__decode_uint8_srgb_ARGB, 0, stbir__decode_float_linear_ARGB, stbir__decode_half_float_linear_ARGB },
+    { /* ABGR */ stbir__decode_uint8_srgb4_linearalpha_ABGR, stbir__decode_uint8_srgb_ABGR, 0, stbir__decode_float_linear_ABGR, stbir__decode_half_float_linear_ABGR },
+    { /* RA   */ stbir__decode_uint8_srgb2_linearalpha,      stbir__decode_uint8_srgb,      0, stbir__decode_float_linear,      stbir__decode_half_float_linear },
+    { /* AR   */ stbir__decode_uint8_srgb2_linearalpha_AR,   stbir__decode_uint8_srgb_AR,   0, stbir__decode_float_linear_AR,   stbir__decode_half_float_linear_AR },
+  };
+
+  static stbir__decode_pixels_func * decode_simple_scaled_or_not[2][2]=
+  {
+    { stbir__decode_uint8_linear_scaled,  stbir__decode_uint8_linear }, { stbir__decode_uint16_linear_scaled, stbir__decode_uint16_linear },
+  };
+
+  static stbir__decode_pixels_func * decode_alphas_scaled_or_not[STBIRI_AR-STBIRI_RGBA+1][2][2]=
+  {
+    { /* RGBA */ { stbir__decode_uint8_linear_scaled,       stbir__decode_uint8_linear },      { stbir__decode_uint16_linear_scaled,      stbir__decode_uint16_linear } },
+    { /* BGRA */ { stbir__decode_uint8_linear_scaled_BGRA,  stbir__decode_uint8_linear_BGRA }, { stbir__decode_uint16_linear_scaled_BGRA, stbir__decode_uint16_linear_BGRA } },
+    { /* ARGB */ { stbir__decode_uint8_linear_scaled_ARGB,  stbir__decode_uint8_linear_ARGB }, { stbir__decode_uint16_linear_scaled_ARGB, stbir__decode_uint16_linear_ARGB } },
+    { /* ABGR */ { stbir__decode_uint8_linear_scaled_ABGR,  stbir__decode_uint8_linear_ABGR }, { stbir__decode_uint16_linear_scaled_ABGR, stbir__decode_uint16_linear_ABGR } },
+    { /* RA   */ { stbir__decode_uint8_linear_scaled,       stbir__decode_uint8_linear },      { stbir__decode_uint16_linear_scaled,      stbir__decode_uint16_linear } },
+    { /* AR   */ { stbir__decode_uint8_linear_scaled_AR,    stbir__decode_uint8_linear_AR },   { stbir__decode_uint16_linear_scaled_AR,   stbir__decode_uint16_linear_AR } }
+  };
+
+  static stbir__encode_pixels_func * encode_simple[STBIR_TYPE_HALF_FLOAT-STBIR_TYPE_UINT8_SRGB+1]=
+  {
+    /* 1ch-4ch */ stbir__encode_uint8_srgb, stbir__encode_uint8_srgb, 0, stbir__encode_float_linear, stbir__encode_half_float_linear,
+  };
+
+  static stbir__encode_pixels_func * encode_alphas[STBIRI_AR-STBIRI_RGBA+1][STBIR_TYPE_HALF_FLOAT-STBIR_TYPE_UINT8_SRGB+1]=
+  {
+    { /* RGBA */ stbir__encode_uint8_srgb4_linearalpha,      stbir__encode_uint8_srgb,      0, stbir__encode_float_linear,      stbir__encode_half_float_linear },
+    { /* BGRA */ stbir__encode_uint8_srgb4_linearalpha_BGRA, stbir__encode_uint8_srgb_BGRA, 0, stbir__encode_float_linear_BGRA, stbir__encode_half_float_linear_BGRA },
+    { /* ARGB */ stbir__encode_uint8_srgb4_linearalpha_ARGB, stbir__encode_uint8_srgb_ARGB, 0, stbir__encode_float_linear_ARGB, stbir__encode_half_float_linear_ARGB },
+    { /* ABGR */ stbir__encode_uint8_srgb4_linearalpha_ABGR, stbir__encode_uint8_srgb_ABGR, 0, stbir__encode_float_linear_ABGR, stbir__encode_half_float_linear_ABGR },
+    { /* RA   */ stbir__encode_uint8_srgb2_linearalpha,      stbir__encode_uint8_srgb,      0, stbir__encode_float_linear,      stbir__encode_half_float_linear },
+    { /* AR   */ stbir__encode_uint8_srgb2_linearalpha_AR,   stbir__encode_uint8_srgb_AR,   0, stbir__encode_float_linear_AR,   stbir__encode_half_float_linear_AR }
+  };
+
+  static stbir__encode_pixels_func * encode_simple_scaled_or_not[2][2]=
+  {
+    { stbir__encode_uint8_linear_scaled,  stbir__encode_uint8_linear }, { stbir__encode_uint16_linear_scaled, stbir__encode_uint16_linear },
+  };
+
+  static stbir__encode_pixels_func * encode_alphas_scaled_or_not[STBIRI_AR-STBIRI_RGBA+1][2][2]=
+  {
+    { /* RGBA */ { stbir__encode_uint8_linear_scaled,       stbir__encode_uint8_linear },       { stbir__encode_uint16_linear_scaled,      stbir__encode_uint16_linear } },
+    { /* BGRA */ { stbir__encode_uint8_linear_scaled_BGRA,  stbir__encode_uint8_linear_BGRA },  { stbir__encode_uint16_linear_scaled_BGRA, stbir__encode_uint16_linear_BGRA } },
+    { /* ARGB */ { stbir__encode_uint8_linear_scaled_ARGB,  stbir__encode_uint8_linear_ARGB },  { stbir__encode_uint16_linear_scaled_ARGB, stbir__encode_uint16_linear_ARGB } },
+    { /* ABGR */ { stbir__encode_uint8_linear_scaled_ABGR,  stbir__encode_uint8_linear_ABGR },  { stbir__encode_uint16_linear_scaled_ABGR, stbir__encode_uint16_linear_ABGR } },
+    { /* RA   */ { stbir__encode_uint8_linear_scaled,       stbir__encode_uint8_linear },       { stbir__encode_uint16_linear_scaled,      stbir__encode_uint16_linear } },
+    { /* AR   */ { stbir__encode_uint8_linear_scaled_AR,    stbir__encode_uint8_linear_AR },    { stbir__encode_uint16_linear_scaled_AR,   stbir__encode_uint16_linear_AR } }
+  };
+
+  stbir__decode_pixels_func * decode_pixels = 0;
+  stbir__encode_pixels_func * encode_pixels = 0;
+  stbir_datatype input_type, output_type;
+
+  input_type = resize->input_data_type;
+  output_type = resize->output_data_type;
+  info->input_data = resize->input_pixels;
+  info->input_stride_bytes = resize->input_stride_in_bytes;
+  info->output_stride_bytes = resize->output_stride_in_bytes;
+
+  // if we're completely point sampling, then we can turn off SRGB
+  if ( ( info->horizontal.filter_enum == STBIR_FILTER_POINT_SAMPLE ) && ( info->vertical.filter_enum == STBIR_FILTER_POINT_SAMPLE ) )
+  {
+    if ( ( ( input_type  == STBIR_TYPE_UINT8_SRGB ) || ( input_type  == STBIR_TYPE_UINT8_SRGB_ALPHA ) ) &&
+         ( ( output_type == STBIR_TYPE_UINT8_SRGB ) || ( output_type == STBIR_TYPE_UINT8_SRGB_ALPHA ) ) )
+    {
+      input_type = STBIR_TYPE_UINT8;
+      output_type = STBIR_TYPE_UINT8;
+    }
+  }
+
+  // recalc the output and input strides
+  if ( info->input_stride_bytes == 0 )
+    info->input_stride_bytes = info->channels * info->horizontal.scale_info.input_full_size * stbir__type_size[input_type];
+
+  if ( info->output_stride_bytes == 0 )
+    info->output_stride_bytes = info->channels * info->horizontal.scale_info.output_sub_size * stbir__type_size[output_type];
+
+  // calc offset
+  info->output_data = ( (char*) resize->output_pixels ) + ( (size_t) info->offset_y * (size_t) resize->output_stride_in_bytes ) + ( info->offset_x * info->channels * stbir__type_size[output_type] );
+
+  info->in_pixels_cb = resize->input_cb;
+  info->user_data = resize->user_data;
+  info->out_pixels_cb = resize->output_cb;
+
+  // setup the input format converters
+  if ( ( input_type == STBIR_TYPE_UINT8 ) || ( input_type == STBIR_TYPE_UINT16 ) )
+  {
+    int non_scaled = 0;
+
+    // check if we can run unscaled - 0-255.0/0-65535.0 instead of 0-1.0 (which is a tiny bit faster when doing linear 8->8 or 16->16)
+    if ( ( !info->alpha_weight ) && ( !info->alpha_unweight )  ) // don't short circuit when alpha weighting (get everything to 0-1.0 as usual)
+      if ( ( ( input_type == STBIR_TYPE_UINT8 ) && ( output_type == STBIR_TYPE_UINT8 ) ) || ( ( input_type == STBIR_TYPE_UINT16 ) && ( output_type == STBIR_TYPE_UINT16 ) ) )
+        non_scaled = 1;
+
+    if ( info->input_pixel_layout_internal <= STBIRI_4CHANNEL )
+      decode_pixels = decode_simple_scaled_or_not[ input_type == STBIR_TYPE_UINT16 ][ non_scaled ];
+    else
+      decode_pixels = decode_alphas_scaled_or_not[ ( info->input_pixel_layout_internal - STBIRI_RGBA ) % ( STBIRI_AR-STBIRI_RGBA+1 ) ][ input_type == STBIR_TYPE_UINT16 ][ non_scaled ];
+  }
+  else
+  {
+    if ( info->input_pixel_layout_internal <= STBIRI_4CHANNEL )
+      decode_pixels = decode_simple[ input_type - STBIR_TYPE_UINT8_SRGB ];
+    else
+      decode_pixels = decode_alphas[ ( info->input_pixel_layout_internal - STBIRI_RGBA ) % ( STBIRI_AR-STBIRI_RGBA+1 ) ][ input_type - STBIR_TYPE_UINT8_SRGB ];
+  }
+
+  // setup the output format converters
+  if ( ( output_type == STBIR_TYPE_UINT8 ) || ( output_type == STBIR_TYPE_UINT16 ) )
+  {
+    int non_scaled = 0;
+
+    // check if we can run unscaled - 0-255.0/0-65535.0 instead of 0-1.0 (which is a tiny bit faster when doing linear 8->8 or 16->16)
+    if ( ( !info->alpha_weight ) && ( !info->alpha_unweight ) ) // don't short circuit when alpha weighting (get everything to 0-1.0 as usual)
+      if ( ( ( input_type == STBIR_TYPE_UINT8 ) && ( output_type == STBIR_TYPE_UINT8 ) ) || ( ( input_type == STBIR_TYPE_UINT16 ) && ( output_type == STBIR_TYPE_UINT16 ) ) )
+        non_scaled = 1;
+
+    if ( info->output_pixel_layout_internal <= STBIRI_4CHANNEL )
+      encode_pixels = encode_simple_scaled_or_not[ output_type == STBIR_TYPE_UINT16 ][ non_scaled ];
+    else
+      encode_pixels = encode_alphas_scaled_or_not[ ( info->output_pixel_layout_internal - STBIRI_RGBA ) % ( STBIRI_AR-STBIRI_RGBA+1 ) ][ output_type == STBIR_TYPE_UINT16 ][ non_scaled ];
+  }
+  else
+  {
+    if ( info->output_pixel_layout_internal <= STBIRI_4CHANNEL )
+      encode_pixels = encode_simple[ output_type - STBIR_TYPE_UINT8_SRGB ];
+    else
+      encode_pixels = encode_alphas[ ( info->output_pixel_layout_internal - STBIRI_RGBA ) % ( STBIRI_AR-STBIRI_RGBA+1 ) ][ output_type - STBIR_TYPE_UINT8_SRGB ];
+  }
+
+  info->input_type = input_type;
+  info->output_type = output_type;
+  info->decode_pixels = decode_pixels;
+  info->encode_pixels = encode_pixels;
+}
+
+static void stbir__clip( int * outx, int * outsubw, int outw, double * u0, double * u1 )
+{
+  double per, adj;
+  int over;
+
+  // do left/top edge
+  if ( *outx < 0 )
+  {
+    per = ( (double)*outx ) / ( (double)*outsubw ); // is negative
+    adj = per * ( *u1 - *u0 );
+    *u0 -= adj; // increases u0
+    *outx = 0;
+  }
+
+  // do right/bot edge
+  over = outw - ( *outx + *outsubw );
+  if ( over < 0 )
+  {
+    per = ( (double)over ) / ( (double)*outsubw ); // is negative
+    adj = per * ( *u1 - *u0 );
+    *u1 += adj; // decrease u1
+    *outsubw = outw - *outx;
+  }
+}
+
+// converts a double to a rational that has less than one float bit of error (returns 0 if unable to do so)
+static int stbir__double_to_rational(double f, stbir_uint32 limit, stbir_uint32 *numer, stbir_uint32 *denom, int limit_denom ) // limit_denom (1) or limit numer (0)
+{
+  double err;
+  stbir_uint64 top, bot;
+  stbir_uint64 numer_last = 0;
+  stbir_uint64 denom_last = 1;
+  stbir_uint64 numer_estimate = 1;
+  stbir_uint64 denom_estimate = 0;
+
+  // scale to past float error range
+  top = (stbir_uint64)( f * (double)(1 << 25) );
+  bot = 1 << 25;
+
+  // keep refining, but usually stops in a few loops - usually 5 for bad cases
+  for(;;)
+  {
+    stbir_uint64 est, temp;
+
+    // hit limit, break out and do best full range estimate
+    if ( ( ( limit_denom ) ? denom_estimate : numer_estimate ) >= limit )
+      break;
+
+    // is the current error less than 1 bit of a float? if so, we're done
+    if ( denom_estimate )
+    {
+      err = ( (double)numer_estimate / (double)denom_estimate ) - f;
+      if ( err < 0.0 ) err = -err;
+      if ( err < ( 1.0 / (double)(1<<24) ) )
+      {
+        // yup, found it
+        *numer = (stbir_uint32) numer_estimate;
+        *denom = (stbir_uint32) denom_estimate;
+        return 1;
+      }
+    }
+
+    // no more refinement bits left? break out and do full range estimate
+    if ( bot == 0 )
+      break;
+
+    // gcd the estimate bits
+    est = top / bot;
+    temp = top % bot;
+    top = bot;
+    bot = temp;
+
+    // move remainders
+    temp = est * denom_estimate + denom_last;
+    denom_last = denom_estimate;
+    denom_estimate = temp;
+
+    // move remainders
+    temp = est * numer_estimate + numer_last;
+    numer_last = numer_estimate;
+    numer_estimate = temp;
+  }
+
+  // we didn't fine anything good enough for float, use a full range estimate
+  if ( limit_denom )
+  {
+    numer_estimate= (stbir_uint64)( f * (double)limit + 0.5 );
+    denom_estimate = limit;
+  }
+  else
+  {
+    numer_estimate = limit;
+    denom_estimate = (stbir_uint64)( ( (double)limit / f ) + 0.5 );
+  }
+
+  *numer = (stbir_uint32) numer_estimate;
+  *denom = (stbir_uint32) denom_estimate;
+
+  err = ( denom_estimate ) ? ( ( (double)(stbir_uint32)numer_estimate / (double)(stbir_uint32)denom_estimate ) - f ) : 1.0;
+  if ( err < 0.0 ) err = -err;
+  return ( err < ( 1.0 / (double)(1<<24) ) ) ? 1 : 0;
+}
+
+static int stbir__calculate_region_transform( stbir__scale_info * scale_info, int output_full_range, int * output_offset, int output_sub_range, int input_full_range, double input_s0, double input_s1 )
+{
+  double output_range, input_range, output_s, input_s, ratio, scale;
+
+  input_s = input_s1 - input_s0;
+
+  // null area
+  if ( ( output_full_range == 0 ) || ( input_full_range == 0 ) ||
+       ( output_sub_range == 0 ) || ( input_s <= stbir__small_float ) )
+    return 0;
+
+  // are either of the ranges completely out of bounds?
+  if ( ( *output_offset >= output_full_range ) || ( ( *output_offset + output_sub_range ) <= 0 ) || ( input_s0 >= (1.0f-stbir__small_float) ) || ( input_s1 <= stbir__small_float ) )
+    return 0;
+
+  output_range = (double)output_full_range;
+  input_range = (double)input_full_range;
+
+  output_s = ( (double)output_sub_range) / output_range;
+
+  // figure out the scaling to use
+  ratio = output_s / input_s;
+
+  // save scale before clipping
+  scale = ( output_range / input_range ) * ratio;
+  scale_info->scale = (float)scale;
+  scale_info->inv_scale = (float)( 1.0 / scale );
+
+  // clip output area to left/right output edges (and adjust input area)
+  stbir__clip( output_offset, &output_sub_range, output_full_range, &input_s0, &input_s1 );
+
+  // recalc input area
+  input_s = input_s1 - input_s0;
+
+  // after clipping do we have zero input area?
+  if ( input_s <= stbir__small_float )
+    return 0;
+
+  // calculate and store the starting source offsets in output pixel space
+  scale_info->pixel_shift = (float) ( input_s0 * ratio * output_range );
+
+  scale_info->scale_is_rational = stbir__double_to_rational( scale, ( scale <= 1.0 ) ? output_full_range : input_full_range, &scale_info->scale_numerator, &scale_info->scale_denominator, ( scale >= 1.0 ) );
+
+  scale_info->input_full_size = input_full_range;
+  scale_info->output_sub_size = output_sub_range;
+
+  return 1;
+}
+
+
+static void stbir__init_and_set_layout( STBIR_RESIZE * resize, stbir_pixel_layout pixel_layout, stbir_datatype data_type )
+{
+  resize->input_cb = 0;
+  resize->output_cb = 0;
+  resize->user_data = resize;
+  resize->samplers = 0;
+  resize->called_alloc = 0;
+  resize->horizontal_filter = STBIR_FILTER_DEFAULT;
+  resize->horizontal_filter_kernel = 0; resize->horizontal_filter_support = 0;
+  resize->vertical_filter = STBIR_FILTER_DEFAULT;
+  resize->vertical_filter_kernel = 0; resize->vertical_filter_support = 0;
+  resize->horizontal_edge = STBIR_EDGE_CLAMP;
+  resize->vertical_edge = STBIR_EDGE_CLAMP;
+  resize->input_s0 = 0; resize->input_t0 = 0; resize->input_s1 = 1; resize->input_t1 = 1;
+  resize->output_subx = 0; resize->output_suby = 0; resize->output_subw = resize->output_w; resize->output_subh = resize->output_h;
+  resize->input_data_type = data_type;
+  resize->output_data_type = data_type;
+  resize->input_pixel_layout_public = pixel_layout;
+  resize->output_pixel_layout_public = pixel_layout;
+  resize->needs_rebuild = 1;
+}
+
+STBIRDEF void stbir_resize_init( STBIR_RESIZE * resize,
+                                 const void *input_pixels,  int input_w,  int input_h, int input_stride_in_bytes, // stride can be zero
+                                       void *output_pixels, int output_w, int output_h, int output_stride_in_bytes, // stride can be zero
+                                 stbir_pixel_layout pixel_layout, stbir_datatype data_type )
+{
+  resize->input_pixels = input_pixels;
+  resize->input_w = input_w;
+  resize->input_h = input_h;
+  resize->input_stride_in_bytes = input_stride_in_bytes;
+  resize->output_pixels = output_pixels;
+  resize->output_w = output_w;
+  resize->output_h = output_h;
+  resize->output_stride_in_bytes = output_stride_in_bytes;
+  resize->fast_alpha = 0;
+
+  stbir__init_and_set_layout( resize, pixel_layout, data_type );
+}
+
+// You can update parameters any time after resize_init
+STBIRDEF void stbir_set_datatypes( STBIR_RESIZE * resize, stbir_datatype input_type, stbir_datatype output_type )  // by default, datatype from resize_init
+{
+  resize->input_data_type = input_type;
+  resize->output_data_type = output_type;
+  if ( ( resize->samplers ) && ( !resize->needs_rebuild ) )
+    stbir__update_info_from_resize( resize->samplers, resize );
+}
+
+STBIRDEF void stbir_set_pixel_callbacks( STBIR_RESIZE * resize, stbir_input_callback * input_cb, stbir_output_callback * output_cb )   // no callbacks by default
+{
+  resize->input_cb = input_cb;
+  resize->output_cb = output_cb;
+
+  if ( ( resize->samplers ) && ( !resize->needs_rebuild ) )
+  {
+    resize->samplers->in_pixels_cb = input_cb;
+    resize->samplers->out_pixels_cb = output_cb;
+  }
+}
+
+STBIRDEF void stbir_set_user_data( STBIR_RESIZE * resize, void * user_data )                                     // pass back STBIR_RESIZE* by default
+{
+  resize->user_data = user_data;
+  if ( ( resize->samplers ) && ( !resize->needs_rebuild ) )
+    resize->samplers->user_data = user_data;
+}
+
+STBIRDEF void stbir_set_buffer_ptrs( STBIR_RESIZE * resize, const void * input_pixels, int input_stride_in_bytes, void * output_pixels, int output_stride_in_bytes )
+{
+  resize->input_pixels = input_pixels;
+  resize->input_stride_in_bytes = input_stride_in_bytes;
+  resize->output_pixels = output_pixels;
+  resize->output_stride_in_bytes = output_stride_in_bytes;
+  if ( ( resize->samplers ) && ( !resize->needs_rebuild ) )
+    stbir__update_info_from_resize( resize->samplers, resize );
+}
+
+
+STBIRDEF int stbir_set_edgemodes( STBIR_RESIZE * resize, stbir_edge horizontal_edge, stbir_edge vertical_edge )       // CLAMP by default
+{
+  resize->horizontal_edge = horizontal_edge;
+  resize->vertical_edge = vertical_edge;
+  resize->needs_rebuild = 1;
+  return 1;
+}
+
+STBIRDEF int stbir_set_filters( STBIR_RESIZE * resize, stbir_filter horizontal_filter, stbir_filter vertical_filter ) // STBIR_DEFAULT_FILTER_UPSAMPLE/DOWNSAMPLE by default
+{
+  resize->horizontal_filter = horizontal_filter;
+  resize->vertical_filter = vertical_filter;
+  resize->needs_rebuild = 1;
+  return 1;
+}
+
+STBIRDEF int stbir_set_filter_callbacks( STBIR_RESIZE * resize, stbir__kernel_callback * horizontal_filter, stbir__support_callback * horizontal_support, stbir__kernel_callback * vertical_filter, stbir__support_callback * vertical_support )
+{
+  resize->horizontal_filter_kernel = horizontal_filter; resize->horizontal_filter_support = horizontal_support;
+  resize->vertical_filter_kernel = vertical_filter; resize->vertical_filter_support = vertical_support;
+  resize->needs_rebuild = 1;
+  return 1;
+}
+
+STBIRDEF int stbir_set_pixel_layouts( STBIR_RESIZE * resize, stbir_pixel_layout input_pixel_layout, stbir_pixel_layout output_pixel_layout )   // sets new pixel layouts
+{
+  resize->input_pixel_layout_public = input_pixel_layout;
+  resize->output_pixel_layout_public = output_pixel_layout;
+  resize->needs_rebuild = 1;
+  return 1;
+}
+
+
+STBIRDEF int stbir_set_non_pm_alpha_speed_over_quality( STBIR_RESIZE * resize, int non_pma_alpha_speed_over_quality )   // sets alpha speed
+{
+  resize->fast_alpha = non_pma_alpha_speed_over_quality;
+  resize->needs_rebuild = 1;
+  return 1;
+}
+
+STBIRDEF int stbir_set_input_subrect( STBIR_RESIZE * resize, double s0, double t0, double s1, double t1 )                 // sets input region (full region by default)
+{
+  resize->input_s0 = s0;
+  resize->input_t0 = t0;
+  resize->input_s1 = s1;
+  resize->input_t1 = t1;
+  resize->needs_rebuild = 1;
+
+  // are we inbounds?
+  if ( ( s1 < stbir__small_float ) || ( (s1-s0) < stbir__small_float ) ||
+       ( t1 < stbir__small_float ) || ( (t1-t0) < stbir__small_float ) ||
+       ( s0 > (1.0f-stbir__small_float) ) ||
+       ( t0 > (1.0f-stbir__small_float) ) )
+    return 0;
+
+  return 1;
+}
+
+STBIRDEF int stbir_set_output_pixel_subrect( STBIR_RESIZE * resize, int subx, int suby, int subw, int subh )          // sets input region (full region by default)
+{
+  resize->output_subx = subx;
+  resize->output_suby = suby;
+  resize->output_subw = subw;
+  resize->output_subh = subh;
+  resize->needs_rebuild = 1;
+
+  // are we inbounds?
+  if ( ( subx >= resize->output_w ) || ( ( subx + subw ) <= 0 ) || ( suby >= resize->output_h ) || ( ( suby + subh ) <= 0 ) || ( subw == 0 ) || ( subh == 0 ) )
+    return 0;
+
+  return 1;
+}
+
+STBIRDEF int stbir_set_pixel_subrect( STBIR_RESIZE * resize, int subx, int suby, int subw, int subh )                 // sets both regions (full regions by default)
+{
+  double s0, t0, s1, t1;
+
+  s0 = ( (double)subx ) / ( (double)resize->output_w );
+  t0 = ( (double)suby ) / ( (double)resize->output_h );
+  s1 = ( (double)(subx+subw) ) / ( (double)resize->output_w );
+  t1 = ( (double)(suby+subh) ) / ( (double)resize->output_h );
+
+  resize->input_s0 = s0;
+  resize->input_t0 = t0;
+  resize->input_s1 = s1;
+  resize->input_t1 = t1;
+  resize->output_subx = subx;
+  resize->output_suby = suby;
+  resize->output_subw = subw;
+  resize->output_subh = subh;
+  resize->needs_rebuild = 1;
+
+  // are we inbounds?
+  if ( ( subx >= resize->output_w ) || ( ( subx + subw ) <= 0 ) || ( suby >= resize->output_h ) || ( ( suby + subh ) <= 0 ) || ( subw == 0 ) || ( subh == 0 ) )
+    return 0;
+
+  return 1;
+}
+
+static int stbir__perform_build( STBIR_RESIZE * resize, int splits )
+{
+  stbir__contributors conservative = { 0, 0 };
+  stbir__sampler horizontal, vertical;
+  int new_output_subx, new_output_suby;
+  stbir__info * out_info;
+  #ifdef STBIR_PROFILE
+  stbir__info profile_infod;  // used to contain building profile info before everything is allocated
+  stbir__info * profile_info = &profile_infod;
+  #endif
+
+  // have we already built the samplers?
+  if ( resize->samplers )
+    return 0;
+
+  #define STBIR_RETURN_ERROR_AND_ASSERT( exp )  STBIR_ASSERT( !(exp) ); if (exp) return 0;
+  STBIR_RETURN_ERROR_AND_ASSERT( (unsigned)resize->horizontal_filter >= STBIR_FILTER_OTHER)
+  STBIR_RETURN_ERROR_AND_ASSERT( (unsigned)resize->vertical_filter >= STBIR_FILTER_OTHER)
+  #undef STBIR_RETURN_ERROR_AND_ASSERT
+
+  if ( splits <= 0 )
+    return 0;
+
+  STBIR_PROFILE_BUILD_FIRST_START( build );
+
+  new_output_subx = resize->output_subx;
+  new_output_suby = resize->output_suby;
+
+  // do horizontal clip and scale calcs
+  if ( !stbir__calculate_region_transform( &horizontal.scale_info, resize->output_w, &new_output_subx, resize->output_subw, resize->input_w, resize->input_s0, resize->input_s1 ) )
+    return 0;
+
+  // do vertical clip and scale calcs
+  if ( !stbir__calculate_region_transform( &vertical.scale_info, resize->output_h, &new_output_suby, resize->output_subh, resize->input_h, resize->input_t0, resize->input_t1 ) )
+    return 0;
+
+  // if nothing to do, just return
+  if ( ( horizontal.scale_info.output_sub_size == 0 ) || ( vertical.scale_info.output_sub_size == 0 ) )
+    return 0;
+
+  stbir__set_sampler(&horizontal, resize->horizontal_filter, resize->horizontal_filter_kernel, resize->horizontal_filter_support, resize->horizontal_edge, &horizontal.scale_info, 1, resize->user_data );
+  stbir__get_conservative_extents( &horizontal, &conservative, resize->user_data );
+  stbir__set_sampler(&vertical, resize->vertical_filter, resize->vertical_filter_kernel, resize->vertical_filter_support, resize->vertical_edge, &vertical.scale_info, 0, resize->user_data );
+
+  if ( ( vertical.scale_info.output_sub_size / splits ) < STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS ) // each split should be a minimum of 4 scanlines (handwavey choice)
+  {
+    splits = vertical.scale_info.output_sub_size / STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS;
+    if ( splits == 0 ) splits = 1;
+  }
+
+  STBIR_PROFILE_BUILD_START( alloc );
+  out_info = stbir__alloc_internal_mem_and_build_samplers( &horizontal, &vertical, &conservative, resize->input_pixel_layout_public, resize->output_pixel_layout_public, splits, new_output_subx, new_output_suby, resize->fast_alpha, resize->user_data STBIR_ONLY_PROFILE_BUILD_SET_INFO );
+  STBIR_PROFILE_BUILD_END( alloc );
+  STBIR_PROFILE_BUILD_END( build );
+
+  if ( out_info )
+  {
+    resize->splits = splits;
+    resize->samplers = out_info;
+    resize->needs_rebuild = 0;
+    #ifdef STBIR_PROFILE
+      STBIR_MEMCPY( &out_info->profile, &profile_infod.profile, sizeof( out_info->profile ) );
+    #endif
+
+    // update anything that can be changed without recalcing samplers
+    stbir__update_info_from_resize( out_info, resize );
+
+    return splits;
+  }
+
+  return 0;
+}
+
+void stbir_free_samplers( STBIR_RESIZE * resize )
+{
+  if ( resize->samplers )
+  {
+    stbir__free_internal_mem( resize->samplers );
+    resize->samplers = 0;
+    resize->called_alloc = 0;
+  }
+}
+
+STBIRDEF int stbir_build_samplers_with_splits( STBIR_RESIZE * resize, int splits )
+{
+  if ( ( resize->samplers == 0 ) || ( resize->needs_rebuild ) )
+  {
+    if ( resize->samplers )
+      stbir_free_samplers( resize );
+
+    resize->called_alloc = 1;
+    return stbir__perform_build( resize, splits );
+  }
+
+  STBIR_PROFILE_BUILD_CLEAR( resize->samplers );
+
+  return 1;
+}
+
+STBIRDEF int stbir_build_samplers( STBIR_RESIZE * resize )
+{
+  return stbir_build_samplers_with_splits( resize, 1 );
+}
+
+STBIRDEF int stbir_resize_extended( STBIR_RESIZE * resize )
+{
+  int result;
+
+  if ( ( resize->samplers == 0 ) || ( resize->needs_rebuild ) )
+  {
+    int alloc_state = resize->called_alloc;  // remember allocated state
+
+    if ( resize->samplers )
+    {
+      stbir__free_internal_mem( resize->samplers );
+      resize->samplers = 0;
+    }
+
+    if ( !stbir_build_samplers( resize ) )
+      return 0;
+
+    resize->called_alloc = alloc_state;
+
+    // if build_samplers succeeded (above), but there are no samplers set, then
+    //   the area to stretch into was zero pixels, so don't do anything and return
+    //   success
+    if ( resize->samplers == 0 )
+      return 1;
+  }
+  else
+  {
+    // didn't build anything - clear it
+    STBIR_PROFILE_BUILD_CLEAR( resize->samplers );
+  }
+
+  // do resize
+  result = stbir__perform_resize( resize->samplers, 0, resize->splits );
+
+  // if we alloced, then free
+  if ( !resize->called_alloc )
+  {
+    stbir_free_samplers( resize );
+    resize->samplers = 0;
+  }
+
+  return result;
+}
+
+STBIRDEF int stbir_resize_extended_split( STBIR_RESIZE * resize, int split_start, int split_count )
+{
+  STBIR_ASSERT( resize->samplers );
+
+  // if we're just doing the whole thing, call full
+  if ( ( split_start == -1 ) || ( ( split_start == 0 ) && ( split_count == resize->splits ) ) )
+    return stbir_resize_extended( resize );
+
+  // you **must** build samplers first when using split resize
+  if ( ( resize->samplers == 0 ) || ( resize->needs_rebuild ) )
+    return 0;
+
+  if ( ( split_start >= resize->splits ) || ( split_start < 0 ) || ( ( split_start + split_count ) > resize->splits ) || ( split_count <= 0 ) )
+    return 0;
+
+  // do resize
+  return stbir__perform_resize( resize->samplers, split_start, split_count );
+}
+
+
+static void * stbir_quick_resize_helper( const void *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                               void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                               stbir_pixel_layout pixel_layout, stbir_datatype data_type, stbir_edge edge, stbir_filter filter )
+{
+  STBIR_RESIZE resize;
+  int scanline_output_in_bytes;
+  int positive_output_stride_in_bytes;
+  void * start_ptr;
+  void * free_ptr;
+
+  scanline_output_in_bytes = output_w * stbir__type_size[ data_type ] * stbir__pixel_channels[ stbir__pixel_layout_convert_public_to_internal[ pixel_layout ] ];
+  if ( scanline_output_in_bytes == 0 )
+    return 0;
+
+  // if zero stride, use scanline output
+  if ( output_stride_in_bytes == 0 )
+    output_stride_in_bytes = scanline_output_in_bytes;
+
+  // abs value for inverted images (negative pitches)
+  positive_output_stride_in_bytes = output_stride_in_bytes;
+  if ( positive_output_stride_in_bytes < 0 )
+    positive_output_stride_in_bytes = -positive_output_stride_in_bytes;
+
+  // is the requested stride smaller than the scanline output? if so, just fail
+  if ( positive_output_stride_in_bytes < scanline_output_in_bytes )
+    return 0;
+
+  start_ptr = output_pixels;
+  free_ptr = 0;  // no free pointer, since they passed buffer to use
+
+  // did they pass a zero for the dest? if so, allocate the buffer
+  if ( output_pixels == 0 )
+  {
+    size_t size;
+    char * ptr;
+  
+    size = (size_t)positive_output_stride_in_bytes * (size_t)output_h;
+    if ( size == 0 )
+      return 0;
+
+    ptr = (char*) STBIR_MALLOC( size, 0 );
+    if ( ptr == 0 )
+      return 0;
+
+    free_ptr = ptr;
+
+    // point at the last scanline, if they requested a flipped image
+    if ( output_stride_in_bytes < 0 )
+      start_ptr = ptr + ( (size_t)positive_output_stride_in_bytes * (size_t)( output_h - 1 ) );
+    else
+      start_ptr = ptr;
+  }
+
+  // ok, now do the resize
+  stbir_resize_init( &resize,
+                     input_pixels,  input_w,  input_h,  input_stride_in_bytes,
+                     start_ptr, output_w, output_h, output_stride_in_bytes,
+                     pixel_layout, data_type );
+
+  resize.horizontal_edge = edge;
+  resize.vertical_edge = edge;
+  resize.horizontal_filter = filter;
+  resize.vertical_filter = filter;
+
+  if ( !stbir_resize_extended( &resize ) )
+  {
+    if ( free_ptr )
+      STBIR_FREE( free_ptr, 0 );
+    return 0;
+  }
+
+  return (free_ptr) ? free_ptr : start_ptr;
+}
+
+
+
+STBIRDEF unsigned char * stbir_resize_uint8_linear( const unsigned char *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                                          unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                                          stbir_pixel_layout pixel_layout )
+{
+  return (unsigned char *) stbir_quick_resize_helper( input_pixels , input_w , input_h, input_stride_in_bytes, 
+                                                      output_pixels, output_w, output_h, output_stride_in_bytes, 
+                                                      pixel_layout, STBIR_TYPE_UINT8, STBIR_EDGE_CLAMP, STBIR_FILTER_DEFAULT );
+}
+
+STBIRDEF unsigned char * stbir_resize_uint8_srgb( const unsigned char *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                                        unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                                        stbir_pixel_layout pixel_layout )
+{
+  return (unsigned char *) stbir_quick_resize_helper( input_pixels , input_w , input_h, input_stride_in_bytes, 
+                                                      output_pixels, output_w, output_h, output_stride_in_bytes, 
+                                                      pixel_layout, STBIR_TYPE_UINT8_SRGB, STBIR_EDGE_CLAMP, STBIR_FILTER_DEFAULT );
+}
+
+
+STBIRDEF float * stbir_resize_float_linear( const float *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                                  float *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                                  stbir_pixel_layout pixel_layout )
+{
+  return (float *) stbir_quick_resize_helper( input_pixels , input_w , input_h, input_stride_in_bytes, 
+                                              output_pixels, output_w, output_h, output_stride_in_bytes, 
+                                              pixel_layout, STBIR_TYPE_FLOAT, STBIR_EDGE_CLAMP, STBIR_FILTER_DEFAULT  );
+}
+
+
+STBIRDEF void * stbir_resize( const void *input_pixels , int input_w , int input_h, int input_stride_in_bytes,
+                                    void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                    stbir_pixel_layout pixel_layout, stbir_datatype data_type,
+                                    stbir_edge edge, stbir_filter filter )
+{
+  return (void *) stbir_quick_resize_helper( input_pixels , input_w , input_h, input_stride_in_bytes, 
+                                             output_pixels, output_w, output_h, output_stride_in_bytes, 
+                                             pixel_layout, data_type, edge, filter  );
+}
+
+#ifdef STBIR_PROFILE
+
+STBIRDEF void stbir_resize_build_profile_info( STBIR_PROFILE_INFO * info, STBIR_RESIZE const * resize )
+{
+  static char const * bdescriptions[6] = { "Building", "Allocating", "Horizontal sampler", "Vertical sampler", "Coefficient cleanup", "Coefficient piovot" } ;
+  stbir__info* samp = resize->samplers;
+  int i;
+
+  typedef int testa[ (STBIR__ARRAY_SIZE( bdescriptions ) == (STBIR__ARRAY_SIZE( samp->profile.array )-1) )?1:-1];
+  typedef int testb[ (sizeof( samp->profile.array ) == (sizeof(samp->profile.named)) )?1:-1];
+  typedef int testc[ (sizeof( info->clocks ) >= (sizeof(samp->profile.named)) )?1:-1];
+
+  for( i = 0 ; i < STBIR__ARRAY_SIZE( bdescriptions ) ; i++)
+    info->clocks[i] = samp->profile.array[i+1];
+
+  info->total_clocks = samp->profile.named.total;
+  info->descriptions = bdescriptions;
+  info->count = STBIR__ARRAY_SIZE( bdescriptions );
+}
+
+STBIRDEF void stbir_resize_split_profile_info( STBIR_PROFILE_INFO * info, STBIR_RESIZE const * resize, int split_start, int split_count )
+{
+  static char const * descriptions[7] = { "Looping", "Vertical sampling", "Horizontal sampling", "Scanline input", "Scanline output", "Alpha weighting", "Alpha unweighting" };
+  stbir__per_split_info * split_info;
+  int s, i;
+
+  typedef int testa[ (STBIR__ARRAY_SIZE( descriptions ) == (STBIR__ARRAY_SIZE( split_info->profile.array )-1) )?1:-1];
+  typedef int testb[ (sizeof( split_info->profile.array ) == (sizeof(split_info->profile.named)) )?1:-1];
+  typedef int testc[ (sizeof( info->clocks ) >= (sizeof(split_info->profile.named)) )?1:-1];
+
+  if ( split_start == -1 )
+  {
+    split_start = 0;
+    split_count = resize->samplers->splits;
+  }
+
+  if ( ( split_start >= resize->splits ) || ( split_start < 0 ) || ( ( split_start + split_count ) > resize->splits ) || ( split_count <= 0 ) )
+  {
+    info->total_clocks = 0;
+    info->descriptions = 0;
+    info->count = 0;
+    return;
+  }
+
+  split_info = resize->samplers->split_info + split_start;
+
+  // sum up the profile from all the splits
+  for( i = 0 ; i < STBIR__ARRAY_SIZE( descriptions ) ; i++ )
+  {
+    stbir_uint64 sum = 0;
+    for( s = 0 ; s < split_count ; s++ )
+      sum += split_info[s].profile.array[i+1];
+    info->clocks[i] = sum;
+  }
+
+  info->total_clocks = split_info->profile.named.total;
+  info->descriptions = descriptions;
+  info->count = STBIR__ARRAY_SIZE( descriptions );
+}
+
+STBIRDEF void stbir_resize_extended_profile_info( STBIR_PROFILE_INFO * info, STBIR_RESIZE const * resize )
+{
+  stbir_resize_split_profile_info( info, resize, -1, 0 );
+}
+
+#endif // STBIR_PROFILE
+
+#undef STBIR_BGR
+#undef STBIR_1CHANNEL
+#undef STBIR_2CHANNEL
+#undef STBIR_RGB
+#undef STBIR_RGBA
+#undef STBIR_4CHANNEL
+#undef STBIR_BGRA
+#undef STBIR_ARGB
+#undef STBIR_ABGR
+#undef STBIR_RA
+#undef STBIR_AR
+#undef STBIR_RGBA_PM
+#undef STBIR_BGRA_PM
+#undef STBIR_ARGB_PM
+#undef STBIR_ABGR_PM
+#undef STBIR_RA_PM
+#undef STBIR_AR_PM
+
+#endif // STB_IMAGE_RESIZE_IMPLEMENTATION
+
+#else  // STB_IMAGE_RESIZE_HORIZONTALS&STB_IMAGE_RESIZE_DO_VERTICALS
+
+// we reinclude the header file to define all the horizontal functions
+//   specializing each function for the number of coeffs is 20-40% faster *OVERALL*
+
+// by including the header file again this way, we can still debug the functions
+
+#define STBIR_strs_join2( start, mid, end ) start##mid##end
+#define STBIR_strs_join1( start, mid, end ) STBIR_strs_join2( start, mid, end )
+
+#define STBIR_strs_join24( start, mid1, mid2, end ) start##mid1##mid2##end
+#define STBIR_strs_join14( start, mid1, mid2, end ) STBIR_strs_join24( start, mid1, mid2, end )
+
+#ifdef STB_IMAGE_RESIZE_DO_CODERS
+
+#ifdef stbir__decode_suffix
+#define STBIR__CODER_NAME( name ) STBIR_strs_join1( name, _, stbir__decode_suffix )
+#else
+#define STBIR__CODER_NAME( name ) name
+#endif
+
+#ifdef stbir__decode_swizzle
+#define stbir__decode_simdf8_flip(reg) STBIR_strs_join1( STBIR_strs_join1( STBIR_strs_join1( STBIR_strs_join1( stbir__simdf8_0123to,stbir__decode_order0,stbir__decode_order1),stbir__decode_order2,stbir__decode_order3),stbir__decode_order0,stbir__decode_order1),stbir__decode_order2,stbir__decode_order3)(reg, reg)
+#define stbir__decode_simdf4_flip(reg) STBIR_strs_join1( STBIR_strs_join1( stbir__simdf_0123to,stbir__decode_order0,stbir__decode_order1),stbir__decode_order2,stbir__decode_order3)(reg, reg)
+#define stbir__encode_simdf8_unflip(reg) STBIR_strs_join1( STBIR_strs_join1( STBIR_strs_join1( STBIR_strs_join1( stbir__simdf8_0123to,stbir__encode_order0,stbir__encode_order1),stbir__encode_order2,stbir__encode_order3),stbir__encode_order0,stbir__encode_order1),stbir__encode_order2,stbir__encode_order3)(reg, reg)
+#define stbir__encode_simdf4_unflip(reg) STBIR_strs_join1( STBIR_strs_join1( stbir__simdf_0123to,stbir__encode_order0,stbir__encode_order1),stbir__encode_order2,stbir__encode_order3)(reg, reg)
+#else
+#define stbir__decode_order0 0
+#define stbir__decode_order1 1
+#define stbir__decode_order2 2
+#define stbir__decode_order3 3
+#define stbir__encode_order0 0
+#define stbir__encode_order1 1
+#define stbir__encode_order2 2
+#define stbir__encode_order3 3
+#define stbir__decode_simdf8_flip(reg)
+#define stbir__decode_simdf4_flip(reg)
+#define stbir__encode_simdf8_unflip(reg)
+#define stbir__encode_simdf4_unflip(reg)
+#endif
+
+#ifdef STBIR_SIMD8
+#define stbir__encode_simdfX_unflip  stbir__encode_simdf8_unflip
+#else
+#define stbir__encode_simdfX_unflip  stbir__encode_simdf4_unflip
+#endif
+
+static float * STBIR__CODER_NAME( stbir__decode_uint8_linear_scaled )( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned char const * input = (unsigned char const*)inputp;
+
+  #ifdef STBIR_SIMD
+  unsigned char const * end_input_m16 = input + width_times_channels - 16;
+  if ( width_times_channels >= 16 )
+  {
+    decode_end -= 16;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      #ifdef STBIR_SIMD8
+      stbir__simdi i; stbir__simdi8 o0,o1;
+      stbir__simdf8 of0, of1;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi8_expand_u8_to_u32( o0, o1, i );
+      stbir__simdi8_convert_i32_to_float( of0, o0 );
+      stbir__simdi8_convert_i32_to_float( of1, o1 );
+      stbir__simdf8_mult( of0, of0, STBIR_max_uint8_as_float_inverted8);
+      stbir__simdf8_mult( of1, of1, STBIR_max_uint8_as_float_inverted8);
+      stbir__decode_simdf8_flip( of0 );
+      stbir__decode_simdf8_flip( of1 );
+      stbir__simdf8_store( decode + 0, of0 );
+      stbir__simdf8_store( decode + 8, of1 );
+      #else
+      stbir__simdi i, o0, o1, o2, o3;
+      stbir__simdf of0, of1, of2, of3;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi_expand_u8_to_u32( o0,o1,o2,o3,i);
+      stbir__simdi_convert_i32_to_float( of0, o0 );
+      stbir__simdi_convert_i32_to_float( of1, o1 );
+      stbir__simdi_convert_i32_to_float( of2, o2 );
+      stbir__simdi_convert_i32_to_float( of3, o3 );
+      stbir__simdf_mult( of0, of0, STBIR__CONSTF(STBIR_max_uint8_as_float_inverted) );
+      stbir__simdf_mult( of1, of1, STBIR__CONSTF(STBIR_max_uint8_as_float_inverted) );
+      stbir__simdf_mult( of2, of2, STBIR__CONSTF(STBIR_max_uint8_as_float_inverted) );
+      stbir__simdf_mult( of3, of3, STBIR__CONSTF(STBIR_max_uint8_as_float_inverted) );
+      stbir__decode_simdf4_flip( of0 );
+      stbir__decode_simdf4_flip( of1 );
+      stbir__decode_simdf4_flip( of2 );
+      stbir__decode_simdf4_flip( of3 );
+      stbir__simdf_store( decode + 0,  of0 );
+      stbir__simdf_store( decode + 4,  of1 );
+      stbir__simdf_store( decode + 8,  of2 );
+      stbir__simdf_store( decode + 12, of3 );
+      #endif
+      decode += 16;
+      input += 16;
+      if ( decode <= decode_end )
+        continue;
+      if ( decode == ( decode_end + 16 ) )
+        break;
+      decode = decode_end; // backup and do last couple
+      input = end_input_m16;
+    }
+    return decode_end + 16;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode <= decode_end )
+  {
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0-4] = ((float)(input[stbir__decode_order0])) * stbir__max_uint8_as_float_inverted;
+    decode[1-4] = ((float)(input[stbir__decode_order1])) * stbir__max_uint8_as_float_inverted;
+    decode[2-4] = ((float)(input[stbir__decode_order2])) * stbir__max_uint8_as_float_inverted;
+    decode[3-4] = ((float)(input[stbir__decode_order3])) * stbir__max_uint8_as_float_inverted;
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = ((float)(input[stbir__decode_order0])) * stbir__max_uint8_as_float_inverted;
+    #if stbir__coder_min_num >= 2
+    decode[1] = ((float)(input[stbir__decode_order1])) * stbir__max_uint8_as_float_inverted;
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = ((float)(input[stbir__decode_order2])) * stbir__max_uint8_as_float_inverted;
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+
+  return decode_end;
+}
+
+static void STBIR__CODER_NAME( stbir__encode_uint8_linear_scaled )( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned char STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned char *) outputp;
+  unsigned char * end_output = ( (unsigned char *) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  if ( width_times_channels >= stbir__simdfX_float_count*2 )
+  {
+    float const * end_encode_m8 = encode + width_times_channels - stbir__simdfX_float_count*2;
+    end_output -= stbir__simdfX_float_count*2;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      stbir__simdfX e0, e1;
+      stbir__simdi i;
+      STBIR_SIMD_NO_UNROLL(encode);
+      stbir__simdfX_madd_mem( e0, STBIR_simd_point5X, STBIR_max_uint8_as_floatX, encode );
+      stbir__simdfX_madd_mem( e1, STBIR_simd_point5X, STBIR_max_uint8_as_floatX, encode+stbir__simdfX_float_count );
+      stbir__encode_simdfX_unflip( e0 );
+      stbir__encode_simdfX_unflip( e1 );
+      #ifdef STBIR_SIMD8
+      stbir__simdf8_pack_to_16bytes( i, e0, e1 );
+      stbir__simdi_store( output, i );
+      #else
+      stbir__simdf_pack_to_8bytes( i, e0, e1 );
+      stbir__simdi_store2( output, i );
+      #endif
+      encode += stbir__simdfX_float_count*2;
+      output += stbir__simdfX_float_count*2;
+      if ( output <= end_output )
+        continue;
+      if ( output == ( end_output + stbir__simdfX_float_count*2 ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m8;
+    }
+    return;
+  }
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    stbir__simdf e0;
+    stbir__simdi i0;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_load( e0, encode );
+    stbir__simdf_madd( e0, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint8_as_float), e0 );
+    stbir__encode_simdf4_unflip( e0 );
+    stbir__simdf_pack_to_8bytes( i0, e0, e0 );  // only use first 4
+    *(int*)(output-4) = stbir__simdi_to_int( i0 );
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    stbir__simdf e0;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_madd1_mem( e0, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint8_as_float), encode+stbir__encode_order0 ); output[0] = stbir__simdf_convert_float_to_uint8( e0 );
+    #if stbir__coder_min_num >= 2
+    stbir__simdf_madd1_mem( e0, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint8_as_float), encode+stbir__encode_order1 ); output[1] = stbir__simdf_convert_float_to_uint8( e0 );
+    #endif
+    #if stbir__coder_min_num >= 3
+    stbir__simdf_madd1_mem( e0, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint8_as_float), encode+stbir__encode_order2 ); output[2] = stbir__simdf_convert_float_to_uint8( e0 );
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+
+  #else
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  while( output <= end_output )
+  {
+    float f;
+    f = encode[stbir__encode_order0] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[0-4] = (unsigned char)f;
+    f = encode[stbir__encode_order1] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[1-4] = (unsigned char)f;
+    f = encode[stbir__encode_order2] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[2-4] = (unsigned char)f;
+    f = encode[stbir__encode_order3] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[3-4] = (unsigned char)f;
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    float f;
+    STBIR_NO_UNROLL(encode);
+    f = encode[stbir__encode_order0] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[0] = (unsigned char)f;
+    #if stbir__coder_min_num >= 2
+    f = encode[stbir__encode_order1] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[1] = (unsigned char)f;
+    #endif
+    #if stbir__coder_min_num >= 3
+    f = encode[stbir__encode_order2] * stbir__max_uint8_as_float + 0.5f; STBIR_CLAMP(f, 0, 255); output[2] = (unsigned char)f;
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+  #endif
+}
+
+static float * STBIR__CODER_NAME(stbir__decode_uint8_linear)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned char const * input = (unsigned char const*)inputp;
+
+  #ifdef STBIR_SIMD
+  unsigned char const * end_input_m16 = input + width_times_channels - 16;
+  if ( width_times_channels >= 16 )
+  {
+    decode_end -= 16;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      #ifdef STBIR_SIMD8
+      stbir__simdi i; stbir__simdi8 o0,o1;
+      stbir__simdf8 of0, of1;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi8_expand_u8_to_u32( o0, o1, i );
+      stbir__simdi8_convert_i32_to_float( of0, o0 );
+      stbir__simdi8_convert_i32_to_float( of1, o1 );
+      stbir__decode_simdf8_flip( of0 );
+      stbir__decode_simdf8_flip( of1 );
+      stbir__simdf8_store( decode + 0, of0 );
+      stbir__simdf8_store( decode + 8, of1 );
+      #else
+      stbir__simdi i, o0, o1, o2, o3;
+      stbir__simdf of0, of1, of2, of3;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi_expand_u8_to_u32( o0,o1,o2,o3,i);
+      stbir__simdi_convert_i32_to_float( of0, o0 );
+      stbir__simdi_convert_i32_to_float( of1, o1 );
+      stbir__simdi_convert_i32_to_float( of2, o2 );
+      stbir__simdi_convert_i32_to_float( of3, o3 );
+      stbir__decode_simdf4_flip( of0 );
+      stbir__decode_simdf4_flip( of1 );
+      stbir__decode_simdf4_flip( of2 );
+      stbir__decode_simdf4_flip( of3 );
+      stbir__simdf_store( decode + 0,  of0 );
+      stbir__simdf_store( decode + 4,  of1 );
+      stbir__simdf_store( decode + 8,  of2 );
+      stbir__simdf_store( decode + 12, of3 );
+#endif
+      decode += 16;
+      input += 16;
+      if ( decode <= decode_end )
+        continue;
+      if ( decode == ( decode_end + 16 ) )
+        break;
+      decode = decode_end; // backup and do last couple
+      input = end_input_m16;
+    }
+    return decode_end + 16;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode <= decode_end )
+  {
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0-4] = ((float)(input[stbir__decode_order0]));
+    decode[1-4] = ((float)(input[stbir__decode_order1]));
+    decode[2-4] = ((float)(input[stbir__decode_order2]));
+    decode[3-4] = ((float)(input[stbir__decode_order3]));
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = ((float)(input[stbir__decode_order0]));
+    #if stbir__coder_min_num >= 2
+    decode[1] = ((float)(input[stbir__decode_order1]));
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = ((float)(input[stbir__decode_order2]));
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+  return decode_end;
+}
+
+static void STBIR__CODER_NAME( stbir__encode_uint8_linear )( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned char STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned char *) outputp;
+  unsigned char * end_output = ( (unsigned char *) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  if ( width_times_channels >= stbir__simdfX_float_count*2 )
+  {
+    float const * end_encode_m8 = encode + width_times_channels - stbir__simdfX_float_count*2;
+    end_output -= stbir__simdfX_float_count*2;
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      stbir__simdfX e0, e1;
+      stbir__simdi i;
+      STBIR_SIMD_NO_UNROLL(encode);
+      stbir__simdfX_add_mem( e0, STBIR_simd_point5X, encode );
+      stbir__simdfX_add_mem( e1, STBIR_simd_point5X, encode+stbir__simdfX_float_count );
+      stbir__encode_simdfX_unflip( e0 );
+      stbir__encode_simdfX_unflip( e1 );
+      #ifdef STBIR_SIMD8
+      stbir__simdf8_pack_to_16bytes( i, e0, e1 );
+      stbir__simdi_store( output, i );
+      #else
+      stbir__simdf_pack_to_8bytes( i, e0, e1 );
+      stbir__simdi_store2( output, i );
+      #endif
+      encode += stbir__simdfX_float_count*2;
+      output += stbir__simdfX_float_count*2;
+      if ( output <= end_output )
+        continue;
+      if ( output == ( end_output + stbir__simdfX_float_count*2 ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m8;
+    }
+    return;
+  }
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    stbir__simdf e0;
+    stbir__simdi i0;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_load( e0, encode );
+    stbir__simdf_add( e0, STBIR__CONSTF(STBIR_simd_point5), e0 );
+    stbir__encode_simdf4_unflip( e0 );
+    stbir__simdf_pack_to_8bytes( i0, e0, e0 );  // only use first 4
+    *(int*)(output-4) = stbir__simdi_to_int( i0 );
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  #else
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  while( output <= end_output )
+  {
+    float f;
+    f = encode[stbir__encode_order0] + 0.5f; STBIR_CLAMP(f, 0, 255); output[0-4] = (unsigned char)f;
+    f = encode[stbir__encode_order1] + 0.5f; STBIR_CLAMP(f, 0, 255); output[1-4] = (unsigned char)f;
+    f = encode[stbir__encode_order2] + 0.5f; STBIR_CLAMP(f, 0, 255); output[2-4] = (unsigned char)f;
+    f = encode[stbir__encode_order3] + 0.5f; STBIR_CLAMP(f, 0, 255); output[3-4] = (unsigned char)f;
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    float f;
+    STBIR_NO_UNROLL(encode);
+    f = encode[stbir__encode_order0] + 0.5f; STBIR_CLAMP(f, 0, 255); output[0] = (unsigned char)f;
+    #if stbir__coder_min_num >= 2
+    f = encode[stbir__encode_order1] + 0.5f; STBIR_CLAMP(f, 0, 255); output[1] = (unsigned char)f;
+    #endif
+    #if stbir__coder_min_num >= 3
+    f = encode[stbir__encode_order2] + 0.5f; STBIR_CLAMP(f, 0, 255); output[2] = (unsigned char)f;
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+}
+
+static float * STBIR__CODER_NAME(stbir__decode_uint8_srgb)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned char const * input = (unsigned char const *)inputp;
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  while( decode <= decode_end )
+  {
+    decode[0-4] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order0 ] ];
+    decode[1-4] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order1 ] ];
+    decode[2-4] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order2 ] ];
+    decode[3-4] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order3 ] ];
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order0 ] ];
+    #if stbir__coder_min_num >= 2
+    decode[1] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order1 ] ];
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = stbir__srgb_uchar_to_linear_float[ input[ stbir__decode_order2 ] ];
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+  return decode_end;
+}
+
+#define stbir__min_max_shift20( i, f ) \
+    stbir__simdf_max( f, f, stbir_simdf_casti(STBIR__CONSTI( STBIR_almost_zero )) ); \
+    stbir__simdf_min( f, f, stbir_simdf_casti(STBIR__CONSTI( STBIR_almost_one  )) ); \
+    stbir__simdi_32shr( i, stbir_simdi_castf( f ), 20 );
+
+#define stbir__scale_and_convert( i, f ) \
+    stbir__simdf_madd( f, STBIR__CONSTF( STBIR_simd_point5 ), STBIR__CONSTF( STBIR_max_uint8_as_float ), f ); \
+    stbir__simdf_max( f, f, stbir__simdf_zeroP() ); \
+    stbir__simdf_min( f, f, STBIR__CONSTF( STBIR_max_uint8_as_float ) ); \
+    stbir__simdf_convert_float_to_i32( i, f );
+
+#define stbir__linear_to_srgb_finish( i, f ) \
+{ \
+    stbir__simdi temp;  \
+    stbir__simdi_32shr( temp, stbir_simdi_castf( f ), 12 ) ; \
+    stbir__simdi_and( temp, temp, STBIR__CONSTI(STBIR_mastissa_mask) ); \
+    stbir__simdi_or( temp, temp, STBIR__CONSTI(STBIR_topscale) ); \
+    stbir__simdi_16madd( i, i, temp ); \
+    stbir__simdi_32shr( i, i, 16 ); \
+}
+
+#define stbir__simdi_table_lookup2( v0,v1, table ) \
+{ \
+  stbir__simdi_u32 temp0,temp1; \
+  temp0.m128i_i128 = v0; \
+  temp1.m128i_i128 = v1; \
+  temp0.m128i_u32[0] = table[temp0.m128i_i32[0]]; temp0.m128i_u32[1] = table[temp0.m128i_i32[1]]; temp0.m128i_u32[2] = table[temp0.m128i_i32[2]]; temp0.m128i_u32[3] = table[temp0.m128i_i32[3]]; \
+  temp1.m128i_u32[0] = table[temp1.m128i_i32[0]]; temp1.m128i_u32[1] = table[temp1.m128i_i32[1]]; temp1.m128i_u32[2] = table[temp1.m128i_i32[2]]; temp1.m128i_u32[3] = table[temp1.m128i_i32[3]]; \
+  v0 = temp0.m128i_i128; \
+  v1 = temp1.m128i_i128; \
+}
+
+#define stbir__simdi_table_lookup3( v0,v1,v2, table ) \
+{ \
+  stbir__simdi_u32 temp0,temp1,temp2; \
+  temp0.m128i_i128 = v0; \
+  temp1.m128i_i128 = v1; \
+  temp2.m128i_i128 = v2; \
+  temp0.m128i_u32[0] = table[temp0.m128i_i32[0]]; temp0.m128i_u32[1] = table[temp0.m128i_i32[1]]; temp0.m128i_u32[2] = table[temp0.m128i_i32[2]]; temp0.m128i_u32[3] = table[temp0.m128i_i32[3]]; \
+  temp1.m128i_u32[0] = table[temp1.m128i_i32[0]]; temp1.m128i_u32[1] = table[temp1.m128i_i32[1]]; temp1.m128i_u32[2] = table[temp1.m128i_i32[2]]; temp1.m128i_u32[3] = table[temp1.m128i_i32[3]]; \
+  temp2.m128i_u32[0] = table[temp2.m128i_i32[0]]; temp2.m128i_u32[1] = table[temp2.m128i_i32[1]]; temp2.m128i_u32[2] = table[temp2.m128i_i32[2]]; temp2.m128i_u32[3] = table[temp2.m128i_i32[3]]; \
+  v0 = temp0.m128i_i128; \
+  v1 = temp1.m128i_i128; \
+  v2 = temp2.m128i_i128; \
+}
+
+#define stbir__simdi_table_lookup4( v0,v1,v2,v3, table ) \
+{ \
+  stbir__simdi_u32 temp0,temp1,temp2,temp3; \
+  temp0.m128i_i128 = v0; \
+  temp1.m128i_i128 = v1; \
+  temp2.m128i_i128 = v2; \
+  temp3.m128i_i128 = v3; \
+  temp0.m128i_u32[0] = table[temp0.m128i_i32[0]]; temp0.m128i_u32[1] = table[temp0.m128i_i32[1]]; temp0.m128i_u32[2] = table[temp0.m128i_i32[2]]; temp0.m128i_u32[3] = table[temp0.m128i_i32[3]]; \
+  temp1.m128i_u32[0] = table[temp1.m128i_i32[0]]; temp1.m128i_u32[1] = table[temp1.m128i_i32[1]]; temp1.m128i_u32[2] = table[temp1.m128i_i32[2]]; temp1.m128i_u32[3] = table[temp1.m128i_i32[3]]; \
+  temp2.m128i_u32[0] = table[temp2.m128i_i32[0]]; temp2.m128i_u32[1] = table[temp2.m128i_i32[1]]; temp2.m128i_u32[2] = table[temp2.m128i_i32[2]]; temp2.m128i_u32[3] = table[temp2.m128i_i32[3]]; \
+  temp3.m128i_u32[0] = table[temp3.m128i_i32[0]]; temp3.m128i_u32[1] = table[temp3.m128i_i32[1]]; temp3.m128i_u32[2] = table[temp3.m128i_i32[2]]; temp3.m128i_u32[3] = table[temp3.m128i_i32[3]]; \
+  v0 = temp0.m128i_i128; \
+  v1 = temp1.m128i_i128; \
+  v2 = temp2.m128i_i128; \
+  v3 = temp3.m128i_i128; \
+}
+
+static void STBIR__CODER_NAME( stbir__encode_uint8_srgb )( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned char STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned char*) outputp;
+  unsigned char * end_output = ( (unsigned char*) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+
+  if ( width_times_channels >= 16 )
+  {
+    float const * end_encode_m16 = encode + width_times_channels - 16;
+    end_output -= 16;
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      stbir__simdf f0, f1, f2, f3;
+      stbir__simdi i0, i1, i2, i3;
+      STBIR_SIMD_NO_UNROLL(encode);
+
+      stbir__simdf_load4_transposed( f0, f1, f2, f3, encode );
+
+      stbir__min_max_shift20( i0, f0 );
+      stbir__min_max_shift20( i1, f1 );
+      stbir__min_max_shift20( i2, f2 );
+      stbir__min_max_shift20( i3, f3 );
+
+      stbir__simdi_table_lookup4( i0, i1, i2, i3, ( fp32_to_srgb8_tab4 - (127-13)*8 ) );
+
+      stbir__linear_to_srgb_finish( i0, f0 );
+      stbir__linear_to_srgb_finish( i1, f1 );
+      stbir__linear_to_srgb_finish( i2, f2 );
+      stbir__linear_to_srgb_finish( i3, f3 );
+
+      stbir__interleave_pack_and_store_16_u8( output,  STBIR_strs_join1(i, ,stbir__encode_order0), STBIR_strs_join1(i, ,stbir__encode_order1), STBIR_strs_join1(i, ,stbir__encode_order2), STBIR_strs_join1(i, ,stbir__encode_order3) );
+
+      encode += 16;
+      output += 16;
+      if ( output <= end_output )
+        continue;
+      if ( output == ( end_output + 16 ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m16;
+    }
+    return;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while ( output <= end_output )
+  {
+    STBIR_SIMD_NO_UNROLL(encode);
+
+    output[0-4] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order0] );
+    output[1-4] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order1] );
+    output[2-4] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order2] );
+    output[3-4] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order3] );
+
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    STBIR_NO_UNROLL(encode);
+    output[0] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order0] );
+    #if stbir__coder_min_num >= 2
+    output[1] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order1] );
+    #endif
+    #if stbir__coder_min_num >= 3
+    output[2] = stbir__linear_to_srgb_uchar( encode[stbir__encode_order2] );
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+}
+
+#if ( stbir__coder_min_num == 4 ) || ( ( stbir__coder_min_num == 1 ) && ( !defined(stbir__decode_swizzle) ) )
+
+static float * STBIR__CODER_NAME(stbir__decode_uint8_srgb4_linearalpha)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned char const * input = (unsigned char const *)inputp;
+
+  do {
+    decode[0] = stbir__srgb_uchar_to_linear_float[ input[stbir__decode_order0] ];
+    decode[1] = stbir__srgb_uchar_to_linear_float[ input[stbir__decode_order1] ];
+    decode[2] = stbir__srgb_uchar_to_linear_float[ input[stbir__decode_order2] ];
+    decode[3] = ( (float) input[stbir__decode_order3] ) * stbir__max_uint8_as_float_inverted;
+    input += 4;
+    decode += 4;
+  } while( decode < decode_end );
+  return decode_end;
+}
+
+
+static void STBIR__CODER_NAME( stbir__encode_uint8_srgb4_linearalpha )( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned char STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned char*) outputp;
+  unsigned char * end_output = ( (unsigned char*) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+
+  if ( width_times_channels >= 16 )
+  {
+    float const * end_encode_m16 = encode + width_times_channels - 16;
+    end_output -= 16;
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      stbir__simdf f0, f1, f2, f3;
+      stbir__simdi i0, i1, i2, i3;
+
+      STBIR_SIMD_NO_UNROLL(encode);
+      stbir__simdf_load4_transposed( f0, f1, f2, f3, encode );
+
+      stbir__min_max_shift20( i0, f0 );
+      stbir__min_max_shift20( i1, f1 );
+      stbir__min_max_shift20( i2, f2 );
+      stbir__scale_and_convert( i3, f3 );
+
+      stbir__simdi_table_lookup3( i0, i1, i2, ( fp32_to_srgb8_tab4 - (127-13)*8 ) );
+
+      stbir__linear_to_srgb_finish( i0, f0 );
+      stbir__linear_to_srgb_finish( i1, f1 );
+      stbir__linear_to_srgb_finish( i2, f2 );
+
+      stbir__interleave_pack_and_store_16_u8( output,  STBIR_strs_join1(i, ,stbir__encode_order0), STBIR_strs_join1(i, ,stbir__encode_order1), STBIR_strs_join1(i, ,stbir__encode_order2), STBIR_strs_join1(i, ,stbir__encode_order3) );
+
+      output += 16;
+      encode += 16;
+
+      if ( output <= end_output )
+        continue;
+      if ( output == ( end_output + 16 ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m16;
+    }
+    return;
+  }
+  #endif
+
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float f;
+    STBIR_SIMD_NO_UNROLL(encode);
+
+    output[stbir__decode_order0] = stbir__linear_to_srgb_uchar( encode[0] );
+    output[stbir__decode_order1] = stbir__linear_to_srgb_uchar( encode[1] );
+    output[stbir__decode_order2] = stbir__linear_to_srgb_uchar( encode[2] );
+
+    f = encode[3] * stbir__max_uint8_as_float + 0.5f;
+    STBIR_CLAMP(f, 0, 255);
+    output[stbir__decode_order3] = (unsigned char) f;
+
+    output += 4;
+    encode += 4;
+  } while( output < end_output );
+}
+
+#endif
+
+#if ( stbir__coder_min_num == 2 ) || ( ( stbir__coder_min_num == 1 ) && ( !defined(stbir__decode_swizzle) ) )
+
+static float * STBIR__CODER_NAME(stbir__decode_uint8_srgb2_linearalpha)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned char const * input = (unsigned char const *)inputp;
+
+  decode += 4;
+  while( decode <= decode_end )
+  {
+    decode[0-4] = stbir__srgb_uchar_to_linear_float[ input[stbir__decode_order0] ];
+    decode[1-4] = ( (float) input[stbir__decode_order1] ) * stbir__max_uint8_as_float_inverted;
+    decode[2-4] = stbir__srgb_uchar_to_linear_float[ input[stbir__decode_order0+2] ];
+    decode[3-4] = ( (float) input[stbir__decode_order1+2] ) * stbir__max_uint8_as_float_inverted;
+    input += 4;
+    decode += 4;
+  }
+  decode -= 4;
+  if( decode < decode_end )
+  {
+    decode[0] = stbir__srgb_uchar_to_linear_float[ stbir__decode_order0 ];
+    decode[1] = ( (float) input[stbir__decode_order1] ) * stbir__max_uint8_as_float_inverted;
+  }
+  return decode_end;
+}
+
+static void STBIR__CODER_NAME( stbir__encode_uint8_srgb2_linearalpha )( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned char STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned char*) outputp;
+  unsigned char * end_output = ( (unsigned char*) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+
+  if ( width_times_channels >= 16 )
+  {
+    float const * end_encode_m16 = encode + width_times_channels - 16;
+    end_output -= 16;
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      stbir__simdf f0, f1, f2, f3;
+      stbir__simdi i0, i1, i2, i3;
+
+      STBIR_SIMD_NO_UNROLL(encode);
+      stbir__simdf_load4_transposed( f0, f1, f2, f3, encode );
+
+      stbir__min_max_shift20( i0, f0 );
+      stbir__scale_and_convert( i1, f1 );
+      stbir__min_max_shift20( i2, f2 );
+      stbir__scale_and_convert( i3, f3 );
+
+      stbir__simdi_table_lookup2( i0, i2, ( fp32_to_srgb8_tab4 - (127-13)*8 ) );
+
+      stbir__linear_to_srgb_finish( i0, f0 );
+      stbir__linear_to_srgb_finish( i2, f2 );
+
+      stbir__interleave_pack_and_store_16_u8( output,  STBIR_strs_join1(i, ,stbir__encode_order0), STBIR_strs_join1(i, ,stbir__encode_order1), STBIR_strs_join1(i, ,stbir__encode_order2), STBIR_strs_join1(i, ,stbir__encode_order3) );
+
+      output += 16;
+      encode += 16;
+      if ( output <= end_output )
+        continue;
+      if ( output == ( end_output + 16 ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m16;
+    }
+    return;
+  }
+  #endif
+
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float f;
+    STBIR_SIMD_NO_UNROLL(encode);
+
+    output[stbir__decode_order0] = stbir__linear_to_srgb_uchar( encode[0] );
+
+    f = encode[1] * stbir__max_uint8_as_float + 0.5f;
+    STBIR_CLAMP(f, 0, 255);
+    output[stbir__decode_order1] = (unsigned char) f;
+
+    output += 2;
+    encode += 2;
+  } while( output < end_output );
+}
+
+#endif
+
+static float * STBIR__CODER_NAME(stbir__decode_uint16_linear_scaled)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned short const * input = (unsigned short const *)inputp;
+
+  #ifdef STBIR_SIMD
+  unsigned short const * end_input_m8 = input + width_times_channels - 8;
+  if ( width_times_channels >= 8 )
+  {
+    decode_end -= 8;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      #ifdef STBIR_SIMD8
+      stbir__simdi i; stbir__simdi8 o;
+      stbir__simdf8 of;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi8_expand_u16_to_u32( o, i );
+      stbir__simdi8_convert_i32_to_float( of, o );
+      stbir__simdf8_mult( of, of, STBIR_max_uint16_as_float_inverted8);
+      stbir__decode_simdf8_flip( of );
+      stbir__simdf8_store( decode + 0, of );
+      #else
+      stbir__simdi i, o0, o1;
+      stbir__simdf of0, of1;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi_expand_u16_to_u32( o0,o1,i );
+      stbir__simdi_convert_i32_to_float( of0, o0 );
+      stbir__simdi_convert_i32_to_float( of1, o1 );
+      stbir__simdf_mult( of0, of0, STBIR__CONSTF(STBIR_max_uint16_as_float_inverted) );
+      stbir__simdf_mult( of1, of1, STBIR__CONSTF(STBIR_max_uint16_as_float_inverted));
+      stbir__decode_simdf4_flip( of0 );
+      stbir__decode_simdf4_flip( of1 );
+      stbir__simdf_store( decode + 0,  of0 );
+      stbir__simdf_store( decode + 4,  of1 );
+      #endif
+      decode += 8;
+      input += 8;
+      if ( decode <= decode_end )
+        continue;
+      if ( decode == ( decode_end + 8 ) )
+        break;
+      decode = decode_end; // backup and do last couple
+      input = end_input_m8;
+    }
+    return decode_end + 8;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode <= decode_end )
+  {
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0-4] = ((float)(input[stbir__decode_order0])) * stbir__max_uint16_as_float_inverted;
+    decode[1-4] = ((float)(input[stbir__decode_order1])) * stbir__max_uint16_as_float_inverted;
+    decode[2-4] = ((float)(input[stbir__decode_order2])) * stbir__max_uint16_as_float_inverted;
+    decode[3-4] = ((float)(input[stbir__decode_order3])) * stbir__max_uint16_as_float_inverted;
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = ((float)(input[stbir__decode_order0])) * stbir__max_uint16_as_float_inverted;
+    #if stbir__coder_min_num >= 2
+    decode[1] = ((float)(input[stbir__decode_order1])) * stbir__max_uint16_as_float_inverted;
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = ((float)(input[stbir__decode_order2])) * stbir__max_uint16_as_float_inverted;
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+  return decode_end;
+}
+
+
+static void STBIR__CODER_NAME(stbir__encode_uint16_linear_scaled)( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned short STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned short*) outputp;
+  unsigned short * end_output = ( (unsigned short*) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  {
+    if ( width_times_channels >= stbir__simdfX_float_count*2 )
+    {
+      float const * end_encode_m8 = encode + width_times_channels - stbir__simdfX_float_count*2;
+      end_output -= stbir__simdfX_float_count*2;
+      STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+      for(;;)
+      {
+        stbir__simdfX e0, e1;
+        stbir__simdiX i;
+        STBIR_SIMD_NO_UNROLL(encode);
+        stbir__simdfX_madd_mem( e0, STBIR_simd_point5X, STBIR_max_uint16_as_floatX, encode );
+        stbir__simdfX_madd_mem( e1, STBIR_simd_point5X, STBIR_max_uint16_as_floatX, encode+stbir__simdfX_float_count );
+        stbir__encode_simdfX_unflip( e0 );
+        stbir__encode_simdfX_unflip( e1 );
+        stbir__simdfX_pack_to_words( i, e0, e1 );
+        stbir__simdiX_store( output, i );
+        encode += stbir__simdfX_float_count*2;
+        output += stbir__simdfX_float_count*2;
+        if ( output <= end_output )
+          continue;
+        if ( output == ( end_output + stbir__simdfX_float_count*2 ) )
+          break;
+        output = end_output;     // backup and do last couple
+        encode = end_encode_m8;
+      }
+      return;
+    }
+  }
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    stbir__simdf e;
+    stbir__simdi i;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_load( e, encode );
+    stbir__simdf_madd( e, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint16_as_float), e );
+    stbir__encode_simdf4_unflip( e );
+    stbir__simdf_pack_to_8words( i, e, e );  // only use first 4
+    stbir__simdi_store2( output-4, i );
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    stbir__simdf e;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_madd1_mem( e, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint16_as_float), encode+stbir__encode_order0 ); output[0] = stbir__simdf_convert_float_to_short( e );
+    #if stbir__coder_min_num >= 2
+    stbir__simdf_madd1_mem( e, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint16_as_float), encode+stbir__encode_order1 ); output[1] = stbir__simdf_convert_float_to_short( e );
+    #endif
+    #if stbir__coder_min_num >= 3
+    stbir__simdf_madd1_mem( e, STBIR__CONSTF(STBIR_simd_point5), STBIR__CONSTF(STBIR_max_uint16_as_float), encode+stbir__encode_order2 ); output[2] = stbir__simdf_convert_float_to_short( e );
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+
+  #else
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    float f;
+    STBIR_SIMD_NO_UNROLL(encode);
+    f = encode[stbir__encode_order0] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[0-4] = (unsigned short)f;
+    f = encode[stbir__encode_order1] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[1-4] = (unsigned short)f;
+    f = encode[stbir__encode_order2] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[2-4] = (unsigned short)f;
+    f = encode[stbir__encode_order3] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[3-4] = (unsigned short)f;
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    float f;
+    STBIR_NO_UNROLL(encode);
+    f = encode[stbir__encode_order0] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[0] = (unsigned short)f;
+    #if stbir__coder_min_num >= 2
+    f = encode[stbir__encode_order1] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[1] = (unsigned short)f;
+    #endif
+    #if stbir__coder_min_num >= 3
+    f = encode[stbir__encode_order2] * stbir__max_uint16_as_float + 0.5f; STBIR_CLAMP(f, 0, 65535); output[2] = (unsigned short)f;
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+  #endif
+}
+
+static float * STBIR__CODER_NAME(stbir__decode_uint16_linear)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  unsigned short const * input = (unsigned short const *)inputp;
+
+  #ifdef STBIR_SIMD
+  unsigned short const * end_input_m8 = input + width_times_channels - 8;
+  if ( width_times_channels >= 8 )
+  {
+    decode_end -= 8;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      #ifdef STBIR_SIMD8
+      stbir__simdi i; stbir__simdi8 o;
+      stbir__simdf8 of;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi8_expand_u16_to_u32( o, i );
+      stbir__simdi8_convert_i32_to_float( of, o );
+      stbir__decode_simdf8_flip( of );
+      stbir__simdf8_store( decode + 0, of );
+      #else
+      stbir__simdi i, o0, o1;
+      stbir__simdf of0, of1;
+      STBIR_NO_UNROLL(decode);
+      stbir__simdi_load( i, input );
+      stbir__simdi_expand_u16_to_u32( o0, o1, i );
+      stbir__simdi_convert_i32_to_float( of0, o0 );
+      stbir__simdi_convert_i32_to_float( of1, o1 );
+      stbir__decode_simdf4_flip( of0 );
+      stbir__decode_simdf4_flip( of1 );
+      stbir__simdf_store( decode + 0,  of0 );
+      stbir__simdf_store( decode + 4,  of1 );
+      #endif
+      decode += 8;
+      input += 8;
+      if ( decode <= decode_end )
+        continue;
+      if ( decode == ( decode_end + 8 ) )
+        break;
+      decode = decode_end; // backup and do last couple
+      input = end_input_m8;
+    }
+    return decode_end + 8;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode <= decode_end )
+  {
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0-4] = ((float)(input[stbir__decode_order0]));
+    decode[1-4] = ((float)(input[stbir__decode_order1]));
+    decode[2-4] = ((float)(input[stbir__decode_order2]));
+    decode[3-4] = ((float)(input[stbir__decode_order3]));
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = ((float)(input[stbir__decode_order0]));
+    #if stbir__coder_min_num >= 2
+    decode[1] = ((float)(input[stbir__decode_order1]));
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = ((float)(input[stbir__decode_order2]));
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+  return decode_end;
+}
+
+static void STBIR__CODER_NAME(stbir__encode_uint16_linear)( void * outputp, int width_times_channels, float const * encode )
+{
+  unsigned short STBIR_SIMD_STREAMOUT_PTR( * ) output = (unsigned short*) outputp;
+  unsigned short * end_output = ( (unsigned short*) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  {
+    if ( width_times_channels >= stbir__simdfX_float_count*2 )
+    {
+      float const * end_encode_m8 = encode + width_times_channels - stbir__simdfX_float_count*2;
+      end_output -= stbir__simdfX_float_count*2;
+      STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+      for(;;)
+      {
+        stbir__simdfX e0, e1;
+        stbir__simdiX i;
+        STBIR_SIMD_NO_UNROLL(encode);
+        stbir__simdfX_add_mem( e0, STBIR_simd_point5X, encode );
+        stbir__simdfX_add_mem( e1, STBIR_simd_point5X, encode+stbir__simdfX_float_count );
+        stbir__encode_simdfX_unflip( e0 );
+        stbir__encode_simdfX_unflip( e1 );
+        stbir__simdfX_pack_to_words( i, e0, e1 );
+        stbir__simdiX_store( output, i );
+        encode += stbir__simdfX_float_count*2;
+        output += stbir__simdfX_float_count*2;
+        if ( output <= end_output )
+          continue;
+        if ( output == ( end_output + stbir__simdfX_float_count*2 ) )
+          break;
+        output = end_output; // backup and do last couple
+        encode = end_encode_m8;
+      }
+      return;
+    }
+  }
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    stbir__simdf e;
+    stbir__simdi i;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_load( e, encode );
+    stbir__simdf_add( e, STBIR__CONSTF(STBIR_simd_point5), e );
+    stbir__encode_simdf4_unflip( e );
+    stbir__simdf_pack_to_8words( i, e, e );  // only use first 4
+    stbir__simdi_store2( output-4, i );
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  #else
+
+  // try to do blocks of 4 when you can
+  #if  stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    float f;
+    STBIR_SIMD_NO_UNROLL(encode);
+    f = encode[stbir__encode_order0] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[0-4] = (unsigned short)f;
+    f = encode[stbir__encode_order1] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[1-4] = (unsigned short)f;
+    f = encode[stbir__encode_order2] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[2-4] = (unsigned short)f;
+    f = encode[stbir__encode_order3] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[3-4] = (unsigned short)f;
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    float f;
+    STBIR_NO_UNROLL(encode);
+    f = encode[stbir__encode_order0] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[0] = (unsigned short)f;
+    #if stbir__coder_min_num >= 2
+    f = encode[stbir__encode_order1] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[1] = (unsigned short)f;
+    #endif
+    #if stbir__coder_min_num >= 3
+    f = encode[stbir__encode_order2] + 0.5f; STBIR_CLAMP(f, 0, 65535); output[2] = (unsigned short)f;
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+}
+
+static float * STBIR__CODER_NAME(stbir__decode_half_float_linear)( float * decodep, int width_times_channels, void const * inputp )
+{
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  stbir__FP16 const * input = (stbir__FP16 const *)inputp;
+
+  #ifdef STBIR_SIMD
+  if ( width_times_channels >= 8 )
+  {
+    stbir__FP16 const * end_input_m8 = input + width_times_channels - 8;
+    decode_end -= 8;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      STBIR_NO_UNROLL(decode);
+
+      stbir__half_to_float_SIMD( decode, input );
+      #ifdef stbir__decode_swizzle
+      #ifdef STBIR_SIMD8
+      {
+        stbir__simdf8 of;
+        stbir__simdf8_load( of, decode );
+        stbir__decode_simdf8_flip( of );
+        stbir__simdf8_store( decode, of );
+      }
+      #else
+      {
+        stbir__simdf of0,of1;
+        stbir__simdf_load( of0, decode );
+        stbir__simdf_load( of1, decode+4 );
+        stbir__decode_simdf4_flip( of0 );
+        stbir__decode_simdf4_flip( of1 );
+        stbir__simdf_store( decode, of0 );
+        stbir__simdf_store( decode+4, of1 );
+      }
+      #endif
+      #endif
+      decode += 8;
+      input += 8;
+      if ( decode <= decode_end )
+        continue;
+      if ( decode == ( decode_end + 8 ) )
+        break;
+      decode = decode_end; // backup and do last couple
+      input = end_input_m8;
+    }
+    return decode_end + 8;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode <= decode_end )
+  {
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0-4] = stbir__half_to_float(input[stbir__decode_order0]);
+    decode[1-4] = stbir__half_to_float(input[stbir__decode_order1]);
+    decode[2-4] = stbir__half_to_float(input[stbir__decode_order2]);
+    decode[3-4] = stbir__half_to_float(input[stbir__decode_order3]);
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = stbir__half_to_float(input[stbir__decode_order0]);
+    #if stbir__coder_min_num >= 2
+    decode[1] = stbir__half_to_float(input[stbir__decode_order1]);
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = stbir__half_to_float(input[stbir__decode_order2]);
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+  return decode_end;
+}
+
+static void STBIR__CODER_NAME( stbir__encode_half_float_linear )( void * outputp, int width_times_channels, float const * encode )
+{
+  stbir__FP16 STBIR_SIMD_STREAMOUT_PTR( * ) output = (stbir__FP16*) outputp;
+  stbir__FP16 * end_output = ( (stbir__FP16*) output ) + width_times_channels;
+
+  #ifdef STBIR_SIMD
+  if ( width_times_channels >= 8 )
+  {
+    float const * end_encode_m8 = encode + width_times_channels - 8;
+    end_output -= 8;
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      STBIR_SIMD_NO_UNROLL(encode);
+      #ifdef stbir__decode_swizzle
+      #ifdef STBIR_SIMD8
+      {
+        stbir__simdf8 of;
+        stbir__simdf8_load( of, encode );
+        stbir__encode_simdf8_unflip( of );
+        stbir__float_to_half_SIMD( output, (float*)&of );
+      }
+      #else
+      {
+        stbir__simdf of[2];
+        stbir__simdf_load( of[0], encode );
+        stbir__simdf_load( of[1], encode+4 );
+        stbir__encode_simdf4_unflip( of[0] );
+        stbir__encode_simdf4_unflip( of[1] );
+        stbir__float_to_half_SIMD( output, (float*)of );
+      }
+      #endif
+      #else
+      stbir__float_to_half_SIMD( output, encode );
+      #endif
+      encode += 8;
+      output += 8;
+      if ( output <= end_output )
+        continue;
+      if ( output == ( end_output + 8 ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m8;
+    }
+    return;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    STBIR_SIMD_NO_UNROLL(output);
+    output[0-4] = stbir__float_to_half(encode[stbir__encode_order0]);
+    output[1-4] = stbir__float_to_half(encode[stbir__encode_order1]);
+    output[2-4] = stbir__float_to_half(encode[stbir__encode_order2]);
+    output[3-4] = stbir__float_to_half(encode[stbir__encode_order3]);
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    STBIR_NO_UNROLL(output);
+    output[0] = stbir__float_to_half(encode[stbir__encode_order0]);
+    #if stbir__coder_min_num >= 2
+    output[1] = stbir__float_to_half(encode[stbir__encode_order1]);
+    #endif
+    #if stbir__coder_min_num >= 3
+    output[2] = stbir__float_to_half(encode[stbir__encode_order2]);
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+}
+
+static float * STBIR__CODER_NAME(stbir__decode_float_linear)( float * decodep, int width_times_channels, void const * inputp )
+{
+  #ifdef stbir__decode_swizzle
+  float STBIR_STREAMOUT_PTR( * ) decode = decodep;
+  float * decode_end = (float*) decode + width_times_channels;
+  float const * input = (float const *)inputp;
+
+  #ifdef STBIR_SIMD
+  if ( width_times_channels >= 16 )
+  {
+    float const * end_input_m16 = input + width_times_channels - 16;
+    decode_end -= 16;
+    STBIR_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      STBIR_NO_UNROLL(decode);
+      #ifdef stbir__decode_swizzle
+      #ifdef STBIR_SIMD8
+      {
+        stbir__simdf8 of0,of1;
+        stbir__simdf8_load( of0, input );
+        stbir__simdf8_load( of1, input+8 );
+        stbir__decode_simdf8_flip( of0 );
+        stbir__decode_simdf8_flip( of1 );
+        stbir__simdf8_store( decode, of0 );
+        stbir__simdf8_store( decode+8, of1 );
+      }
+      #else
+      {
+        stbir__simdf of0,of1,of2,of3;
+        stbir__simdf_load( of0, input );
+        stbir__simdf_load( of1, input+4 );
+        stbir__simdf_load( of2, input+8 );
+        stbir__simdf_load( of3, input+12 );
+        stbir__decode_simdf4_flip( of0 );
+        stbir__decode_simdf4_flip( of1 );
+        stbir__decode_simdf4_flip( of2 );
+        stbir__decode_simdf4_flip( of3 );
+        stbir__simdf_store( decode, of0 );
+        stbir__simdf_store( decode+4, of1 );
+        stbir__simdf_store( decode+8, of2 );
+        stbir__simdf_store( decode+12, of3 );
+      }
+      #endif
+      #endif
+      decode += 16;
+      input += 16;
+      if ( decode <= decode_end )
+        continue;
+      if ( decode == ( decode_end + 16 ) )
+        break;
+      decode = decode_end; // backup and do last couple
+      input = end_input_m16;
+    }
+    return decode_end + 16;
+  }
+  #endif
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  decode += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( decode <= decode_end )
+  {
+    STBIR_SIMD_NO_UNROLL(decode);
+    decode[0-4] = input[stbir__decode_order0];
+    decode[1-4] = input[stbir__decode_order1];
+    decode[2-4] = input[stbir__decode_order2];
+    decode[3-4] = input[stbir__decode_order3];
+    decode += 4;
+    input += 4;
+  }
+  decode -= 4;
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( decode < decode_end )
+  {
+    STBIR_NO_UNROLL(decode);
+    decode[0] = input[stbir__decode_order0];
+    #if stbir__coder_min_num >= 2
+    decode[1] = input[stbir__decode_order1];
+    #endif
+    #if stbir__coder_min_num >= 3
+    decode[2] = input[stbir__decode_order2];
+    #endif
+    decode += stbir__coder_min_num;
+    input += stbir__coder_min_num;
+  }
+  #endif
+  return decode_end;
+
+  #else
+
+  if ( (void*)decodep != inputp )
+    STBIR_MEMCPY( decodep, inputp, width_times_channels * sizeof( float ) );
+
+  return decodep + width_times_channels;
+
+  #endif
+}
+
+static void STBIR__CODER_NAME( stbir__encode_float_linear )( void * outputp, int width_times_channels, float const * encode )
+{
+  #if !defined( STBIR_FLOAT_HIGH_CLAMP ) && !defined(STBIR_FLOAT_LO_CLAMP) && !defined(stbir__decode_swizzle)
+
+  if ( (void*)outputp != (void*) encode )
+    STBIR_MEMCPY( outputp, encode, width_times_channels * sizeof( float ) );
+
+  #else
+
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = (float*) outputp;
+  float * end_output = ( (float*) output ) + width_times_channels;
+
+  #ifdef STBIR_FLOAT_HIGH_CLAMP
+  #define stbir_scalar_hi_clamp( v ) if ( v > STBIR_FLOAT_HIGH_CLAMP ) v = STBIR_FLOAT_HIGH_CLAMP;
+  #else
+  #define stbir_scalar_hi_clamp( v )
+  #endif
+  #ifdef STBIR_FLOAT_LOW_CLAMP
+  #define stbir_scalar_lo_clamp( v ) if ( v < STBIR_FLOAT_LOW_CLAMP ) v = STBIR_FLOAT_LOW_CLAMP;
+  #else
+  #define stbir_scalar_lo_clamp( v )
+  #endif
+
+  #ifdef STBIR_SIMD
+
+  #ifdef STBIR_FLOAT_HIGH_CLAMP
+  const stbir__simdfX high_clamp = stbir__simdf_frepX(STBIR_FLOAT_HIGH_CLAMP);
+  #endif
+  #ifdef STBIR_FLOAT_LOW_CLAMP
+  const stbir__simdfX low_clamp = stbir__simdf_frepX(STBIR_FLOAT_LOW_CLAMP);
+  #endif
+
+  if ( width_times_channels >= ( stbir__simdfX_float_count * 2 ) )
+  {
+    float const * end_encode_m8 = encode + width_times_channels - ( stbir__simdfX_float_count * 2 );
+    end_output -= ( stbir__simdfX_float_count * 2 );
+    STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
+    for(;;)
+    {
+      stbir__simdfX e0, e1;
+      STBIR_SIMD_NO_UNROLL(encode);
+      stbir__simdfX_load( e0, encode );
+      stbir__simdfX_load( e1, encode+stbir__simdfX_float_count );
+#ifdef STBIR_FLOAT_HIGH_CLAMP
+      stbir__simdfX_min( e0, e0, high_clamp );
+      stbir__simdfX_min( e1, e1, high_clamp );
+#endif
+#ifdef STBIR_FLOAT_LOW_CLAMP
+      stbir__simdfX_max( e0, e0, low_clamp );
+      stbir__simdfX_max( e1, e1, low_clamp );
+#endif
+      stbir__encode_simdfX_unflip( e0 );
+      stbir__encode_simdfX_unflip( e1 );
+      stbir__simdfX_store( output, e0 );
+      stbir__simdfX_store( output+stbir__simdfX_float_count, e1 );
+      encode += stbir__simdfX_float_count * 2;
+      output += stbir__simdfX_float_count * 2;
+      if ( output < end_output )
+        continue;
+      if ( output == ( end_output + ( stbir__simdfX_float_count * 2 ) ) )
+        break;
+      output = end_output; // backup and do last couple
+      encode = end_encode_m8;
+    }
+    return;
+  }
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    stbir__simdf e0;
+    STBIR_NO_UNROLL(encode);
+    stbir__simdf_load( e0, encode );
+#ifdef STBIR_FLOAT_HIGH_CLAMP
+    stbir__simdf_min( e0, e0, high_clamp );
+#endif
+#ifdef STBIR_FLOAT_LOW_CLAMP
+    stbir__simdf_max( e0, e0, low_clamp );
+#endif
+    stbir__encode_simdf4_unflip( e0 );
+    stbir__simdf_store( output-4, e0 );
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+  #endif
+
+  #else
+
+  // try to do blocks of 4 when you can
+  #if stbir__coder_min_num != 3 // doesn't divide cleanly by four
+  output += 4;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  while( output <= end_output )
+  {
+    float e;
+    STBIR_SIMD_NO_UNROLL(encode);
+    e = encode[ stbir__encode_order0 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[0-4] = e;
+    e = encode[ stbir__encode_order1 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[1-4] = e;
+    e = encode[ stbir__encode_order2 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[2-4] = e;
+    e = encode[ stbir__encode_order3 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[3-4] = e;
+    output += 4;
+    encode += 4;
+  }
+  output -= 4;
+
+  #endif
+
+  #endif
+
+  // do the remnants
+  #if stbir__coder_min_num < 4
+  STBIR_NO_UNROLL_LOOP_START
+  while( output < end_output )
+  {
+    float e;
+    STBIR_NO_UNROLL(encode);
+    e = encode[ stbir__encode_order0 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[0] = e;
+    #if stbir__coder_min_num >= 2
+    e = encode[ stbir__encode_order1 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[1] = e;
+    #endif
+    #if stbir__coder_min_num >= 3
+    e = encode[ stbir__encode_order2 ]; stbir_scalar_hi_clamp( e ); stbir_scalar_lo_clamp( e ); output[2] = e;
+    #endif
+    output += stbir__coder_min_num;
+    encode += stbir__coder_min_num;
+  }
+  #endif
+
+  #endif
+}
+
+#undef stbir__decode_suffix
+#undef stbir__decode_simdf8_flip
+#undef stbir__decode_simdf4_flip
+#undef stbir__decode_order0
+#undef stbir__decode_order1
+#undef stbir__decode_order2
+#undef stbir__decode_order3
+#undef stbir__encode_order0
+#undef stbir__encode_order1
+#undef stbir__encode_order2
+#undef stbir__encode_order3
+#undef stbir__encode_simdf8_unflip
+#undef stbir__encode_simdf4_unflip
+#undef stbir__encode_simdfX_unflip
+#undef STBIR__CODER_NAME
+#undef stbir__coder_min_num
+#undef stbir__decode_swizzle
+#undef stbir_scalar_hi_clamp
+#undef stbir_scalar_lo_clamp
+#undef STB_IMAGE_RESIZE_DO_CODERS
+
+#elif defined( STB_IMAGE_RESIZE_DO_VERTICALS)
+
+#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#define STBIR_chans( start, end ) STBIR_strs_join14(start,STBIR__vertical_channels,end,_cont)
+#else
+#define STBIR_chans( start, end ) STBIR_strs_join1(start,STBIR__vertical_channels,end)
+#endif
+
+#if STBIR__vertical_channels >= 1
+#define stbIF0( code ) code
+#else
+#define stbIF0( code )
+#endif
+#if STBIR__vertical_channels >= 2
+#define stbIF1( code ) code
+#else
+#define stbIF1( code )
+#endif
+#if STBIR__vertical_channels >= 3
+#define stbIF2( code ) code
+#else
+#define stbIF2( code )
+#endif
+#if STBIR__vertical_channels >= 4
+#define stbIF3( code ) code
+#else
+#define stbIF3( code )
+#endif
+#if STBIR__vertical_channels >= 5
+#define stbIF4( code ) code
+#else
+#define stbIF4( code )
+#endif
+#if STBIR__vertical_channels >= 6
+#define stbIF5( code ) code
+#else
+#define stbIF5( code )
+#endif
+#if STBIR__vertical_channels >= 7
+#define stbIF6( code ) code
+#else
+#define stbIF6( code )
+#endif
+#if STBIR__vertical_channels >= 8
+#define stbIF7( code ) code
+#else
+#define stbIF7( code )
+#endif
+
+static void STBIR_chans( stbir__vertical_scatter_with_,_coeffs)( float ** outputs, float const * vertical_coefficients, float const * input, float const * input_end )
+{
+  stbIF0( float STBIR_SIMD_STREAMOUT_PTR( * ) output0 = outputs[0]; float c0s = vertical_coefficients[0]; )
+  stbIF1( float STBIR_SIMD_STREAMOUT_PTR( * ) output1 = outputs[1]; float c1s = vertical_coefficients[1]; )
+  stbIF2( float STBIR_SIMD_STREAMOUT_PTR( * ) output2 = outputs[2]; float c2s = vertical_coefficients[2]; )
+  stbIF3( float STBIR_SIMD_STREAMOUT_PTR( * ) output3 = outputs[3]; float c3s = vertical_coefficients[3]; )
+  stbIF4( float STBIR_SIMD_STREAMOUT_PTR( * ) output4 = outputs[4]; float c4s = vertical_coefficients[4]; )
+  stbIF5( float STBIR_SIMD_STREAMOUT_PTR( * ) output5 = outputs[5]; float c5s = vertical_coefficients[5]; )
+  stbIF6( float STBIR_SIMD_STREAMOUT_PTR( * ) output6 = outputs[6]; float c6s = vertical_coefficients[6]; )
+  stbIF7( float STBIR_SIMD_STREAMOUT_PTR( * ) output7 = outputs[7]; float c7s = vertical_coefficients[7]; )
+
+  #ifdef STBIR_SIMD
+  {
+    stbIF0(stbir__simdfX c0 = stbir__simdf_frepX( c0s ); )
+    stbIF1(stbir__simdfX c1 = stbir__simdf_frepX( c1s ); )
+    stbIF2(stbir__simdfX c2 = stbir__simdf_frepX( c2s ); )
+    stbIF3(stbir__simdfX c3 = stbir__simdf_frepX( c3s ); )
+    stbIF4(stbir__simdfX c4 = stbir__simdf_frepX( c4s ); )
+    stbIF5(stbir__simdfX c5 = stbir__simdf_frepX( c5s ); )
+    stbIF6(stbir__simdfX c6 = stbir__simdf_frepX( c6s ); )
+    stbIF7(stbir__simdfX c7 = stbir__simdf_frepX( c7s ); )
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    while ( ( (char*)input_end - (char*) input ) >= (16*stbir__simdfX_float_count) )
+    {
+      stbir__simdfX o0, o1, o2, o3, r0, r1, r2, r3;
+      STBIR_SIMD_NO_UNROLL(output0);
+
+      stbir__simdfX_load( r0, input );               stbir__simdfX_load( r1, input+stbir__simdfX_float_count );     stbir__simdfX_load( r2, input+(2*stbir__simdfX_float_count) );      stbir__simdfX_load( r3, input+(3*stbir__simdfX_float_count) );
+
+      #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+      stbIF0( stbir__simdfX_load( o0, output0 );     stbir__simdfX_load( o1, output0+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output0+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output0+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c0 );  stbir__simdfX_madd( o1, o1, r1, c0 );  stbir__simdfX_madd( o2, o2, r2, c0 );   stbir__simdfX_madd( o3, o3, r3, c0 );
+              stbir__simdfX_store( output0, o0 );    stbir__simdfX_store( output0+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output0+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output0+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF1( stbir__simdfX_load( o0, output1 );     stbir__simdfX_load( o1, output1+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output1+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output1+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c1 );  stbir__simdfX_madd( o1, o1, r1, c1 );  stbir__simdfX_madd( o2, o2, r2, c1 );   stbir__simdfX_madd( o3, o3, r3, c1 );
+              stbir__simdfX_store( output1, o0 );    stbir__simdfX_store( output1+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output1+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output1+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF2( stbir__simdfX_load( o0, output2 );     stbir__simdfX_load( o1, output2+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output2+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output2+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c2 );  stbir__simdfX_madd( o1, o1, r1, c2 );  stbir__simdfX_madd( o2, o2, r2, c2 );   stbir__simdfX_madd( o3, o3, r3, c2 );
+              stbir__simdfX_store( output2, o0 );    stbir__simdfX_store( output2+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output2+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output2+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF3( stbir__simdfX_load( o0, output3 );     stbir__simdfX_load( o1, output3+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output3+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output3+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c3 );  stbir__simdfX_madd( o1, o1, r1, c3 );  stbir__simdfX_madd( o2, o2, r2, c3 );   stbir__simdfX_madd( o3, o3, r3, c3 );
+              stbir__simdfX_store( output3, o0 );    stbir__simdfX_store( output3+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output3+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output3+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF4( stbir__simdfX_load( o0, output4 );     stbir__simdfX_load( o1, output4+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output4+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output4+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c4 );  stbir__simdfX_madd( o1, o1, r1, c4 );  stbir__simdfX_madd( o2, o2, r2, c4 );   stbir__simdfX_madd( o3, o3, r3, c4 );
+              stbir__simdfX_store( output4, o0 );    stbir__simdfX_store( output4+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output4+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output4+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF5( stbir__simdfX_load( o0, output5 );     stbir__simdfX_load( o1, output5+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output5+(2*stbir__simdfX_float_count));    stbir__simdfX_load( o3, output5+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c5 );  stbir__simdfX_madd( o1, o1, r1, c5 );  stbir__simdfX_madd( o2, o2, r2, c5 );   stbir__simdfX_madd( o3, o3, r3, c5 );
+              stbir__simdfX_store( output5, o0 );    stbir__simdfX_store( output5+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output5+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output5+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF6( stbir__simdfX_load( o0, output6 );     stbir__simdfX_load( o1, output6+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output6+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output6+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c6 );  stbir__simdfX_madd( o1, o1, r1, c6 );  stbir__simdfX_madd( o2, o2, r2, c6 );   stbir__simdfX_madd( o3, o3, r3, c6 );
+              stbir__simdfX_store( output6, o0 );    stbir__simdfX_store( output6+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output6+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output6+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF7( stbir__simdfX_load( o0, output7 );     stbir__simdfX_load( o1, output7+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output7+(2*stbir__simdfX_float_count) );    stbir__simdfX_load( o3, output7+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c7 );  stbir__simdfX_madd( o1, o1, r1, c7 );  stbir__simdfX_madd( o2, o2, r2, c7 );   stbir__simdfX_madd( o3, o3, r3, c7 );
+              stbir__simdfX_store( output7, o0 );    stbir__simdfX_store( output7+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output7+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output7+(3*stbir__simdfX_float_count), o3 ); )
+      #else
+      stbIF0( stbir__simdfX_mult( o0, r0, c0 );      stbir__simdfX_mult( o1, r1, c0 );      stbir__simdfX_mult( o2, r2, c0 );       stbir__simdfX_mult( o3, r3, c0 );
+              stbir__simdfX_store( output0, o0 );    stbir__simdfX_store( output0+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output0+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output0+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF1( stbir__simdfX_mult( o0, r0, c1 );      stbir__simdfX_mult( o1, r1, c1 );      stbir__simdfX_mult( o2, r2, c1 );       stbir__simdfX_mult( o3, r3, c1 );
+              stbir__simdfX_store( output1, o0 );    stbir__simdfX_store( output1+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output1+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output1+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF2( stbir__simdfX_mult( o0, r0, c2 );      stbir__simdfX_mult( o1, r1, c2 );      stbir__simdfX_mult( o2, r2, c2 );       stbir__simdfX_mult( o3, r3, c2 );
+              stbir__simdfX_store( output2, o0 );    stbir__simdfX_store( output2+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output2+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output2+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF3( stbir__simdfX_mult( o0, r0, c3 );      stbir__simdfX_mult( o1, r1, c3 );      stbir__simdfX_mult( o2, r2, c3 );       stbir__simdfX_mult( o3, r3, c3 );
+              stbir__simdfX_store( output3, o0 );    stbir__simdfX_store( output3+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output3+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output3+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF4( stbir__simdfX_mult( o0, r0, c4 );      stbir__simdfX_mult( o1, r1, c4 );      stbir__simdfX_mult( o2, r2, c4 );       stbir__simdfX_mult( o3, r3, c4 );
+              stbir__simdfX_store( output4, o0 );    stbir__simdfX_store( output4+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output4+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output4+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF5( stbir__simdfX_mult( o0, r0, c5 );      stbir__simdfX_mult( o1, r1, c5 );      stbir__simdfX_mult( o2, r2, c5 );       stbir__simdfX_mult( o3, r3, c5 );
+              stbir__simdfX_store( output5, o0 );    stbir__simdfX_store( output5+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output5+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output5+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF6( stbir__simdfX_mult( o0, r0, c6 );      stbir__simdfX_mult( o1, r1, c6 );      stbir__simdfX_mult( o2, r2, c6 );       stbir__simdfX_mult( o3, r3, c6 );
+              stbir__simdfX_store( output6, o0 );    stbir__simdfX_store( output6+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output6+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output6+(3*stbir__simdfX_float_count), o3 ); )
+      stbIF7( stbir__simdfX_mult( o0, r0, c7 );      stbir__simdfX_mult( o1, r1, c7 );      stbir__simdfX_mult( o2, r2, c7 );       stbir__simdfX_mult( o3, r3, c7 );
+              stbir__simdfX_store( output7, o0 );    stbir__simdfX_store( output7+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output7+(2*stbir__simdfX_float_count), o2 );   stbir__simdfX_store( output7+(3*stbir__simdfX_float_count), o3 ); )
+      #endif
+
+      input += (4*stbir__simdfX_float_count);
+      stbIF0( output0 += (4*stbir__simdfX_float_count); ) stbIF1( output1 += (4*stbir__simdfX_float_count); ) stbIF2( output2 += (4*stbir__simdfX_float_count); ) stbIF3( output3 += (4*stbir__simdfX_float_count); ) stbIF4( output4 += (4*stbir__simdfX_float_count); ) stbIF5( output5 += (4*stbir__simdfX_float_count); ) stbIF6( output6 += (4*stbir__simdfX_float_count); ) stbIF7( output7 += (4*stbir__simdfX_float_count); )
+    }
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    while ( ( (char*)input_end - (char*) input ) >= 16 )
+    {
+      stbir__simdf o0, r0;
+      STBIR_SIMD_NO_UNROLL(output0);
+
+      stbir__simdf_load( r0, input );
+
+      #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+      stbIF0( stbir__simdf_load( o0, output0 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c0 ) );  stbir__simdf_store( output0, o0 ); )
+      stbIF1( stbir__simdf_load( o0, output1 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c1 ) );  stbir__simdf_store( output1, o0 ); )
+      stbIF2( stbir__simdf_load( o0, output2 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c2 ) );  stbir__simdf_store( output2, o0 ); )
+      stbIF3( stbir__simdf_load( o0, output3 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c3 ) );  stbir__simdf_store( output3, o0 ); )
+      stbIF4( stbir__simdf_load( o0, output4 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c4 ) );  stbir__simdf_store( output4, o0 ); )
+      stbIF5( stbir__simdf_load( o0, output5 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c5 ) );  stbir__simdf_store( output5, o0 ); )
+      stbIF6( stbir__simdf_load( o0, output6 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c6 ) );  stbir__simdf_store( output6, o0 ); )
+      stbIF7( stbir__simdf_load( o0, output7 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c7 ) );  stbir__simdf_store( output7, o0 ); )
+      #else
+      stbIF0( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c0 ) );   stbir__simdf_store( output0, o0 ); )
+      stbIF1( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c1 ) );   stbir__simdf_store( output1, o0 ); )
+      stbIF2( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c2 ) );   stbir__simdf_store( output2, o0 ); )
+      stbIF3( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c3 ) );   stbir__simdf_store( output3, o0 ); )
+      stbIF4( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c4 ) );   stbir__simdf_store( output4, o0 ); )
+      stbIF5( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c5 ) );   stbir__simdf_store( output5, o0 ); )
+      stbIF6( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c6 ) );   stbir__simdf_store( output6, o0 ); )
+      stbIF7( stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c7 ) );   stbir__simdf_store( output7, o0 ); )
+      #endif
+
+      input += 4;
+      stbIF0( output0 += 4; ) stbIF1( output1 += 4; ) stbIF2( output2 += 4; ) stbIF3( output3 += 4; ) stbIF4( output4 += 4; ) stbIF5( output5 += 4; ) stbIF6( output6 += 4; ) stbIF7( output7 += 4; )
+    }
+  }
+  #else
+  STBIR_NO_UNROLL_LOOP_START
+  while ( ( (char*)input_end - (char*) input ) >= 16 )
+  {
+    float r0, r1, r2, r3;
+    STBIR_NO_UNROLL(input);
+
+    r0 = input[0], r1 = input[1], r2 = input[2], r3 = input[3];
+
+    #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+    stbIF0( output0[0] += ( r0 * c0s ); output0[1] += ( r1 * c0s ); output0[2] += ( r2 * c0s ); output0[3] += ( r3 * c0s ); )
+    stbIF1( output1[0] += ( r0 * c1s ); output1[1] += ( r1 * c1s ); output1[2] += ( r2 * c1s ); output1[3] += ( r3 * c1s ); )
+    stbIF2( output2[0] += ( r0 * c2s ); output2[1] += ( r1 * c2s ); output2[2] += ( r2 * c2s ); output2[3] += ( r3 * c2s ); )
+    stbIF3( output3[0] += ( r0 * c3s ); output3[1] += ( r1 * c3s ); output3[2] += ( r2 * c3s ); output3[3] += ( r3 * c3s ); )
+    stbIF4( output4[0] += ( r0 * c4s ); output4[1] += ( r1 * c4s ); output4[2] += ( r2 * c4s ); output4[3] += ( r3 * c4s ); )
+    stbIF5( output5[0] += ( r0 * c5s ); output5[1] += ( r1 * c5s ); output5[2] += ( r2 * c5s ); output5[3] += ( r3 * c5s ); )
+    stbIF6( output6[0] += ( r0 * c6s ); output6[1] += ( r1 * c6s ); output6[2] += ( r2 * c6s ); output6[3] += ( r3 * c6s ); )
+    stbIF7( output7[0] += ( r0 * c7s ); output7[1] += ( r1 * c7s ); output7[2] += ( r2 * c7s ); output7[3] += ( r3 * c7s ); )
+    #else
+    stbIF0( output0[0]  = ( r0 * c0s ); output0[1]  = ( r1 * c0s ); output0[2]  = ( r2 * c0s ); output0[3]  = ( r3 * c0s ); )
+    stbIF1( output1[0]  = ( r0 * c1s ); output1[1]  = ( r1 * c1s ); output1[2]  = ( r2 * c1s ); output1[3]  = ( r3 * c1s ); )
+    stbIF2( output2[0]  = ( r0 * c2s ); output2[1]  = ( r1 * c2s ); output2[2]  = ( r2 * c2s ); output2[3]  = ( r3 * c2s ); )
+    stbIF3( output3[0]  = ( r0 * c3s ); output3[1]  = ( r1 * c3s ); output3[2]  = ( r2 * c3s ); output3[3]  = ( r3 * c3s ); )
+    stbIF4( output4[0]  = ( r0 * c4s ); output4[1]  = ( r1 * c4s ); output4[2]  = ( r2 * c4s ); output4[3]  = ( r3 * c4s ); )
+    stbIF5( output5[0]  = ( r0 * c5s ); output5[1]  = ( r1 * c5s ); output5[2]  = ( r2 * c5s ); output5[3]  = ( r3 * c5s ); )
+    stbIF6( output6[0]  = ( r0 * c6s ); output6[1]  = ( r1 * c6s ); output6[2]  = ( r2 * c6s ); output6[3]  = ( r3 * c6s ); )
+    stbIF7( output7[0]  = ( r0 * c7s ); output7[1]  = ( r1 * c7s ); output7[2]  = ( r2 * c7s ); output7[3]  = ( r3 * c7s ); )
+    #endif
+
+    input += 4;
+    stbIF0( output0 += 4; ) stbIF1( output1 += 4; ) stbIF2( output2 += 4; ) stbIF3( output3 += 4; ) stbIF4( output4 += 4; ) stbIF5( output5 += 4; ) stbIF6( output6 += 4; ) stbIF7( output7 += 4; )
+  }
+  #endif
+  STBIR_NO_UNROLL_LOOP_START
+  while ( input < input_end )
+  {
+    float r = input[0];
+    STBIR_NO_UNROLL(output0);
+
+    #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+    stbIF0( output0[0] += ( r * c0s ); )
+    stbIF1( output1[0] += ( r * c1s ); )
+    stbIF2( output2[0] += ( r * c2s ); )
+    stbIF3( output3[0] += ( r * c3s ); )
+    stbIF4( output4[0] += ( r * c4s ); )
+    stbIF5( output5[0] += ( r * c5s ); )
+    stbIF6( output6[0] += ( r * c6s ); )
+    stbIF7( output7[0] += ( r * c7s ); )
+    #else
+    stbIF0( output0[0]  = ( r * c0s ); )
+    stbIF1( output1[0]  = ( r * c1s ); )
+    stbIF2( output2[0]  = ( r * c2s ); )
+    stbIF3( output3[0]  = ( r * c3s ); )
+    stbIF4( output4[0]  = ( r * c4s ); )
+    stbIF5( output5[0]  = ( r * c5s ); )
+    stbIF6( output6[0]  = ( r * c6s ); )
+    stbIF7( output7[0]  = ( r * c7s ); )
+    #endif
+
+    ++input;
+    stbIF0( ++output0; ) stbIF1( ++output1; ) stbIF2( ++output2; ) stbIF3( ++output3; ) stbIF4( ++output4; ) stbIF5( ++output5; ) stbIF6( ++output6; ) stbIF7( ++output7; )
+  }
+}
+
+static void STBIR_chans( stbir__vertical_gather_with_,_coeffs)( float * outputp, float const * vertical_coefficients, float const ** inputs, float const * input0_end )
+{
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = outputp;
+
+  stbIF0( float const * input0 = inputs[0]; float c0s = vertical_coefficients[0]; )
+  stbIF1( float const * input1 = inputs[1]; float c1s = vertical_coefficients[1]; )
+  stbIF2( float const * input2 = inputs[2]; float c2s = vertical_coefficients[2]; )
+  stbIF3( float const * input3 = inputs[3]; float c3s = vertical_coefficients[3]; )
+  stbIF4( float const * input4 = inputs[4]; float c4s = vertical_coefficients[4]; )
+  stbIF5( float const * input5 = inputs[5]; float c5s = vertical_coefficients[5]; )
+  stbIF6( float const * input6 = inputs[6]; float c6s = vertical_coefficients[6]; )
+  stbIF7( float const * input7 = inputs[7]; float c7s = vertical_coefficients[7]; )
+
+#if ( STBIR__vertical_channels == 1 ) && !defined(STB_IMAGE_RESIZE_VERTICAL_CONTINUE)
+  // check single channel one weight
+  if ( ( c0s >= (1.0f-0.000001f) ) && ( c0s <= (1.0f+0.000001f) ) )
+  {
+    STBIR_MEMCPY( output, input0, (char*)input0_end - (char*)input0 );
+    return;
+  }
+#endif
+
+  #ifdef STBIR_SIMD
+  {
+    stbIF0(stbir__simdfX c0 = stbir__simdf_frepX( c0s ); )
+    stbIF1(stbir__simdfX c1 = stbir__simdf_frepX( c1s ); )
+    stbIF2(stbir__simdfX c2 = stbir__simdf_frepX( c2s ); )
+    stbIF3(stbir__simdfX c3 = stbir__simdf_frepX( c3s ); )
+    stbIF4(stbir__simdfX c4 = stbir__simdf_frepX( c4s ); )
+    stbIF5(stbir__simdfX c5 = stbir__simdf_frepX( c5s ); )
+    stbIF6(stbir__simdfX c6 = stbir__simdf_frepX( c6s ); )
+    stbIF7(stbir__simdfX c7 = stbir__simdf_frepX( c7s ); )
+
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    while ( ( (char*)input0_end - (char*) input0 ) >= (16*stbir__simdfX_float_count) )
+    {
+      stbir__simdfX o0, o1, o2, o3, r0, r1, r2, r3;
+      STBIR_SIMD_NO_UNROLL(output);
+
+      // prefetch four loop iterations ahead (doesn't affect much for small resizes, but helps with big ones)
+      stbIF0( stbir__prefetch( input0 + (16*stbir__simdfX_float_count) ); )
+      stbIF1( stbir__prefetch( input1 + (16*stbir__simdfX_float_count) ); )
+      stbIF2( stbir__prefetch( input2 + (16*stbir__simdfX_float_count) ); )
+      stbIF3( stbir__prefetch( input3 + (16*stbir__simdfX_float_count) ); )
+      stbIF4( stbir__prefetch( input4 + (16*stbir__simdfX_float_count) ); )
+      stbIF5( stbir__prefetch( input5 + (16*stbir__simdfX_float_count) ); )
+      stbIF6( stbir__prefetch( input6 + (16*stbir__simdfX_float_count) ); )
+      stbIF7( stbir__prefetch( input7 + (16*stbir__simdfX_float_count) ); )
+
+      #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+      stbIF0( stbir__simdfX_load( o0, output );      stbir__simdfX_load( o1, output+stbir__simdfX_float_count );   stbir__simdfX_load( o2, output+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( o3, output+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_load( r0, input0 );      stbir__simdfX_load( r1, input0+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input0+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input0+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c0 );  stbir__simdfX_madd( o1, o1, r1, c0 );                         stbir__simdfX_madd( o2, o2, r2, c0 );                             stbir__simdfX_madd( o3, o3, r3, c0 ); )
+      #else
+      stbIF0( stbir__simdfX_load( r0, input0 );      stbir__simdfX_load( r1, input0+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input0+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input0+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_mult( o0, r0, c0 );      stbir__simdfX_mult( o1, r1, c0 );                             stbir__simdfX_mult( o2, r2, c0 );                                 stbir__simdfX_mult( o3, r3, c0 );  )
+      #endif
+
+      stbIF1( stbir__simdfX_load( r0, input1 );      stbir__simdfX_load( r1, input1+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input1+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input1+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c1 );  stbir__simdfX_madd( o1, o1, r1, c1 );                         stbir__simdfX_madd( o2, o2, r2, c1 );                             stbir__simdfX_madd( o3, o3, r3, c1 ); )
+      stbIF2( stbir__simdfX_load( r0, input2 );      stbir__simdfX_load( r1, input2+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input2+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input2+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c2 );  stbir__simdfX_madd( o1, o1, r1, c2 );                         stbir__simdfX_madd( o2, o2, r2, c2 );                             stbir__simdfX_madd( o3, o3, r3, c2 ); )
+      stbIF3( stbir__simdfX_load( r0, input3 );      stbir__simdfX_load( r1, input3+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input3+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input3+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c3 );  stbir__simdfX_madd( o1, o1, r1, c3 );                         stbir__simdfX_madd( o2, o2, r2, c3 );                             stbir__simdfX_madd( o3, o3, r3, c3 ); )
+      stbIF4( stbir__simdfX_load( r0, input4 );      stbir__simdfX_load( r1, input4+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input4+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input4+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c4 );  stbir__simdfX_madd( o1, o1, r1, c4 );                         stbir__simdfX_madd( o2, o2, r2, c4 );                             stbir__simdfX_madd( o3, o3, r3, c4 ); )
+      stbIF5( stbir__simdfX_load( r0, input5 );      stbir__simdfX_load( r1, input5+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input5+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input5+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c5 );  stbir__simdfX_madd( o1, o1, r1, c5 );                         stbir__simdfX_madd( o2, o2, r2, c5 );                             stbir__simdfX_madd( o3, o3, r3, c5 ); )
+      stbIF6( stbir__simdfX_load( r0, input6 );      stbir__simdfX_load( r1, input6+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input6+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input6+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c6 );  stbir__simdfX_madd( o1, o1, r1, c6 );                         stbir__simdfX_madd( o2, o2, r2, c6 );                             stbir__simdfX_madd( o3, o3, r3, c6 ); )
+      stbIF7( stbir__simdfX_load( r0, input7 );      stbir__simdfX_load( r1, input7+stbir__simdfX_float_count );   stbir__simdfX_load( r2, input7+(2*stbir__simdfX_float_count) );   stbir__simdfX_load( r3, input7+(3*stbir__simdfX_float_count) );
+              stbir__simdfX_madd( o0, o0, r0, c7 );  stbir__simdfX_madd( o1, o1, r1, c7 );                         stbir__simdfX_madd( o2, o2, r2, c7 );                             stbir__simdfX_madd( o3, o3, r3, c7 ); )
+
+      stbir__simdfX_store( output, o0 );             stbir__simdfX_store( output+stbir__simdfX_float_count, o1 );  stbir__simdfX_store( output+(2*stbir__simdfX_float_count), o2 );  stbir__simdfX_store( output+(3*stbir__simdfX_float_count), o3 );
+      output += (4*stbir__simdfX_float_count);
+      stbIF0( input0 += (4*stbir__simdfX_float_count); ) stbIF1( input1 += (4*stbir__simdfX_float_count); ) stbIF2( input2 += (4*stbir__simdfX_float_count); ) stbIF3( input3 += (4*stbir__simdfX_float_count); ) stbIF4( input4 += (4*stbir__simdfX_float_count); ) stbIF5( input5 += (4*stbir__simdfX_float_count); ) stbIF6( input6 += (4*stbir__simdfX_float_count); ) stbIF7( input7 += (4*stbir__simdfX_float_count); )
+    }
+
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    while ( ( (char*)input0_end - (char*) input0 ) >= 16 )
+    {
+      stbir__simdf o0, r0;
+      STBIR_SIMD_NO_UNROLL(output);
+
+      #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+      stbIF0( stbir__simdf_load( o0, output );   stbir__simdf_load( r0, input0 ); stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c0 ) ); )
+      #else
+      stbIF0( stbir__simdf_load( r0, input0 );  stbir__simdf_mult( o0, r0, stbir__if_simdf8_cast_to_simdf4( c0 ) ); )
+      #endif
+      stbIF1( stbir__simdf_load( r0, input1 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c1 ) ); )
+      stbIF2( stbir__simdf_load( r0, input2 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c2 ) ); )
+      stbIF3( stbir__simdf_load( r0, input3 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c3 ) ); )
+      stbIF4( stbir__simdf_load( r0, input4 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c4 ) ); )
+      stbIF5( stbir__simdf_load( r0, input5 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c5 ) ); )
+      stbIF6( stbir__simdf_load( r0, input6 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c6 ) ); )
+      stbIF7( stbir__simdf_load( r0, input7 );  stbir__simdf_madd( o0, o0, r0, stbir__if_simdf8_cast_to_simdf4( c7 ) ); )
+
+      stbir__simdf_store( output, o0 );
+      output += 4;
+      stbIF0( input0 += 4; ) stbIF1( input1 += 4; ) stbIF2( input2 += 4; ) stbIF3( input3 += 4; ) stbIF4( input4 += 4; ) stbIF5( input5 += 4; ) stbIF6( input6 += 4; ) stbIF7( input7 += 4; )
+    }
+  }
+  #else
+  STBIR_NO_UNROLL_LOOP_START
+  while ( ( (char*)input0_end - (char*) input0 ) >= 16 )
+  {
+    float o0, o1, o2, o3;
+    STBIR_NO_UNROLL(output);
+    #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+    stbIF0( o0 = output[0] + input0[0] * c0s; o1 = output[1] + input0[1] * c0s; o2 = output[2] + input0[2] * c0s; o3 = output[3] + input0[3] * c0s; )
+    #else
+    stbIF0( o0  = input0[0] * c0s; o1  = input0[1] * c0s; o2  = input0[2] * c0s; o3  = input0[3] * c0s; )
+    #endif
+    stbIF1( o0 += input1[0] * c1s; o1 += input1[1] * c1s; o2 += input1[2] * c1s; o3 += input1[3] * c1s; )
+    stbIF2( o0 += input2[0] * c2s; o1 += input2[1] * c2s; o2 += input2[2] * c2s; o3 += input2[3] * c2s; )
+    stbIF3( o0 += input3[0] * c3s; o1 += input3[1] * c3s; o2 += input3[2] * c3s; o3 += input3[3] * c3s; )
+    stbIF4( o0 += input4[0] * c4s; o1 += input4[1] * c4s; o2 += input4[2] * c4s; o3 += input4[3] * c4s; )
+    stbIF5( o0 += input5[0] * c5s; o1 += input5[1] * c5s; o2 += input5[2] * c5s; o3 += input5[3] * c5s; )
+    stbIF6( o0 += input6[0] * c6s; o1 += input6[1] * c6s; o2 += input6[2] * c6s; o3 += input6[3] * c6s; )
+    stbIF7( o0 += input7[0] * c7s; o1 += input7[1] * c7s; o2 += input7[2] * c7s; o3 += input7[3] * c7s; )
+    output[0] = o0; output[1] = o1; output[2] = o2; output[3] = o3;
+    output += 4;
+    stbIF0( input0 += 4; ) stbIF1( input1 += 4; ) stbIF2( input2 += 4; ) stbIF3( input3 += 4; ) stbIF4( input4 += 4; ) stbIF5( input5 += 4; ) stbIF6( input6 += 4; ) stbIF7( input7 += 4; )
+  }
+  #endif
+  STBIR_NO_UNROLL_LOOP_START
+  while ( input0 < input0_end )
+  {
+    float o0;
+    STBIR_NO_UNROLL(output);
+    #ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+    stbIF0( o0 = output[0] + input0[0] * c0s; )
+    #else
+    stbIF0( o0  = input0[0] * c0s; )
+    #endif
+    stbIF1( o0 += input1[0] * c1s; )
+    stbIF2( o0 += input2[0] * c2s; )
+    stbIF3( o0 += input3[0] * c3s; )
+    stbIF4( o0 += input4[0] * c4s; )
+    stbIF5( o0 += input5[0] * c5s; )
+    stbIF6( o0 += input6[0] * c6s; )
+    stbIF7( o0 += input7[0] * c7s; )
+    output[0] = o0;
+    ++output;
+    stbIF0( ++input0; ) stbIF1( ++input1; ) stbIF2( ++input2; ) stbIF3( ++input3; ) stbIF4( ++input4; ) stbIF5( ++input5; ) stbIF6( ++input6; ) stbIF7( ++input7; )
+  }
+}
+
+#undef stbIF0
+#undef stbIF1
+#undef stbIF2
+#undef stbIF3
+#undef stbIF4
+#undef stbIF5
+#undef stbIF6
+#undef stbIF7
+#undef STB_IMAGE_RESIZE_DO_VERTICALS
+#undef STBIR__vertical_channels
+#undef STB_IMAGE_RESIZE_DO_HORIZONTALS
+#undef STBIR_strs_join24
+#undef STBIR_strs_join14
+#undef STBIR_chans
+#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#undef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
+#endif
+
+#else // !STB_IMAGE_RESIZE_DO_VERTICALS
+
+#define STBIR_chans( start, end ) STBIR_strs_join1(start,STBIR__horizontal_channels,end)
+
+#ifndef stbir__2_coeff_only
+#define stbir__2_coeff_only()             \
+    stbir__1_coeff_only();                \
+    stbir__1_coeff_remnant(1);
+#endif
+
+#ifndef stbir__2_coeff_remnant
+#define stbir__2_coeff_remnant( ofs )     \
+    stbir__1_coeff_remnant(ofs);          \
+    stbir__1_coeff_remnant((ofs)+1);
+#endif
+
+#ifndef stbir__3_coeff_only
+#define stbir__3_coeff_only()             \
+    stbir__2_coeff_only();                \
+    stbir__1_coeff_remnant(2);
+#endif
+
+#ifndef stbir__3_coeff_remnant
+#define stbir__3_coeff_remnant( ofs )     \
+    stbir__2_coeff_remnant(ofs);          \
+    stbir__1_coeff_remnant((ofs)+2);
+#endif
+
+#ifndef stbir__3_coeff_setup
+#define stbir__3_coeff_setup()
+#endif
+
+#ifndef stbir__4_coeff_start
+#define stbir__4_coeff_start()            \
+    stbir__2_coeff_only();                \
+    stbir__2_coeff_remnant(2);
+#endif
+
+#ifndef stbir__4_coeff_continue_from_4
+#define stbir__4_coeff_continue_from_4( ofs )     \
+    stbir__2_coeff_remnant(ofs);                  \
+    stbir__2_coeff_remnant((ofs)+2);
+#endif
+
+#ifndef stbir__store_output_tiny
+#define stbir__store_output_tiny stbir__store_output
+#endif
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_1_coeff)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__1_coeff_only();
+    stbir__store_output_tiny();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_2_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__2_coeff_only();
+    stbir__store_output_tiny();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_3_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__3_coeff_only();
+    stbir__store_output_tiny();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_4_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_5_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__1_coeff_remnant(4);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_6_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__2_coeff_remnant(4);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_7_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  stbir__3_coeff_setup();
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+
+    stbir__4_coeff_start();
+    stbir__3_coeff_remnant(4);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_8_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__4_coeff_continue_from_4(4);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_9_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__4_coeff_continue_from_4(4);
+    stbir__1_coeff_remnant(8);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_10_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__4_coeff_continue_from_4(4);
+    stbir__2_coeff_remnant(8);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_11_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  stbir__3_coeff_setup();
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__4_coeff_continue_from_4(4);
+    stbir__3_coeff_remnant(8);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_12_coeffs)( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    float const * hc = horizontal_coefficients;
+    stbir__4_coeff_start();
+    stbir__4_coeff_continue_from_4(4);
+    stbir__4_coeff_continue_from_4(8);
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_n_coeffs_mod0 )( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    int n = ( ( horizontal_contributors->n1 - horizontal_contributors->n0 + 1 ) - 4 + 3 ) >> 2;
+    float const * hc = horizontal_coefficients;
+
+    stbir__4_coeff_start();
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    do {
+      hc += 4;
+      decode += STBIR__horizontal_channels * 4;
+      stbir__4_coeff_continue_from_4( 0 );
+      --n;
+    } while ( n > 0 );
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_n_coeffs_mod1 )( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    int n = ( ( horizontal_contributors->n1 - horizontal_contributors->n0 + 1 ) - 5 + 3 ) >> 2;
+    float const * hc = horizontal_coefficients;
+
+    stbir__4_coeff_start();
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    do {
+      hc += 4;
+      decode += STBIR__horizontal_channels * 4;
+      stbir__4_coeff_continue_from_4( 0 );
+      --n;
+    } while ( n > 0 );
+    stbir__1_coeff_remnant( 4 );
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_n_coeffs_mod2 )( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    int n = ( ( horizontal_contributors->n1 - horizontal_contributors->n0 + 1 ) - 6 + 3 ) >> 2;
+    float const * hc = horizontal_coefficients;
+
+    stbir__4_coeff_start();
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    do {
+      hc += 4;
+      decode += STBIR__horizontal_channels * 4;
+      stbir__4_coeff_continue_from_4( 0 );
+      --n;
+    } while ( n > 0 );
+    stbir__2_coeff_remnant( 4 );
+
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static void STBIR_chans( stbir__horizontal_gather_,_channels_with_n_coeffs_mod3 )( float * output_buffer, unsigned int output_sub_size, float const * decode_buffer, stbir__contributors const * horizontal_contributors, float const * horizontal_coefficients, int coefficient_width )
+{
+  float const * output_end = output_buffer + output_sub_size * STBIR__horizontal_channels;
+  float STBIR_SIMD_STREAMOUT_PTR( * ) output = output_buffer;
+  stbir__3_coeff_setup();
+  STBIR_SIMD_NO_UNROLL_LOOP_START
+  do {
+    float const * decode = decode_buffer + horizontal_contributors->n0 * STBIR__horizontal_channels;
+    int n = ( ( horizontal_contributors->n1 - horizontal_contributors->n0 + 1 ) - 7 + 3 ) >> 2;
+    float const * hc = horizontal_coefficients;
+
+    stbir__4_coeff_start();
+    STBIR_SIMD_NO_UNROLL_LOOP_START
+    do {
+      hc += 4;
+      decode += STBIR__horizontal_channels * 4;
+      stbir__4_coeff_continue_from_4( 0 );
+      --n;
+    } while ( n > 0 );
+    stbir__3_coeff_remnant( 4 );
+
+    stbir__store_output();
+  } while ( output < output_end );
+}
+
+static stbir__horizontal_gather_channels_func * STBIR_chans(stbir__horizontal_gather_,_channels_with_n_coeffs_funcs)[4]=
+{
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_n_coeffs_mod0),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_n_coeffs_mod1),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_n_coeffs_mod2),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_n_coeffs_mod3),
+};
+
+static stbir__horizontal_gather_channels_func * STBIR_chans(stbir__horizontal_gather_,_channels_funcs)[12]=
+{
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_1_coeff),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_2_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_3_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_4_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_5_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_6_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_7_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_8_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_9_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_10_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_11_coeffs),
+  STBIR_chans(stbir__horizontal_gather_,_channels_with_12_coeffs),
+};
+
+#undef STBIR__horizontal_channels
+#undef STB_IMAGE_RESIZE_DO_HORIZONTALS
+#undef stbir__1_coeff_only
+#undef stbir__1_coeff_remnant
+#undef stbir__2_coeff_only
+#undef stbir__2_coeff_remnant
+#undef stbir__3_coeff_only
+#undef stbir__3_coeff_remnant
+#undef stbir__3_coeff_setup
+#undef stbir__4_coeff_start
+#undef stbir__4_coeff_continue_from_4
+#undef stbir__store_output
+#undef stbir__store_output_tiny
+#undef STBIR_chans
+
+#endif  // HORIZONALS
+
+#undef STBIR_strs_join2
+#undef STBIR_strs_join1
+
+#endif // STB_IMAGE_RESIZE_DO_HORIZONTALS/VERTICALS/CODERS
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_image_resize_test/dotimings.c b/vendor/stb/stb_image_resize_test/dotimings.c
new file mode 100644
index 0000000..515c5d5
--- /dev/null
+++ b/vendor/stb/stb_image_resize_test/dotimings.c
@@ -0,0 +1,224 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#ifdef _MSC_VER
+
+#define stop() __debugbreak()
+#include <windows.h>
+#define int64 __int64
+#pragma warning(disable:4127)
+
+#define get_milliseconds GetTickCount
+
+#else
+
+#define stop() __builtin_trap()
+#define int64 long long
+
+typedef unsigned int U32;
+typedef unsigned long long U64;
+
+#include <time.h>
+static int get_milliseconds()
+{
+  struct timespec ts;
+  clock_gettime( CLOCK_MONOTONIC, &ts );
+  return (U32) ( ( ((U64)(U32)ts.tv_sec) * 1000LL ) + (U64)(((U32)ts.tv_nsec+500000)/1000000) );
+}
+
+#endif
+
+#if defined(TIME_SIMD)
+  // default for most platforms
+#elif defined(TIME_SCALAR)
+  #define STBIR_NO_SIMD
+#else
+  #error You must define TIME_SIMD or TIME_SCALAR when compiling this file.
+#endif
+
+#define STBIR_PROFILE
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#define STBIR__V_FIRST_INFO_BUFFER v_info
+#include "stb_image_resize2.h"  // new one!
+
+#if defined(TIME_SIMD)  && !defined(STBIR_SIMD)
+#error Timing SIMD, but scalar was ON!
+#endif
+
+#if defined(TIME_SCALAR)  && defined(STBIR_SIMD)
+#error Timing scalar, but SIMD was ON!
+#endif
+
+#define HEADER 32
+
+
+static int file_write( const char *filename, void * buffer, size_t size ) 
+{
+  FILE * f = fopen( filename, "wb" );
+  if ( f == 0 ) return 0;
+  if ( fwrite( buffer, 1, size, f) != size ) return 0;
+  fclose(f);
+  return 1;
+}
+
+int64 nresize( void * o, int ox, int oy, int op, void * i, int ix, int iy, int ip, int buf, int type, int edg, int flt )
+{
+  STBIR_RESIZE resize;
+  int t;
+  int64 b;
+
+  stbir_resize_init( &resize, i, ix, iy, ip, o, ox, oy, op, buf, type );
+  stbir_set_edgemodes( &resize, edg, edg );
+  stbir_set_filters( &resize, flt, flt );
+  
+  stbir_build_samplers_with_splits( &resize, 1 );
+
+  b = 0x7fffffffffffffffULL;
+  for( t = 0 ; t < 16 ; t++ )
+  {
+    STBIR_PROFILE_INFO profile;
+    int64 v;
+    if(!stbir_resize_extended( &resize ) )
+      stop();
+    stbir_resize_extended_profile_info( &profile, &resize );
+    v = profile.clocks[1]+profile.clocks[2];
+    if ( v < b )
+    {
+      b = v;
+      t = 0;
+    }
+  }
+
+  stbir_free_samplers( &resize );
+
+  return b;
+}
+
+
+#define INSIZES 5
+#define TYPESCOUNT 5
+#define NUM 64
+
+static const int sizes[INSIZES]={63,126,252,520,772};
+static const int types[TYPESCOUNT]={STBIR_1CHANNEL,STBIR_2CHANNEL,STBIR_RGB,STBIR_4CHANNEL,STBIR_RGBA};
+static const int effective[TYPESCOUNT]={1,2,3,4,7};
+
+int main( int argc, char ** argv )
+{
+  unsigned char * input;
+  unsigned char * output;
+  int dimensionx, dimensiony;
+  int scalex, scaley;
+  int totalms;
+  int timing_count;
+  int ir;
+  int * file;
+  int * ts;
+  int64 totalcycles;
+
+  if ( argc != 6 )
+  {
+    printf("command: dotimings x_samps y_samps x_scale y_scale outfilename\n");
+    exit(1);
+  }
+
+  input = malloc( 4*1200*1200 );
+  memset( input, 0x80, 4*1200*1200 );
+  output = malloc( 4*10000*10000ULL );
+  
+  dimensionx = atoi( argv[1] );
+  dimensiony = atoi( argv[2] );
+  scalex = atoi( argv[3] );
+  scaley = atoi( argv[4] );
+
+  timing_count = dimensionx * dimensiony * INSIZES * TYPESCOUNT;
+
+  file = malloc( sizeof(int) * ( 2 * timing_count + HEADER ) );
+  ts = file + HEADER;
+
+  totalms = get_milliseconds();  
+  totalcycles = STBIR_PROFILE_FUNC();
+  for( ir = 0 ; ir < INSIZES ; ir++ )
+  {
+    int ix, iy, ty;
+    ix = iy = sizes[ir];
+
+    for( ty = 0 ; ty < TYPESCOUNT ; ty++ )
+    {
+      int h, hh;
+
+      h = 1;
+      for( hh = 0 ; hh < dimensiony; hh++ )
+      {
+        int ww, w = 1;
+        for( ww = 0 ; ww < dimensionx; ww++ )
+        {
+          int64 VF, HF;
+          int good;
+        
+          v_info.control_v_first = 2; // vertical first
+          VF = nresize( output, w, h, (w*4*1)&~3, input, ix, iy, ix*4*1, types[ty], STBIR_TYPE_UINT8, STBIR_EDGE_CLAMP, STBIR_FILTER_MITCHELL );
+          v_info.control_v_first = 1; // horizonal first
+          HF = nresize( output, w, h, (w*4*1)&~3, input, ix, iy, ix*4*1, types[ty], STBIR_TYPE_UINT8, STBIR_EDGE_CLAMP, STBIR_FILTER_MITCHELL );
+
+          good = ( ((HF<=VF) && (!v_info.v_first)) || ((VF<=HF) && (v_info.v_first)));
+
+//          printf("\r%d,%d, %d,%d, %d, %I64d,%I64d, // Good: %c(%c-%d)  CompEst: %.1f %.1f\n", ix, iy, w, h, ty, VF, HF, good?'y':'n', v_info.v_first?'v':'h', v_info.v_resize_classification, v_info.v_cost,v_info.h_cost );
+          ts[0] = (int)VF;
+          ts[1] = (int)HF;
+
+          ts += 2;
+
+          w += scalex;
+        }
+        printf(".");
+        h += scaley;  
+      }
+    }
+  }
+  totalms = get_milliseconds() - totalms;  
+  totalcycles = STBIR_PROFILE_FUNC() - totalcycles;
+
+  printf("\n");
+
+  file[0] = 'VFT1';
+
+  #if defined(_x86_64) || defined( __x86_64__ ) || defined( _M_X64 ) || defined(__x86_64) || defined(__SSE2__) || defined( _M_IX86_FP ) || defined(__i386) || defined( __i386__ ) || defined( _M_IX86 ) || defined( _X86_ )
+  file[1] = 1;  // x64
+  #elif defined( _M_AMD64 ) || defined( __aarch64__ ) || defined( __arm64__ ) || defined(__ARM_NEON__) || defined(__ARM_NEON) || defined(__arm__) || defined( _M_ARM )
+  file[1] = 2;  // arm
+  #else
+  file[1] = 99;  // who knows???
+  #endif
+  
+  #ifdef STBIR_SIMD8
+    file[2] = 2;  // simd-8
+  #elif defined( STBIR_SIMD )
+    file[2] = 1;  // simd-4
+  #else
+    file[2] = 0;  // nosimd
+  #endif
+  
+  file[3] = dimensionx; // dimx
+  file[4] = dimensiony; // dimy
+  file[5] = TYPESCOUNT;  // channel types
+  file[ 6] = types[0]; file[7] = types[1]; file[8] = types[2]; file[9] = types[3]; file[10] = types[4];  // buffer_type 
+  file[11] = effective[0]; file[12] = effective[1]; file[13] = effective[2]; file[14] = effective[3]; file[15] = effective[4];  // effective channels 
+  file[16] = INSIZES;  // resizes
+  file[17] = sizes[0]; file[18] = sizes[0]; // input sizes (w x h)
+  file[19] = sizes[1]; file[20] = sizes[1];
+  file[21] = sizes[2]; file[22] = sizes[2];
+  file[23] = sizes[3]; file[24] = sizes[3];
+  file[25] = sizes[4]; file[26] = sizes[4];
+  file[27] = scalex;   file[28] = scaley;  // scale the dimx and dimy amount ( for(i=0;i<dimx) outputx = 1 + i*scalex; )
+  file[29] = totalms;
+  ((int64*)(file+30))[0] = totalcycles;
+
+  if ( !file_write( argv[5], file, sizeof(int) * ( 2 * timing_count + HEADER ) ) )
+    printf( "Error writing file: %s\n", argv[5] );
+  else
+    printf( "Successfully wrote timing file: %s\n", argv[5] );
+
+  return 0;
+}
diff --git a/vendor/stb/stb_image_resize_test/old_image_resize.h b/vendor/stb/stb_image_resize_test/old_image_resize.h
new file mode 100644
index 0000000..caf0f50
--- /dev/null
+++ b/vendor/stb/stb_image_resize_test/old_image_resize.h
@@ -0,0 +1,2738 @@
+/* stb_image_resize - v0.96 - public domain image resizing
+   by Jorge L Rodriguez (@VinoBS) - 2014
+   http://github.com/nothings/stb
+
+   Written with emphasis on usability, portability, and efficiency. (No
+   SIMD or threads, so it be easily outperformed by libs that use those.)
+   Only scaling and translation is supported, no rotations or shears.
+   Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
+
+   COMPILING & LINKING
+      In one C/C++ file that #includes this file, do this:
+         #define STB_IMAGE_RESIZE_IMPLEMENTATION
+      before the #include. That will create the implementation in that file.
+
+   QUICKSTART
+      stbir_resize_uint8(      input_pixels , in_w , in_h , 0,
+                               output_pixels, out_w, out_h, 0, num_channels)
+      stbir_resize_float(...)
+      stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
+                               output_pixels, out_w, out_h, 0,
+                               num_channels , alpha_chan  , 0)
+      stbir_resize_uint8_srgb_edgemode(
+                               input_pixels , in_w , in_h , 0, 
+                               output_pixels, out_w, out_h, 0, 
+                               num_channels , alpha_chan  , 0, STBIR_EDGE_CLAMP)
+                                                            // WRAP/REFLECT/ZERO
+
+   FULL API
+      See the "header file" section of the source for API documentation.
+
+   ADDITIONAL DOCUMENTATION
+
+      SRGB & FLOATING POINT REPRESENTATION
+         The sRGB functions presume IEEE floating point. If you do not have
+         IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
+         a slower implementation.
+
+      MEMORY ALLOCATION
+         The resize functions here perform a single memory allocation using
+         malloc. To control the memory allocation, before the #include that
+         triggers the implementation, do:
+
+            #define STBIR_MALLOC(size,context) ...
+            #define STBIR_FREE(ptr,context)   ...
+
+         Each resize function makes exactly one call to malloc/free, so to use
+         temp memory, store the temp memory in the context and return that.
+
+      ASSERT
+         Define STBIR_ASSERT(boolval) to override assert() and not use assert.h
+
+      OPTIMIZATION
+         Define STBIR_SATURATE_INT to compute clamp values in-range using
+         integer operations instead of float operations. This may be faster
+         on some platforms.
+
+      DEFAULT FILTERS
+         For functions which don't provide explicit control over what filters
+         to use, you can change the compile-time defaults with
+
+            #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
+            #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
+
+         See stbir_filter in the header-file section for the list of filters.
+
+      NEW FILTERS
+         A number of 1D filter kernels are used. For a list of
+         supported filters see the stbir_filter enum. To add a new filter,
+         write a filter function and add it to stbir__filter_info_table.
+
+      PROGRESS
+         For interactive use with slow resize operations, you can install
+         a progress-report callback:
+
+            #define STBIR_PROGRESS_REPORT(val)   some_func(val)
+
+         The parameter val is a float which goes from 0 to 1 as progress is made.
+
+         For example:
+
+            static void my_progress_report(float progress);
+            #define STBIR_PROGRESS_REPORT(val) my_progress_report(val)
+
+            #define STB_IMAGE_RESIZE_IMPLEMENTATION
+            #include "stb_image_resize.h"
+
+            static void my_progress_report(float progress)
+            {
+               printf("Progress: %f%%\n", progress*100);
+            }
+
+      MAX CHANNELS
+         If your image has more than 64 channels, define STBIR_MAX_CHANNELS
+         to the max you'll have.
+
+      ALPHA CHANNEL
+         Most of the resizing functions provide the ability to control how
+         the alpha channel of an image is processed. The important things
+         to know about this:
+
+         1. The best mathematically-behaved version of alpha to use is
+         called "premultiplied alpha", in which the other color channels
+         have had the alpha value multiplied in. If you use premultiplied
+         alpha, linear filtering (such as image resampling done by this
+         library, or performed in texture units on GPUs) does the "right
+         thing". While premultiplied alpha is standard in the movie CGI
+         industry, it is still uncommon in the videogame/real-time world.
+
+         If you linearly filter non-premultiplied alpha, strange effects
+         occur. (For example, the 50/50 average of 99% transparent bright green
+         and 1% transparent black produces 50% transparent dark green when
+         non-premultiplied, whereas premultiplied it produces 50%
+         transparent near-black. The former introduces green energy
+         that doesn't exist in the source image.)
+
+         2. Artists should not edit premultiplied-alpha images; artists
+         want non-premultiplied alpha images. Thus, art tools generally output
+         non-premultiplied alpha images.
+
+         3. You will get best results in most cases by converting images
+         to premultiplied alpha before processing them mathematically.
+
+         4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
+         resizer does not do anything special for the alpha channel;
+         it is resampled identically to other channels. This produces
+         the correct results for premultiplied-alpha images, but produces
+         less-than-ideal results for non-premultiplied-alpha images.
+
+         5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
+         then the resizer weights the contribution of input pixels
+         based on their alpha values, or, equivalently, it multiplies
+         the alpha value into the color channels, resamples, then divides
+         by the resultant alpha value. Input pixels which have alpha=0 do
+         not contribute at all to output pixels unless _all_ of the input
+         pixels affecting that output pixel have alpha=0, in which case
+         the result for that pixel is the same as it would be without
+         STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
+         input images in integer formats. For input images in float format,
+         input pixels with alpha=0 have no effect, and output pixels
+         which have alpha=0 will be 0 in all channels. (For float images,
+         you can manually achieve the same result by adding a tiny epsilon
+         value to the alpha channel of every image, and then subtracting
+         or clamping it at the end.)
+
+         6. You can suppress the behavior described in #5 and make
+         all-0-alpha pixels have 0 in all channels by #defining
+         STBIR_NO_ALPHA_EPSILON.
+
+         7. You can separately control whether the alpha channel is
+         interpreted as linear or affected by the colorspace. By default
+         it is linear; you almost never want to apply the colorspace.
+         (For example, graphics hardware does not apply sRGB conversion
+         to the alpha channel.)
+
+   CONTRIBUTORS
+      Jorge L Rodriguez: Implementation
+      Sean Barrett: API design, optimizations
+      Aras Pranckevicius: bugfix
+      Nathan Reed: warning fixes
+
+   REVISIONS
+      0.96 (2019-03-04) fixed warnings
+      0.95 (2017-07-23) fixed warnings
+      0.94 (2017-03-18) fixed warnings
+      0.93 (2017-03-03) fixed bug with certain combinations of heights
+      0.92 (2017-01-02) fix integer overflow on large (>2GB) images
+      0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
+      0.90 (2014-09-17) first released version
+
+   LICENSE
+     See end of file for license information.
+
+   TODO
+      Don't decode all of the image data when only processing a partial tile
+      Don't use full-width decode buffers when only processing a partial tile
+      When processing wide images, break processing into tiles so data fits in L1 cache
+      Installable filters?
+      Resize that respects alpha test coverage
+         (Reference code: FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage:
+         https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp )
+*/
+
+#ifndef STBIR_INCLUDE_STB_IMAGE_RESIZE_H
+#define STBIR_INCLUDE_STB_IMAGE_RESIZE_H
+
+#ifdef _MSC_VER
+typedef unsigned char  stbir_uint8;
+typedef unsigned short stbir_uint16;
+typedef unsigned int   stbir_uint32;
+typedef unsigned __int64   stbir_uint64;
+#else
+#include <stdint.h>
+typedef uint8_t  stbir_uint8;
+typedef uint16_t stbir_uint16;
+typedef uint32_t stbir_uint32;
+typedef uint64_t stbir_uint64;
+#endif
+
+#ifndef STBIRDEF
+#ifdef STB_IMAGE_RESIZE_STATIC
+#define STBIRDEF static
+#else
+#ifdef __cplusplus
+#define STBIRDEF extern "C"
+#else
+#define STBIRDEF extern
+#endif
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Easy-to-use API:
+//
+//     * "input pixels" points to an array of image data with 'num_channels' channels (e.g. RGB=3, RGBA=4)
+//     * input_w is input image width (x-axis), input_h is input image height (y-axis)
+//     * stride is the offset between successive rows of image data in memory, in bytes. you can
+//       specify 0 to mean packed continuously in memory
+//     * alpha channel is treated identically to other channels.
+//     * colorspace is linear or sRGB as specified by function name
+//     * returned result is 1 for success or 0 in case of an error.
+//       #define STBIR_ASSERT() to trigger an assert on parameter validation errors.
+//     * Memory required grows approximately linearly with input and output size, but with
+//       discontinuities at input_w == output_w and input_h == output_h.
+//     * These functions use a "default" resampling filter defined at compile time. To change the filter,
+//       you can change the compile-time defaults by #defining STBIR_DEFAULT_FILTER_UPSAMPLE
+//       and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or you can use the medium-complexity API.
+
+STBIRDEF int stbir_resize_uint8(     const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels);
+
+STBIRDEF int stbir_resize_float(     const float *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           float *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels);
+
+
+// The following functions interpret image data as gamma-corrected sRGB. 
+// Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
+// or otherwise provide the index of the alpha channel. Flags value
+// of 0 will probably do the right thing if you're not sure what
+// the flags mean.
+
+#define STBIR_ALPHA_CHANNEL_NONE       -1
+
+// Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
+// use alpha-weighted resampling (effectively premultiplying, resampling,
+// then unpremultiplying).
+#define STBIR_FLAG_ALPHA_PREMULTIPLIED    (1 << 0)
+// The specified alpha channel should be handled as gamma-corrected value even
+// when doing sRGB operations.
+#define STBIR_FLAG_ALPHA_USES_COLORSPACE  (1 << 1)
+
+#define STBIR_FLAG_ALPHA_OUT_PREMULTIPLIED (1 << 2)
+
+STBIRDEF int stbir_resize_uint8_srgb(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels, int alpha_channel, int flags);
+
+
+typedef enum
+{
+    STBIR_EDGE_CLAMP   = 1,
+    STBIR_EDGE_REFLECT = 2,
+    STBIR_EDGE_WRAP    = 3,
+    STBIR_EDGE_ZERO    = 4,
+} stbir_edge;
+
+// This function adds the ability to specify how requests to sample off the edge of the image are handled.
+STBIRDEF int stbir_resize_uint8_srgb_edgemode(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                                    unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                              int num_channels, int alpha_channel, int flags,
+                                              stbir_edge edge_wrap_mode);
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Medium-complexity API
+//
+// This extends the easy-to-use API as follows:
+//
+//     * Alpha-channel can be processed separately
+//       * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
+//         * Alpha channel will not be gamma corrected (unless flags&STBIR_FLAG_GAMMA_CORRECT)
+//         * Filters will be weighted by alpha channel (unless flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
+//     * Filter can be selected explicitly
+//     * uint16 image type
+//     * sRGB colorspace available for all types
+//     * context parameter for passing to STBIR_MALLOC
+
+typedef enum
+{
+    STBIR_FILTER_DEFAULT      = 0,  // use same filter type that easy-to-use API chooses
+    STBIR_FILTER_BOX          = 1,  // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
+    STBIR_FILTER_TRIANGLE     = 2,  // On upsampling, produces same results as bilinear texture filtering
+    STBIR_FILTER_CUBICBSPLINE = 3,  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
+    STBIR_FILTER_CATMULLROM   = 4,  // An interpolating cubic spline
+    STBIR_FILTER_MITCHELL     = 5,  // Mitchell-Netrevalli filter with B=1/3, C=1/3
+} stbir_filter;
+
+typedef enum
+{
+    STBIR_COLORSPACE_LINEAR,
+    STBIR_COLORSPACE_SRGB,
+
+    STBIR_MAX_COLORSPACES,
+} stbir_colorspace;
+
+// The following functions are all identical except for the type of the image data
+
+STBIRDEF int stbir_resize_uint8_generic( const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                               unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space, 
+                                         void *alloc_context);
+
+STBIRDEF int stbir_resize_uint16_generic(const stbir_uint16 *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
+                                               stbir_uint16 *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space, 
+                                         void *alloc_context);
+
+STBIRDEF int stbir_resize_float_generic( const float *input_pixels         , int input_w , int input_h , int input_stride_in_bytes,
+                                               float *output_pixels        , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space, 
+                                         void *alloc_context);
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Full-complexity API
+//
+// This extends the medium API as follows:
+//
+//       * uint32 image type
+//     * not typesafe
+//     * separate filter types for each axis
+//     * separate edge modes for each axis
+//     * can specify scale explicitly for subpixel correctness
+//     * can specify image source tile using texture coordinates
+
+typedef enum
+{
+    STBIR_TYPE_UINT8 ,
+    STBIR_TYPE_UINT16,
+    STBIR_TYPE_FLOAT ,
+    STBIR_TYPE_UINT32,
+
+    STBIR_MAX_TYPES
+} stbir_datatype;
+
+STBIRDEF int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical, 
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context);
+
+STBIRDEF int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical, 
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float x_scale, float y_scale,
+                                   float x_offset, float y_offset);
+
+STBIRDEF int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical, 
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float s0, float t0, float s1, float t1);
+// (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing style: [0, 1]x[0, 1]) of a region of the input image to use.
+
+//
+//
+////   end header file   /////////////////////////////////////////////////////
+#endif // STBIR_INCLUDE_STB_IMAGE_RESIZE_H
+
+
+
+
+
+#ifdef STB_IMAGE_RESIZE_IMPLEMENTATION
+
+#ifndef STBIR_ASSERT
+#include <assert.h>
+#define STBIR_ASSERT(x) assert(x)
+#endif
+
+// For memset
+#include <string.h>
+
+#include <math.h>
+
+#ifndef STBIR_MALLOC
+#include <stdlib.h>
+// use comma operator to evaluate c, to avoid "unused parameter" warnings
+#define STBIR_MALLOC(size,c) ((void)(c), malloc(size))
+#define STBIR_FREE(ptr,c)    ((void)(c), free(ptr))
+#endif
+
+#ifndef _MSC_VER
+#ifdef __cplusplus
+#define stbir__inline inline
+#else
+#define stbir__inline
+#endif
+#else
+#define stbir__inline __forceinline
+#endif
+
+#ifdef STBIR_PROFILE
+
+union
+{
+  struct { stbir_uint64 total, setup, filters, looping, vertical, horizontal, decode, encode, alpha, unalpha; } named;
+  stbir_uint64 array[10];
+} oldprofile;
+stbir_uint64 * current_zone_excluded_ptr;
+
+#if defined(_x86_64) || defined( __x86_64__ ) || defined( _M_X64 ) || defined(__x86_64) || defined(__SSE2__) || defined(STBIR_SSE) || defined( _M_IX86_FP ) || defined(__i386) || defined( __i386__ ) || defined( _M_IX86 ) || defined( _X86_ )
+
+#ifdef _MSC_VER
+
+  STBIRDEF stbir_uint64 __rdtsc();
+  #define STBIR_PROFILE_FUNC() __rdtsc()
+
+#else // non msvc
+
+  static stbir__inline stbir_uint64 STBIR_PROFILE_FUNC() 
+  {
+    stbir_uint32 lo, hi;
+    asm volatile ("rdtsc" : "=a" (lo), "=d" (hi) );
+    return ( ( (stbir_uint64) hi ) << 32 ) | ( (stbir_uint64) lo );
+  }
+
+#endif  // msvc
+
+#elif defined( _M_AMD64 ) || defined( __aarch64__ ) || defined( __arm64__ ) || defined(__ARM_NEON__) || defined(__ARM_NEON)
+
+#ifdef _MSC_VER
+
+  #error Not sure what the intrinsic for cntvct_el0 is on MSVC
+
+#else // no msvc
+
+  static stbir__inline stbir_uint64 STBIR_PROFILE_FUNC()
+  {
+    stbir_uint64 tsc;
+    asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
+    return tsc;
+  }
+
+#endif
+
+#elif // x64, arm
+
+#error Unknown platform for profiling.
+
+#endif  //x64 and   
+
+#define STBIR_PROFILE_START() { stbir_uint64 thiszonetime = STBIR_PROFILE_FUNC(); stbir_uint64 * save_parent_excluded_ptr = current_zone_excluded_ptr; stbir_uint64 current_zone_excluded = 0; current_zone_excluded_ptr = &current_zone_excluded; 
+#define STBIR_PROFILE_END( wh ) thiszonetime = STBIR_PROFILE_FUNC() - thiszonetime; oldprofile.named.wh += thiszonetime - current_zone_excluded; *save_parent_excluded_ptr += thiszonetime; current_zone_excluded_ptr = save_parent_excluded_ptr; }
+#define STBIR_PROFILE_FIRST_START() { int i; current_zone_excluded_ptr = &oldprofile.named.total; for(i=0;i<STBIR__ARRAY_SIZE(oldprofile.array);i++) oldprofile.array[i]=0; } STBIR_PROFILE_START();
+
+#else
+
+#define STBIR_PROFILE_START()
+#define STBIR_PROFILE_END( wh )
+#define STBIR_PROFILE_FIRST_START()
+
+#endif
+
+// should produce compiler error if size is wrong
+typedef unsigned char stbir__validate_uint32[sizeof(stbir_uint32) == 4 ? 1 : -1];
+
+#ifdef _MSC_VER
+#define STBIR__NOTUSED(v)  (void)(v)
+#else
+#define STBIR__NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#define STBIR__ARRAY_SIZE(a) (sizeof((a))/sizeof((a)[0]))
+
+#ifndef STBIR_DEFAULT_FILTER_UPSAMPLE
+#define STBIR_DEFAULT_FILTER_UPSAMPLE    STBIR_FILTER_CATMULLROM
+#endif
+
+#ifndef STBIR_DEFAULT_FILTER_DOWNSAMPLE
+#define STBIR_DEFAULT_FILTER_DOWNSAMPLE  STBIR_FILTER_MITCHELL
+#endif
+
+#ifndef STBIR_PROGRESS_REPORT
+#define STBIR_PROGRESS_REPORT(float_0_to_1)
+#endif
+
+#ifndef STBIR_MAX_CHANNELS
+#define STBIR_MAX_CHANNELS 64
+#endif
+
+#if STBIR_MAX_CHANNELS > 65536
+#error "Too many channels; STBIR_MAX_CHANNELS must be no more than 65536."
+// because we store the indices in 16-bit variables
+#endif
+
+// This value is added to alpha just before premultiplication to avoid
+// zeroing out color values. It is equivalent to 2^-80. If you don't want
+// that behavior (it may interfere if you have floating point images with
+// very small alpha values) then you can define STBIR_NO_ALPHA_EPSILON to
+// disable it.
+#ifndef STBIR_ALPHA_EPSILON
+#define STBIR_ALPHA_EPSILON ((float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20))
+#endif
+
+
+
+#ifdef _MSC_VER
+#define STBIR__UNUSED_PARAM(v)  (void)(v)
+#else
+#define STBIR__UNUSED_PARAM(v)  (void)sizeof(v)
+#endif
+
+// must match stbir_datatype
+static unsigned char stbir__type_size[] = {
+    1, // STBIR_TYPE_UINT8
+    2, // STBIR_TYPE_UINT16
+    4, // STBIR_TYPE_UINT32
+    4, // STBIR_TYPE_FLOAT
+};
+
+// Kernel function centered at 0
+typedef float (stbir__kernel_fn)(float x, float scale);
+typedef float (stbir__support_fn)(float scale);
+
+typedef struct
+{
+    stbir__kernel_fn* kernel;
+    stbir__support_fn* support;
+} stbir__filter_info;
+
+// When upsampling, the contributors are which source pixels contribute.
+// When downsampling, the contributors are which destination pixels are contributed to.
+typedef struct
+{
+    int n0; // First contributing pixel
+    int n1; // Last contributing pixel
+} stbir__contributors;
+
+typedef struct
+{
+    const void* input_data;
+    int input_w;
+    int input_h;
+    int input_stride_bytes;
+
+    void* output_data;
+    int output_w;
+    int output_h;
+    int output_stride_bytes;
+
+    float s0, t0, s1, t1;
+
+    float horizontal_shift; // Units: output pixels
+    float vertical_shift;   // Units: output pixels
+    float horizontal_scale;
+    float vertical_scale;
+
+    int channels;
+    int alpha_channel;
+    stbir_uint32 flags;
+    stbir_datatype type;
+    stbir_filter horizontal_filter;
+    stbir_filter vertical_filter;
+    stbir_edge edge_horizontal;
+    stbir_edge edge_vertical;
+    stbir_colorspace colorspace;
+
+    stbir__contributors* horizontal_contributors;
+    float* horizontal_coefficients;
+
+    stbir__contributors* vertical_contributors;
+    float* vertical_coefficients;
+
+    int decode_buffer_pixels;
+    float* decode_buffer;
+
+    float* horizontal_buffer;
+
+    // cache these because ceil/floor are inexplicably showing up in profile
+    int horizontal_coefficient_width;
+    int vertical_coefficient_width;
+    int horizontal_filter_pixel_width;
+    int vertical_filter_pixel_width;
+    int horizontal_filter_pixel_margin;
+    int vertical_filter_pixel_margin;
+    int horizontal_num_contributors;
+    int vertical_num_contributors;
+
+    int ring_buffer_length_bytes;   // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
+    int ring_buffer_num_entries;    // Total number of entries in the ring buffer.
+    int ring_buffer_first_scanline;
+    int ring_buffer_last_scanline;
+    int ring_buffer_begin_index;    // first_scanline is at this index in the ring buffer
+    float* ring_buffer;
+
+    float* encode_buffer; // A temporary buffer to store floats so we don't lose precision while we do multiply-adds.
+
+    int horizontal_contributors_size;
+    int horizontal_coefficients_size;
+    int vertical_contributors_size;
+    int vertical_coefficients_size;
+    int decode_buffer_size;
+    int horizontal_buffer_size;
+    int ring_buffer_size;
+    int encode_buffer_size;
+} ostbir__info;
+
+
+static const float stbir__max_uint8_as_float  = 255.0f;
+static const float stbir__max_uint16_as_float = 65535.0f;
+static const double stbir__max_uint32_as_float = 4294967295.0;
+
+
+static stbir__inline int stbir__min(int a, int b)
+{
+    return a < b ? a : b;
+}
+
+static stbir__inline float stbir__saturate(float x)
+{
+    if (x < 0)
+        return 0;
+
+    if (x > 1)
+        return 1;
+
+    return x;
+}
+
+#ifdef STBIR_SATURATE_INT
+static stbir__inline stbir_uint8 stbir__saturate8(int x)
+{
+    if ((unsigned int) x <= 255)
+        return (stbir_uint8) x;
+
+    if (x < 0)
+        return 0;
+
+    return 255;
+}
+
+static stbir__inline stbir_uint16 stbir__saturate16(int x)
+{
+    if ((unsigned int) x <= 65535)
+        return (stbir_uint16) x;
+
+    if (x < 0)
+        return 0;
+
+    return 65535;
+}
+#endif
+
+static float stbir__srgb_uchar_to_linear_float[256] = {
+    0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
+    0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
+    0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
+    0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
+    0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
+    0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
+    0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
+    0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
+    0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
+    0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
+    0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
+    0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
+    0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
+    0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
+    0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
+    0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
+    0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
+    0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
+    0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
+    0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
+    0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
+    0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
+    0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
+    0.982251f, 0.991102f, 1.0f
+};
+
+static float stbir__srgb_to_linear(float f)
+{
+    if (f <= 0.04045f)
+        return f / 12.92f;
+    else
+        return (float)pow((f + 0.055f) / 1.055f, 2.4f);
+}
+
+static float stbir__linear_to_srgb(float f)
+{
+    if (f <= 0.0031308f)
+        return f * 12.92f;
+    else
+        return 1.055f * (float)pow(f, 1 / 2.4f) - 0.055f;
+}
+
+#ifndef STBIR_NON_IEEE_FLOAT
+// From https://gist.github.com/rygorous/2203834
+
+typedef union
+{
+    stbir_uint32 u;
+    float f;
+} stbir__FP32;
+
+static const stbir_uint32 fp32_to_srgb8_tab4[104] = {
+    0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
+    0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
+    0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
+    0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
+    0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
+    0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
+    0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
+    0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
+    0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
+    0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
+    0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
+    0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
+    0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
+};
+ 
+static stbir_uint8 stbir__linear_to_srgb_uchar(float in)
+{
+    static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
+    static const stbir__FP32 minval = { (127-13) << 23 };
+    stbir_uint32 tab,bias,scale,t;
+    stbir__FP32 f;
+ 
+    // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
+    // The tests are carefully written so that NaNs map to 0, same as in the reference
+    // implementation.
+    if (!(in > minval.f)) // written this way to catch NaNs
+        in = minval.f;
+    if (in > almostone.f)
+        in = almostone.f;
+ 
+    // Do the table lookup and unpack bias, scale
+    f.f = in;
+    tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
+    bias = (tab >> 16) << 9;
+    scale = tab & 0xffff;
+ 
+    // Grab next-highest mantissa bits and perform linear interpolation
+    t = (f.u >> 12) & 0xff;
+    return (unsigned char) ((bias + scale*t) >> 16);
+}
+
+#else
+// sRGB transition values, scaled by 1<<28
+static int stbir__srgb_offset_to_linear_scaled[256] =
+{
+            0,     40738,    122216,    203693,    285170,    366648,    448125,    529603,
+       611080,    692557,    774035,    855852,    942009,   1033024,   1128971,   1229926,
+      1335959,   1447142,   1563542,   1685229,   1812268,   1944725,   2082664,   2226148,
+      2375238,   2529996,   2690481,   2856753,   3028870,   3206888,   3390865,   3580856,
+      3776916,   3979100,   4187460,   4402049,   4622919,   4850123,   5083710,   5323731,
+      5570236,   5823273,   6082892,   6349140,   6622065,   6901714,   7188133,   7481369,
+      7781466,   8088471,   8402427,   8723380,   9051372,   9386448,   9728650,  10078021,
+     10434603,  10798439,  11169569,  11548036,  11933879,  12327139,  12727857,  13136073,
+     13551826,  13975156,  14406100,  14844697,  15290987,  15745007,  16206795,  16676389,
+     17153826,  17639142,  18132374,  18633560,  19142734,  19659934,  20185196,  20718552,
+     21260042,  21809696,  22367554,  22933648,  23508010,  24090680,  24681686,  25281066,
+     25888850,  26505076,  27129772,  27762974,  28404716,  29055026,  29713942,  30381490,
+     31057708,  31742624,  32436272,  33138682,  33849884,  34569912,  35298800,  36036568,
+     36783260,  37538896,  38303512,  39077136,  39859796,  40651528,  41452360,  42262316,
+     43081432,  43909732,  44747252,  45594016,  46450052,  47315392,  48190064,  49074096,
+     49967516,  50870356,  51782636,  52704392,  53635648,  54576432,  55526772,  56486700,
+     57456236,  58435408,  59424248,  60422780,  61431036,  62449032,  63476804,  64514376,
+     65561776,  66619028,  67686160,  68763192,  69850160,  70947088,  72053992,  73170912,
+     74297864,  75434880,  76581976,  77739184,  78906536,  80084040,  81271736,  82469648,
+     83677792,  84896192,  86124888,  87363888,  88613232,  89872928,  91143016,  92423512,
+     93714432,  95015816,  96327688,  97650056,  98982952, 100326408, 101680440, 103045072,
+    104420320, 105806224, 107202800, 108610064, 110028048, 111456776, 112896264, 114346544,
+    115807632, 117279552, 118762328, 120255976, 121760536, 123276016, 124802440, 126339832,
+    127888216, 129447616, 131018048, 132599544, 134192112, 135795792, 137410592, 139036528,
+    140673648, 142321952, 143981456, 145652208, 147334208, 149027488, 150732064, 152447968,
+    154175200, 155913792, 157663776, 159425168, 161197984, 162982240, 164777968, 166585184,
+    168403904, 170234160, 172075968, 173929344, 175794320, 177670896, 179559120, 181458992,
+    183370528, 185293776, 187228736, 189175424, 191133888, 193104112, 195086128, 197079968,
+    199085648, 201103184, 203132592, 205173888, 207227120, 209292272, 211369392, 213458480,
+    215559568, 217672656, 219797792, 221934976, 224084240, 226245600, 228419056, 230604656,
+    232802400, 235012320, 237234432, 239468736, 241715280, 243974080, 246245120, 248528464,
+    250824112, 253132064, 255452368, 257785040, 260130080, 262487520, 264857376, 267239664,
+};
+
+static stbir_uint8 stbir__linear_to_srgb_uchar(float f)
+{
+    int x = (int) (f * (1 << 28)); // has headroom so you don't need to clamp
+    int v = 0;
+    int i;
+
+    // Refine the guess with a short binary search.
+    i = v + 128; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +  64; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +  32; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +  16; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   8; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   4; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   2; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+    i = v +   1; if (x >= stbir__srgb_offset_to_linear_scaled[i]) v = i;
+
+    return (stbir_uint8) v;
+}
+#endif
+
+static float stbir__filter_trapezoid(float x, float scale)
+{
+    float halfscale = scale / 2;
+    float t = 0.5f + halfscale;
+    STBIR_ASSERT(scale <= 1);
+
+    x = (float)fabs(x);
+
+    if (x >= t)
+        return 0;
+    else
+    {
+        float r = 0.5f - halfscale;
+        if (x <= r)
+            return 1;
+        else
+            return (t - x) / scale;
+    }
+}
+
+static float stbir__support_trapezoid(float scale)
+{
+    STBIR_ASSERT(scale <= 1);
+    return 0.5f + scale / 2;
+}
+
+static float stbir__filter_triangle(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x <= 1.0f)
+        return 1 - x;
+    else
+        return 0;
+}
+
+static float stbir__filter_cubic(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x < 1.0f)
+        return (4 + x*x*(3*x - 6))/6;
+    else if (x < 2.0f)
+        return (8 + x*(-12 + x*(6 - x)))/6;
+
+    return (0.0f);
+}
+
+static float stbir__filter_catmullrom(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x < 1.0f)
+        return 1 - x*x*(2.5f - 1.5f*x);
+    else if (x < 2.0f)
+        return 2 - x*(4 + x*(0.5f*x - 2.5f));
+
+    return (0.0f);
+}
+
+static float stbir__filter_mitchell(float x, float s)
+{
+    STBIR__UNUSED_PARAM(s);
+
+    x = (float)fabs(x);
+
+    if (x < 1.0f)
+        return (16 + x*x*(21 * x - 36))/18;
+    else if (x < 2.0f)
+        return (32 + x*(-60 + x*(36 - 7*x)))/18;
+
+    return (0.0f);
+}
+
+static float stbir__support_zero(float s)
+{
+    STBIR__UNUSED_PARAM(s);
+    return 0;
+}
+
+static float stbir__support_one(float s)
+{
+    STBIR__UNUSED_PARAM(s);
+    return 1;
+}
+
+static float stbir__support_two(float s)
+{
+    STBIR__UNUSED_PARAM(s);
+    return 2;
+}
+
+static stbir__filter_info stbir__filter_info_table[] = {
+        { NULL,                     stbir__support_zero },
+        { stbir__filter_trapezoid,  stbir__support_trapezoid },
+        { stbir__filter_triangle,   stbir__support_one },
+        { stbir__filter_cubic,      stbir__support_two },
+        { stbir__filter_catmullrom, stbir__support_two },
+        { stbir__filter_mitchell,   stbir__support_two },
+};
+
+stbir__inline static int stbir__use_upsampling(float ratio)
+{
+    return ratio > 1;
+}
+
+stbir__inline static int stbir__use_width_upsampling(ostbir__info* stbir_info)
+{
+    return stbir__use_upsampling(stbir_info->horizontal_scale);
+}
+
+stbir__inline static int stbir__use_height_upsampling(ostbir__info* stbir_info)
+{
+    return stbir__use_upsampling(stbir_info->vertical_scale);
+}
+
+// This is the maximum number of input samples that can affect an output sample
+// with the given filter
+static int stbir__get_filter_pixel_width(stbir_filter filter, float scale)
+{
+    STBIR_ASSERT(filter != 0);
+    STBIR_ASSERT(filter < STBIR__ARRAY_SIZE(stbir__filter_info_table));
+
+    if (stbir__use_upsampling(scale))
+        return (int)ceil(stbir__filter_info_table[filter].support(1/scale) * 2);
+    else
+        return (int)ceil(stbir__filter_info_table[filter].support(scale) * 2 / scale);
+}
+
+// This is how much to expand buffers to account for filters seeking outside
+// the image boundaries.
+static int stbir__get_filter_pixel_margin(stbir_filter filter, float scale)
+{
+    return stbir__get_filter_pixel_width(filter, scale) / 2;
+}
+
+static int stbir__get_coefficient_width(stbir_filter filter, float scale)
+{
+    if (stbir__use_upsampling(scale))
+        return (int)ceil(stbir__filter_info_table[filter].support(1 / scale) * 2);
+    else
+        return (int)ceil(stbir__filter_info_table[filter].support(scale) * 2);
+}
+
+static int stbir__get_contributors(float scale, stbir_filter filter, int input_size, int output_size)
+{
+    if (stbir__use_upsampling(scale))
+        return output_size;
+    else
+        return (input_size + stbir__get_filter_pixel_margin(filter, scale) * 2);
+}
+
+static int stbir__get_total_horizontal_coefficients(ostbir__info* info)
+{
+    return info->horizontal_num_contributors
+         * stbir__get_coefficient_width      (info->horizontal_filter, info->horizontal_scale);
+}
+
+static int stbir__get_total_vertical_coefficients(ostbir__info* info)
+{
+    return info->vertical_num_contributors
+         * stbir__get_coefficient_width      (info->vertical_filter, info->vertical_scale);
+}
+
+static stbir__contributors* stbir__get_contributor(stbir__contributors* contributors, int n)
+{
+    return &contributors[n];
+}
+
+// For perf reasons this code is duplicated in stbir__resample_horizontal_upsample/downsample,
+// if you change it here change it there too.
+static float* stbir__get_coefficient(float* coefficients, stbir_filter filter, float scale, int n, int c)
+{
+    int width = stbir__get_coefficient_width(filter, scale);
+    return &coefficients[width*n + c];
+}
+
+static int stbir__edge_wrap_slow(stbir_edge edge, int n, int max)
+{
+    switch (edge)
+    {
+    case STBIR_EDGE_ZERO:
+        return 0; // we'll decode the wrong pixel here, and then overwrite with 0s later
+
+    case STBIR_EDGE_CLAMP:
+        if (n < 0)
+            return 0;
+
+        if (n >= max)
+            return max - 1;
+
+        return n; // NOTREACHED
+
+    case STBIR_EDGE_REFLECT:
+    {
+        if (n < 0)
+        {
+            if (n > -max)
+                return -n;
+            else
+                return max - 1;
+        }
+
+        if (n >= max)
+        {
+            int max2 = max * 2;
+            if (n >= max2)
+                return 0;
+            else
+                return max2 - n - 1;
+        }
+
+        return n; // NOTREACHED
+    }
+
+    case STBIR_EDGE_WRAP:
+        if (n >= 0)
+            return (n % max);
+        else
+        {
+            int m = (-n) % max;
+
+            if (m != 0)
+                m = max - m;
+
+            return (m);
+        }
+        // NOTREACHED
+
+    default:
+        STBIR_ASSERT(!"Unimplemented edge type");
+        return 0;
+    }
+}
+
+stbir__inline static int stbir__edge_wrap(stbir_edge edge, int n, int max)
+{
+    // avoid per-pixel switch
+    if (n >= 0 && n < max)
+        return n;
+    return stbir__edge_wrap_slow(edge, n, max);
+}
+
+// What input pixels contribute to this output pixel?
+static void stbir__calculate_sample_range_upsample(int n, float out_filter_radius, float scale_ratio, float out_shift, int* in_first_pixel, int* in_last_pixel, float* in_center_of_out)
+{
+    float out_pixel_center = (float)n + 0.5f;
+    float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
+    float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
+
+    float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) / scale_ratio;
+    float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) / scale_ratio;
+
+    *in_center_of_out = (out_pixel_center + out_shift) / scale_ratio;
+    *in_first_pixel = (int)(floor(in_pixel_influence_lowerbound + 0.5));
+    *in_last_pixel = (int)(floor(in_pixel_influence_upperbound - 0.5));
+}
+
+// What output pixels does this input pixel contribute to?
+static void stbir__calculate_sample_range_downsample(int n, float in_pixels_radius, float scale_ratio, float out_shift, int* out_first_pixel, int* out_last_pixel, float* out_center_of_in)
+{
+    float in_pixel_center = (float)n + 0.5f;
+    float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
+    float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
+
+    float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale_ratio - out_shift;
+    float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale_ratio - out_shift;
+
+    *out_center_of_in = in_pixel_center * scale_ratio - out_shift;
+    *out_first_pixel = (int)(floor(out_pixel_influence_lowerbound + 0.5));
+    *out_last_pixel = (int)(floor(out_pixel_influence_upperbound - 0.5));
+}
+
+static void stbir__calculate_coefficients_upsample(stbir_filter filter, float scale, int in_first_pixel, int in_last_pixel, float in_center_of_out, stbir__contributors* contributor, float* coefficient_group)
+{
+    int i;
+    float total_filter = 0;
+    float filter_scale;
+
+    STBIR_ASSERT(in_last_pixel - in_first_pixel <= (int)ceil(stbir__filter_info_table[filter].support(1/scale) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
+
+    contributor->n0 = in_first_pixel;
+    contributor->n1 = in_last_pixel;
+
+    STBIR_ASSERT(contributor->n1 >= contributor->n0);
+
+    for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
+    {
+        float in_pixel_center = (float)(i + in_first_pixel) + 0.5f;
+        coefficient_group[i] = stbir__filter_info_table[filter].kernel(in_center_of_out - in_pixel_center, 1 / scale);
+
+        // If the coefficient is zero, skip it. (Don't do the <0 check here, we want the influence of those outside pixels.)
+        if (i == 0 && !coefficient_group[i])
+        {
+            contributor->n0 = ++in_first_pixel;
+            i--;
+            continue;
+        }
+
+        total_filter += coefficient_group[i];
+    }
+
+    STBIR_ASSERT(stbir__filter_info_table[filter].kernel((float)(in_last_pixel + 1) + 0.5f - in_center_of_out, 1/scale) == 0);
+
+    STBIR_ASSERT(total_filter > 0.9);
+    STBIR_ASSERT(total_filter < 1.1f); // Make sure it's not way off.
+
+    // Make sure the sum of all coefficients is 1.
+    filter_scale = 1 / total_filter;
+
+    for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
+        coefficient_group[i] *= filter_scale;
+
+    for (i = in_last_pixel - in_first_pixel; i >= 0; i--)
+    {
+        if (coefficient_group[i])
+            break;
+
+        // This line has no weight. We can skip it.
+        contributor->n1 = contributor->n0 + i - 1;
+    }
+}
+
+static void stbir__calculate_coefficients_downsample(stbir_filter filter, float scale_ratio, int out_first_pixel, int out_last_pixel, float out_center_of_in, stbir__contributors* contributor, float* coefficient_group)
+{
+    int i;
+
+     STBIR_ASSERT(out_last_pixel - out_first_pixel <= (int)ceil(stbir__filter_info_table[filter].support(scale_ratio) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
+
+    contributor->n0 = out_first_pixel;
+    contributor->n1 = out_last_pixel;
+
+    STBIR_ASSERT(contributor->n1 >= contributor->n0);
+
+    for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
+    {
+        float out_pixel_center = (float)(i + out_first_pixel) + 0.5f;
+        float x = out_pixel_center - out_center_of_in;
+        coefficient_group[i] = stbir__filter_info_table[filter].kernel(x, scale_ratio) * scale_ratio;
+    }
+
+    STBIR_ASSERT(stbir__filter_info_table[filter].kernel((float)(out_last_pixel + 1) + 0.5f - out_center_of_in, scale_ratio) == 0);
+
+    for (i = out_last_pixel - out_first_pixel; i >= 0; i--)
+    {
+        if (coefficient_group[i])
+            break;
+
+        // This line has no weight. We can skip it.
+        contributor->n1 = contributor->n0 + i - 1;
+    }
+}
+
+static void stbir__normalize_downsample_coefficients(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, int input_size, int output_size)
+{
+    int num_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
+    int num_coefficients = stbir__get_coefficient_width(filter, scale_ratio);
+    int i, j;
+    int skip;
+
+    for (i = 0; i < output_size; i++)
+    {
+        float scale;
+        float total = 0;
+
+        for (j = 0; j < num_contributors; j++)
+        {
+            if (i >= contributors[j].n0 && i <= contributors[j].n1)
+            {
+                float coefficient = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0);
+                total += coefficient;
+            }
+            else if (i < contributors[j].n0)
+                break;
+        }
+
+        //STBIR_ASSERT(total > 0.9f);
+        //STBIR_ASSERT(total < 1.5f);
+
+        scale = 1 / total;
+
+        for (j = 0; j < num_contributors; j++)
+        {
+            if (i >= contributors[j].n0 && i <= contributors[j].n1)
+                *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0) *= scale;
+            else if (i < contributors[j].n0)
+                break;
+        }
+    }
+
+    // Optimize: Skip zero coefficients and contributions outside of image bounds.
+    // Do this after normalizing because normalization depends on the n0/n1 values.
+    for (j = 0; j < num_contributors; j++)
+    {
+        int range, max, width;
+
+        skip = 0;
+        while (*stbir__get_coefficient(coefficients, filter, scale_ratio, j, skip) == 0)
+            skip++;
+
+        contributors[j].n0 += skip;
+
+        while (contributors[j].n0 < 0)
+        {
+            contributors[j].n0++;
+            skip++;
+        }
+
+        range = contributors[j].n1 - contributors[j].n0 + 1;
+        max = stbir__min(num_coefficients, range);
+
+        width = stbir__get_coefficient_width(filter, scale_ratio);
+        for (i = 0; i < max; i++)
+        {
+            if (i + skip >= width)
+                break;
+
+            *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i) = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i + skip);
+        }
+
+        continue;
+    }
+
+    // Using min to avoid writing into invalid pixels.
+    for (i = 0; i < num_contributors; i++)
+        contributors[i].n1 = stbir__min(contributors[i].n1, output_size - 1);
+}
+
+// Each scan line uses the same kernel values so we should calculate the kernel
+// values once and then we can use them for every scan line.
+static void stbir__calculate_filters(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, float shift, int input_size, int output_size)
+{
+    int n;
+    int total_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
+
+    if (stbir__use_upsampling(scale_ratio))
+    {
+        float out_pixels_radius = stbir__filter_info_table[filter].support(1 / scale_ratio) * scale_ratio;
+
+        // Looping through out pixels
+        for (n = 0; n < total_contributors; n++)
+        {
+            float in_center_of_out; // Center of the current out pixel in the in pixel space
+            int in_first_pixel, in_last_pixel;
+
+            stbir__calculate_sample_range_upsample(n, out_pixels_radius, scale_ratio, shift, &in_first_pixel, &in_last_pixel, &in_center_of_out);
+
+            stbir__calculate_coefficients_upsample(filter, scale_ratio, in_first_pixel, in_last_pixel, in_center_of_out, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
+        }
+    }
+    else
+    {
+        float in_pixels_radius = stbir__filter_info_table[filter].support(scale_ratio) / scale_ratio;
+
+        // Looping through in pixels
+        for (n = 0; n < total_contributors; n++)
+        {
+            float out_center_of_in; // Center of the current out pixel in the in pixel space
+            int out_first_pixel, out_last_pixel;
+            int n_adjusted = n - stbir__get_filter_pixel_margin(filter, scale_ratio);
+
+            stbir__calculate_sample_range_downsample(n_adjusted, in_pixels_radius, scale_ratio, shift, &out_first_pixel, &out_last_pixel, &out_center_of_in);
+
+            stbir__calculate_coefficients_downsample(filter, scale_ratio, out_first_pixel, out_last_pixel, out_center_of_in, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
+        }
+
+        stbir__normalize_downsample_coefficients(contributors, coefficients, filter, scale_ratio, input_size, output_size);
+    }
+}
+
+static float* stbir__get_decode_buffer(ostbir__info* stbir_info)
+{
+    // The 0 index of the decode buffer starts after the margin. This makes
+    // it okay to use negative indexes on the decode buffer.
+    return &stbir_info->decode_buffer[stbir_info->horizontal_filter_pixel_margin * stbir_info->channels];
+}
+
+#define STBIR__DECODE(type, colorspace) ((type) * (STBIR_MAX_COLORSPACES) + (colorspace))
+
+static void stbir__decode_scanline(ostbir__info* stbir_info, int n)
+{
+    int c;
+    int channels = stbir_info->channels;
+    int alpha_channel = stbir_info->alpha_channel;
+    int type = stbir_info->type;
+    int colorspace = stbir_info->colorspace;
+    int input_w = stbir_info->input_w;
+    size_t input_stride_bytes = stbir_info->input_stride_bytes;
+    float* decode_buffer = stbir__get_decode_buffer(stbir_info);
+    stbir_edge edge_horizontal = stbir_info->edge_horizontal;
+    stbir_edge edge_vertical = stbir_info->edge_vertical;
+    size_t in_buffer_row_offset = stbir__edge_wrap(edge_vertical, n, stbir_info->input_h) * input_stride_bytes;
+    const void* input_data = (char *) stbir_info->input_data + in_buffer_row_offset;
+    int max_x = input_w + stbir_info->horizontal_filter_pixel_margin;
+    int decode = STBIR__DECODE(type, colorspace);
+
+    int x = -stbir_info->horizontal_filter_pixel_margin;
+
+    // special handling for STBIR_EDGE_ZERO because it needs to return an item that doesn't appear in the input,
+    // and we want to avoid paying overhead on every pixel if not STBIR_EDGE_ZERO
+    if (edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info->input_h))
+    {
+        for (; x < max_x; x++)
+            for (c = 0; c < channels; c++)
+                decode_buffer[x*channels + c] = 0;
+        return;
+    }
+
+    STBIR_PROFILE_START( );
+    switch (decode)
+    {
+    case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = ((float)((const unsigned char*)input_data)[input_pixel_index + c]) / stbir__max_uint8_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[((const unsigned char*)input_data)[input_pixel_index + c]];
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = ((float)((const unsigned char*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint8_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = ((float)((const unsigned short*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(((float)((const unsigned short*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float);
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = ((float)((const unsigned short*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint16_as_float;
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = (float)(((double)((const unsigned int*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float);
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((float)(((double)((const unsigned int*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float));
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = (float)(((double)((const unsigned int*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint32_as_float);
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = ((const float*)input_data)[input_pixel_index + c];
+        }
+        break;
+
+    case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
+        for (; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+            int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
+            for (c = 0; c < channels; c++)
+                decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(((const float*)input_data)[input_pixel_index + c]);
+
+            if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                decode_buffer[decode_pixel_index + alpha_channel] = ((const float*)input_data)[input_pixel_index + alpha_channel];
+        }
+
+        break;
+
+    default:
+        STBIR_ASSERT(!"Unknown type/colorspace/channels combination.");
+        break;
+    }
+    STBIR_PROFILE_END( decode );
+
+    if (!(stbir_info->flags & STBIR_FLAG_ALPHA_PREMULTIPLIED))
+    {
+        STBIR_PROFILE_START();
+
+        for (x = -stbir_info->horizontal_filter_pixel_margin; x < max_x; x++)
+        {
+            int decode_pixel_index = x * channels;
+
+            // If the alpha value is 0 it will clobber the color values. Make sure it's not.
+            float alpha = decode_buffer[decode_pixel_index + alpha_channel];
+#ifndef STBIR_NO_ALPHA_EPSILON
+            if (stbir_info->type != STBIR_TYPE_FLOAT) {
+                alpha += STBIR_ALPHA_EPSILON;
+                decode_buffer[decode_pixel_index + alpha_channel] = alpha;
+            }
+#endif
+            for (c = 0; c < channels; c++)
+            {
+                if (c == alpha_channel)
+                    continue;
+
+                decode_buffer[decode_pixel_index + c] *= alpha;
+            }
+        }
+        STBIR_PROFILE_END( alpha );
+    }
+
+    if (edge_horizontal == STBIR_EDGE_ZERO)
+    {
+        for (x = -stbir_info->horizontal_filter_pixel_margin; x < 0; x++)
+        {
+            for (c = 0; c < channels; c++)
+                decode_buffer[x*channels + c] = 0;
+        }
+        for (x = input_w; x < max_x; x++)
+        {
+            for (c = 0; c < channels; c++)
+                decode_buffer[x*channels + c] = 0;
+        }
+    }
+}
+
+static float* stbir__get_ring_buffer_entry(float* ring_buffer, int index, int ring_buffer_length)
+{
+    return &ring_buffer[index * ring_buffer_length];
+}
+
+static float* stbir__add_empty_ring_buffer_entry(ostbir__info* stbir_info, int n)
+{
+    int ring_buffer_index;
+    float* ring_buffer;
+
+    stbir_info->ring_buffer_last_scanline = n;
+
+    if (stbir_info->ring_buffer_begin_index < 0)
+    {
+        ring_buffer_index = stbir_info->ring_buffer_begin_index = 0;
+        stbir_info->ring_buffer_first_scanline = n;
+    }
+    else
+    {
+        ring_buffer_index = (stbir_info->ring_buffer_begin_index + (stbir_info->ring_buffer_last_scanline - stbir_info->ring_buffer_first_scanline)) % stbir_info->ring_buffer_num_entries;
+        STBIR_ASSERT(ring_buffer_index != stbir_info->ring_buffer_begin_index);
+    }
+
+    ring_buffer = stbir__get_ring_buffer_entry(stbir_info->ring_buffer, ring_buffer_index, stbir_info->ring_buffer_length_bytes / sizeof(float));
+    
+    memset(ring_buffer, 0, stbir_info->ring_buffer_length_bytes);
+
+    return ring_buffer;
+}
+
+
+static void stbir__resample_horizontal_upsample(ostbir__info* stbir_info, float* output_buffer)
+{
+    int x, k;
+    int output_w = stbir_info->output_w;
+    int channels = stbir_info->channels;
+    float* decode_buffer = stbir__get_decode_buffer(stbir_info);
+    stbir__contributors* horizontal_contributors = stbir_info->horizontal_contributors;
+    float* horizontal_coefficients = stbir_info->horizontal_coefficients;
+    int coefficient_width = stbir_info->horizontal_coefficient_width;
+
+    STBIR_PROFILE_START( );
+    for (x = 0; x < output_w; x++)
+    {
+        int n0 = horizontal_contributors[x].n0;
+        int n1 = horizontal_contributors[x].n1;
+
+        int out_pixel_index = x * channels;
+        int coefficient_group = coefficient_width * x;
+        int coefficient_counter = 0;
+
+        STBIR_ASSERT(n1 >= n0);
+        STBIR_ASSERT(n0 >= -stbir_info->horizontal_filter_pixel_margin);
+        STBIR_ASSERT(n1 >= -stbir_info->horizontal_filter_pixel_margin);
+        STBIR_ASSERT(n0 < stbir_info->input_w + stbir_info->horizontal_filter_pixel_margin);
+        STBIR_ASSERT(n1 < stbir_info->input_w + stbir_info->horizontal_filter_pixel_margin);
+
+        switch (channels) {
+            case 1:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 1;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                }
+                break;
+            case 2:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 2;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                }
+                break;
+            case 3:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 3;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                }
+                break;
+            case 4:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * 4;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                    output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
+                }
+                break;
+            default:
+                for (k = n0; k <= n1; k++)
+                {
+                    int in_pixel_index = k * channels;
+                    float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
+                    int c;
+                    STBIR_ASSERT(coefficient != 0);
+                    for (c = 0; c < channels; c++)
+                        output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
+                }
+                break;
+        }
+    }
+    STBIR_PROFILE_END( horizontal );
+}
+
+static void stbir__resample_horizontal_downsample(ostbir__info* stbir_info, float* output_buffer)
+{
+    int x, k;
+    int input_w = stbir_info->input_w;
+    int channels = stbir_info->channels;
+    float* decode_buffer = stbir__get_decode_buffer(stbir_info);
+    stbir__contributors* horizontal_contributors = stbir_info->horizontal_contributors;
+    float* horizontal_coefficients = stbir_info->horizontal_coefficients;
+    int coefficient_width = stbir_info->horizontal_coefficient_width;
+    int filter_pixel_margin = stbir_info->horizontal_filter_pixel_margin;
+    int max_x = input_w + filter_pixel_margin * 2;
+
+    STBIR_ASSERT(!stbir__use_width_upsampling(stbir_info));
+
+    STBIR_PROFILE_START( );
+    switch (channels) {
+        case 1:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 1;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 1;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                }
+            }
+            break;
+
+        case 2:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 2;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 2;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                }
+            }
+            break;
+
+        case 3:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 3;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 3;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    //STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                }
+            }
+            break;
+
+        case 4:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * 4;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int out_pixel_index = k * 4;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    STBIR_ASSERT(coefficient != 0);
+                    output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
+                    output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
+                    output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
+                    output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
+                }
+            }
+            break;
+
+        default:
+            for (x = 0; x < max_x; x++)
+            {
+                int n0 = horizontal_contributors[x].n0;
+                int n1 = horizontal_contributors[x].n1;
+
+                int in_x = x - filter_pixel_margin;
+                int in_pixel_index = in_x * channels;
+                int max_n = n1;
+                int coefficient_group = coefficient_width * x;
+
+                for (k = n0; k <= max_n; k++)
+                {
+                    int c;
+                    int out_pixel_index = k * channels;
+                    float coefficient = horizontal_coefficients[coefficient_group + k - n0];
+                    STBIR_ASSERT(coefficient != 0);
+                    for (c = 0; c < channels; c++)
+                        output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
+                }
+            }
+            break;
+    }
+    STBIR_PROFILE_END( horizontal );
+}
+
+static void stbir__decode_and_resample_upsample(ostbir__info* stbir_info, int n)
+{
+    // Decode the nth scanline from the source image into the decode buffer.
+    stbir__decode_scanline(stbir_info, n);
+
+    // Now resample it into the ring buffer.
+    if (stbir__use_width_upsampling(stbir_info))
+        stbir__resample_horizontal_upsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
+    else
+        stbir__resample_horizontal_downsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
+
+    // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
+}
+
+static void stbir__decode_and_resample_downsample(ostbir__info* stbir_info, int n)
+{
+    // Decode the nth scanline from the source image into the decode buffer.
+    stbir__decode_scanline(stbir_info, n);
+
+    memset(stbir_info->horizontal_buffer, 0, stbir_info->output_w * stbir_info->channels * sizeof(float));
+
+    // Now resample it into the horizontal buffer.
+    if (stbir__use_width_upsampling(stbir_info))
+        stbir__resample_horizontal_upsample(stbir_info, stbir_info->horizontal_buffer);
+    else
+        stbir__resample_horizontal_downsample(stbir_info, stbir_info->horizontal_buffer);
+
+    // Now it's sitting in the horizontal buffer ready to be distributed into the ring buffers.
+}
+
+// Get the specified scan line from the ring buffer.
+static float* stbir__get_ring_buffer_scanline(int get_scanline, float* ring_buffer, int begin_index, int first_scanline, int ring_buffer_num_entries, int ring_buffer_length)
+{
+    int ring_buffer_index = (begin_index + (get_scanline - first_scanline)) % ring_buffer_num_entries;
+    return stbir__get_ring_buffer_entry(ring_buffer, ring_buffer_index, ring_buffer_length);
+}
+
+
+static void stbir__encode_scanline(ostbir__info* stbir_info, int num_pixels, void *output_buffer, float *encode_buffer, int channels, int alpha_channel, int decode)
+{
+    int x;
+    int n;
+    int num_nonalpha;
+    stbir_uint16 nonalpha[STBIR_MAX_CHANNELS];
+
+   if ((!(stbir_info->flags&STBIR_FLAG_ALPHA_OUT_PREMULTIPLIED))&&(alpha_channel!=-1))
+   {
+        STBIR_PROFILE_START( );
+
+        for (x=0; x < num_pixels; ++x)
+        {
+            int pixel_index = x*channels;
+
+            float alpha = encode_buffer[pixel_index + alpha_channel];
+            float reciprocal_alpha = alpha ? 1.0f / alpha : 0;
+
+            // unrolling this produced a 1% slowdown upscaling a large RGBA linear-space image on my machine - stb
+            for (n = 0; n < channels; n++)
+                if (n != alpha_channel)
+                    encode_buffer[pixel_index + n] *= reciprocal_alpha;
+
+            // We added in a small epsilon to prevent the color channel from being deleted with zero alpha.
+            // Because we only add it for integer types, it will automatically be discarded on integer
+            // conversion, so we don't need to subtract it back out (which would be problematic for
+            // numeric precision reasons).
+        }
+      STBIR_PROFILE_END( unalpha );
+    }
+
+    // build a table of all channels that need colorspace correction, so
+    // we don't perform colorspace correction on channels that don't need it.
+    for (x = 0, num_nonalpha = 0; x < channels; ++x)
+    {
+        if (x != alpha_channel || (stbir_info->flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
+        {
+            nonalpha[num_nonalpha++] = (stbir_uint16)x;
+        }
+    }
+
+    #define STBIR__ROUND_INT(f)    ((int)          ((f)+0.5))
+    #define STBIR__ROUND_UINT(f)   ((stbir_uint32) ((f)+0.5))
+
+    #ifdef STBIR__SATURATE_INT
+    #define STBIR__ENCODE_LINEAR8(f)   stbir__saturate8 (STBIR__ROUND_INT((f) * stbir__max_uint8_as_float ))
+    #define STBIR__ENCODE_LINEAR16(f)  stbir__saturate16(STBIR__ROUND_INT((f) * stbir__max_uint16_as_float))
+    #else
+    #define STBIR__ENCODE_LINEAR8(f)   (unsigned char ) STBIR__ROUND_INT(stbir__saturate(f) * stbir__max_uint8_as_float )
+    #define STBIR__ENCODE_LINEAR16(f)  (unsigned short) STBIR__ROUND_INT(stbir__saturate(f) * stbir__max_uint16_as_float)
+    #endif
+
+    STBIR_PROFILE_START( );
+
+    switch (decode)
+    {
+        case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((unsigned char*)output_buffer)[index] = STBIR__ENCODE_LINEAR8(encode_buffer[index]);
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((unsigned char*)output_buffer)[index] = stbir__linear_to_srgb_uchar(encode_buffer[index]);
+                }
+
+                if (!(stbir_info->flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((unsigned char *)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR8(encode_buffer[pixel_index+alpha_channel]);
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((unsigned short*)output_buffer)[index] = STBIR__ENCODE_LINEAR16(encode_buffer[index]);
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((unsigned short*)output_buffer)[index] = (unsigned short)STBIR__ROUND_INT(stbir__linear_to_srgb(stbir__saturate(encode_buffer[index])) * stbir__max_uint16_as_float);
+                }
+
+                if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((unsigned short*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR16(encode_buffer[pixel_index + alpha_channel]);
+            }
+
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((unsigned int*)output_buffer)[index] = (unsigned int)STBIR__ROUND_UINT(((double)stbir__saturate(encode_buffer[index])) * stbir__max_uint32_as_float);
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    ((unsigned int*)output_buffer)[index] = (unsigned int)STBIR__ROUND_UINT(((double)stbir__linear_to_srgb(stbir__saturate(encode_buffer[index]))) * stbir__max_uint32_as_float);
+                }
+
+                if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                    ((unsigned int*)output_buffer)[pixel_index + alpha_channel] = (unsigned int)STBIR__ROUND_INT(((double)stbir__saturate(encode_buffer[pixel_index + alpha_channel])) * stbir__max_uint32_as_float);
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < channels; n++)
+                {
+                    int index = pixel_index + n;
+                    ((float*)output_buffer)[index] = encode_buffer[index];
+                }
+            }
+            break;
+
+        case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
+            for (x=0; x < num_pixels; ++x)
+            {
+                int pixel_index = x*channels;
+
+                for (n = 0; n < num_nonalpha; n++)
+                {
+                    int index = pixel_index + nonalpha[n];
+                    float p = encode_buffer[index];
+                    if ( p <= 0 ) p = 0; if ( p >= 1.0 ) p = 1.0;
+                    ((float*)output_buffer)[index] = stbir__linear_to_srgb(p);
+                }
+
+                if (!(stbir_info->flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
+                {
+                    float p = encode_buffer[pixel_index + alpha_channel];
+                    if ( p <= 0 ) p = 0; if ( p >= 1.0 ) p = 1.0;
+                    ((float*)output_buffer)[pixel_index + alpha_channel] = p;
+                }
+            }
+            break;
+
+        default:
+            STBIR_ASSERT(!"Unknown type/colorspace/channels combination.");
+            break;
+    }
+    STBIR_PROFILE_END( encode );
+}
+
+static void stbir__resample_vertical_upsample(ostbir__info* stbir_info, int n)
+{
+    int x, k;
+    int output_w = stbir_info->output_w;
+    stbir__contributors* vertical_contributors = stbir_info->vertical_contributors;
+    float* vertical_coefficients = stbir_info->vertical_coefficients;
+    int channels = stbir_info->channels;
+    int alpha_channel = stbir_info->alpha_channel;
+    int type = stbir_info->type;
+    int colorspace = stbir_info->colorspace;
+    int ring_buffer_entries = stbir_info->ring_buffer_num_entries;
+    void* output_data = stbir_info->output_data;
+    float* encode_buffer = stbir_info->encode_buffer;
+    int decode = STBIR__DECODE(type, colorspace);
+    int coefficient_width = stbir_info->vertical_coefficient_width;
+    int coefficient_counter;
+    int contributor = n;
+
+    float* ring_buffer = stbir_info->ring_buffer;
+    int ring_buffer_begin_index = stbir_info->ring_buffer_begin_index;
+    int ring_buffer_first_scanline = stbir_info->ring_buffer_first_scanline;
+    int ring_buffer_length = stbir_info->ring_buffer_length_bytes/sizeof(float);
+
+    int n0,n1, output_row_start;
+    int coefficient_group = coefficient_width * contributor;
+
+    n0 = vertical_contributors[contributor].n0;
+    n1 = vertical_contributors[contributor].n1;
+
+    output_row_start = n * stbir_info->output_stride_bytes;
+
+    STBIR_ASSERT(stbir__use_height_upsampling(stbir_info));
+
+    STBIR_PROFILE_START( );
+
+    memset(encode_buffer, 0, output_w * sizeof(float) * channels);
+
+    // I tried reblocking this for better cache usage of encode_buffer
+    // (using x_outer, k, x_inner), but it lost speed. -- stb
+
+    coefficient_counter = 0;
+    switch (channels) {
+        case 1:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 1;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                }
+            }
+            break;
+        case 2:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 2;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                    encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
+                }
+            }
+            break;
+        case 3:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 3;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                    encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
+                    encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
+                }
+            }
+            break;
+        case 4:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * 4;
+                    encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
+                    encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
+                    encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
+                    encode_buffer[in_pixel_index + 3] += ring_buffer_entry[in_pixel_index + 3] * coefficient;
+                }
+            }
+            break;
+        default:
+            for (k = n0; k <= n1; k++)
+            {
+                int coefficient_index = coefficient_counter++;
+                float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+                float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+                for (x = 0; x < output_w; ++x)
+                {
+                    int in_pixel_index = x * channels;
+                    int c;
+                    for (c = 0; c < channels; c++)
+                        encode_buffer[in_pixel_index + c] += ring_buffer_entry[in_pixel_index + c] * coefficient;
+                }
+            }
+            break;
+    }
+    STBIR_PROFILE_END( vertical );
+    stbir__encode_scanline(stbir_info, output_w, (char *) output_data + output_row_start, encode_buffer, channels, alpha_channel, decode);
+}
+
+static void stbir__resample_vertical_downsample(ostbir__info* stbir_info, int n)
+{
+    int x, k;
+    int output_w = stbir_info->output_w;
+    stbir__contributors* vertical_contributors = stbir_info->vertical_contributors;
+    float* vertical_coefficients = stbir_info->vertical_coefficients;
+    int channels = stbir_info->channels;
+    int ring_buffer_entries = stbir_info->ring_buffer_num_entries;
+    float* horizontal_buffer = stbir_info->horizontal_buffer;
+    int coefficient_width = stbir_info->vertical_coefficient_width;
+    int contributor = n + stbir_info->vertical_filter_pixel_margin;
+
+    float* ring_buffer = stbir_info->ring_buffer;
+    int ring_buffer_begin_index = stbir_info->ring_buffer_begin_index;
+    int ring_buffer_first_scanline = stbir_info->ring_buffer_first_scanline;
+    int ring_buffer_length = stbir_info->ring_buffer_length_bytes/sizeof(float);
+    int n0,n1;
+
+    n0 = vertical_contributors[contributor].n0;
+    n1 = vertical_contributors[contributor].n1;
+
+    STBIR_ASSERT(!stbir__use_height_upsampling(stbir_info));
+
+    STBIR_PROFILE_START( );
+    for (k = n0; k <= n1; k++)
+    {
+        int coefficient_index = k - n0;
+        int coefficient_group = coefficient_width * contributor;
+        float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
+
+        float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
+
+        switch (channels) {
+            case 1:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 1;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                }
+                break;
+            case 2:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 2;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
+                }
+                break;
+            case 3:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 3;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
+                }
+                break;
+            case 4:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * 4;
+                    ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
+                    ring_buffer_entry[in_pixel_index + 3] += horizontal_buffer[in_pixel_index + 3] * coefficient;
+                }
+                break;
+            default:
+                for (x = 0; x < output_w; x++)
+                {
+                    int in_pixel_index = x * channels;
+
+                    int c;
+                    for (c = 0; c < channels; c++)
+                        ring_buffer_entry[in_pixel_index + c] += horizontal_buffer[in_pixel_index + c] * coefficient;
+                }
+                break;
+        }
+    }
+    STBIR_PROFILE_END( vertical );
+}
+
+static void stbir__buffer_loop_upsample(ostbir__info* stbir_info)
+{
+    int y;
+    float scale_ratio = stbir_info->vertical_scale;
+    float out_scanlines_radius = stbir__filter_info_table[stbir_info->vertical_filter].support(1/scale_ratio) * scale_ratio;
+
+    STBIR_ASSERT(stbir__use_height_upsampling(stbir_info));
+
+    for (y = 0; y < stbir_info->output_h; y++)
+    {
+        float in_center_of_out = 0; // Center of the current out scanline in the in scanline space
+        int in_first_scanline = 0, in_last_scanline = 0;
+
+        stbir__calculate_sample_range_upsample(y, out_scanlines_radius, scale_ratio, stbir_info->vertical_shift, &in_first_scanline, &in_last_scanline, &in_center_of_out);
+
+        STBIR_ASSERT(in_last_scanline - in_first_scanline + 1 <= stbir_info->ring_buffer_num_entries);
+
+        if (stbir_info->ring_buffer_begin_index >= 0)
+        {
+            // Get rid of whatever we don't need anymore.
+            while (in_first_scanline > stbir_info->ring_buffer_first_scanline)
+            {
+                if (stbir_info->ring_buffer_first_scanline == stbir_info->ring_buffer_last_scanline)
+                {
+                    // We just popped the last scanline off the ring buffer.
+                    // Reset it to the empty state.
+                    stbir_info->ring_buffer_begin_index = -1;
+                    stbir_info->ring_buffer_first_scanline = 0;
+                    stbir_info->ring_buffer_last_scanline = 0;
+                    break;
+                }
+                else
+                {
+                    stbir_info->ring_buffer_first_scanline++;
+                    stbir_info->ring_buffer_begin_index = (stbir_info->ring_buffer_begin_index + 1) % stbir_info->ring_buffer_num_entries;
+                }
+            }
+        }
+
+        // Load in new ones.
+        if (stbir_info->ring_buffer_begin_index < 0)
+            stbir__decode_and_resample_upsample(stbir_info, in_first_scanline);
+
+        while (in_last_scanline > stbir_info->ring_buffer_last_scanline)
+            stbir__decode_and_resample_upsample(stbir_info, stbir_info->ring_buffer_last_scanline + 1);
+
+        // Now all buffers should be ready to write a row of vertical sampling.
+        stbir__resample_vertical_upsample(stbir_info, y);
+
+        STBIR_PROGRESS_REPORT((float)y / stbir_info->output_h);
+    }
+}
+
+static void stbir__empty_ring_buffer(ostbir__info* stbir_info, int first_necessary_scanline)
+{
+    int output_stride_bytes = stbir_info->output_stride_bytes;
+    int channels = stbir_info->channels;
+    int alpha_channel = stbir_info->alpha_channel;
+    int type = stbir_info->type;
+    int colorspace = stbir_info->colorspace;
+    int output_w = stbir_info->output_w;
+    void* output_data = stbir_info->output_data;
+    int decode = STBIR__DECODE(type, colorspace);
+
+    float* ring_buffer = stbir_info->ring_buffer;
+    int ring_buffer_length = stbir_info->ring_buffer_length_bytes/sizeof(float);
+
+    if (stbir_info->ring_buffer_begin_index >= 0)
+    {
+        // Get rid of whatever we don't need anymore.
+        while (first_necessary_scanline > stbir_info->ring_buffer_first_scanline)
+        {
+            if (stbir_info->ring_buffer_first_scanline >= 0 && stbir_info->ring_buffer_first_scanline < stbir_info->output_h)
+            {
+                int output_row_start = stbir_info->ring_buffer_first_scanline * output_stride_bytes;
+                float* ring_buffer_entry = stbir__get_ring_buffer_entry(ring_buffer, stbir_info->ring_buffer_begin_index, ring_buffer_length);
+                stbir__encode_scanline(stbir_info, output_w, (char *) output_data + output_row_start, ring_buffer_entry, channels, alpha_channel, decode);
+                STBIR_PROGRESS_REPORT((float)stbir_info->ring_buffer_first_scanline / stbir_info->output_h);
+            }
+
+            if (stbir_info->ring_buffer_first_scanline == stbir_info->ring_buffer_last_scanline)
+            {
+                // We just popped the last scanline off the ring buffer.
+                // Reset it to the empty state.
+                stbir_info->ring_buffer_begin_index = -1;
+                stbir_info->ring_buffer_first_scanline = 0;
+                stbir_info->ring_buffer_last_scanline = 0;
+                break;
+            }
+            else
+            {
+                stbir_info->ring_buffer_first_scanline++;
+                stbir_info->ring_buffer_begin_index = (stbir_info->ring_buffer_begin_index + 1) % stbir_info->ring_buffer_num_entries;
+            }
+        }
+    }
+}
+
+static void stbir__buffer_loop_downsample(ostbir__info* stbir_info)
+{
+    int y;
+    float scale_ratio = stbir_info->vertical_scale;
+    int output_h = stbir_info->output_h;
+    float in_pixels_radius = stbir__filter_info_table[stbir_info->vertical_filter].support(scale_ratio) / scale_ratio;
+    int pixel_margin = stbir_info->vertical_filter_pixel_margin;
+    int max_y = stbir_info->input_h + pixel_margin;
+
+    STBIR_ASSERT(!stbir__use_height_upsampling(stbir_info));
+
+    for (y = -pixel_margin; y < max_y; y++)
+    {
+        float out_center_of_in; // Center of the current out scanline in the in scanline space
+        int out_first_scanline, out_last_scanline;
+
+        stbir__calculate_sample_range_downsample(y, in_pixels_radius, scale_ratio, stbir_info->vertical_shift, &out_first_scanline, &out_last_scanline, &out_center_of_in);
+
+        STBIR_ASSERT(out_last_scanline - out_first_scanline + 1 <= stbir_info->ring_buffer_num_entries);
+
+        if (out_last_scanline < 0 || out_first_scanline >= output_h)
+            continue;
+
+        stbir__empty_ring_buffer(stbir_info, out_first_scanline);
+
+        stbir__decode_and_resample_downsample(stbir_info, y);
+
+        // Load in new ones.
+        if (stbir_info->ring_buffer_begin_index < 0)
+            stbir__add_empty_ring_buffer_entry(stbir_info, out_first_scanline);
+
+        while (out_last_scanline > stbir_info->ring_buffer_last_scanline)
+            stbir__add_empty_ring_buffer_entry(stbir_info, stbir_info->ring_buffer_last_scanline + 1);
+
+        // Now the horizontal buffer is ready to write to all ring buffer rows.
+        stbir__resample_vertical_downsample(stbir_info, y);
+    }
+
+    stbir__empty_ring_buffer(stbir_info, stbir_info->output_h);
+}
+
+static void stbir__setup(ostbir__info *info, int input_w, int input_h, int output_w, int output_h, int channels)
+{
+    info->input_w = input_w;
+    info->input_h = input_h;
+    info->output_w = output_w;
+    info->output_h = output_h;
+    info->channels = channels;
+}
+
+static void stbir__calculate_transform(ostbir__info *info, float s0, float t0, float s1, float t1, float *transform)
+{
+    info->s0 = s0;
+    info->t0 = t0;
+    info->s1 = s1;
+    info->t1 = t1;
+
+    if (transform)
+    {
+        info->horizontal_scale = transform[0];
+        info->vertical_scale   = transform[1];
+        info->horizontal_shift = transform[2];
+        info->vertical_shift   = transform[3];
+    }
+    else
+    {
+        info->horizontal_scale = ((float)info->output_w / info->input_w) / (s1 - s0);
+        info->vertical_scale = ((float)info->output_h / info->input_h) / (t1 - t0);
+
+        info->horizontal_shift = s0 * info->output_w / (s1 - s0);
+        info->vertical_shift = t0 * info->output_h / (t1 - t0);
+    }
+}
+
+static void stbir__choose_filter(ostbir__info *info, stbir_filter h_filter, stbir_filter v_filter)
+{
+    if (h_filter == 0)
+        h_filter = stbir__use_upsampling(info->horizontal_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
+    if (v_filter == 0)
+        v_filter = stbir__use_upsampling(info->vertical_scale)   ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
+    info->horizontal_filter = h_filter;
+    info->vertical_filter = v_filter;
+}
+
+static stbir_uint32 stbir__calculate_memory(ostbir__info *info)
+{
+    int pixel_margin = stbir__get_filter_pixel_margin(info->horizontal_filter, info->horizontal_scale);
+    int filter_height = stbir__get_filter_pixel_width(info->vertical_filter, info->vertical_scale);
+
+    info->horizontal_num_contributors = stbir__get_contributors(info->horizontal_scale, info->horizontal_filter, info->input_w, info->output_w);
+    info->vertical_num_contributors   = stbir__get_contributors(info->vertical_scale  , info->vertical_filter  , info->input_h, info->output_h);
+
+    // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
+    info->ring_buffer_num_entries = filter_height + 1;
+
+    info->horizontal_contributors_size = info->horizontal_num_contributors * sizeof(stbir__contributors);
+    info->horizontal_coefficients_size = stbir__get_total_horizontal_coefficients(info) * sizeof(float);
+    info->vertical_contributors_size = info->vertical_num_contributors * sizeof(stbir__contributors);
+    info->vertical_coefficients_size = stbir__get_total_vertical_coefficients(info) * sizeof(float);
+    info->decode_buffer_size = (info->input_w + pixel_margin * 2) * info->channels * sizeof(float);
+    info->horizontal_buffer_size = info->output_w * info->channels * sizeof(float);
+    info->ring_buffer_size = info->output_w * info->channels * info->ring_buffer_num_entries * sizeof(float);
+    info->encode_buffer_size = info->output_w * info->channels * sizeof(float);
+
+    STBIR_ASSERT(info->horizontal_filter != 0);
+    STBIR_ASSERT(info->horizontal_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table)); // this now happens too late
+    STBIR_ASSERT(info->vertical_filter != 0);
+    STBIR_ASSERT(info->vertical_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table)); // this now happens too late
+
+    if (stbir__use_height_upsampling(info))
+        // The horizontal buffer is for when we're downsampling the height and we
+        // can't output the result of sampling the decode buffer directly into the
+        // ring buffers.
+        info->horizontal_buffer_size = 0;
+    else
+        // The encode buffer is to retain precision in the height upsampling method
+        // and isn't used when height downsampling.
+        info->encode_buffer_size = 0;
+
+    return info->horizontal_contributors_size + info->horizontal_coefficients_size
+        + info->vertical_contributors_size + info->vertical_coefficients_size
+        + info->decode_buffer_size + info->horizontal_buffer_size
+        + info->ring_buffer_size + info->encode_buffer_size;
+}
+
+static int stbir__resize_allocated(ostbir__info *info,
+    const void* input_data, int input_stride_in_bytes,
+    void* output_data, int output_stride_in_bytes,
+    int alpha_channel, stbir_uint32 flags, stbir_datatype type,
+    stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace,
+    void* tempmem, size_t tempmem_size_in_bytes)
+{
+    size_t memory_required = stbir__calculate_memory(info);
+
+    int width_stride_input = input_stride_in_bytes ? input_stride_in_bytes : info->channels * info->input_w * stbir__type_size[type];
+    int width_stride_output = output_stride_in_bytes ? output_stride_in_bytes : info->channels * info->output_w * stbir__type_size[type];
+
+#ifdef STBIR_DEBUG_OVERWRITE_TEST
+#define OVERWRITE_ARRAY_SIZE 8
+    unsigned char overwrite_output_before_pre[OVERWRITE_ARRAY_SIZE];
+    unsigned char overwrite_tempmem_before_pre[OVERWRITE_ARRAY_SIZE];
+    unsigned char overwrite_output_after_pre[OVERWRITE_ARRAY_SIZE];
+    unsigned char overwrite_tempmem_after_pre[OVERWRITE_ARRAY_SIZE];
+
+    size_t begin_forbidden = width_stride_output * (info->output_h - 1) + info->output_w * info->channels * stbir__type_size[type];
+    memcpy(overwrite_output_before_pre, &((unsigned char*)output_data)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE);
+    memcpy(overwrite_output_after_pre, &((unsigned char*)output_data)[begin_forbidden], OVERWRITE_ARRAY_SIZE);
+    memcpy(overwrite_tempmem_before_pre, &((unsigned char*)tempmem)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE);
+    memcpy(overwrite_tempmem_after_pre, &((unsigned char*)tempmem)[tempmem_size_in_bytes], OVERWRITE_ARRAY_SIZE);
+#endif
+
+    STBIR_ASSERT(info->channels >= 0);
+    STBIR_ASSERT(info->channels <= STBIR_MAX_CHANNELS);
+
+    if (info->channels < 0 || info->channels > STBIR_MAX_CHANNELS)
+        return 0;
+
+    STBIR_ASSERT(info->horizontal_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table));
+    STBIR_ASSERT(info->vertical_filter < STBIR__ARRAY_SIZE(stbir__filter_info_table));
+
+    if (info->horizontal_filter >= STBIR__ARRAY_SIZE(stbir__filter_info_table))
+        return 0;
+    if (info->vertical_filter >= STBIR__ARRAY_SIZE(stbir__filter_info_table))
+        return 0;
+
+    if (alpha_channel < 0)
+        flags |= STBIR_FLAG_ALPHA_USES_COLORSPACE | STBIR_FLAG_ALPHA_PREMULTIPLIED;
+
+    if (!(flags&STBIR_FLAG_ALPHA_USES_COLORSPACE) || !(flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)) {
+        STBIR_ASSERT(alpha_channel >= 0 && alpha_channel < info->channels);
+    }
+
+    if (alpha_channel >= info->channels)
+        return 0;
+
+    STBIR_ASSERT(tempmem);
+
+    if (!tempmem)
+        return 0;
+
+    STBIR_ASSERT(tempmem_size_in_bytes >= memory_required);
+
+    if (tempmem_size_in_bytes < memory_required)
+        return 0;
+
+    memset(tempmem, 0, tempmem_size_in_bytes);
+
+    info->input_data = input_data;
+    info->input_stride_bytes = width_stride_input;
+
+    info->output_data = output_data;
+    info->output_stride_bytes = width_stride_output;
+
+    info->alpha_channel = alpha_channel;
+    info->flags = flags;
+    info->type = type;
+    info->edge_horizontal = edge_horizontal;
+    info->edge_vertical = edge_vertical;
+    info->colorspace = colorspace;
+
+    STBIR_PROFILE_START();
+
+    info->horizontal_coefficient_width   = stbir__get_coefficient_width  (info->horizontal_filter, info->horizontal_scale);
+    info->vertical_coefficient_width     = stbir__get_coefficient_width  (info->vertical_filter  , info->vertical_scale  );
+    info->horizontal_filter_pixel_width  = stbir__get_filter_pixel_width (info->horizontal_filter, info->horizontal_scale);
+    info->vertical_filter_pixel_width    = stbir__get_filter_pixel_width (info->vertical_filter  , info->vertical_scale  );
+    info->horizontal_filter_pixel_margin = stbir__get_filter_pixel_margin(info->horizontal_filter, info->horizontal_scale);
+    info->vertical_filter_pixel_margin   = stbir__get_filter_pixel_margin(info->vertical_filter  , info->vertical_scale  );
+
+    info->ring_buffer_length_bytes = info->output_w * info->channels * sizeof(float);
+    info->decode_buffer_pixels = info->input_w + info->horizontal_filter_pixel_margin * 2;
+
+#define STBIR__NEXT_MEMPTR(current, newtype) (newtype*)(((unsigned char*)current) + current##_size)
+
+    info->horizontal_contributors = (stbir__contributors *) tempmem;
+    info->horizontal_coefficients = STBIR__NEXT_MEMPTR(info->horizontal_contributors, float);
+    info->vertical_contributors = STBIR__NEXT_MEMPTR(info->horizontal_coefficients, stbir__contributors);
+    info->vertical_coefficients = STBIR__NEXT_MEMPTR(info->vertical_contributors, float);
+    info->decode_buffer = STBIR__NEXT_MEMPTR(info->vertical_coefficients, float);
+
+    if (stbir__use_height_upsampling(info))
+    {
+        info->horizontal_buffer = NULL;
+        info->ring_buffer = STBIR__NEXT_MEMPTR(info->decode_buffer, float);
+        info->encode_buffer = STBIR__NEXT_MEMPTR(info->ring_buffer, float);
+
+        STBIR_ASSERT((size_t)STBIR__NEXT_MEMPTR(info->encode_buffer, unsigned char) == (size_t)tempmem + tempmem_size_in_bytes);
+    }
+    else
+    {
+        info->horizontal_buffer = STBIR__NEXT_MEMPTR(info->decode_buffer, float);
+        info->ring_buffer = STBIR__NEXT_MEMPTR(info->horizontal_buffer, float);
+        info->encode_buffer = NULL;
+
+        STBIR_ASSERT((size_t)STBIR__NEXT_MEMPTR(info->ring_buffer, unsigned char) == (size_t)tempmem + tempmem_size_in_bytes);
+    }
+
+#undef STBIR__NEXT_MEMPTR
+
+    // This signals that the ring buffer is empty
+    info->ring_buffer_begin_index = -1;
+
+    stbir__calculate_filters(info->horizontal_contributors, info->horizontal_coefficients, info->horizontal_filter, info->horizontal_scale, info->horizontal_shift, info->input_w, info->output_w);
+    stbir__calculate_filters(info->vertical_contributors, info->vertical_coefficients, info->vertical_filter, info->vertical_scale, info->vertical_shift, info->input_h, info->output_h);
+    STBIR_PROFILE_END( filters );
+
+    STBIR_PROGRESS_REPORT(0);
+
+    STBIR_PROFILE_START();
+    if (stbir__use_height_upsampling(info))
+    {
+        stbir__buffer_loop_upsample(info);
+    }
+    else
+    {
+        stbir__buffer_loop_downsample(info);
+    }
+    STBIR_PROFILE_END( looping );
+
+
+    STBIR_PROGRESS_REPORT(1);
+
+#ifdef STBIR_DEBUG_OVERWRITE_TEST
+    STBIR_ASSERT(memcmp(overwrite_output_before_pre, &((unsigned char*)output_data)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE) == 0);
+    STBIR_ASSERT(memcmp(overwrite_output_after_pre, &((unsigned char*)output_data)[begin_forbidden], OVERWRITE_ARRAY_SIZE) == 0);
+    STBIR_ASSERT(memcmp(overwrite_tempmem_before_pre, &((unsigned char*)tempmem)[-OVERWRITE_ARRAY_SIZE], OVERWRITE_ARRAY_SIZE) == 0);
+    STBIR_ASSERT(memcmp(overwrite_tempmem_after_pre, &((unsigned char*)tempmem)[tempmem_size_in_bytes], OVERWRITE_ARRAY_SIZE) == 0);
+#endif
+
+    return 1;
+}
+
+
+static int stbir__resize_arbitrary(
+    void *alloc_context,
+    const void* input_data, int input_w, int input_h, int input_stride_in_bytes,
+    void* output_data, int output_w, int output_h, int output_stride_in_bytes,
+    float s0, float t0, float s1, float t1, float *transform,
+    int channels, int alpha_channel, stbir_uint32 flags, stbir_datatype type,
+    stbir_filter h_filter, stbir_filter v_filter,
+    stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace)
+{
+    ostbir__info info;
+    int result;
+    size_t memory_required;
+    void* extra_memory;
+    
+    STBIR_PROFILE_FIRST_START();
+
+    stbir__setup(&info, input_w, input_h, output_w, output_h, channels);
+    stbir__calculate_transform(&info, s0,t0,s1,t1,transform);
+    stbir__choose_filter(&info, h_filter, v_filter);
+    memory_required = stbir__calculate_memory(&info);
+    extra_memory = STBIR_MALLOC(memory_required, alloc_context);
+
+    if (!extra_memory)
+    {
+        return 0;
+    }
+
+    result = stbir__resize_allocated(&info, input_data, input_stride_in_bytes,
+                                            output_data, output_stride_in_bytes, 
+                                            alpha_channel, flags, type,
+                                            edge_horizontal, edge_vertical,
+                                            colorspace, extra_memory, memory_required);
+
+    STBIR_PROFILE_END( setup);
+
+    STBIR_FREE(extra_memory, alloc_context);
+
+    return result;
+}
+
+STBIRDEF int stbir_resize_uint8(     const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,-1,0, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
+}
+
+STBIRDEF int stbir_resize_float(     const float *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           float *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,-1,0, STBIR_TYPE_FLOAT, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
+}
+
+STBIRDEF int stbir_resize_uint8_srgb(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                           unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                     int num_channels, int alpha_channel, int flags)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB);
+}
+
+STBIRDEF int stbir_resize_uint8_srgb_edgemode(const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                                    unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                              int num_channels, int alpha_channel, int flags,
+                                              stbir_edge edge_wrap_mode)
+{
+    return stbir__resize_arbitrary(NULL, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
+        edge_wrap_mode, edge_wrap_mode, STBIR_COLORSPACE_SRGB);
+}
+
+STBIRDEF int stbir_resize_uint8_generic( const unsigned char *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                               unsigned char *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space, 
+                                         void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
+        edge_wrap_mode, edge_wrap_mode, space);
+}
+
+STBIRDEF int stbir_resize_uint16_generic(const stbir_uint16 *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
+                                               stbir_uint16 *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space, 
+                                         void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_UINT16, filter, filter,
+        edge_wrap_mode, edge_wrap_mode, space);
+}
+
+
+STBIRDEF int stbir_resize_float_generic( const float *input_pixels         , int input_w , int input_h , int input_stride_in_bytes,
+                                               float *output_pixels        , int output_w, int output_h, int output_stride_in_bytes,
+                                         int num_channels, int alpha_channel, int flags,
+                                         stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space, 
+                                         void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, STBIR_TYPE_FLOAT, filter, filter,
+        edge_wrap_mode, edge_wrap_mode, space);
+}
+
+
+STBIRDEF int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical, 
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,NULL,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
+        edge_mode_horizontal, edge_mode_vertical, space);
+}
+
+
+STBIRDEF int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical, 
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float x_scale, float y_scale,
+                                   float x_offset, float y_offset)
+{
+    float transform[4];
+    transform[0] = x_scale;
+    transform[1] = y_scale;
+    transform[2] = x_offset;
+    transform[3] = y_offset;
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        0,0,1,1,transform,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
+        edge_mode_horizontal, edge_mode_vertical, space);
+}
+
+STBIRDEF int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
+                                         void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
+                                   stbir_datatype datatype,
+                                   int num_channels, int alpha_channel, int flags,
+                                   stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical, 
+                                   stbir_filter filter_horizontal,  stbir_filter filter_vertical,
+                                   stbir_colorspace space, void *alloc_context,
+                                   float s0, float t0, float s1, float t1)
+{
+    return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
+        output_pixels, output_w, output_h, output_stride_in_bytes,
+        s0,t0,s1,t1,NULL,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
+        edge_mode_horizontal, edge_mode_vertical, space);
+}
+
+#endif // STB_IMAGE_RESIZE_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of 
+this software and associated documentation files (the "Software"), to deal in 
+the Software without restriction, including without limitation the rights to 
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 
+of the Software, and to permit persons to whom the Software is furnished to do 
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all 
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this 
+software, either in source code form or as a compiled binary, for any purpose, 
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this 
+software dedicate any and all copyright interest in the software to the public 
+domain. We make this dedication for the benefit of the public at large and to 
+the detriment of our heirs and successors. We intend this dedication to be an 
+overt act of relinquishment in perpetuity of all present and future rights to 
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_image_resize_test/oldir.c b/vendor/stb/stb_image_resize_test/oldir.c
new file mode 100644
index 0000000..e1f3505
--- /dev/null
+++ b/vendor/stb/stb_image_resize_test/oldir.c
@@ -0,0 +1,56 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef _MSC_VER
+#define stop() __debugbreak()
+#else
+#define stop() __builtin_trap()
+#endif
+
+//#define HEAVYTM
+#include "tm.h"
+
+#define STBIR_SATURATE_INT
+#define STB_IMAGE_RESIZE_STATIC
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#include "old_image_resize.h"  
+
+
+static int types[4] =    { STBIR_TYPE_UINT8, STBIR_TYPE_UINT8, STBIR_TYPE_UINT16, STBIR_TYPE_FLOAT };
+static int edges[4] =    { STBIR_EDGE_CLAMP, STBIR_EDGE_REFLECT, STBIR_EDGE_ZERO, STBIR_EDGE_WRAP };
+static int flts[5] =     { STBIR_FILTER_BOX, STBIR_FILTER_TRIANGLE, STBIR_FILTER_CUBICBSPLINE, STBIR_FILTER_CATMULLROM, STBIR_FILTER_MITCHELL };
+static int channels[20] = { 1, 2, 3, 4,      4,4,  2,2,  4,4, 2,2,  4,4, 2,2,  4,4, 2,2 }; 
+static int alphapos[20] = { -1, -1, -1, -1,  3,0,  1,0,   3,0,  1,0,   3,0,  1,0,   3,0,  1,0 }; 
+
+
+void oresize( void * o, int ox, int oy, int op, void * i, int ix, int iy, int ip, int buf, int type, int edg, int flt )
+{
+  int t = types[type];
+  int ic = channels[buf];
+  int alpha = alphapos[buf];
+  int e = edges[edg];
+  int f = flts[flt];
+  int space = ( type == 1 ) ? STBIR_COLORSPACE_SRGB : 0;
+  int flags = ( buf >= 16 ) ? STBIR_FLAG_ALPHA_PREMULTIPLIED : ( ( buf >= 12 ) ? STBIR_FLAG_ALPHA_OUT_PREMULTIPLIED : ( ( buf >= 8 ) ? (STBIR_FLAG_ALPHA_PREMULTIPLIED|STBIR_FLAG_ALPHA_OUT_PREMULTIPLIED) : 0 ) );
+  stbir_uint64 start;
+
+  ENTER( "Resize (old)" );
+  start = tmGetAccumulationStart( tm_mask );
+
+  if(!stbir_resize( i, ix, iy, ip, o, ox, oy, op, t, ic, alpha, flags, e, e, f, f, space, 0 ) )
+    stop();
+
+  #ifdef STBIR_PROFILE
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.setup, "Setup (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.filters, "Filters (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.looping, "Looping (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.vertical, "Vertical (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.horizontal, "Horizontal (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.decode, "Scanline input (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.encode, "Scanline output (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.alpha, "Alpha weighting (old)" );
+  tmEmitAccumulationZone( 0, 0, (tm_uint64 *)&start, 0, oldprofile.named.unalpha, "Alpha unweighting (old)" );
+  #endif
+
+  LEAVE();
+}
diff --git a/vendor/stb/stb_image_resize_test/stbirtest.c b/vendor/stb/stb_image_resize_test/stbirtest.c
new file mode 100644
index 0000000..22e1b82
--- /dev/null
+++ b/vendor/stb/stb_image_resize_test/stbirtest.c
@@ -0,0 +1,992 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+//#define HEAVYTM
+#include "tm.h"
+
+#ifdef RADUSETM3
+tm_api * g_tm_api;
+//#define PROFILE_MODE
+#endif
+
+#include <math.h>
+
+#ifdef _MSC_VER
+#define stop() __debugbreak()
+#include <windows.h>
+#define int64 __int64
+#define uint64 unsigned __int64
+#else
+#define stop() __builtin_trap()
+#define int64 long long
+#define uint64 unsigned long long
+#endif
+
+#ifdef _MSC_VER
+#pragma warning(disable:4127)
+#endif
+
+//#define NOCOMP
+
+
+//#define PROFILE_NEW_ONLY
+//#define PROFILE_MODE
+
+
+#if defined(_x86_64) || defined( __x86_64__ ) || defined( _M_X64 ) || defined(__x86_64) || defined(__SSE2__) || defined(STBIR_SSE) || defined( _M_IX86_FP ) || defined(__i386) || defined( __i386__ ) || defined( _M_IX86 ) || defined( _X86_ )
+
+#ifdef _MSC_VER
+
+  uint64 __rdtsc();
+  #define __cycles() __rdtsc()
+
+#else // non msvc
+
+  static inline uint64 __cycles() 
+  {
+    unsigned int lo, hi;
+    asm volatile ("rdtsc" : "=a" (lo), "=d" (hi) );
+    return ( ( (uint64) hi ) << 32 ) | ( (uint64) lo );
+  }
+
+#endif  // msvc
+
+#elif defined( _M_ARM64 ) || defined( __aarch64__ ) || defined( __arm64__ ) || defined(__ARM_NEON__) 
+
+#ifdef _MSC_VER
+
+  #define __cycles() _ReadStatusReg(ARM64_CNTVCT)
+
+#else
+
+  static inline uint64 __cycles()
+  {
+    uint64 tsc;
+    asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
+    return tsc;
+  }
+
+#endif
+
+#else // x64, arm
+
+#error Unknown platform for timing.
+
+#endif  //x64 and   
+
+
+#ifdef PROFILE_MODE
+
+#define STBIR_ASSERT(cond)
+
+#endif
+
+#ifdef _DEBUG
+#undef STBIR_ASSERT
+#define STBIR_ASSERT(cond) { if (!(cond)) stop(); }
+#endif
+
+
+#define SHRINKBYW 2
+#define ZOOMBYW 2
+#define SHRINKBYH 2
+#define ZOOMBYH 2
+
+
+int mem_count = 0;
+
+#ifdef TEST_WITH_VALLOC
+
+#define STBIR__SEPARATE_ALLOCATIONS
+
+#if TEST_WITH_LIMIT_AT_FRONT
+
+  void * wmalloc(SIZE_T size)
+  {
+     static unsigned int pagesize=0;
+     void* p;
+     SIZE_T s;
+
+     // get the page size, if we haven't yet
+     if (pagesize==0)
+     {
+       SYSTEM_INFO si;
+       GetSystemInfo(&si);
+       pagesize=si.dwPageSize;
+     }
+
+     // we need room for the size, 8 bytes to hide the original pointer and a
+     //   validation dword, and enough data to completely fill one page
+     s=(size+(pagesize-1))&~(pagesize-1);
+
+     // allocate the size plus a page (for the guard)
+     p=VirtualAlloc(0,(SIZE_T)s,MEM_RESERVE|MEM_COMMIT,PAGE_READWRITE);
+     
+     return p;
+   }
+
+  void wfree(void * ptr)
+  {
+    if (ptr)
+    {
+      if ( ((ptrdiff_t)ptr) & 4095 ) stop();
+      if ( VirtualFree(ptr,0,MEM_RELEASE) == 0 ) stop();
+    }
+  }
+
+#else
+
+  void * wmalloc(SIZE_T size)
+  {
+     static unsigned int pagesize=0;
+     void* p;
+     SIZE_T s;
+
+     // get the page size, if we haven't yet
+     if (pagesize==0)
+     {
+       SYSTEM_INFO si;
+       GetSystemInfo(&si);
+       pagesize=si.dwPageSize;
+     }
+
+     // we need room for the size, 8 bytes to hide the original pointer and a
+     //   validation dword, and enough data to completely fill one page
+     s=(size+16+(pagesize-1))&~(pagesize-1);
+
+     // allocate the size plus a page (for the guard)
+     p=VirtualAlloc(0,(SIZE_T)(s+pagesize+pagesize),MEM_RESERVE|MEM_COMMIT,PAGE_READWRITE);
+
+     if (p)
+     {
+       DWORD oldprot;
+       void* orig=p;
+
+       // protect the first page
+       VirtualProtect(((char*)p),pagesize,PAGE_NOACCESS,&oldprot);
+
+       // protect the final page
+       VirtualProtect(((char*)p)+s+pagesize,pagesize,PAGE_NOACCESS,&oldprot);
+
+       // now move the returned pointer so that it bumps right up against the
+       //   the next (protected) page (this may result in unaligned return
+       //   addresses - pre-align the sizes if you always want aligned ptrs)
+//#define ERROR_ON_FRONT
+#ifdef ERROR_ON_FRONT
+       p=((char*)p)+pagesize+16;
+#else
+       p=((char*)p)+(s-size)+pagesize;
+#endif
+
+       // hide the validation value and the original pointer (which we'll
+       //   need used for freeing) right behind the returned pointer
+       ((unsigned int*)p)[-1]=0x98765432;
+       ((void**)p)[-2]=orig;
+       ++mem_count;
+//printf("aloc: %p bytes: %d\n",p,(int)size);
+       return(p);
+     }
+
+     return 0;
+  }
+
+  void wfree(void * ptr)
+  {
+    if (ptr)
+    {
+      int err=0;
+
+      // is this one of our allocations?
+      if (((((unsigned int*)ptr)[-1])!=0x98765432) || ((((void**)ptr)[-2])==0))
+      {
+        err=1;
+      }
+
+      if (err)
+      {
+        __debugbreak();
+      }
+      else
+      {
+
+        // back up to find the original pointer
+        void* p=((void**)ptr)[-2];
+
+        // clear the validation value and the original pointer
+        ((unsigned int*)ptr)[-1]=0;
+        ((void**)ptr)[-2]=0;
+
+//printf("free: %p\n",ptr);
+
+        --mem_count;
+
+        // now free the pages
+        if (p)
+          VirtualFree(p,0,MEM_RELEASE);
+
+      }
+    }
+  }
+
+#endif
+
+#define STBIR_MALLOC(size,user_data) ((void)(user_data), wmalloc(size))
+#define STBIR_FREE(ptr,user_data)    ((void)(user_data), wfree(ptr))
+
+#endif
+
+#define STBIR_PROFILE
+//#define STBIR_NO_SIMD
+//#define STBIR_AVX
+//#define STBIR_AVX2
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#include "stb_image_resize2.h"  // new one!
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+int tsizes[5] =   { 1, 1, 2, 4, 2 };
+int ttypes[5] =   { STBIR_TYPE_UINT8, STBIR_TYPE_UINT8_SRGB, STBIR_TYPE_UINT16, STBIR_TYPE_FLOAT, STBIR_TYPE_HALF_FLOAT };
+
+int cedges[4] =   { STBIR_EDGE_CLAMP, STBIR_EDGE_REFLECT, STBIR_EDGE_ZERO, STBIR_EDGE_WRAP };
+int flts[5] =     { STBIR_FILTER_BOX, STBIR_FILTER_TRIANGLE, STBIR_FILTER_CUBICBSPLINE, STBIR_FILTER_CATMULLROM, STBIR_FILTER_MITCHELL };
+int buffers[20] = { STBIR_1CHANNEL, STBIR_2CHANNEL, STBIR_RGB, STBIR_4CHANNEL, 
+                    STBIR_BGRA, STBIR_ARGB, STBIR_RA, STBIR_AR,
+                    STBIR_RGBA_PM, STBIR_ARGB_PM, STBIR_RA_PM, STBIR_AR_PM,
+                    STBIR_RGBA, STBIR_ARGB, STBIR_RA, STBIR_AR,
+                    STBIR_RGBA_PM, STBIR_ARGB_PM, STBIR_RA_PM, STBIR_AR_PM,
+                  };
+int obuffers[20] = { STBIR_1CHANNEL, STBIR_2CHANNEL, STBIR_RGB, STBIR_4CHANNEL, 
+                     STBIR_BGRA, STBIR_ARGB, STBIR_RA, STBIR_AR,
+                     STBIR_RGBA_PM, STBIR_ARGB_PM, STBIR_RA_PM, STBIR_AR_PM,
+                     STBIR_RGBA_PM, STBIR_ARGB_PM, STBIR_RA_PM, STBIR_AR_PM,
+                     STBIR_RGBA, STBIR_ARGB, STBIR_RA, STBIR_AR,
+                   };
+
+int bchannels[20] = { 1, 2, 3, 4,  4,4, 2,2,  4,4, 2,2,  4,4, 2,2,  4,4, 2,2  }; 
+int alphapos[20] = { -1, -1, -1, -1,  3,0,  1,0,   3,0,  1,0,   3,0,  1,0,3,0,  1,0 }; 
+
+
+char const * buffstrs[20] = { "1ch", "2ch", "3ch", "4ch",  "RGBA", "ARGB", "RA", "AR",  "RGBA_both_pre", "ARGB_both_pre", "RA_both_pre", "AR_both_pre",  "RGBA_out_pre", "ARGB_out_pre", "RA_out_pre", "AR_out_pre",  "RGBA_in_pre", "ARGB_in_pre", "RA_in_pre", "AR_in_pre" };
+char const * typestrs[5] =  { "Bytes", "BytesSRGB", "Shorts", "Floats", "Half Floats"};
+char const * edgestrs[4] =  { "Clamp", "Reflect", "Zero", "Wrap" };
+char const * fltstrs[5] =   { "Box", "Triangle", "Cubic", "Catmullrom", "Mitchell" };
+
+#ifdef STBIR_PROFILE
+  static void do_acc_zones( STBIR_PROFILE_INFO * profile )
+  {
+    stbir_uint32 j;
+    stbir_uint64 start = tmGetAccumulationStart( tm_mask ); start=start;
+
+    for( j = 0 ; j < profile->count ; j++ )
+    {
+      if ( profile->clocks[j] )
+        tmEmitAccumulationZone( 0, 0, (tm_uint64*)&start, 0, profile->clocks[j], profile->descriptions[j] );
+    }
+  }
+#else
+  #define do_acc_zones(...)
+#endif
+
+int64 vert;
+
+//#define WINTHREADTEST
+#ifdef WINTHREADTEST
+
+static STBIR_RESIZE * thread_resize;
+static LONG which;
+static int threads_started = 0;
+static HANDLE threads[32];
+static HANDLE starts,stops;
+  
+static DWORD resize_shim( LPVOID p )
+{
+  for(;;)
+  {
+    LONG wh;
+
+    WaitForSingleObject( starts, INFINITE );
+
+    wh = InterlockedAdd( &which, 1 ) - 1;
+
+    ENTER( "Split %d", wh );
+    stbir_resize_split( thread_resize, wh, 1 );
+  #ifdef STBIR_PROFILE
+    { STBIR_PROFILE_INFO profile; stbir_resize_split_profile_info( &profile, thread_resize, wh, 1 ); do_acc_zones( &profile ); vert = profile.clocks[1]; }
+  #endif
+    LEAVE();
+
+    ReleaseSemaphore( stops, 1, 0 );
+  }
+}
+
+#endif
+
+void nresize( void * o, int ox, int oy, int op, void * i, int ix, int iy, int ip, int buf, int type, int edg, int flt )
+{
+  STBIR_RESIZE resize;
+ 
+  stbir_resize_init( &resize, i, ix, iy, ip, o, ox, oy, op, buffers[buf], ttypes[type] );
+  stbir_set_pixel_layouts( &resize, buffers[buf], obuffers[buf] );
+  stbir_set_edgemodes( &resize, cedges[edg], cedges[edg] );
+  stbir_set_filters( &resize, flts[flt], /*STBIR_FILTER_POINT_SAMPLE */ flts[flt] );
+  //stbir_set_input_subrect( &resize, 0.55f,0.333f,0.75f,0.50f);
+  //stbir_set_output_pixel_subrect( &resize, 00, 00, ox/2,oy/2);
+  //stbir_set_pixel_subrect(&resize, 1430,1361,30,30);
+  
+  ENTER( "Resize" );
+ 
+  #ifndef WINTHREADTEST
+
+  ENTER( "Filters" );
+  stbir_build_samplers_with_splits( &resize, 1 );
+  #ifdef STBIR_PROFILE
+    { STBIR_PROFILE_INFO profile; stbir_resize_build_profile_info( &profile, &resize ); do_acc_zones( &profile ); }
+  #endif
+  LEAVE();
+ 
+  ENTER( "Resize" );
+  if(!stbir_resize_extended( &resize ) )
+    stop();
+  #ifdef STBIR_PROFILE
+    { STBIR_PROFILE_INFO profile; stbir_resize_extended_profile_info( &profile, &resize ); do_acc_zones( &profile ); vert = profile.clocks[1]; }
+  #endif
+  LEAVE();
+
+  #else
+  {
+    int c, cnt;
+ 
+    ENTER( "Filters" );
+    cnt =  stbir_build_samplers_with_splits( &resize, 4 );
+    #ifdef STBIR_PROFILE
+      { STBIR_PROFILE_INFO profile; stbir_resize_build_profile_info( &profile, &resize ); do_acc_zones( &profile ); }
+    #endif
+    LEAVE();
+    
+    ENTER( "Thread start" );
+    if ( threads_started == 0 )
+    {
+      starts = CreateSemaphore( 0, 0, 32, 0 );
+      stops = CreateSemaphore( 0, 0, 32, 0 );
+    }
+    for( c = threads_started ; c < cnt ; c++ )
+      threads[ c ] = CreateThread( 0, 2048*1024, resize_shim, 0, 0, 0 );
+    
+    threads_started = cnt;
+    thread_resize = &resize;
+    which = 0;
+    LEAVE();
+
+    // starts the threads
+    ReleaseSemaphore( starts, cnt, 0 );
+
+    ENTER( "Wait" );
+    for( c = 0 ; c < cnt; c++ )
+      WaitForSingleObject( stops, INFINITE );
+    LEAVE();
+  }
+  #endif
+
+  ENTER( "Free" );
+  stbir_free_samplers( &resize );
+  LEAVE();
+  LEAVE();
+}
+
+
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+extern void oresize( void * o, int ox, int oy, int op, void * i, int ix, int iy, int ip, int buf, int type, int edg, int flt );
+
+
+
+#define TYPESTART 0
+#define TYPEEND   4
+
+#define LAYOUTSTART  0
+#define LAYOUTEND   19
+
+#define SIZEWSTART 0
+#define SIZEWEND   2
+
+#define SIZEHSTART 0
+#define SIZEHEND   2
+
+#define EDGESTART 0
+#define EDGEEND   3
+
+#define FILTERSTART 0
+#define FILTEREND   4
+
+#define HEIGHTSTART 0
+#define HEIGHTEND   2
+
+#define WIDTHSTART 0
+#define WIDTHEND   2
+
+
+
+
+static void * convert8to16( unsigned char * i, int w, int h, int c )
+{
+  unsigned short * ret;
+  int p;
+
+  ret = malloc( w*h*c*sizeof(short) );
+  for(p = 0 ; p < (w*h*c) ; p++ )
+  {
+    ret[p]=(short)((((int)i[p])<<8)+i[p]);
+  }
+
+  return ret;
+}
+
+static void * convert8tof( unsigned char * i, int w, int h, int c )
+{
+  float * ret;
+  int p;
+
+  ret = malloc( w*h*c*sizeof(float) );
+  for(p = 0 ; p < (w*h*c) ; p++ )
+  {
+    ret[p]=((float)i[p])*(1.0f/255.0f);
+  }
+
+  return ret;
+}
+
+static void * convert8tohf( unsigned char * i, int w, int h, int c )
+{
+  stbir__FP16 * ret;
+  int p;
+
+  ret = malloc( w*h*c*sizeof(stbir__FP16) );
+  for(p = 0 ; p < (w*h*c) ; p++ )
+  {
+    ret[p]=stbir__float_to_half(((float)i[p])*(1.0f/255.0f));
+  }
+
+  return ret;
+}
+
+static void * convert8tohff( unsigned char * i, int w, int h, int c )
+{
+  float * ret;
+  int p;
+
+  ret = malloc( w*h*c*sizeof(float) );
+  for(p = 0 ; p < (w*h*c) ; p++ )
+  {
+    ret[p]=stbir__half_to_float(stbir__float_to_half(((float)i[p])*(1.0f/255.0f)));
+  }
+
+  return ret;
+}
+
+static int isprime( int v )
+{
+  int i;
+
+  if ( v <= 3 )
+    return ( v > 1 );
+  if ( ( v & 1 ) == 0 )
+    return 0;
+  if ( ( v % 3 ) == 0 )
+    return 0;
+  i = 5;
+  while ( (i*i) <= v )
+  {
+    if ( ( v % i ) == 0 )
+      return 0;
+    if ( ( v % ( i + 2 ) ) == 0 )
+      return 0;
+    i += 6;
+  }
+
+  return 1;
+}
+
+static int getprime( int v )
+{
+  int i;
+  i = 0;
+  for(;;)
+  {
+    if ( i >= v )
+      return v;  // can't find any, just return orig
+    if (isprime(v - i))
+      return v - i;
+    if (isprime(v + i))
+      return v + i;
+    ++i;
+  }
+}
+
+
+int main( int argc, char ** argv )
+{
+  int ix, iy, ic;
+  unsigned char * input[6];
+  char * ir1;
+  char * ir2;
+  int szhs[3];
+  int szws[3];
+  int aw, ah, ac;
+  unsigned char * correctalpha;
+  int layouts, types, heights, widths, edges, filters;
+
+  if ( argc != 2 )
+  {
+    printf("command: stbirtest [imagefile]\n");
+    exit(1);
+  }
+
+  SetupTM( "127.0.0.1" );
+
+  correctalpha = stbi_load( "correctalpha.png", &aw, &ah, &ac, 0 );
+
+  input[0] = stbi_load( argv[1], &ix, &iy, &ic, 0 );
+  input[1] = input[0];
+  input[2] = convert8to16( input[0], ix, iy, ic );
+  input[3] = convert8tof( input[0], ix, iy, ic );
+  input[4] = convert8tohf( input[0], ix, iy, ic );
+  input[5] = convert8tohff( input[0], ix, iy, ic );
+
+  printf("Input %dx%d (%d channels)\n",ix,iy,ic);
+
+  ir1 = malloc( 4 * 4 * 3000 * 3000ULL );
+  ir2 = malloc( 4 * 4 * 3000 * 3000ULL );
+
+  szhs[0] = getprime( iy/SHRINKBYH );
+  szhs[1] = iy;
+  szhs[2] = getprime( iy*ZOOMBYH );
+
+  szws[0] = getprime( ix/SHRINKBYW );
+  szws[1] = ix;
+  szws[2] = getprime( ix*ZOOMBYW );
+
+  #if 1
+  for( types = TYPESTART ; types <= TYPEEND ; types++ )
+  #else
+  for( types = 1 ; types <= 1 ; types++ )
+  #endif
+  {
+    ENTER( "Test type: %s",typestrs[types]);
+    #if 1
+    for( layouts = LAYOUTSTART ; layouts <= LAYOUTEND ; layouts++ )
+    #else
+    for( layouts = 16; layouts <= 16 ; layouts++ )
+    #endif
+    {
+      ENTER( "Test layout: %s",buffstrs[layouts]);
+      
+      #if 0
+      for( heights = HEIGHTSTART ; heights <= HEIGHTEND ; heights++ )
+      {
+        int w, h = szhs[heights];
+      #else
+      for( heights = 0 ; heights <= 11 ; heights++ )
+      {
+        static int szhsz[12]={32, 200, 350, 400, 450, 509, 532, 624, 700, 824, 1023, 2053 };
+        int w, h = szhsz[heights];
+      #endif
+
+        ENTER( "Test height: %d %s %d",iy,(h<iy)?"Down":((h>iy)?"Up":"Same"),h);
+  
+        #if 0
+        for( widths = WIDTHSTART ; widths <= WIDTHEND ; widths++ )
+        {
+          w = szws[widths];
+        #else
+        for( widths = 0 ; widths <= 12 ; widths++ )
+        {
+          static int szwsz[13]={2, 32, 200, 350, 400, 450, 509, 532, 624, 700, 824, 1023, 2053 };
+          w = szwsz[widths];
+        #endif
+
+          ENTER( "Test width: %d %s %d",ix, (w<ix)?"Down":((w>ix)?"Up":"Same"), w);
+
+          #if 0
+          for( edges = EDGESTART ; edges <= EDGEEND ; edges++ )
+          #else
+          for( edges = 0 ; edges <= 0 ; edges++ )
+          #endif
+          {
+            ENTER( "Test edge: %s",edgestrs[edges]);
+            #if 0
+            for( filters = FILTERSTART ; filters <= FILTEREND ; filters++ )
+            #else
+            for( filters = 3 ; filters <= 3 ; filters++ )
+            #endif
+            {
+              int op, opw, np,npw, c, a;
+              #ifdef COMPARE_SAME                        
+              int oldtypes = types;
+              #else
+              int oldtypes = (types==4)?3:types;
+              #endif
+            
+              ENTER( "Test filter: %s",fltstrs[filters]);
+              {
+                c = bchannels[layouts];
+                a = alphapos[layouts];
+
+                op = w*tsizes[oldtypes]*c + 60;
+                opw = w*tsizes[oldtypes]*c;
+
+                np = w*tsizes[types]*c + 60;
+                npw = w*tsizes[types]*c;
+
+                printf( "%s:layout: %s  w: %d h: %d edge: %s filt: %s\n", typestrs[types],buffstrs[layouts], w, h, edgestrs[edges], fltstrs[filters] );
+
+
+                // clear pixel area to different, right edge to zero
+                #ifndef NOCLEAR
+                ENTER( "Test clear padding" );
+                {
+                  int d;
+                  for( d = 0 ; d < h ; d++ )
+                  {
+                    int oofs = d * op;
+                    int nofs = d * np;
+                    memset( ir1 + oofs, 192, opw );
+                    memset( ir1 + oofs+opw, 79, op-opw );
+                    memset( ir2 + nofs, 255, npw );
+                    memset( ir2 + nofs+npw, 79, np-npw );
+                  }
+                }
+                LEAVE();
+
+                #endif
+
+                #ifdef COMPARE_SAME                        
+                #define TIMINGS 1
+                #else 
+                #define TIMINGS 1
+                #endif
+                ENTER( "Test both" );
+                {
+                  #ifndef PROFILE_NEW_ONLY
+                    {
+                      int ttt, max = 0x7fffffff;
+                      ENTER( "Test old" );
+                      for( ttt = 0 ; ttt < TIMINGS ; ttt++ )
+                      {
+                        int64 m = __cycles();
+
+                        oresize( ir1, w, h, op, 
+                        #ifdef COMPARE_SAME                        
+                        input[types], 
+                        #else
+                        input[(types==4)?5:types], 
+                        #endif
+                        ix, iy, ix*ic*tsizes[oldtypes], layouts, oldtypes, edges, filters );
+
+                        m = __cycles() - m;
+                        if ( ( (int)m ) < max )
+                        max = (int) m;
+                      }
+                      LEAVE();
+                      printf("old: %d\n", max );
+                    }
+                  #endif
+     
+                  {
+                    int ttt, max = 0x7fffffff, maxv = 0x7fffffff;
+                    ENTER( "Test new" );
+                    for( ttt = 0 ; ttt < TIMINGS ; ttt++ )
+                    {
+                      int64 m = __cycles();
+
+                      nresize( ir2, w, h, np, input[types], ix, iy, ix*ic*tsizes[types], layouts, types, edges, filters );
+        
+                      m = __cycles() - m;
+                      if ( ( (int)m ) < max )
+                        max = (int) m;
+                      if ( ( (int)vert ) < maxv )
+                        maxv = (int) vert;
+                    }
+                    LEAVE(); // test new
+                    printf("new: %d (v: %d)\n", max, maxv );
+                  }
+                }
+                LEAVE();  // test both
+
+                if ( mem_count!= 0 )
+                  stop();
+
+              #ifndef NOCOMP
+                ENTER( "Test compare" );
+                {
+                  int x,y,ch;
+                  int nums = 0;
+                  for( y = 0 ; y < h ; y++ )
+                  {
+                    for( x = 0 ; x < w ; x++ )
+                    {
+                      switch(types)
+                      {
+                        case 0:
+                        case 1: //SRGB 
+                        {
+                          unsigned char * p1 = (unsigned char *)&ir1[y*op+x*c];
+                          unsigned char * p2 = (unsigned char *)&ir2[y*np+x*c];
+                          for( ch = 0 ; ch < c ; ch++ )
+                          {
+                            float pp1,pp2,d;
+                            float av = (a==-1)?1.0f:((float)p1[a]/255.0f);
+
+                            pp1 = p1[ch];
+                            pp2 = p2[ch];
+
+                            // compare in premult space
+                            #ifndef COMPARE_SAME                        
+                            if ( ( ( layouts >=4 ) && ( layouts <= 7 ) ) || ( ( layouts >=16 ) && ( layouts <= 19 ) ) )
+                            {
+                              pp1 *= av;
+                              pp2 *= av;
+                            }
+                            #endif
+
+                            d = pp1 - pp2;
+                            if ( d < 0 ) d = -d;
+
+                            #ifdef COMPARE_SAME                        
+                            if ( d > 0 ) 
+                            #else 
+                            if ( d > 1 )
+                            #endif
+                            {
+                              printf("Error at %d x %d (chan %d) (d: %g a: %g) [%d %d %d %d] [%d %d %d %d]\n",x,y,ch, d,av, p1[0],p1[1],p1[2],p1[3], p2[0],p2[1],p2[2],p2[3]);
+                              ++nums;
+                              if ( nums > 16 ) goto ex;
+                              //if (d) exit(1);
+                              //goto ex;
+                            }
+                          }  
+                        }
+                        break;
+
+                        case 2:
+                        {
+                          unsigned short * p1 = (unsigned short *)&ir1[y*op+x*c*sizeof(short)];
+                          unsigned short * p2 = (unsigned short *)&ir2[y*np+x*c*sizeof(short)];
+                          for( ch = 0 ; ch < c ; ch++ )
+                          {
+                            float thres,pp1,pp2,d;
+                            float av = (a==-1)?1.0f:((float)p1[a]/65535.0f);
+
+                            pp1 = p1[ch];
+                            pp2 = p2[ch];
+
+                            // compare in premult space
+                            #ifndef COMPARE_SAME                        
+                            if ( ( ( layouts >=4 ) && ( layouts <= 7 ) ) || ( ( layouts >= 16 ) && ( layouts <= 19 ) ) )
+                            {
+                              pp1 *= av;
+                              pp2 *= av;
+                            }
+                            #endif
+
+                            d = pp1 - pp2;
+                            if ( d < 0 ) d = -d;
+
+                            thres=((float)p1[ch]*0.007f)+2.0f;
+                            if (thres<4) thres = 4;
+  
+                            #ifdef COMPARE_SAME                        
+                            if ( d > 0 ) 
+                            #else 
+                            if ( d > thres)
+                            #endif
+                            {
+                              printf("Error at %d x %d (chan %d) %d %d [df: %g th: %g al: %g] (%d %d %d %d) (%d %d %d %d)\n",x,y,ch, p1[ch],p2[ch],d,thres,av,p1[0],p1[1],p1[2],p1[3],p2[0],p2[1],p2[2],p2[3]);
+                              ++nums;
+                              if ( nums > 16 ) goto ex;
+                              //if (d) exit(1);
+                              //goto ex;
+                            }
+                          }
+                        }
+                        break;
+
+                        case 3:
+                        {
+                          float * p1 = (float *)&ir1[y*op+x*c*sizeof(float)];
+                          float * p2 = (float *)&ir2[y*np+x*c*sizeof(float)];
+                          for( ch = 0 ; ch < c ; ch++ )
+                          {
+                            float pp1 = p1[ch], pp2 = p2[ch];
+                            float av = (a==-1)?1.0f:p1[a];
+                            float thres, d;
+
+                            // clamp
+                            if (pp1<=0.0f) pp1 = 0;
+                            if (pp2<=0.0f) pp2 = 0;
+                            if (av<=0.0f) av = 0;
+                            if (pp1>1.0f) pp1 = 1.0f;
+                            if (pp2>1.0f) pp2 = 1.0f;
+                            if (av>1.0f) av = 1.0f;
+
+                            // compare in premult space
+                            #ifndef COMPARE_SAME                        
+                            if ( ( ( layouts >=4 ) && ( layouts <= 7 ) ) || ( ( layouts >= 16 ) && ( layouts <= 19 ) ) )
+                            {
+                              pp1 *= av;
+                              pp2 *= av;
+                            }
+                            #endif
+
+                            d = pp1 - pp2;
+                            if ( d < 0 ) d = -d;
+
+                            thres=(p1[ch]*0.002f)+0.0002f;
+                            if ( thres < 0 ) thres = -thres;
+
+                            #ifdef COMPARE_SAME                        
+                            if ( d != 0.0f ) 
+                            #else 
+                            if ( d > thres )
+                            #endif
+                            {
+                              printf("Error at %d x %d (chan %d) %g %g [df: %g th: %g al: %g] (%g %g %g %g) (%g %g %g %g)\n",x,y,ch, p1[ch],p2[ch],d,thres,av,p1[0],p1[1],p1[2],p1[3],p2[0],p2[1],p2[2],p2[3]);
+                              ++nums;
+                              if ( nums > 16 ) goto ex;
+                              //if (d) exit(1);
+                              //goto ex;
+                            }
+                          }
+                        }
+                        break;
+
+                        case 4:
+                        {
+                          #ifdef COMPARE_SAME                        
+                          stbir__FP16 * p1 = (stbir__FP16 *)&ir1[y*op+x*c*sizeof(stbir__FP16)];
+                          #else
+                          float * p1 = (float *)&ir1[y*op+x*c*sizeof(float)];
+                          #endif
+                          stbir__FP16 * p2 = (stbir__FP16 *)&ir2[y*np+x*c*sizeof(stbir__FP16)];
+                          for( ch = 0 ; ch < c ; ch++ )
+                          {
+                            #ifdef COMPARE_SAME                        
+                            float pp1  = stbir__half_to_float(p1[ch]);
+                            float av = (a==-1)?1.0f:stbir__half_to_float(p1[a]);
+                            #else
+                            float pp1 = stbir__half_to_float(stbir__float_to_half(p1[ch]));
+                            float av = (a==-1)?1.0f:stbir__half_to_float(stbir__float_to_half(p1[a]));
+                            #endif
+                            float pp2 = stbir__half_to_float(p2[ch]);
+                            float d, thres;
+
+                            // clamp
+                            if (pp1<=0.0f) pp1 = 0;
+                            if (pp2<=0.0f) pp2 = 0;
+                            if (av<=0.0f) av = 0;
+                            if (pp1>1.0f) pp1 = 1.0f;
+                            if (pp2>1.0f) pp2 = 1.0f;
+                            if (av>1.0f) av = 1.0f;
+
+                            thres=(pp1*0.002f)+0.0002f;
+
+                            // compare in premult space
+                            #ifndef COMPARE_SAME                        
+                            if ( ( ( layouts >=4 ) && ( layouts <= 7 ) ) || ( ( layouts >= 16 ) && ( layouts <= 19 ) ) )
+                            {
+                              pp1 *= av;
+                              pp2 *= av;
+                            }
+                            #endif
+
+                            d = pp1 - pp2;
+                            if ( d < 0 ) d = -d;
+
+
+                            #ifdef COMPARE_SAME                        
+                            if ( d != 0.0f ) 
+                            #else 
+                            if ( d > thres )
+                            #endif
+                            {
+                              printf("Error at %d x %d (chan %d) %g %g [df: %g th: %g al: %g] (%g %g %g %g) (%g %g %g %g)\n",x,y,ch, 
+                              #ifdef COMPARE_SAME                        
+                              stbir__half_to_float(p1[ch]), 
+                              #else
+                              p1[ch],
+                              #endif
+                              stbir__half_to_float(p2[ch]), 
+                              d,thres,av,
+                              #ifdef COMPARE_SAME                        
+                              stbir__half_to_float(p1[0]),stbir__half_to_float(p1[1]),stbir__half_to_float(p1[2]),stbir__half_to_float(p1[3]),
+                              #else
+                              p1[0],p1[1],p1[2],p1[3],
+                              #endif
+                              stbir__half_to_float(p2[0]),stbir__half_to_float(p2[1]),stbir__half_to_float(p2[2]),stbir__half_to_float(p2[3]) );
+                              ++nums;
+                              if ( nums > 16 ) goto ex;
+                              //if (d) exit(1);
+                              //goto ex;
+                            }
+                          }
+                        }
+                        break;
+                      }
+                    }
+
+                    for( x = (w*c)*tsizes[oldtypes]; x < op; x++ )
+                    {
+                      if ( ir1[y*op+x] != 79 )
+                      {
+                        printf("Margin error at %d x %d %d (should be 79) OLD!\n",x,y,(unsigned char)ir1[y*op+x]);
+                        goto ex;
+                      }
+                    }
+
+                    for( x = (w*c)*tsizes[types]; x < np; x++ )
+                    {
+                      if ( ir2[y*np+x] != 79 )
+                      {
+                        printf("Margin error at %d x %d %d (should be 79) NEW\n",x,y,(unsigned char)ir2[y*np+x]);
+                        goto ex;
+                      }
+                    }
+                  }
+                
+                  ex:
+                  ENTER( "OUTPUT IMAGES" );
+                  printf("  tot pix: %d, errs: %d\n", w*h*c,nums );
+
+                  if (nums)
+                  {
+                    stbi_write_png("old.png", w, h, c, ir1, op);
+                    stbi_write_png("new.png", w, h, c, ir2, np);
+                    exit(1);
+                  }
+
+                  LEAVE(); // output images
+                }
+                LEAVE(); //test compare
+              #endif
+
+
+
+              }
+              LEAVE(); // test filter
+            }
+            LEAVE(); // test edge
+          }
+          LEAVE(); // test width
+        }
+        LEAVE(); // test height
+      }
+      LEAVE(); // test type
+    }
+    LEAVE();  // test layout
+  }
+
+  CloseTM();
+  return 0;
+}
diff --git a/vendor/stb/stb_image_resize_test/vf_train.c b/vendor/stb/stb_image_resize_test/vf_train.c
new file mode 100644
index 0000000..0fdbe27
--- /dev/null
+++ b/vendor/stb/stb_image_resize_test/vf_train.c
@@ -0,0 +1,999 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define stop() __debugbreak()
+#include <windows.h>
+#define int64 __int64
+
+#pragma warning(disable:4127)
+
+#define STBIR__WEIGHT_TABLES  
+#define STBIR_PROFILE
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#include "stb_image_resize2.h"  
+
+static int * file_read( char const * filename )
+{
+  size_t s;
+  int * m;
+  FILE * f = fopen( filename, "rb" );
+  if ( f == 0 ) return 0;
+  
+  fseek( f, 0, SEEK_END);
+  s = ftell( f );
+  fseek( f, 0, SEEK_SET);
+  m = malloc( s + 4 );
+  m[0] = (int)s;
+  fread( m+1, 1, s, f);
+  fclose(f);
+
+  return( m );
+}
+
+typedef struct fileinfo
+{
+  int * timings;
+  int timing_count;
+  int dimensionx, dimensiony;
+  int numtypes;
+  int * types;
+  int * effective;
+  int cpu;
+  int simd;
+  int numinputrects;
+  int * inputrects;
+  int outputscalex, outputscaley;
+  int milliseconds;
+  int64 cycles;
+  double scale_time;
+  int bitmapx, bitmapy;
+  char const * filename;
+} fileinfo;
+
+int numfileinfo;
+fileinfo fi[256];
+unsigned char * bitmap;
+int bitmapw, bitmaph, bitmapp;
+
+static int use_timing_file( char const * filename, int index )
+{
+  int * base = file_read( filename );
+  int * file = base;
+
+  if ( base == 0 ) return 0;
+
+  ++file; // skip file image size;
+  if ( *file++ != 'VFT1' ) return 0;
+  fi[index].cpu = *file++;
+  fi[index].simd = *file++;
+  fi[index].dimensionx = *file++;
+  fi[index].dimensiony = *file++;
+  fi[index].numtypes = *file++;
+  fi[index].types = file; file += fi[index].numtypes;
+  fi[index].effective = file; file += fi[index].numtypes;
+  fi[index].numinputrects = *file++;
+  fi[index].inputrects = file; file += fi[index].numinputrects * 2;
+  fi[index].outputscalex = *file++;
+  fi[index].outputscaley = *file++;
+  fi[index].milliseconds = *file++;
+  fi[index].cycles = ((int64*)file)[0]; file += 2;
+  fi[index].filename = filename;
+
+  fi[index].timings = file;
+  fi[index].timing_count = (int) ( ( base[0] - ( ((char*)file - (char*)base - sizeof(int) ) ) ) / (sizeof(int)*2) );
+
+  fi[index].scale_time = (double)fi[index].milliseconds / (double)fi[index].cycles;
+
+  return 1;
+}
+
+static int vert_first( float weights_table[STBIR_RESIZE_CLASSIFICATIONS][4], int ox, int oy, int ix, int iy, int filter, STBIR__V_FIRST_INFO * v_info )
+{
+  float h_scale=(float)ox/(float)(ix);
+  float v_scale=(float)oy/(float)(iy);
+  stbir__support_callback * support = stbir__builtin_supports[filter];
+  int vertical_filter_width = stbir__get_filter_pixel_width(support,v_scale,0);
+  int vertical_gather = ( v_scale >= ( 1.0f - stbir__small_float ) ) || ( vertical_filter_width <= STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT );
+
+  return stbir__should_do_vertical_first( weights_table, stbir__get_filter_pixel_width(support,h_scale,0), h_scale, ox, vertical_filter_width, v_scale, oy, vertical_gather, v_info );
+} 
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+static void alloc_bitmap()
+{
+  int findex;
+  int x = 0, y = 0;
+  int w = 0, h = 0;
+
+  for( findex = 0 ; findex < numfileinfo ; findex++ )
+  {
+    int nx, ny;
+    int thisw, thish;
+
+    thisw = ( fi[findex].dimensionx * fi[findex].numtypes ) + ( fi[findex].numtypes - 1 );
+    thish = ( fi[findex].dimensiony * fi[findex].numinputrects ) + ( fi[findex].numinputrects - 1 );
+
+    for(;;)
+    {
+      nx = x + ((x)?4:0) + thisw;
+      ny = y + ((y)?4:0) + thish;
+      if ( ( nx <= 3600 ) || ( x == 0 ) )
+      { 
+        fi[findex].bitmapx = x + ((x)?4:0);
+        fi[findex].bitmapy = y + ((y)?4:0);
+        x = nx;
+        if ( x > w ) w = x;
+        if ( ny > h ) h = ny;
+        break;
+      }
+      else
+      {
+        x = 0;
+        y = h;
+      }
+    }
+  }
+
+  w = (w+3) & ~3;
+  bitmapw = w;
+  bitmaph = h;
+  bitmapp = w * 3; // RGB
+  bitmap = malloc( bitmapp * bitmaph );
+
+  memset( bitmap, 0, bitmapp * bitmaph );
+}
+
+static void build_bitmap( float weights[STBIR_RESIZE_CLASSIFICATIONS][4], int do_channel_count_index, int findex )
+{
+  static int colors[STBIR_RESIZE_CLASSIFICATIONS];
+  STBIR__V_FIRST_INFO v_info = {0};
+
+  int * ts;
+  int ir;
+  unsigned char * bitm = bitmap + ( fi[findex].bitmapx*3 ) + ( fi[findex].bitmapy*bitmapp) ;
+
+  for( ir = 0; ir < STBIR_RESIZE_CLASSIFICATIONS ; ir++ ) colors[ ir ] = 127*ir/STBIR_RESIZE_CLASSIFICATIONS+128;
+
+  ts = fi[findex].timings;
+
+  for( ir = 0 ; ir < fi[findex].numinputrects ; ir++ )
+  {
+    int ix, iy, chanind;
+    ix = fi[findex].inputrects[ir*2];
+    iy = fi[findex].inputrects[ir*2+1];
+
+    for( chanind = 0 ; chanind < fi[findex].numtypes ; chanind++ )
+    {
+      int ofs, h, hh;
+
+      // just do the type that we're on
+      if ( chanind != do_channel_count_index )
+      {
+        ts += 2 * fi[findex].dimensionx * fi[findex].dimensiony;
+        continue;
+      }
+
+      // bitmap offset
+      ofs=chanind*(fi[findex].dimensionx+1)*3+ir*(fi[findex].dimensiony+1)*bitmapp;
+
+      h = 1;
+      for( hh = 0 ; hh < fi[findex].dimensiony; hh++ )
+      {
+        int ww, w = 1;
+        for( ww = 0 ; ww < fi[findex].dimensionx; ww++ )
+        {
+          int good, v_first, VF, HF;
+
+          VF = ts[0];
+          HF = ts[1];
+
+          v_first = vert_first( weights, w, h, ix, iy, STBIR_FILTER_MITCHELL, &v_info );
+
+          good = ( ((HF<=VF) && (!v_first)) || ((VF<=HF) && (v_first)));
+
+          if ( good )
+          {
+            bitm[ofs+2] = 0;
+            bitm[ofs+1] = (unsigned char)colors[v_info.v_resize_classification];
+          }
+          else
+          {
+            double r;
+
+            if ( HF < VF )
+              r = (double)(VF-HF)/(double)HF;
+            else
+              r = (double)(HF-VF)/(double)VF;
+            
+            if ( r > 0.4f) r = 0.4;
+            r *= 1.0f/0.4f;   
+
+            bitm[ofs+2] = (char)(255.0f*r);
+            bitm[ofs+1] = (char)(((float)colors[v_info.v_resize_classification])*(1.0f-r));
+          }
+          bitm[ofs] = 0;
+
+          ofs += 3;
+          ts += 2;
+          w += fi[findex].outputscalex;
+        }
+        ofs += bitmapp - fi[findex].dimensionx*3;
+        h += fi[findex].outputscaley;
+      }
+    }
+  }
+}
+
+static void build_comp_bitmap( float weights[STBIR_RESIZE_CLASSIFICATIONS][4], int do_channel_count_index )
+{
+  int * ts0;
+  int * ts1;
+  int ir;
+  unsigned char * bitm = bitmap + ( fi[0].bitmapx*3 ) + ( fi[0].bitmapy*bitmapp) ;
+
+  ts0 = fi[0].timings;
+  ts1 = fi[1].timings;
+
+  for( ir = 0 ; ir < fi[0].numinputrects ; ir++ )
+  {
+    int ix, iy, chanind;
+    ix = fi[0].inputrects[ir*2];
+    iy = fi[0].inputrects[ir*2+1];
+
+    for( chanind = 0 ; chanind < fi[0].numtypes ; chanind++ )
+    {
+      int ofs, h, hh;
+
+      // just do the type that we're on
+      if ( chanind != do_channel_count_index )
+      {
+        ts0 += 2 * fi[0].dimensionx * fi[0].dimensiony;
+        ts1 += 2 * fi[0].dimensionx * fi[0].dimensiony;
+        continue;
+      }
+
+      // bitmap offset
+      ofs=chanind*(fi[0].dimensionx+1)*3+ir*(fi[0].dimensiony+1)*bitmapp;
+
+      h = 1;
+      for( hh = 0 ; hh < fi[0].dimensiony; hh++ )
+      {
+        int ww, w = 1;
+        for( ww = 0 ; ww < fi[0].dimensionx; ww++ )
+        {
+          int v_first, time0, time1;
+
+          v_first = vert_first( weights, w, h, ix, iy, STBIR_FILTER_MITCHELL, 0 );
+
+          time0 = ( v_first ) ? ts0[0] : ts0[1];
+          time1 = ( v_first ) ? ts1[0] : ts1[1];
+
+          if ( time0 < time1 )
+          {
+            double r = (double)(time1-time0)/(double)time0;
+            if ( r > 0.4f) r = 0.4;
+            r *= 1.0f/0.4f;   
+            bitm[ofs+2] = 0;
+            bitm[ofs+1] = (char)(255.0f*r);
+            bitm[ofs] = (char)(64.0f*(1.0f-r));
+          }
+          else
+          {
+            double r = (double)(time0-time1)/(double)time1;
+            if ( r > 0.4f) r = 0.4;
+            r *= 1.0f/0.4f;   
+            bitm[ofs+2] = (char)(255.0f*r);
+            bitm[ofs+1] = 0;
+            bitm[ofs] = (char)(64.0f*(1.0f-r));
+          }
+          ofs += 3;
+          ts0 += 2;
+          ts1 += 2;
+          w += fi[0].outputscalex;
+        }
+        ofs += bitmapp - fi[0].dimensionx*3;
+        h += fi[0].outputscaley;
+      }
+    }
+  }
+}
+
+static void write_bitmap()
+{
+  stbi_write_png( "results.png", bitmapp / 3, bitmaph, 3|STB_IMAGE_BGR, bitmap, bitmapp );
+}
+
+
+static void calc_errors( float weights_table[STBIR_RESIZE_CLASSIFICATIONS][4], int * curtot, double * curerr, int do_channel_count_index )
+{
+  int th, findex;
+  STBIR__V_FIRST_INFO v_info = {0};
+
+  for(th=0;th<STBIR_RESIZE_CLASSIFICATIONS;th++)
+  {
+    curerr[th]=0;
+    curtot[th]=0;
+  }
+
+  for( findex = 0 ; findex < numfileinfo ; findex++ )
+  {
+    int * ts;
+    int ir;
+    ts = fi[findex].timings;
+
+    for( ir = 0 ; ir < fi[findex].numinputrects ; ir++ )
+    {
+      int ix, iy, chanind;
+      ix = fi[findex].inputrects[ir*2];
+      iy = fi[findex].inputrects[ir*2+1];
+
+      for( chanind = 0 ; chanind < fi[findex].numtypes ; chanind++ )
+      {
+        int h, hh;
+
+        // just do the type that we're on
+        if ( chanind != do_channel_count_index )
+        {
+          ts += 2 * fi[findex].dimensionx * fi[findex].dimensiony;
+          continue;
+        }
+
+        h = 1;
+        for( hh = 0 ; hh < fi[findex].dimensiony; hh++ )
+        {
+          int ww, w = 1;
+          for( ww = 0 ; ww < fi[findex].dimensionx; ww++ )
+          {
+            int good, v_first, VF, HF;
+
+            VF = ts[0];
+            HF = ts[1];
+
+            v_first = vert_first( weights_table, w, h, ix, iy, STBIR_FILTER_MITCHELL, &v_info );
+
+            good = ( ((HF<=VF) && (!v_first)) || ((VF<=HF) && (v_first)));
+
+            if ( !good )
+            {
+              double diff;
+              if ( VF < HF )
+                diff = ((double)HF-(double)VF) * fi[findex].scale_time;
+              else
+                diff = ((double)VF-(double)HF) * fi[findex].scale_time;
+
+              curtot[v_info.v_resize_classification] += 1;
+              curerr[v_info.v_resize_classification] += diff;
+            }
+
+            ts += 2;
+            w += fi[findex].outputscalex;
+          }
+          h += fi[findex].outputscaley;
+        }
+      }
+    }
+  }
+}
+
+#define TRIESPERWEIGHT 32
+#define MAXRANGE ((TRIESPERWEIGHT+1) * (TRIESPERWEIGHT+1) * (TRIESPERWEIGHT+1) * (TRIESPERWEIGHT+1) - 1)
+
+static void expand_to_floats( float * weights, int range )
+{
+  weights[0] = (float)( range % (TRIESPERWEIGHT+1) ) / (float)TRIESPERWEIGHT;
+  weights[1] = (float)( range/(TRIESPERWEIGHT+1) % (TRIESPERWEIGHT+1) ) / (float)TRIESPERWEIGHT;
+  weights[2] = (float)( range/(TRIESPERWEIGHT+1)/(TRIESPERWEIGHT+1) % (TRIESPERWEIGHT+1) ) / (float)TRIESPERWEIGHT;
+  weights[3] = (float)( range/(TRIESPERWEIGHT+1)/(TRIESPERWEIGHT+1)/(TRIESPERWEIGHT+1) % (TRIESPERWEIGHT+1) ) / (float)TRIESPERWEIGHT;
+}
+
+static char const * expand_to_string( int range )
+{
+  static char str[128];
+  int w0,w1,w2,w3;
+  w0 = range % (TRIESPERWEIGHT+1);
+  w1 = range/(TRIESPERWEIGHT+1) % (TRIESPERWEIGHT+1);
+  w2 = range/(TRIESPERWEIGHT+1)/(TRIESPERWEIGHT+1) % (TRIESPERWEIGHT+1);
+  w3 = range/(TRIESPERWEIGHT+1)/(TRIESPERWEIGHT+1)/(TRIESPERWEIGHT+1) % (TRIESPERWEIGHT+1);
+  sprintf( str, "[ %2d/%d %2d/%d %2d/%d %2d/%d ]",w0,TRIESPERWEIGHT,w1,TRIESPERWEIGHT,w2,TRIESPERWEIGHT,w3,TRIESPERWEIGHT );
+  return str;
+}
+
+static void print_weights( float weights[STBIR_RESIZE_CLASSIFICATIONS][4], int channel_count_index, int * tots, double * errs )
+{
+  int th;
+  printf("ChInd: %d  Weights:\n",channel_count_index);
+  for(th=0;th<STBIR_RESIZE_CLASSIFICATIONS;th++)
+  {
+    float * w = weights[th];
+    printf("  %d: [%1.5f %1.5f %1.5f %1.5f] (%d %.4f)\n",th, w[0], w[1], w[2], w[3], tots[th], errs[th] );
+  }
+  printf("\n");
+}
+
+static int windowranges[ 16 ];
+static int windowstatus = 0;
+static DWORD trainstart = 0;
+
+static void opt_channel( float best_output_weights[STBIR_RESIZE_CLASSIFICATIONS][4], int channel_count_index )
+{
+  int newbest = 0;
+  float weights[STBIR_RESIZE_CLASSIFICATIONS][4] = {0};
+  double besterr[STBIR_RESIZE_CLASSIFICATIONS];
+  int besttot[STBIR_RESIZE_CLASSIFICATIONS];
+  int best[STBIR_RESIZE_CLASSIFICATIONS]={0};
+
+  double curerr[STBIR_RESIZE_CLASSIFICATIONS];
+  int curtot[STBIR_RESIZE_CLASSIFICATIONS];
+  int th, range;
+  DWORD lasttick = 0;
+
+  for(th=0;th<STBIR_RESIZE_CLASSIFICATIONS;th++) 
+  {
+    besterr[th]=1000000000000.0;
+    besttot[th]=0x7fffffff;
+  }
+
+  newbest = 0;
+
+  // try the whole range  
+  range = MAXRANGE;
+  do
+  {
+    for(th=0;th<STBIR_RESIZE_CLASSIFICATIONS;th++)
+      expand_to_floats( weights[th], range );
+
+    calc_errors( weights, curtot, curerr, channel_count_index );
+
+    for(th=0;th<STBIR_RESIZE_CLASSIFICATIONS;th++)
+    {
+      if ( curerr[th] < besterr[th] )
+      {
+        besterr[th] = curerr[th];
+        besttot[th] = curtot[th];
+        best[th] = range;
+        expand_to_floats( best_output_weights[th], best[th] );
+        newbest = 1;
+      }
+    }
+
+    {
+      DWORD t = GetTickCount();
+      if ( range == 0 )
+        goto do_bitmap;
+
+      if ( newbest )
+      {
+        if ( ( GetTickCount() - lasttick ) > 200 )
+        {
+          int findex;
+        
+         do_bitmap:
+          lasttick = t;
+          newbest = 0;
+
+          for( findex = 0 ; findex < numfileinfo ; findex++ )
+            build_bitmap( best_output_weights, channel_count_index, findex );
+
+          lasttick = GetTickCount();
+        }
+      }
+    }
+   
+    windowranges[ channel_count_index ] = range;
+
+    // advance all the weights and loop
+    --range;
+  } while( ( range >= 0 ) && ( !windowstatus ) );
+
+  // if we hit here, then we tried all weights for this opt, so save them 
+}
+
+static void print_struct( float weight[5][STBIR_RESIZE_CLASSIFICATIONS][4], char const * name )
+{
+  printf("\n\nstatic float %s[5][STBIR_RESIZE_CLASSIFICATIONS][4]=\n{", name );
+  {
+    int i;
+    for(i=0;i<5;i++) 
+    { 
+      int th;
+      for(th=0;th<STBIR_RESIZE_CLASSIFICATIONS;th++)
+      {
+        int j;
+        printf("\n  "); 
+        for(j=0;j<4;j++) 
+          printf("%1.5ff, ", weight[i][th][j] ); 
+      }
+      printf("\n");
+    }
+    printf("\n};\n");
+  }
+}
+
+static float retrain_weights[5][STBIR_RESIZE_CLASSIFICATIONS][4];
+
+static DWORD __stdcall retrain_shim( LPVOID p )
+{
+  int chanind = (int) (size_t)p;
+  opt_channel( retrain_weights[chanind], chanind );
+  return 0;
+}
+
+static char const * gettime( int ms )
+{
+  static char time[32];
+  if (ms > 60000)
+    sprintf( time, "%dm %ds",ms/60000, (ms/1000)%60 );
+  else  
+    sprintf( time, "%ds",ms/1000 );
+  return time;
+}
+
+static BITMAPINFOHEADER bmiHeader;
+static DWORD extrawindoww, extrawindowh;
+static HINSTANCE instance;
+static int curzoom = 1;
+
+static LRESULT WINAPI WindowProc( HWND   window,
+                                  UINT   message,
+                                  WPARAM wparam,
+                                  LPARAM lparam )
+{
+  switch( message )
+  {
+    case WM_CHAR:
+      if ( wparam != 27 )
+        break;
+      // falls through
+
+    case WM_CLOSE:
+    {
+      int i;
+      int max = 0;
+      
+      for( i = 0 ; i < fi[0].numtypes ; i++ )
+        if( windowranges[i] > max ) max = windowranges[i];
+   
+      if ( ( max == 0 ) || ( MessageBox( window, "Cancel before training is finished?", "Vertical First Training", MB_OKCANCEL|MB_ICONSTOP ) == IDOK ) )
+      {
+        for( i = 0 ; i < fi[0].numtypes ; i++ )
+          if( windowranges[i] > max ) max = windowranges[i];
+        if ( max )
+          windowstatus = 1;
+        DestroyWindow( window );
+      }
+    }
+    return 0;
+       
+    case WM_PAINT:
+    {
+      PAINTSTRUCT ps;
+      HDC dc;
+
+      dc = BeginPaint( window, &ps );
+      StretchDIBits( dc, 
+        0, 0, bitmapw*curzoom, bitmaph*curzoom,
+        0, 0, bitmapw, bitmaph,
+        bitmap, (BITMAPINFO*)&bmiHeader, DIB_RGB_COLORS, SRCCOPY );
+
+      PatBlt( dc, bitmapw*curzoom, 0, 4096, 4096, WHITENESS );
+      PatBlt( dc, 0, bitmaph*curzoom, 4096, 4096, WHITENESS );
+
+      SetTextColor( dc, RGB(0,0,0)  );
+      SetBkColor( dc, RGB(255,255,255) );
+      SetBkMode( dc, OPAQUE );
+
+      {
+        int i, l = 0, max = 0;
+        char buf[1024];
+        RECT rc;
+        POINT p;
+
+        for( i = 0 ; i < fi[0].numtypes ; i++ )
+        {
+          l += sprintf( buf + l, "channels: %d %s\n", fi[0].effective[i], windowranges[i] ? expand_to_string( windowranges[i] ) : "Done." );
+          if ( windowranges[i] > max ) max = windowranges[i];
+        }
+
+        rc.left = 32; rc.top = bitmaph*curzoom+10;
+        rc.right = 512; rc.bottom = rc.top + 512;
+        DrawText( dc, buf, -1, &rc, DT_TOP );
+
+        l = 0;
+        if ( max == 0 )
+        {
+          static DWORD traindone = 0;
+          if ( traindone == 0 ) traindone = GetTickCount();
+          l = sprintf( buf, "Finished in %s.", gettime( traindone - trainstart ) );
+        }
+        else if ( max != MAXRANGE )
+          l = sprintf( buf, "Done in %s...", gettime( (int) ( ( ( (int64)max * ( (int64)GetTickCount() - (int64)trainstart ) ) ) / (int64) ( MAXRANGE - max ) ) ) );
+
+        GetCursorPos( &p );
+        ScreenToClient( window, &p );
+
+        if ( ( p.x >= 0 ) && ( p.y >= 0 ) && ( p.x < (bitmapw*curzoom) ) && ( p.y < (bitmaph*curzoom) ) )
+        {
+          int findex;
+          int x, y, w, h, sx, sy, ix, iy, ox, oy;
+          int ir, chanind;
+          int * ts;
+          char badstr[64];
+          STBIR__V_FIRST_INFO v_info={0};
+
+          p.x /= curzoom;
+          p.y /= curzoom;
+
+          for( findex = 0 ; findex < numfileinfo ; findex++ )
+          {
+            x = fi[findex].bitmapx;
+            y = fi[findex].bitmapy;
+            w = x + ( fi[findex].dimensionx + 1 ) * fi[findex].numtypes;
+            h = y + ( fi[findex].dimensiony + 1 ) * fi[findex].numinputrects;
+
+            if ( ( p.x >= x ) && ( p.y >= y ) && ( p.x < w ) && ( p.y < h ) )
+              goto found;
+          }
+          goto nope;
+         
+         found:
+            
+          ir = ( p.y - y ) / ( fi[findex].dimensiony + 1 );
+          sy = ( p.y - y ) % ( fi[findex].dimensiony + 1 );
+          if ( sy >= fi[findex].dimensiony ) goto nope;
+
+          chanind = ( p.x - x ) / ( fi[findex].dimensionx + 1 );
+          sx = ( p.x - x ) % ( fi[findex].dimensionx + 1 );
+          if ( sx >= fi[findex].dimensionx ) goto nope;
+
+          ix = fi[findex].inputrects[ir*2];
+          iy = fi[findex].inputrects[ir*2+1];
+
+          ts = fi[findex].timings + ( ( fi[findex].dimensionx * fi[findex].dimensiony * fi[findex].numtypes * ir ) + ( fi[findex].dimensionx * fi[findex].dimensiony * chanind ) + ( fi[findex].dimensionx * sy ) + sx ) * 2;
+
+          ox = 1+fi[findex].outputscalex*sx;
+          oy = 1+fi[findex].outputscaley*sy;
+
+          if ( windowstatus != 2 )
+          {
+            int VF, HF, v_first, good;
+            VF = ts[0];
+            HF = ts[1];
+
+            v_first = vert_first( retrain_weights[chanind], ox, oy, ix, iy, STBIR_FILTER_MITCHELL, &v_info );
+
+            good = ( ((HF<=VF) && (!v_first)) || ((VF<=HF) && (v_first)));
+
+            if ( good )
+              badstr[0] = 0;
+            else
+            {
+              double r;
+
+              if ( HF < VF )
+                r = (double)(VF-HF)/(double)HF;
+              else
+                r = (double)(HF-VF)/(double)VF;
+              sprintf( badstr, " %.1f%% off", r*100 );
+            }
+            sprintf( buf + l, "\n\n%s\nCh: %d Resize: %dx%d to %dx%d\nV: %d H: %d  Order: %c (%s%s)\nClass: %d Scale: %.2f %s", fi[findex].filename,fi[findex].effective[chanind], ix,iy,ox,oy, VF, HF, v_first?'V':'H', good?"Good":"Wrong", badstr, v_info.v_resize_classification, (double)oy/(double)iy, v_info.is_gather ? "Gather" : "Scatter" );
+          }
+          else
+          {
+            int v_first, time0, time1;
+            float (* weights)[4] = stbir__compute_weights[chanind];
+            int * ts1;
+            char b0[32], b1[32];
+
+            ts1 = fi[1].timings + ( ts - fi[0].timings );
+
+            v_first = vert_first( weights, ox, oy, ix, iy, STBIR_FILTER_MITCHELL, &v_info );
+
+            time0 = ( v_first ) ? ts[0] : ts[1];
+            time1 = ( v_first ) ? ts1[0] : ts1[1];
+            
+            b0[0] = b1[0] = 0;
+            if ( time0 < time1 )
+              sprintf( b0," (%.f%% better)", ((double)time1-(double)time0)*100.0f/(double)time0);
+            else
+              sprintf( b1," (%.f%% better)", ((double)time0-(double)time1)*100.0f/(double)time1);
+
+            sprintf( buf + l, "\n\n0: %s\n1: %s\nCh: %d Resize: %dx%d to %dx%d\nClass: %d Scale: %.2f %s\nTime0: %d%s\nTime1: %d%s", fi[0].filename, fi[1].filename, fi[0].effective[chanind], ix,iy,ox,oy, v_info.v_resize_classification, (double)oy/(double)iy, v_info.is_gather ? "Gather" : "Scatter", time0, b0, time1, b1 );
+          }
+        }
+       nope:
+
+        rc.left = 32+320; rc.right = 512+320; 
+        SetTextColor( dc, RGB(0,0,128) );
+        DrawText( dc, buf, -1, &rc, DT_TOP );
+
+      }
+      EndPaint( window, &ps );
+      return 0;
+    }
+
+    case WM_TIMER:
+      InvalidateRect( window, 0, 0 );
+      return 0;
+
+    case WM_DESTROY:
+      PostQuitMessage( 0 );
+      return 0;
+  }
+   
+
+  return DefWindowProc( window, message, wparam, lparam );
+}
+
+static void SetHighDPI(void)
+{
+  typedef HRESULT WINAPI setdpitype(int v);
+  HMODULE h=LoadLibrary("Shcore.dll");
+  if (h)
+  {
+    setdpitype * sd = (setdpitype*)GetProcAddress(h,"SetProcessDpiAwareness");
+    if (sd )
+      sd(1);
+  }
+} 
+
+static void draw_window()
+{
+  WNDCLASS wc;
+  HWND w;
+  MSG msg;
+  
+  instance = GetModuleHandle(NULL);
+
+  wc.style = 0;
+  wc.lpfnWndProc = WindowProc;
+  wc.cbClsExtra = 0;
+  wc.cbWndExtra = 0;
+  wc.hInstance = instance;
+  wc.hIcon = 0;
+  wc.hCursor = LoadCursor(NULL, IDC_ARROW);
+  wc.hbrBackground = 0;
+  wc.lpszMenuName = 0;
+  wc.lpszClassName = "WHTrain";
+
+  if ( !RegisterClass( &wc ) )
+    exit(1);
+
+  SetHighDPI();
+
+  bmiHeader.biSize          =  sizeof(BITMAPINFOHEADER);
+  bmiHeader.biWidth         =  bitmapp/3;
+  bmiHeader.biHeight        =  -bitmaph;
+  bmiHeader.biPlanes        =  1;
+  bmiHeader.biBitCount      =  24;
+  bmiHeader.biCompression   =  BI_RGB;
+
+  w = CreateWindow( "WHTrain",
+                    "Vertical First Training",
+                    WS_CAPTION | WS_POPUP| WS_CLIPCHILDREN |
+                    WS_SYSMENU | WS_MINIMIZEBOX | WS_SIZEBOX,
+                    CW_USEDEFAULT,CW_USEDEFAULT,
+                    CW_USEDEFAULT,CW_USEDEFAULT,
+                    0, 0, instance, 0 );
+
+  {
+    RECT r, c;
+    GetWindowRect( w, &r );
+    GetClientRect( w, &c );
+    extrawindoww = ( r.right - r.left ) - ( c.right - c.left );
+    extrawindowh = ( r.bottom - r.top ) - ( c.bottom - c.top );
+    SetWindowPos( w, 0, 0, 0, bitmapw * curzoom + extrawindoww, bitmaph * curzoom + extrawindowh + 164, SWP_NOMOVE );
+  }
+
+  ShowWindow( w, SW_SHOWNORMAL );
+  SetTimer( w, 1, 250, 0 );
+
+  {
+    BOOL ret;
+    while( ( ret = GetMessage( &msg, w, 0, 0 ) ) != 0 )
+    { 
+      if ( ret == -1 )
+        break;
+      TranslateMessage( &msg ); 
+      DispatchMessage( &msg ); 
+    }
+  }
+}
+
+static void retrain()
+{
+  HANDLE threads[ 16 ];
+  int chanind;
+
+  trainstart = GetTickCount();
+  for( chanind = 0 ; chanind < fi[0].numtypes ; chanind++ )
+    threads[ chanind ] = CreateThread( 0, 2048*1024, retrain_shim, (LPVOID)(size_t)chanind, 0, 0 );
+
+  draw_window();
+
+  for( chanind = 0 ; chanind < fi[0].numtypes ; chanind++ )
+  {
+    WaitForSingleObject( threads[ chanind ], INFINITE );
+    CloseHandle( threads[ chanind ] );
+  }
+
+  write_bitmap();
+
+  print_struct( retrain_weights, "retained_weights" );
+  if ( windowstatus ) printf( "CANCELLED!\n" );
+}
+
+static void info()
+{
+  int findex;
+
+  // display info about each input file
+  for( findex = 0 ; findex < numfileinfo ; findex++ )
+  {
+    int i, h,m,s;
+    if ( findex ) printf( "\n" );
+    printf( "Timing file: %s\n", fi[findex].filename );
+    printf( "CPU type: %d  %s\n", fi[findex].cpu, fi[findex].simd?(fi[findex].simd==2?"SIMD8":"SIMD4"):"Scalar" );
+    h = fi[findex].milliseconds/3600000;
+    m = (fi[findex].milliseconds-h*3600000)/60000;
+    s = (fi[findex].milliseconds-h*3600000-m*60000)/1000;
+    printf( "Total time in test: %dh %dm %ds  Cycles/sec: %.f\n", h,m,s, 1000.0/fi[findex].scale_time );
+    printf( "Each tile of samples is %dx%d, and is scaled by %dx%d.\n", fi[findex].dimensionx,fi[findex].dimensiony, fi[findex].outputscalex,fi[findex].outputscaley );
+    printf( "So the x coords are: " );
+    for( i=0; i < fi[findex].dimensionx ; i++ ) printf( "%d ",1+i*fi[findex].outputscalex );
+    printf( "\n" );
+    printf( "And the y coords are: " );
+    for( i=0; i < fi[findex].dimensiony ; i++ ) printf( "%d ",1+i*fi[findex].outputscaley );
+    printf( "\n" );
+    printf( "There are %d channel counts and they are: ", fi[findex].numtypes );
+    for( i=0; i < fi[findex].numtypes ; i++ ) printf( "%d ",fi[findex].effective[i] );
+    printf( "\n" );
+    printf( "There are %d input rect sizes and they are: ", fi[findex].numinputrects );
+    for( i=0; i < fi[findex].numtypes ; i++ ) printf( "%dx%d ",fi[findex].inputrects[i*2],fi[findex].inputrects[i*2+1] );
+    printf( "\n" );
+  }
+}
+
+static void current( int do_win, int do_bitmap )
+{
+  int i, findex;
+
+  trainstart = GetTickCount();
+
+  // clear progress
+  memset( windowranges, 0, sizeof( windowranges ) );
+  // copy in appropriate weights
+  memcpy( retrain_weights, stbir__compute_weights, sizeof( retrain_weights ) );
+
+  // build and print current errors and build current bitmap
+  for( i = 0 ; i < fi[0].numtypes ; i++ )
+  {
+    double curerr[STBIR_RESIZE_CLASSIFICATIONS];
+    int curtot[STBIR_RESIZE_CLASSIFICATIONS];
+    float (* weights)[4] = retrain_weights[i];
+ 
+    calc_errors( weights, curtot, curerr, i );
+    if ( !do_bitmap )
+      print_weights( weights, i, curtot, curerr );
+
+    for( findex = 0 ; findex < numfileinfo ; findex++ )
+      build_bitmap( weights, i, findex );
+  }
+
+  if ( do_win )
+    draw_window();
+
+  if ( do_bitmap )
+    write_bitmap();
+}
+
+static void compare()
+{
+  int i;
+
+  trainstart = GetTickCount();
+  windowstatus = 2; // comp mode
+
+  // clear progress
+  memset( windowranges, 0, sizeof( windowranges ) );
+
+  if ( ( fi[0].numtypes != fi[1].numtypes ) || ( fi[0].numinputrects != fi[1].numinputrects ) ||
+       ( fi[0].dimensionx != fi[1].dimensionx ) || ( fi[0].dimensiony != fi[1].dimensiony ) || 
+       ( fi[0].outputscalex != fi[1].outputscalex ) || ( fi[0].outputscaley != fi[1].outputscaley ) )
+  {
+   err:
+    printf( "Timing files don't match.\n" );
+    exit(5);
+  }
+
+  for( i=0; i < fi[0].numtypes ; i++ )
+  {
+    if ( fi[0].effective[i]      != fi[1].effective[i] ) goto err;
+    if ( fi[0].inputrects[i*2]   != fi[1].inputrects[i*2] ) goto err;
+    if ( fi[0].inputrects[i*2+1] != fi[1].inputrects[i*2+1] ) goto err;
+  }
+    
+  alloc_bitmap( 1 );
+  
+  for( i = 0 ; i < fi[0].numtypes ; i++ )
+  {
+    float (* weights)[4] = stbir__compute_weights[i];
+    build_comp_bitmap( weights, i );
+  }
+
+  draw_window();
+}
+
+static void load_files( char ** args, int count )
+{
+  int i;
+
+  if ( count == 0 )
+  {
+    printf( "No timing files listed!" );
+    exit(3);
+  }
+
+  for ( i = 0 ; i < count ; i++ )
+  {
+    if ( !use_timing_file( args[i], i ) )
+    {
+      printf( "Bad timing file %s\n", args[i] );
+      exit(2);
+    }
+  }
+  numfileinfo = count;
+}  
+
+int main( int argc, char ** argv )
+{
+  int check;
+  if ( argc < 3 )
+  {
+   err:
+    printf( "vf_train retrain [timing_filenames....] - recalcs weights for all the files on the command line.\n");
+    printf( "vf_train info [timing_filenames....] - shows info about each timing file.\n");
+    printf( "vf_train check [timing_filenames...] - show results for the current weights for all files listed.\n");
+    printf( "vf_train compare <timing file1> <timing file2> - compare two timing files (must only be two files and same resolution).\n");
+    printf( "vf_train bitmap [timing_filenames...] - write out results.png, comparing against the current weights for all files listed.\n");
+    exit(1);
+  }
+  
+  check = ( strcmp( argv[1], "check" ) == 0 );
+  if ( ( check ) || ( strcmp( argv[1], "bitmap" ) == 0 ) )
+  {
+    load_files( argv + 2, argc - 2 );
+    alloc_bitmap( numfileinfo );
+    current( check, !check );
+  }
+  else if ( strcmp( argv[1], "info" ) == 0 ) 
+  {
+    load_files( argv + 2, argc - 2 );
+    info();
+  }
+  else if ( strcmp( argv[1], "compare" ) == 0 ) 
+  {
+    if ( argc != 4 )
+    {
+      printf( "You must specify two files to compare.\n" );
+      exit(4);
+    }
+
+    load_files( argv + 2, argc - 2 );
+    compare();
+  }
+  else if ( strcmp( argv[1], "retrain" ) == 0 ) 
+  {
+    load_files( argv + 2, argc - 2 );
+    alloc_bitmap( numfileinfo );
+    retrain();  
+  }
+  else
+  {
+    goto err;
+  }
+
+  return 0;
+}
diff --git a/vendor/stb/stb_image_write.h b/vendor/stb/stb_image_write.h
new file mode 100644
index 0000000..e4b32ed
--- /dev/null
+++ b/vendor/stb/stb_image_write.h
@@ -0,0 +1,1724 @@
+/* stb_image_write - v1.16 - public domain - http://nothings.org/stb
+   writes out PNG/BMP/TGA/JPEG/HDR images to C stdio - Sean Barrett 2010-2015
+                                     no warranty implied; use at your own risk
+
+   Before #including,
+
+       #define STB_IMAGE_WRITE_IMPLEMENTATION
+
+   in the file that you want to have the implementation.
+
+   Will probably not work correctly with strict-aliasing optimizations.
+
+ABOUT:
+
+   This header file is a library for writing images to C stdio or a callback.
+
+   The PNG output is not optimal; it is 20-50% larger than the file
+   written by a decent optimizing implementation; though providing a custom
+   zlib compress function (see STBIW_ZLIB_COMPRESS) can mitigate that.
+   This library is designed for source code compactness and simplicity,
+   not optimal image file size or run-time performance.
+
+BUILDING:
+
+   You can #define STBIW_ASSERT(x) before the #include to avoid using assert.h.
+   You can #define STBIW_MALLOC(), STBIW_REALLOC(), and STBIW_FREE() to replace
+   malloc,realloc,free.
+   You can #define STBIW_MEMMOVE() to replace memmove()
+   You can #define STBIW_ZLIB_COMPRESS to use a custom zlib-style compress function
+   for PNG compression (instead of the builtin one), it must have the following signature:
+   unsigned char * my_compress(unsigned char *data, int data_len, int *out_len, int quality);
+   The returned data will be freed with STBIW_FREE() (free() by default),
+   so it must be heap allocated with STBIW_MALLOC() (malloc() by default),
+
+UNICODE:
+
+   If compiling for Windows and you wish to use Unicode filenames, compile
+   with
+       #define STBIW_WINDOWS_UTF8
+   and pass utf8-encoded filenames. Call stbiw_convert_wchar_to_utf8 to convert
+   Windows wchar_t filenames to utf8.
+
+USAGE:
+
+   There are five functions, one for each image file format:
+
+     int stbi_write_png(char const *filename, int w, int h, int comp, const void *data, int stride_in_bytes);
+     int stbi_write_bmp(char const *filename, int w, int h, int comp, const void *data);
+     int stbi_write_tga(char const *filename, int w, int h, int comp, const void *data);
+     int stbi_write_jpg(char const *filename, int w, int h, int comp, const void *data, int quality);
+     int stbi_write_hdr(char const *filename, int w, int h, int comp, const float *data);
+
+     void stbi_flip_vertically_on_write(int flag); // flag is non-zero to flip data vertically
+
+   There are also five equivalent functions that use an arbitrary write function. You are
+   expected to open/close your file-equivalent before and after calling these:
+
+     int stbi_write_png_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void  *data, int stride_in_bytes);
+     int stbi_write_bmp_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void  *data);
+     int stbi_write_tga_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void  *data);
+     int stbi_write_hdr_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const float *data);
+     int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data, int quality);
+
+   where the callback is:
+      void stbi_write_func(void *context, void *data, int size);
+
+   You can configure it with these global variables:
+      int stbi_write_tga_with_rle;             // defaults to true; set to 0 to disable RLE
+      int stbi_write_png_compression_level;    // defaults to 8; set to higher for more compression
+      int stbi_write_force_png_filter;         // defaults to -1; set to 0..5 to force a filter mode
+
+
+   You can define STBI_WRITE_NO_STDIO to disable the file variant of these
+   functions, so the library will not use stdio.h at all. However, this will
+   also disable HDR writing, because it requires stdio for formatted output.
+
+   Each function returns 0 on failure and non-0 on success.
+
+   The functions create an image file defined by the parameters. The image
+   is a rectangle of pixels stored from left-to-right, top-to-bottom.
+   Each pixel contains 'comp' channels of data stored interleaved with 8-bits
+   per channel, in the following order: 1=Y, 2=YA, 3=RGB, 4=RGBA. (Y is
+   monochrome color.) The rectangle is 'w' pixels wide and 'h' pixels tall.
+   The *data pointer points to the first byte of the top-left-most pixel.
+   For PNG, "stride_in_bytes" is the distance in bytes from the first byte of
+   a row of pixels to the first byte of the next row of pixels.
+
+   PNG creates output files with the same number of components as the input.
+   The BMP format expands Y to RGB in the file format and does not
+   output alpha.
+
+   PNG supports writing rectangles of data even when the bytes storing rows of
+   data are not consecutive in memory (e.g. sub-rectangles of a larger image),
+   by supplying the stride between the beginning of adjacent rows. The other
+   formats do not. (Thus you cannot write a native-format BMP through the BMP
+   writer, both because it is in BGR order and because it may have padding
+   at the end of the line.)
+
+   PNG allows you to set the deflate compression level by setting the global
+   variable 'stbi_write_png_compression_level' (it defaults to 8).
+
+   HDR expects linear float data. Since the format is always 32-bit rgb(e)
+   data, alpha (if provided) is discarded, and for monochrome data it is
+   replicated across all three channels.
+
+   TGA supports RLE or non-RLE compressed data. To use non-RLE-compressed
+   data, set the global variable 'stbi_write_tga_with_rle' to 0.
+
+   JPEG does ignore alpha channels in input data; quality is between 1 and 100.
+   Higher quality looks better but results in a bigger image.
+   JPEG baseline (no JPEG progressive).
+
+CREDITS:
+
+
+   Sean Barrett           -    PNG/BMP/TGA
+   Baldur Karlsson        -    HDR
+   Jean-Sebastien Guay    -    TGA monochrome
+   Tim Kelsey             -    misc enhancements
+   Alan Hickman           -    TGA RLE
+   Emmanuel Julien        -    initial file IO callback implementation
+   Jon Olick              -    original jo_jpeg.cpp code
+   Daniel Gibson          -    integrate JPEG, allow external zlib
+   Aarni Koskela          -    allow choosing PNG filter
+
+   bugfixes:
+      github:Chribba
+      Guillaume Chereau
+      github:jry2
+      github:romigrou
+      Sergio Gonzalez
+      Jonas Karlsson
+      Filip Wasil
+      Thatcher Ulrich
+      github:poppolopoppo
+      Patrick Boettcher
+      github:xeekworx
+      Cap Petschulat
+      Simon Rodriguez
+      Ivan Tikhonov
+      github:ignotion
+      Adam Schackart
+      Andrew Kensler
+
+LICENSE
+
+  See end of file for license information.
+
+*/
+
+#ifndef INCLUDE_STB_IMAGE_WRITE_H
+#define INCLUDE_STB_IMAGE_WRITE_H
+
+#include <stdlib.h>
+
+// if STB_IMAGE_WRITE_STATIC causes problems, try defining STBIWDEF to 'inline' or 'static inline'
+#ifndef STBIWDEF
+#ifdef STB_IMAGE_WRITE_STATIC
+#define STBIWDEF  static
+#else
+#ifdef __cplusplus
+#define STBIWDEF  extern "C"
+#else
+#define STBIWDEF  extern
+#endif
+#endif
+#endif
+
+#ifndef STB_IMAGE_WRITE_STATIC  // C++ forbids static forward declarations
+STBIWDEF int stbi_write_tga_with_rle;
+STBIWDEF int stbi_write_png_compression_level;
+STBIWDEF int stbi_write_force_png_filter;
+#endif
+
+#ifndef STBI_WRITE_NO_STDIO
+STBIWDEF int stbi_write_png(char const *filename, int w, int h, int comp, const void  *data, int stride_in_bytes);
+STBIWDEF int stbi_write_bmp(char const *filename, int w, int h, int comp, const void  *data);
+STBIWDEF int stbi_write_tga(char const *filename, int w, int h, int comp, const void  *data);
+STBIWDEF int stbi_write_hdr(char const *filename, int w, int h, int comp, const float *data);
+STBIWDEF int stbi_write_jpg(char const *filename, int x, int y, int comp, const void  *data, int quality);
+
+#ifdef STBIW_WINDOWS_UTF8
+STBIWDEF int stbiw_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
+#endif
+#endif
+
+typedef void stbi_write_func(void *context, void *data, int size);
+
+STBIWDEF int stbi_write_png_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void  *data, int stride_in_bytes);
+STBIWDEF int stbi_write_bmp_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void  *data);
+STBIWDEF int stbi_write_tga_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void  *data);
+STBIWDEF int stbi_write_hdr_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const float *data);
+STBIWDEF int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void  *data, int quality);
+
+STBIWDEF void stbi_flip_vertically_on_write(int flip_boolean);
+
+#endif//INCLUDE_STB_IMAGE_WRITE_H
+
+#ifdef STB_IMAGE_WRITE_IMPLEMENTATION
+
+#ifdef _WIN32
+   #ifndef _CRT_SECURE_NO_WARNINGS
+   #define _CRT_SECURE_NO_WARNINGS
+   #endif
+   #ifndef _CRT_NONSTDC_NO_DEPRECATE
+   #define _CRT_NONSTDC_NO_DEPRECATE
+   #endif
+#endif
+
+#ifndef STBI_WRITE_NO_STDIO
+#include <stdio.h>
+#endif // STBI_WRITE_NO_STDIO
+
+#include <stdarg.h>
+#include <stdlib.h>
+#include <string.h>
+#include <math.h>
+
+#if defined(STBIW_MALLOC) && defined(STBIW_FREE) && (defined(STBIW_REALLOC) || defined(STBIW_REALLOC_SIZED))
+// ok
+#elif !defined(STBIW_MALLOC) && !defined(STBIW_FREE) && !defined(STBIW_REALLOC) && !defined(STBIW_REALLOC_SIZED)
+// ok
+#else
+#error "Must define all or none of STBIW_MALLOC, STBIW_FREE, and STBIW_REALLOC (or STBIW_REALLOC_SIZED)."
+#endif
+
+#ifndef STBIW_MALLOC
+#define STBIW_MALLOC(sz)        malloc(sz)
+#define STBIW_REALLOC(p,newsz)  realloc(p,newsz)
+#define STBIW_FREE(p)           free(p)
+#endif
+
+#ifndef STBIW_REALLOC_SIZED
+#define STBIW_REALLOC_SIZED(p,oldsz,newsz) STBIW_REALLOC(p,newsz)
+#endif
+
+
+#ifndef STBIW_MEMMOVE
+#define STBIW_MEMMOVE(a,b,sz) memmove(a,b,sz)
+#endif
+
+
+#ifndef STBIW_ASSERT
+#include <assert.h>
+#define STBIW_ASSERT(x) assert(x)
+#endif
+
+#define STBIW_UCHAR(x) (unsigned char) ((x) & 0xff)
+
+#ifdef STB_IMAGE_WRITE_STATIC
+static int stbi_write_png_compression_level = 8;
+static int stbi_write_tga_with_rle = 1;
+static int stbi_write_force_png_filter = -1;
+#else
+int stbi_write_png_compression_level = 8;
+int stbi_write_tga_with_rle = 1;
+int stbi_write_force_png_filter = -1;
+#endif
+
+static int stbi__flip_vertically_on_write = 0;
+
+STBIWDEF void stbi_flip_vertically_on_write(int flag)
+{
+   stbi__flip_vertically_on_write = flag;
+}
+
+typedef struct
+{
+   stbi_write_func *func;
+   void *context;
+   unsigned char buffer[64];
+   int buf_used;
+} stbi__write_context;
+
+// initialize a callback-based context
+static void stbi__start_write_callbacks(stbi__write_context *s, stbi_write_func *c, void *context)
+{
+   s->func    = c;
+   s->context = context;
+}
+
+#ifndef STBI_WRITE_NO_STDIO
+
+static void stbi__stdio_write(void *context, void *data, int size)
+{
+   fwrite(data,1,size,(FILE*) context);
+}
+
+#if defined(_WIN32) && defined(STBIW_WINDOWS_UTF8)
+#ifdef __cplusplus
+#define STBIW_EXTERN extern "C"
+#else
+#define STBIW_EXTERN extern
+#endif
+STBIW_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
+STBIW_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
+
+STBIWDEF int stbiw_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
+{
+   return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
+}
+#endif
+
+static FILE *stbiw__fopen(char const *filename, char const *mode)
+{
+   FILE *f;
+#if defined(_WIN32) && defined(STBIW_WINDOWS_UTF8)
+   wchar_t wMode[64];
+   wchar_t wFilename[1024];
+   if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
+      return 0;
+
+   if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
+      return 0;
+
+#if defined(_MSC_VER) && _MSC_VER >= 1400
+   if (0 != _wfopen_s(&f, wFilename, wMode))
+      f = 0;
+#else
+   f = _wfopen(wFilename, wMode);
+#endif
+
+#elif defined(_MSC_VER) && _MSC_VER >= 1400
+   if (0 != fopen_s(&f, filename, mode))
+      f=0;
+#else
+   f = fopen(filename, mode);
+#endif
+   return f;
+}
+
+static int stbi__start_write_file(stbi__write_context *s, const char *filename)
+{
+   FILE *f = stbiw__fopen(filename, "wb");
+   stbi__start_write_callbacks(s, stbi__stdio_write, (void *) f);
+   return f != NULL;
+}
+
+static void stbi__end_write_file(stbi__write_context *s)
+{
+   fclose((FILE *)s->context);
+}
+
+#endif // !STBI_WRITE_NO_STDIO
+
+typedef unsigned int stbiw_uint32;
+typedef int stb_image_write_test[sizeof(stbiw_uint32)==4 ? 1 : -1];
+
+static void stbiw__writefv(stbi__write_context *s, const char *fmt, va_list v)
+{
+   while (*fmt) {
+      switch (*fmt++) {
+         case ' ': break;
+         case '1': { unsigned char x = STBIW_UCHAR(va_arg(v, int));
+                     s->func(s->context,&x,1);
+                     break; }
+         case '2': { int x = va_arg(v,int);
+                     unsigned char b[2];
+                     b[0] = STBIW_UCHAR(x);
+                     b[1] = STBIW_UCHAR(x>>8);
+                     s->func(s->context,b,2);
+                     break; }
+         case '4': { stbiw_uint32 x = va_arg(v,int);
+                     unsigned char b[4];
+                     b[0]=STBIW_UCHAR(x);
+                     b[1]=STBIW_UCHAR(x>>8);
+                     b[2]=STBIW_UCHAR(x>>16);
+                     b[3]=STBIW_UCHAR(x>>24);
+                     s->func(s->context,b,4);
+                     break; }
+         default:
+            STBIW_ASSERT(0);
+            return;
+      }
+   }
+}
+
+static void stbiw__writef(stbi__write_context *s, const char *fmt, ...)
+{
+   va_list v;
+   va_start(v, fmt);
+   stbiw__writefv(s, fmt, v);
+   va_end(v);
+}
+
+static void stbiw__write_flush(stbi__write_context *s)
+{
+   if (s->buf_used) {
+      s->func(s->context, &s->buffer, s->buf_used);
+      s->buf_used = 0;
+   }
+}
+
+static void stbiw__putc(stbi__write_context *s, unsigned char c)
+{
+   s->func(s->context, &c, 1);
+}
+
+static void stbiw__write1(stbi__write_context *s, unsigned char a)
+{
+   if ((size_t)s->buf_used + 1 > sizeof(s->buffer))
+      stbiw__write_flush(s);
+   s->buffer[s->buf_used++] = a;
+}
+
+static void stbiw__write3(stbi__write_context *s, unsigned char a, unsigned char b, unsigned char c)
+{
+   int n;
+   if ((size_t)s->buf_used + 3 > sizeof(s->buffer))
+      stbiw__write_flush(s);
+   n = s->buf_used;
+   s->buf_used = n+3;
+   s->buffer[n+0] = a;
+   s->buffer[n+1] = b;
+   s->buffer[n+2] = c;
+}
+
+static void stbiw__write_pixel(stbi__write_context *s, int rgb_dir, int comp, int write_alpha, int expand_mono, unsigned char *d)
+{
+   unsigned char bg[3] = { 255, 0, 255}, px[3];
+   int k;
+
+   if (write_alpha < 0)
+      stbiw__write1(s, d[comp - 1]);
+
+   switch (comp) {
+      case 2: // 2 pixels = mono + alpha, alpha is written separately, so same as 1-channel case
+      case 1:
+         if (expand_mono)
+            stbiw__write3(s, d[0], d[0], d[0]); // monochrome bmp
+         else
+            stbiw__write1(s, d[0]);  // monochrome TGA
+         break;
+      case 4:
+         if (!write_alpha) {
+            // composite against pink background
+            for (k = 0; k < 3; ++k)
+               px[k] = bg[k] + ((d[k] - bg[k]) * d[3]) / 255;
+            stbiw__write3(s, px[1 - rgb_dir], px[1], px[1 + rgb_dir]);
+            break;
+         }
+         /* FALLTHROUGH */
+      case 3:
+         stbiw__write3(s, d[1 - rgb_dir], d[1], d[1 + rgb_dir]);
+         break;
+   }
+   if (write_alpha > 0)
+      stbiw__write1(s, d[comp - 1]);
+}
+
+static void stbiw__write_pixels(stbi__write_context *s, int rgb_dir, int vdir, int x, int y, int comp, void *data, int write_alpha, int scanline_pad, int expand_mono)
+{
+   stbiw_uint32 zero = 0;
+   int i,j, j_end;
+
+   if (y <= 0)
+      return;
+
+   if (stbi__flip_vertically_on_write)
+      vdir *= -1;
+
+   if (vdir < 0) {
+      j_end = -1; j = y-1;
+   } else {
+      j_end =  y; j = 0;
+   }
+
+   for (; j != j_end; j += vdir) {
+      for (i=0; i < x; ++i) {
+         unsigned char *d = (unsigned char *) data + (j*x+i)*comp;
+         stbiw__write_pixel(s, rgb_dir, comp, write_alpha, expand_mono, d);
+      }
+      stbiw__write_flush(s);
+      s->func(s->context, &zero, scanline_pad);
+   }
+}
+
+static int stbiw__outfile(stbi__write_context *s, int rgb_dir, int vdir, int x, int y, int comp, int expand_mono, void *data, int alpha, int pad, const char *fmt, ...)
+{
+   if (y < 0 || x < 0) {
+      return 0;
+   } else {
+      va_list v;
+      va_start(v, fmt);
+      stbiw__writefv(s, fmt, v);
+      va_end(v);
+      stbiw__write_pixels(s,rgb_dir,vdir,x,y,comp,data,alpha,pad, expand_mono);
+      return 1;
+   }
+}
+
+static int stbi_write_bmp_core(stbi__write_context *s, int x, int y, int comp, const void *data)
+{
+   if (comp != 4) {
+      // write RGB bitmap
+      int pad = (-x*3) & 3;
+      return stbiw__outfile(s,-1,-1,x,y,comp,1,(void *) data,0,pad,
+              "11 4 22 4" "4 44 22 444444",
+              'B', 'M', 14+40+(x*3+pad)*y, 0,0, 14+40,  // file header
+               40, x,y, 1,24, 0,0,0,0,0,0);             // bitmap header
+   } else {
+      // RGBA bitmaps need a v4 header
+      // use BI_BITFIELDS mode with 32bpp and alpha mask
+      // (straight BI_RGB with alpha mask doesn't work in most readers)
+      return stbiw__outfile(s,-1,-1,x,y,comp,1,(void *)data,1,0,
+         "11 4 22 4" "4 44 22 444444 4444 4 444 444 444 444",
+         'B', 'M', 14+108+x*y*4, 0, 0, 14+108, // file header
+         108, x,y, 1,32, 3,0,0,0,0,0, 0xff0000,0xff00,0xff,0xff000000u, 0, 0,0,0, 0,0,0, 0,0,0, 0,0,0); // bitmap V4 header
+   }
+}
+
+STBIWDEF int stbi_write_bmp_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data)
+{
+   stbi__write_context s = { 0 };
+   stbi__start_write_callbacks(&s, func, context);
+   return stbi_write_bmp_core(&s, x, y, comp, data);
+}
+
+#ifndef STBI_WRITE_NO_STDIO
+STBIWDEF int stbi_write_bmp(char const *filename, int x, int y, int comp, const void *data)
+{
+   stbi__write_context s = { 0 };
+   if (stbi__start_write_file(&s,filename)) {
+      int r = stbi_write_bmp_core(&s, x, y, comp, data);
+      stbi__end_write_file(&s);
+      return r;
+   } else
+      return 0;
+}
+#endif //!STBI_WRITE_NO_STDIO
+
+static int stbi_write_tga_core(stbi__write_context *s, int x, int y, int comp, void *data)
+{
+   int has_alpha = (comp == 2 || comp == 4);
+   int colorbytes = has_alpha ? comp-1 : comp;
+   int format = colorbytes < 2 ? 3 : 2; // 3 color channels (RGB/RGBA) = 2, 1 color channel (Y/YA) = 3
+
+   if (y < 0 || x < 0)
+      return 0;
+
+   if (!stbi_write_tga_with_rle) {
+      return stbiw__outfile(s, -1, -1, x, y, comp, 0, (void *) data, has_alpha, 0,
+         "111 221 2222 11", 0, 0, format, 0, 0, 0, 0, 0, x, y, (colorbytes + has_alpha) * 8, has_alpha * 8);
+   } else {
+      int i,j,k;
+      int jend, jdir;
+
+      stbiw__writef(s, "111 221 2222 11", 0,0,format+8, 0,0,0, 0,0,x,y, (colorbytes + has_alpha) * 8, has_alpha * 8);
+
+      if (stbi__flip_vertically_on_write) {
+         j = 0;
+         jend = y;
+         jdir = 1;
+      } else {
+         j = y-1;
+         jend = -1;
+         jdir = -1;
+      }
+      for (; j != jend; j += jdir) {
+         unsigned char *row = (unsigned char *) data + j * x * comp;
+         int len;
+
+         for (i = 0; i < x; i += len) {
+            unsigned char *begin = row + i * comp;
+            int diff = 1;
+            len = 1;
+
+            if (i < x - 1) {
+               ++len;
+               diff = memcmp(begin, row + (i + 1) * comp, comp);
+               if (diff) {
+                  const unsigned char *prev = begin;
+                  for (k = i + 2; k < x && len < 128; ++k) {
+                     if (memcmp(prev, row + k * comp, comp)) {
+                        prev += comp;
+                        ++len;
+                     } else {
+                        --len;
+                        break;
+                     }
+                  }
+               } else {
+                  for (k = i + 2; k < x && len < 128; ++k) {
+                     if (!memcmp(begin, row + k * comp, comp)) {
+                        ++len;
+                     } else {
+                        break;
+                     }
+                  }
+               }
+            }
+
+            if (diff) {
+               unsigned char header = STBIW_UCHAR(len - 1);
+               stbiw__write1(s, header);
+               for (k = 0; k < len; ++k) {
+                  stbiw__write_pixel(s, -1, comp, has_alpha, 0, begin + k * comp);
+               }
+            } else {
+               unsigned char header = STBIW_UCHAR(len - 129);
+               stbiw__write1(s, header);
+               stbiw__write_pixel(s, -1, comp, has_alpha, 0, begin);
+            }
+         }
+      }
+      stbiw__write_flush(s);
+   }
+   return 1;
+}
+
+STBIWDEF int stbi_write_tga_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data)
+{
+   stbi__write_context s = { 0 };
+   stbi__start_write_callbacks(&s, func, context);
+   return stbi_write_tga_core(&s, x, y, comp, (void *) data);
+}
+
+#ifndef STBI_WRITE_NO_STDIO
+STBIWDEF int stbi_write_tga(char const *filename, int x, int y, int comp, const void *data)
+{
+   stbi__write_context s = { 0 };
+   if (stbi__start_write_file(&s,filename)) {
+      int r = stbi_write_tga_core(&s, x, y, comp, (void *) data);
+      stbi__end_write_file(&s);
+      return r;
+   } else
+      return 0;
+}
+#endif
+
+// *************************************************************************************************
+// Radiance RGBE HDR writer
+// by Baldur Karlsson
+
+#define stbiw__max(a, b)  ((a) > (b) ? (a) : (b))
+
+#ifndef STBI_WRITE_NO_STDIO
+
+static void stbiw__linear_to_rgbe(unsigned char *rgbe, float *linear)
+{
+   int exponent;
+   float maxcomp = stbiw__max(linear[0], stbiw__max(linear[1], linear[2]));
+
+   if (maxcomp < 1e-32f) {
+      rgbe[0] = rgbe[1] = rgbe[2] = rgbe[3] = 0;
+   } else {
+      float normalize = (float) frexp(maxcomp, &exponent) * 256.0f/maxcomp;
+
+      rgbe[0] = (unsigned char)(linear[0] * normalize);
+      rgbe[1] = (unsigned char)(linear[1] * normalize);
+      rgbe[2] = (unsigned char)(linear[2] * normalize);
+      rgbe[3] = (unsigned char)(exponent + 128);
+   }
+}
+
+static void stbiw__write_run_data(stbi__write_context *s, int length, unsigned char databyte)
+{
+   unsigned char lengthbyte = STBIW_UCHAR(length+128);
+   STBIW_ASSERT(length+128 <= 255);
+   s->func(s->context, &lengthbyte, 1);
+   s->func(s->context, &databyte, 1);
+}
+
+static void stbiw__write_dump_data(stbi__write_context *s, int length, unsigned char *data)
+{
+   unsigned char lengthbyte = STBIW_UCHAR(length);
+   STBIW_ASSERT(length <= 128); // inconsistent with spec but consistent with official code
+   s->func(s->context, &lengthbyte, 1);
+   s->func(s->context, data, length);
+}
+
+static void stbiw__write_hdr_scanline(stbi__write_context *s, int width, int ncomp, unsigned char *scratch, float *scanline)
+{
+   unsigned char scanlineheader[4] = { 2, 2, 0, 0 };
+   unsigned char rgbe[4];
+   float linear[3];
+   int x;
+
+   scanlineheader[2] = (width&0xff00)>>8;
+   scanlineheader[3] = (width&0x00ff);
+
+   /* skip RLE for images too small or large */
+   if (width < 8 || width >= 32768) {
+      for (x=0; x < width; x++) {
+         switch (ncomp) {
+            case 4: /* fallthrough */
+            case 3: linear[2] = scanline[x*ncomp + 2];
+                    linear[1] = scanline[x*ncomp + 1];
+                    linear[0] = scanline[x*ncomp + 0];
+                    break;
+            default:
+                    linear[0] = linear[1] = linear[2] = scanline[x*ncomp + 0];
+                    break;
+         }
+         stbiw__linear_to_rgbe(rgbe, linear);
+         s->func(s->context, rgbe, 4);
+      }
+   } else {
+      int c,r;
+      /* encode into scratch buffer */
+      for (x=0; x < width; x++) {
+         switch(ncomp) {
+            case 4: /* fallthrough */
+            case 3: linear[2] = scanline[x*ncomp + 2];
+                    linear[1] = scanline[x*ncomp + 1];
+                    linear[0] = scanline[x*ncomp + 0];
+                    break;
+            default:
+                    linear[0] = linear[1] = linear[2] = scanline[x*ncomp + 0];
+                    break;
+         }
+         stbiw__linear_to_rgbe(rgbe, linear);
+         scratch[x + width*0] = rgbe[0];
+         scratch[x + width*1] = rgbe[1];
+         scratch[x + width*2] = rgbe[2];
+         scratch[x + width*3] = rgbe[3];
+      }
+
+      s->func(s->context, scanlineheader, 4);
+
+      /* RLE each component separately */
+      for (c=0; c < 4; c++) {
+         unsigned char *comp = &scratch[width*c];
+
+         x = 0;
+         while (x < width) {
+            // find first run
+            r = x;
+            while (r+2 < width) {
+               if (comp[r] == comp[r+1] && comp[r] == comp[r+2])
+                  break;
+               ++r;
+            }
+            if (r+2 >= width)
+               r = width;
+            // dump up to first run
+            while (x < r) {
+               int len = r-x;
+               if (len > 128) len = 128;
+               stbiw__write_dump_data(s, len, &comp[x]);
+               x += len;
+            }
+            // if there's a run, output it
+            if (r+2 < width) { // same test as what we break out of in search loop, so only true if we break'd
+               // find next byte after run
+               while (r < width && comp[r] == comp[x])
+                  ++r;
+               // output run up to r
+               while (x < r) {
+                  int len = r-x;
+                  if (len > 127) len = 127;
+                  stbiw__write_run_data(s, len, comp[x]);
+                  x += len;
+               }
+            }
+         }
+      }
+   }
+}
+
+static int stbi_write_hdr_core(stbi__write_context *s, int x, int y, int comp, float *data)
+{
+   if (y <= 0 || x <= 0 || data == NULL)
+      return 0;
+   else {
+      // Each component is stored separately. Allocate scratch space for full output scanline.
+      unsigned char *scratch = (unsigned char *) STBIW_MALLOC(x*4);
+      int i, len;
+      char buffer[128];
+      char header[] = "#?RADIANCE\n# Written by stb_image_write.h\nFORMAT=32-bit_rle_rgbe\n";
+      s->func(s->context, header, sizeof(header)-1);
+
+#ifdef __STDC_LIB_EXT1__
+      len = sprintf_s(buffer, sizeof(buffer), "EXPOSURE=          1.0000000000000\n\n-Y %d +X %d\n", y, x);
+#else
+      len = sprintf(buffer, "EXPOSURE=          1.0000000000000\n\n-Y %d +X %d\n", y, x);
+#endif
+      s->func(s->context, buffer, len);
+
+      for(i=0; i < y; i++)
+         stbiw__write_hdr_scanline(s, x, comp, scratch, data + comp*x*(stbi__flip_vertically_on_write ? y-1-i : i));
+      STBIW_FREE(scratch);
+      return 1;
+   }
+}
+
+STBIWDEF int stbi_write_hdr_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const float *data)
+{
+   stbi__write_context s = { 0 };
+   stbi__start_write_callbacks(&s, func, context);
+   return stbi_write_hdr_core(&s, x, y, comp, (float *) data);
+}
+
+STBIWDEF int stbi_write_hdr(char const *filename, int x, int y, int comp, const float *data)
+{
+   stbi__write_context s = { 0 };
+   if (stbi__start_write_file(&s,filename)) {
+      int r = stbi_write_hdr_core(&s, x, y, comp, (float *) data);
+      stbi__end_write_file(&s);
+      return r;
+   } else
+      return 0;
+}
+#endif // STBI_WRITE_NO_STDIO
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// PNG writer
+//
+
+#ifndef STBIW_ZLIB_COMPRESS
+// stretchy buffer; stbiw__sbpush() == vector<>::push_back() -- stbiw__sbcount() == vector<>::size()
+#define stbiw__sbraw(a) ((int *) (void *) (a) - 2)
+#define stbiw__sbm(a)   stbiw__sbraw(a)[0]
+#define stbiw__sbn(a)   stbiw__sbraw(a)[1]
+
+#define stbiw__sbneedgrow(a,n)  ((a)==0 || stbiw__sbn(a)+n >= stbiw__sbm(a))
+#define stbiw__sbmaybegrow(a,n) (stbiw__sbneedgrow(a,(n)) ? stbiw__sbgrow(a,n) : 0)
+#define stbiw__sbgrow(a,n)  stbiw__sbgrowf((void **) &(a), (n), sizeof(*(a)))
+
+#define stbiw__sbpush(a, v)      (stbiw__sbmaybegrow(a,1), (a)[stbiw__sbn(a)++] = (v))
+#define stbiw__sbcount(a)        ((a) ? stbiw__sbn(a) : 0)
+#define stbiw__sbfree(a)         ((a) ? STBIW_FREE(stbiw__sbraw(a)),0 : 0)
+
+static void *stbiw__sbgrowf(void **arr, int increment, int itemsize)
+{
+   int m = *arr ? 2*stbiw__sbm(*arr)+increment : increment+1;
+   void *p = STBIW_REALLOC_SIZED(*arr ? stbiw__sbraw(*arr) : 0, *arr ? (stbiw__sbm(*arr)*itemsize + sizeof(int)*2) : 0, itemsize * m + sizeof(int)*2);
+   STBIW_ASSERT(p);
+   if (p) {
+      if (!*arr) ((int *) p)[1] = 0;
+      *arr = (void *) ((int *) p + 2);
+      stbiw__sbm(*arr) = m;
+   }
+   return *arr;
+}
+
+static unsigned char *stbiw__zlib_flushf(unsigned char *data, unsigned int *bitbuffer, int *bitcount)
+{
+   while (*bitcount >= 8) {
+      stbiw__sbpush(data, STBIW_UCHAR(*bitbuffer));
+      *bitbuffer >>= 8;
+      *bitcount -= 8;
+   }
+   return data;
+}
+
+static int stbiw__zlib_bitrev(int code, int codebits)
+{
+   int res=0;
+   while (codebits--) {
+      res = (res << 1) | (code & 1);
+      code >>= 1;
+   }
+   return res;
+}
+
+static unsigned int stbiw__zlib_countm(unsigned char *a, unsigned char *b, int limit)
+{
+   int i;
+   for (i=0; i < limit && i < 258; ++i)
+      if (a[i] != b[i]) break;
+   return i;
+}
+
+static unsigned int stbiw__zhash(unsigned char *data)
+{
+   stbiw_uint32 hash = data[0] + (data[1] << 8) + (data[2] << 16);
+   hash ^= hash << 3;
+   hash += hash >> 5;
+   hash ^= hash << 4;
+   hash += hash >> 17;
+   hash ^= hash << 25;
+   hash += hash >> 6;
+   return hash;
+}
+
+#define stbiw__zlib_flush() (out = stbiw__zlib_flushf(out, &bitbuf, &bitcount))
+#define stbiw__zlib_add(code,codebits) \
+      (bitbuf |= (code) << bitcount, bitcount += (codebits), stbiw__zlib_flush())
+#define stbiw__zlib_huffa(b,c)  stbiw__zlib_add(stbiw__zlib_bitrev(b,c),c)
+// default huffman tables
+#define stbiw__zlib_huff1(n)  stbiw__zlib_huffa(0x30 + (n), 8)
+#define stbiw__zlib_huff2(n)  stbiw__zlib_huffa(0x190 + (n)-144, 9)
+#define stbiw__zlib_huff3(n)  stbiw__zlib_huffa(0 + (n)-256,7)
+#define stbiw__zlib_huff4(n)  stbiw__zlib_huffa(0xc0 + (n)-280,8)
+#define stbiw__zlib_huff(n)  ((n) <= 143 ? stbiw__zlib_huff1(n) : (n) <= 255 ? stbiw__zlib_huff2(n) : (n) <= 279 ? stbiw__zlib_huff3(n) : stbiw__zlib_huff4(n))
+#define stbiw__zlib_huffb(n) ((n) <= 143 ? stbiw__zlib_huff1(n) : stbiw__zlib_huff2(n))
+
+#define stbiw__ZHASH   16384
+
+#endif // STBIW_ZLIB_COMPRESS
+
+STBIWDEF unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_len, int quality)
+{
+#ifdef STBIW_ZLIB_COMPRESS
+   // user provided a zlib compress implementation, use that
+   return STBIW_ZLIB_COMPRESS(data, data_len, out_len, quality);
+#else // use builtin
+   static unsigned short lengthc[] = { 3,4,5,6,7,8,9,10,11,13,15,17,19,23,27,31,35,43,51,59,67,83,99,115,131,163,195,227,258, 259 };
+   static unsigned char  lengtheb[]= { 0,0,0,0,0,0,0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,  4,  5,  5,  5,  5,  0 };
+   static unsigned short distc[]   = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577, 32768 };
+   static unsigned char  disteb[]  = { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13 };
+   unsigned int bitbuf=0;
+   int i,j, bitcount=0;
+   unsigned char *out = NULL;
+   unsigned char ***hash_table = (unsigned char***) STBIW_MALLOC(stbiw__ZHASH * sizeof(unsigned char**));
+   if (hash_table == NULL)
+      return NULL;
+   if (quality < 5) quality = 5;
+
+   stbiw__sbpush(out, 0x78);   // DEFLATE 32K window
+   stbiw__sbpush(out, 0x5e);   // FLEVEL = 1
+   stbiw__zlib_add(1,1);  // BFINAL = 1
+   stbiw__zlib_add(1,2);  // BTYPE = 1 -- fixed huffman
+
+   for (i=0; i < stbiw__ZHASH; ++i)
+      hash_table[i] = NULL;
+
+   i=0;
+   while (i < data_len-3) {
+      // hash next 3 bytes of data to be compressed
+      int h = stbiw__zhash(data+i)&(stbiw__ZHASH-1), best=3;
+      unsigned char *bestloc = 0;
+      unsigned char **hlist = hash_table[h];
+      int n = stbiw__sbcount(hlist);
+      for (j=0; j < n; ++j) {
+         if (hlist[j]-data > i-32768) { // if entry lies within window
+            int d = stbiw__zlib_countm(hlist[j], data+i, data_len-i);
+            if (d >= best) { best=d; bestloc=hlist[j]; }
+         }
+      }
+      // when hash table entry is too long, delete half the entries
+      if (hash_table[h] && stbiw__sbn(hash_table[h]) == 2*quality) {
+         STBIW_MEMMOVE(hash_table[h], hash_table[h]+quality, sizeof(hash_table[h][0])*quality);
+         stbiw__sbn(hash_table[h]) = quality;
+      }
+      stbiw__sbpush(hash_table[h],data+i);
+
+      if (bestloc) {
+         // "lazy matching" - check match at *next* byte, and if it's better, do cur byte as literal
+         h = stbiw__zhash(data+i+1)&(stbiw__ZHASH-1);
+         hlist = hash_table[h];
+         n = stbiw__sbcount(hlist);
+         for (j=0; j < n; ++j) {
+            if (hlist[j]-data > i-32767) {
+               int e = stbiw__zlib_countm(hlist[j], data+i+1, data_len-i-1);
+               if (e > best) { // if next match is better, bail on current match
+                  bestloc = NULL;
+                  break;
+               }
+            }
+         }
+      }
+
+      if (bestloc) {
+         int d = (int) (data+i - bestloc); // distance back
+         STBIW_ASSERT(d <= 32767 && best <= 258);
+         for (j=0; best > lengthc[j+1]-1; ++j);
+         stbiw__zlib_huff(j+257);
+         if (lengtheb[j]) stbiw__zlib_add(best - lengthc[j], lengtheb[j]);
+         for (j=0; d > distc[j+1]-1; ++j);
+         stbiw__zlib_add(stbiw__zlib_bitrev(j,5),5);
+         if (disteb[j]) stbiw__zlib_add(d - distc[j], disteb[j]);
+         i += best;
+      } else {
+         stbiw__zlib_huffb(data[i]);
+         ++i;
+      }
+   }
+   // write out final bytes
+   for (;i < data_len; ++i)
+      stbiw__zlib_huffb(data[i]);
+   stbiw__zlib_huff(256); // end of block
+   // pad with 0 bits to byte boundary
+   while (bitcount)
+      stbiw__zlib_add(0,1);
+
+   for (i=0; i < stbiw__ZHASH; ++i)
+      (void) stbiw__sbfree(hash_table[i]);
+   STBIW_FREE(hash_table);
+
+   // store uncompressed instead if compression was worse
+   if (stbiw__sbn(out) > data_len + 2 + ((data_len+32766)/32767)*5) {
+      stbiw__sbn(out) = 2;  // truncate to DEFLATE 32K window and FLEVEL = 1
+      for (j = 0; j < data_len;) {
+         int blocklen = data_len - j;
+         if (blocklen > 32767) blocklen = 32767;
+         stbiw__sbpush(out, data_len - j == blocklen); // BFINAL = ?, BTYPE = 0 -- no compression
+         stbiw__sbpush(out, STBIW_UCHAR(blocklen)); // LEN
+         stbiw__sbpush(out, STBIW_UCHAR(blocklen >> 8));
+         stbiw__sbpush(out, STBIW_UCHAR(~blocklen)); // NLEN
+         stbiw__sbpush(out, STBIW_UCHAR(~blocklen >> 8));
+         memcpy(out+stbiw__sbn(out), data+j, blocklen);
+         stbiw__sbn(out) += blocklen;
+         j += blocklen;
+      }
+   }
+
+   {
+      // compute adler32 on input
+      unsigned int s1=1, s2=0;
+      int blocklen = (int) (data_len % 5552);
+      j=0;
+      while (j < data_len) {
+         for (i=0; i < blocklen; ++i) { s1 += data[j+i]; s2 += s1; }
+         s1 %= 65521; s2 %= 65521;
+         j += blocklen;
+         blocklen = 5552;
+      }
+      stbiw__sbpush(out, STBIW_UCHAR(s2 >> 8));
+      stbiw__sbpush(out, STBIW_UCHAR(s2));
+      stbiw__sbpush(out, STBIW_UCHAR(s1 >> 8));
+      stbiw__sbpush(out, STBIW_UCHAR(s1));
+   }
+   *out_len = stbiw__sbn(out);
+   // make returned pointer freeable
+   STBIW_MEMMOVE(stbiw__sbraw(out), out, *out_len);
+   return (unsigned char *) stbiw__sbraw(out);
+#endif // STBIW_ZLIB_COMPRESS
+}
+
+static unsigned int stbiw__crc32(unsigned char *buffer, int len)
+{
+#ifdef STBIW_CRC32
+    return STBIW_CRC32(buffer, len);
+#else
+   static unsigned int crc_table[256] =
+   {
+      0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419, 0x706AF48F, 0xE963A535, 0x9E6495A3,
+      0x0eDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988, 0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91,
+      0x1DB71064, 0x6AB020F2, 0xF3B97148, 0x84BE41DE, 0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7,
+      0x136C9856, 0x646BA8C0, 0xFD62F97A, 0x8A65C9EC, 0x14015C4F, 0x63066CD9, 0xFA0F3D63, 0x8D080DF5,
+      0x3B6E20C8, 0x4C69105E, 0xD56041E4, 0xA2677172, 0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B,
+      0x35B5A8FA, 0x42B2986C, 0xDBBBC9D6, 0xACBCF940, 0x32D86CE3, 0x45DF5C75, 0xDCD60DCF, 0xABD13D59,
+      0x26D930AC, 0x51DE003A, 0xC8D75180, 0xBFD06116, 0x21B4F4B5, 0x56B3C423, 0xCFBA9599, 0xB8BDA50F,
+      0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924, 0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D,
+      0x76DC4190, 0x01DB7106, 0x98D220BC, 0xEFD5102A, 0x71B18589, 0x06B6B51F, 0x9FBFE4A5, 0xE8B8D433,
+      0x7807C9A2, 0x0F00F934, 0x9609A88E, 0xE10E9818, 0x7F6A0DBB, 0x086D3D2D, 0x91646C97, 0xE6635C01,
+      0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E, 0x6C0695ED, 0x1B01A57B, 0x8208F4C1, 0xF50FC457,
+      0x65B0D9C6, 0x12B7E950, 0x8BBEB8EA, 0xFCB9887C, 0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3, 0xFBD44C65,
+      0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2, 0x4ADFA541, 0x3DD895D7, 0xA4D1C46D, 0xD3D6F4FB,
+      0x4369E96A, 0x346ED9FC, 0xAD678846, 0xDA60B8D0, 0x44042D73, 0x33031DE5, 0xAA0A4C5F, 0xDD0D7CC9,
+      0x5005713C, 0x270241AA, 0xBE0B1010, 0xC90C2086, 0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F,
+      0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4, 0x59B33D17, 0x2EB40D81, 0xB7BD5C3B, 0xC0BA6CAD,
+      0xEDB88320, 0x9ABFB3B6, 0x03B6E20C, 0x74B1D29A, 0xEAD54739, 0x9DD277AF, 0x04DB2615, 0x73DC1683,
+      0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8, 0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1,
+      0xF00F9344, 0x8708A3D2, 0x1E01F268, 0x6906C2FE, 0xF762575D, 0x806567CB, 0x196C3671, 0x6E6B06E7,
+      0xFED41B76, 0x89D32BE0, 0x10DA7A5A, 0x67DD4ACC, 0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5,
+      0xD6D6A3E8, 0xA1D1937E, 0x38D8C2C4, 0x4FDFF252, 0xD1BB67F1, 0xA6BC5767, 0x3FB506DD, 0x48B2364B,
+      0xD80D2BDA, 0xAF0A1B4C, 0x36034AF6, 0x41047A60, 0xDF60EFC3, 0xA867DF55, 0x316E8EEF, 0x4669BE79,
+      0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236, 0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F,
+      0xC5BA3BBE, 0xB2BD0B28, 0x2BB45A92, 0x5CB36A04, 0xC2D7FFA7, 0xB5D0CF31, 0x2CD99E8B, 0x5BDEAE1D,
+      0x9B64C2B0, 0xEC63F226, 0x756AA39C, 0x026D930A, 0x9C0906A9, 0xEB0E363F, 0x72076785, 0x05005713,
+      0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38, 0x92D28E9B, 0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21,
+      0x86D3D2D4, 0xF1D4E242, 0x68DDB3F8, 0x1FDA836E, 0x81BE16CD, 0xF6B9265B, 0x6FB077E1, 0x18B74777,
+      0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C, 0x8F659EFF, 0xF862AE69, 0x616BFFD3, 0x166CCF45,
+      0xA00AE278, 0xD70DD2EE, 0x4E048354, 0x3903B3C2, 0xA7672661, 0xD06016F7, 0x4969474D, 0x3E6E77DB,
+      0xAED16A4A, 0xD9D65ADC, 0x40DF0B66, 0x37D83BF0, 0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9,
+      0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6, 0xBAD03605, 0xCDD70693, 0x54DE5729, 0x23D967BF,
+      0xB3667A2E, 0xC4614AB8, 0x5D681B02, 0x2A6F2B94, 0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D
+   };
+
+   unsigned int crc = ~0u;
+   int i;
+   for (i=0; i < len; ++i)
+      crc = (crc >> 8) ^ crc_table[buffer[i] ^ (crc & 0xff)];
+   return ~crc;
+#endif
+}
+
+#define stbiw__wpng4(o,a,b,c,d) ((o)[0]=STBIW_UCHAR(a),(o)[1]=STBIW_UCHAR(b),(o)[2]=STBIW_UCHAR(c),(o)[3]=STBIW_UCHAR(d),(o)+=4)
+#define stbiw__wp32(data,v) stbiw__wpng4(data, (v)>>24,(v)>>16,(v)>>8,(v));
+#define stbiw__wptag(data,s) stbiw__wpng4(data, s[0],s[1],s[2],s[3])
+
+static void stbiw__wpcrc(unsigned char **data, int len)
+{
+   unsigned int crc = stbiw__crc32(*data - len - 4, len+4);
+   stbiw__wp32(*data, crc);
+}
+
+static unsigned char stbiw__paeth(int a, int b, int c)
+{
+   int p = a + b - c, pa = abs(p-a), pb = abs(p-b), pc = abs(p-c);
+   if (pa <= pb && pa <= pc) return STBIW_UCHAR(a);
+   if (pb <= pc) return STBIW_UCHAR(b);
+   return STBIW_UCHAR(c);
+}
+
+// @OPTIMIZE: provide an option that always forces left-predict or paeth predict
+static void stbiw__encode_png_line(unsigned char *pixels, int stride_bytes, int width, int height, int y, int n, int filter_type, signed char *line_buffer)
+{
+   static int mapping[] = { 0,1,2,3,4 };
+   static int firstmap[] = { 0,1,0,5,6 };
+   int *mymap = (y != 0) ? mapping : firstmap;
+   int i;
+   int type = mymap[filter_type];
+   unsigned char *z = pixels + stride_bytes * (stbi__flip_vertically_on_write ? height-1-y : y);
+   int signed_stride = stbi__flip_vertically_on_write ? -stride_bytes : stride_bytes;
+
+   if (type==0) {
+      memcpy(line_buffer, z, width*n);
+      return;
+   }
+
+   // first loop isn't optimized since it's just one pixel
+   for (i = 0; i < n; ++i) {
+      switch (type) {
+         case 1: line_buffer[i] = z[i]; break;
+         case 2: line_buffer[i] = z[i] - z[i-signed_stride]; break;
+         case 3: line_buffer[i] = z[i] - (z[i-signed_stride]>>1); break;
+         case 4: line_buffer[i] = (signed char) (z[i] - stbiw__paeth(0,z[i-signed_stride],0)); break;
+         case 5: line_buffer[i] = z[i]; break;
+         case 6: line_buffer[i] = z[i]; break;
+      }
+   }
+   switch (type) {
+      case 1: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - z[i-n]; break;
+      case 2: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - z[i-signed_stride]; break;
+      case 3: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - ((z[i-n] + z[i-signed_stride])>>1); break;
+      case 4: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - stbiw__paeth(z[i-n], z[i-signed_stride], z[i-signed_stride-n]); break;
+      case 5: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - (z[i-n]>>1); break;
+      case 6: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - stbiw__paeth(z[i-n], 0,0); break;
+   }
+}
+
+STBIWDEF unsigned char *stbi_write_png_to_mem(const unsigned char *pixels, int stride_bytes, int x, int y, int n, int *out_len)
+{
+   int force_filter = stbi_write_force_png_filter;
+   int ctype[5] = { -1, 0, 4, 2, 6 };
+   unsigned char sig[8] = { 137,80,78,71,13,10,26,10 };
+   unsigned char *out,*o, *filt, *zlib;
+   signed char *line_buffer;
+   int j,zlen;
+
+   if (stride_bytes == 0)
+      stride_bytes = x * n;
+
+   if (force_filter >= 5) {
+      force_filter = -1;
+   }
+
+   filt = (unsigned char *) STBIW_MALLOC((x*n+1) * y); if (!filt) return 0;
+   line_buffer = (signed char *) STBIW_MALLOC(x * n); if (!line_buffer) { STBIW_FREE(filt); return 0; }
+   for (j=0; j < y; ++j) {
+      int filter_type;
+      if (force_filter > -1) {
+         filter_type = force_filter;
+         stbiw__encode_png_line((unsigned char*)(pixels), stride_bytes, x, y, j, n, force_filter, line_buffer);
+      } else { // Estimate the best filter by running through all of them:
+         int best_filter = 0, best_filter_val = 0x7fffffff, est, i;
+         for (filter_type = 0; filter_type < 5; filter_type++) {
+            stbiw__encode_png_line((unsigned char*)(pixels), stride_bytes, x, y, j, n, filter_type, line_buffer);
+
+            // Estimate the entropy of the line using this filter; the less, the better.
+            est = 0;
+            for (i = 0; i < x*n; ++i) {
+               est += abs((signed char) line_buffer[i]);
+            }
+            if (est < best_filter_val) {
+               best_filter_val = est;
+               best_filter = filter_type;
+            }
+         }
+         if (filter_type != best_filter) {  // If the last iteration already got us the best filter, don't redo it
+            stbiw__encode_png_line((unsigned char*)(pixels), stride_bytes, x, y, j, n, best_filter, line_buffer);
+            filter_type = best_filter;
+         }
+      }
+      // when we get here, filter_type contains the filter type, and line_buffer contains the data
+      filt[j*(x*n+1)] = (unsigned char) filter_type;
+      STBIW_MEMMOVE(filt+j*(x*n+1)+1, line_buffer, x*n);
+   }
+   STBIW_FREE(line_buffer);
+   zlib = stbi_zlib_compress(filt, y*( x*n+1), &zlen, stbi_write_png_compression_level);
+   STBIW_FREE(filt);
+   if (!zlib) return 0;
+
+   // each tag requires 12 bytes of overhead
+   out = (unsigned char *) STBIW_MALLOC(8 + 12+13 + 12+zlen + 12);
+   if (!out) return 0;
+   *out_len = 8 + 12+13 + 12+zlen + 12;
+
+   o=out;
+   STBIW_MEMMOVE(o,sig,8); o+= 8;
+   stbiw__wp32(o, 13); // header length
+   stbiw__wptag(o, "IHDR");
+   stbiw__wp32(o, x);
+   stbiw__wp32(o, y);
+   *o++ = 8;
+   *o++ = STBIW_UCHAR(ctype[n]);
+   *o++ = 0;
+   *o++ = 0;
+   *o++ = 0;
+   stbiw__wpcrc(&o,13);
+
+   stbiw__wp32(o, zlen);
+   stbiw__wptag(o, "IDAT");
+   STBIW_MEMMOVE(o, zlib, zlen);
+   o += zlen;
+   STBIW_FREE(zlib);
+   stbiw__wpcrc(&o, zlen);
+
+   stbiw__wp32(o,0);
+   stbiw__wptag(o, "IEND");
+   stbiw__wpcrc(&o,0);
+
+   STBIW_ASSERT(o == out + *out_len);
+
+   return out;
+}
+
+#ifndef STBI_WRITE_NO_STDIO
+STBIWDEF int stbi_write_png(char const *filename, int x, int y, int comp, const void *data, int stride_bytes)
+{
+   FILE *f;
+   int len;
+   unsigned char *png = stbi_write_png_to_mem((const unsigned char *) data, stride_bytes, x, y, comp, &len);
+   if (png == NULL) return 0;
+
+   f = stbiw__fopen(filename, "wb");
+   if (!f) { STBIW_FREE(png); return 0; }
+   fwrite(png, 1, len, f);
+   fclose(f);
+   STBIW_FREE(png);
+   return 1;
+}
+#endif
+
+STBIWDEF int stbi_write_png_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data, int stride_bytes)
+{
+   int len;
+   unsigned char *png = stbi_write_png_to_mem((const unsigned char *) data, stride_bytes, x, y, comp, &len);
+   if (png == NULL) return 0;
+   func(context, png, len);
+   STBIW_FREE(png);
+   return 1;
+}
+
+
+/* ***************************************************************************
+ *
+ * JPEG writer
+ *
+ * This is based on Jon Olick's jo_jpeg.cpp:
+ * public domain Simple, Minimalistic JPEG writer - http://www.jonolick.com/code.html
+ */
+
+static const unsigned char stbiw__jpg_ZigZag[] = { 0,1,5,6,14,15,27,28,2,4,7,13,16,26,29,42,3,8,12,17,25,30,41,43,9,11,18,
+      24,31,40,44,53,10,19,23,32,39,45,52,54,20,22,33,38,46,51,55,60,21,34,37,47,50,56,59,61,35,36,48,49,57,58,62,63 };
+
+static void stbiw__jpg_writeBits(stbi__write_context *s, int *bitBufP, int *bitCntP, const unsigned short *bs) {
+   int bitBuf = *bitBufP, bitCnt = *bitCntP;
+   bitCnt += bs[1];
+   bitBuf |= bs[0] << (24 - bitCnt);
+   while(bitCnt >= 8) {
+      unsigned char c = (bitBuf >> 16) & 255;
+      stbiw__putc(s, c);
+      if(c == 255) {
+         stbiw__putc(s, 0);
+      }
+      bitBuf <<= 8;
+      bitCnt -= 8;
+   }
+   *bitBufP = bitBuf;
+   *bitCntP = bitCnt;
+}
+
+static void stbiw__jpg_DCT(float *d0p, float *d1p, float *d2p, float *d3p, float *d4p, float *d5p, float *d6p, float *d7p) {
+   float d0 = *d0p, d1 = *d1p, d2 = *d2p, d3 = *d3p, d4 = *d4p, d5 = *d5p, d6 = *d6p, d7 = *d7p;
+   float z1, z2, z3, z4, z5, z11, z13;
+
+   float tmp0 = d0 + d7;
+   float tmp7 = d0 - d7;
+   float tmp1 = d1 + d6;
+   float tmp6 = d1 - d6;
+   float tmp2 = d2 + d5;
+   float tmp5 = d2 - d5;
+   float tmp3 = d3 + d4;
+   float tmp4 = d3 - d4;
+
+   // Even part
+   float tmp10 = tmp0 + tmp3;   // phase 2
+   float tmp13 = tmp0 - tmp3;
+   float tmp11 = tmp1 + tmp2;
+   float tmp12 = tmp1 - tmp2;
+
+   d0 = tmp10 + tmp11;       // phase 3
+   d4 = tmp10 - tmp11;
+
+   z1 = (tmp12 + tmp13) * 0.707106781f; // c4
+   d2 = tmp13 + z1;       // phase 5
+   d6 = tmp13 - z1;
+
+   // Odd part
+   tmp10 = tmp4 + tmp5;       // phase 2
+   tmp11 = tmp5 + tmp6;
+   tmp12 = tmp6 + tmp7;
+
+   // The rotator is modified from fig 4-8 to avoid extra negations.
+   z5 = (tmp10 - tmp12) * 0.382683433f; // c6
+   z2 = tmp10 * 0.541196100f + z5; // c2-c6
+   z4 = tmp12 * 1.306562965f + z5; // c2+c6
+   z3 = tmp11 * 0.707106781f; // c4
+
+   z11 = tmp7 + z3;      // phase 5
+   z13 = tmp7 - z3;
+
+   *d5p = z13 + z2;         // phase 6
+   *d3p = z13 - z2;
+   *d1p = z11 + z4;
+   *d7p = z11 - z4;
+
+   *d0p = d0;  *d2p = d2;  *d4p = d4;  *d6p = d6;
+}
+
+static void stbiw__jpg_calcBits(int val, unsigned short bits[2]) {
+   int tmp1 = val < 0 ? -val : val;
+   val = val < 0 ? val-1 : val;
+   bits[1] = 1;
+   while(tmp1 >>= 1) {
+      ++bits[1];
+   }
+   bits[0] = val & ((1<<bits[1])-1);
+}
+
+static int stbiw__jpg_processDU(stbi__write_context *s, int *bitBuf, int *bitCnt, float *CDU, int du_stride, float *fdtbl, int DC, const unsigned short HTDC[256][2], const unsigned short HTAC[256][2]) {
+   const unsigned short EOB[2] = { HTAC[0x00][0], HTAC[0x00][1] };
+   const unsigned short M16zeroes[2] = { HTAC[0xF0][0], HTAC[0xF0][1] };
+   int dataOff, i, j, n, diff, end0pos, x, y;
+   int DU[64];
+
+   // DCT rows
+   for(dataOff=0, n=du_stride*8; dataOff<n; dataOff+=du_stride) {
+      stbiw__jpg_DCT(&CDU[dataOff], &CDU[dataOff+1], &CDU[dataOff+2], &CDU[dataOff+3], &CDU[dataOff+4], &CDU[dataOff+5], &CDU[dataOff+6], &CDU[dataOff+7]);
+   }
+   // DCT columns
+   for(dataOff=0; dataOff<8; ++dataOff) {
+      stbiw__jpg_DCT(&CDU[dataOff], &CDU[dataOff+du_stride], &CDU[dataOff+du_stride*2], &CDU[dataOff+du_stride*3], &CDU[dataOff+du_stride*4],
+                     &CDU[dataOff+du_stride*5], &CDU[dataOff+du_stride*6], &CDU[dataOff+du_stride*7]);
+   }
+   // Quantize/descale/zigzag the coefficients
+   for(y = 0, j=0; y < 8; ++y) {
+      for(x = 0; x < 8; ++x,++j) {
+         float v;
+         i = y*du_stride+x;
+         v = CDU[i]*fdtbl[j];
+         // DU[stbiw__jpg_ZigZag[j]] = (int)(v < 0 ? ceilf(v - 0.5f) : floorf(v + 0.5f));
+         // ceilf() and floorf() are C99, not C89, but I /think/ they're not needed here anyway?
+         DU[stbiw__jpg_ZigZag[j]] = (int)(v < 0 ? v - 0.5f : v + 0.5f);
+      }
+   }
+
+   // Encode DC
+   diff = DU[0] - DC;
+   if (diff == 0) {
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, HTDC[0]);
+   } else {
+      unsigned short bits[2];
+      stbiw__jpg_calcBits(diff, bits);
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, HTDC[bits[1]]);
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, bits);
+   }
+   // Encode ACs
+   end0pos = 63;
+   for(; (end0pos>0)&&(DU[end0pos]==0); --end0pos) {
+   }
+   // end0pos = first element in reverse order !=0
+   if(end0pos == 0) {
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, EOB);
+      return DU[0];
+   }
+   for(i = 1; i <= end0pos; ++i) {
+      int startpos = i;
+      int nrzeroes;
+      unsigned short bits[2];
+      for (; DU[i]==0 && i<=end0pos; ++i) {
+      }
+      nrzeroes = i-startpos;
+      if ( nrzeroes >= 16 ) {
+         int lng = nrzeroes>>4;
+         int nrmarker;
+         for (nrmarker=1; nrmarker <= lng; ++nrmarker)
+            stbiw__jpg_writeBits(s, bitBuf, bitCnt, M16zeroes);
+         nrzeroes &= 15;
+      }
+      stbiw__jpg_calcBits(DU[i], bits);
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, HTAC[(nrzeroes<<4)+bits[1]]);
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, bits);
+   }
+   if(end0pos != 63) {
+      stbiw__jpg_writeBits(s, bitBuf, bitCnt, EOB);
+   }
+   return DU[0];
+}
+
+static int stbi_write_jpg_core(stbi__write_context *s, int width, int height, int comp, const void* data, int quality) {
+   // Constants that don't pollute global namespace
+   static const unsigned char std_dc_luminance_nrcodes[] = {0,0,1,5,1,1,1,1,1,1,0,0,0,0,0,0,0};
+   static const unsigned char std_dc_luminance_values[] = {0,1,2,3,4,5,6,7,8,9,10,11};
+   static const unsigned char std_ac_luminance_nrcodes[] = {0,0,2,1,3,3,2,4,3,5,5,4,4,0,0,1,0x7d};
+   static const unsigned char std_ac_luminance_values[] = {
+      0x01,0x02,0x03,0x00,0x04,0x11,0x05,0x12,0x21,0x31,0x41,0x06,0x13,0x51,0x61,0x07,0x22,0x71,0x14,0x32,0x81,0x91,0xa1,0x08,
+      0x23,0x42,0xb1,0xc1,0x15,0x52,0xd1,0xf0,0x24,0x33,0x62,0x72,0x82,0x09,0x0a,0x16,0x17,0x18,0x19,0x1a,0x25,0x26,0x27,0x28,
+      0x29,0x2a,0x34,0x35,0x36,0x37,0x38,0x39,0x3a,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x53,0x54,0x55,0x56,0x57,0x58,0x59,
+      0x5a,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x83,0x84,0x85,0x86,0x87,0x88,0x89,
+      0x8a,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xb2,0xb3,0xb4,0xb5,0xb6,
+      0xb7,0xb8,0xb9,0xba,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,0xe1,0xe2,
+      0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa
+   };
+   static const unsigned char std_dc_chrominance_nrcodes[] = {0,0,3,1,1,1,1,1,1,1,1,1,0,0,0,0,0};
+   static const unsigned char std_dc_chrominance_values[] = {0,1,2,3,4,5,6,7,8,9,10,11};
+   static const unsigned char std_ac_chrominance_nrcodes[] = {0,0,2,1,2,4,4,3,4,7,5,4,4,0,1,2,0x77};
+   static const unsigned char std_ac_chrominance_values[] = {
+      0x00,0x01,0x02,0x03,0x11,0x04,0x05,0x21,0x31,0x06,0x12,0x41,0x51,0x07,0x61,0x71,0x13,0x22,0x32,0x81,0x08,0x14,0x42,0x91,
+      0xa1,0xb1,0xc1,0x09,0x23,0x33,0x52,0xf0,0x15,0x62,0x72,0xd1,0x0a,0x16,0x24,0x34,0xe1,0x25,0xf1,0x17,0x18,0x19,0x1a,0x26,
+      0x27,0x28,0x29,0x2a,0x35,0x36,0x37,0x38,0x39,0x3a,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x53,0x54,0x55,0x56,0x57,0x58,
+      0x59,0x5a,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x82,0x83,0x84,0x85,0x86,0x87,
+      0x88,0x89,0x8a,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xb2,0xb3,0xb4,
+      0xb5,0xb6,0xb7,0xb8,0xb9,0xba,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,
+      0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa
+   };
+   // Huffman tables
+   static const unsigned short YDC_HT[256][2] = { {0,2},{2,3},{3,3},{4,3},{5,3},{6,3},{14,4},{30,5},{62,6},{126,7},{254,8},{510,9}};
+   static const unsigned short UVDC_HT[256][2] = { {0,2},{1,2},{2,2},{6,3},{14,4},{30,5},{62,6},{126,7},{254,8},{510,9},{1022,10},{2046,11}};
+   static const unsigned short YAC_HT[256][2] = {
+      {10,4},{0,2},{1,2},{4,3},{11,4},{26,5},{120,7},{248,8},{1014,10},{65410,16},{65411,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {12,4},{27,5},{121,7},{502,9},{2038,11},{65412,16},{65413,16},{65414,16},{65415,16},{65416,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {28,5},{249,8},{1015,10},{4084,12},{65417,16},{65418,16},{65419,16},{65420,16},{65421,16},{65422,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {58,6},{503,9},{4085,12},{65423,16},{65424,16},{65425,16},{65426,16},{65427,16},{65428,16},{65429,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {59,6},{1016,10},{65430,16},{65431,16},{65432,16},{65433,16},{65434,16},{65435,16},{65436,16},{65437,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {122,7},{2039,11},{65438,16},{65439,16},{65440,16},{65441,16},{65442,16},{65443,16},{65444,16},{65445,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {123,7},{4086,12},{65446,16},{65447,16},{65448,16},{65449,16},{65450,16},{65451,16},{65452,16},{65453,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {250,8},{4087,12},{65454,16},{65455,16},{65456,16},{65457,16},{65458,16},{65459,16},{65460,16},{65461,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {504,9},{32704,15},{65462,16},{65463,16},{65464,16},{65465,16},{65466,16},{65467,16},{65468,16},{65469,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {505,9},{65470,16},{65471,16},{65472,16},{65473,16},{65474,16},{65475,16},{65476,16},{65477,16},{65478,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {506,9},{65479,16},{65480,16},{65481,16},{65482,16},{65483,16},{65484,16},{65485,16},{65486,16},{65487,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {1017,10},{65488,16},{65489,16},{65490,16},{65491,16},{65492,16},{65493,16},{65494,16},{65495,16},{65496,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {1018,10},{65497,16},{65498,16},{65499,16},{65500,16},{65501,16},{65502,16},{65503,16},{65504,16},{65505,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {2040,11},{65506,16},{65507,16},{65508,16},{65509,16},{65510,16},{65511,16},{65512,16},{65513,16},{65514,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {65515,16},{65516,16},{65517,16},{65518,16},{65519,16},{65520,16},{65521,16},{65522,16},{65523,16},{65524,16},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {2041,11},{65525,16},{65526,16},{65527,16},{65528,16},{65529,16},{65530,16},{65531,16},{65532,16},{65533,16},{65534,16},{0,0},{0,0},{0,0},{0,0},{0,0}
+   };
+   static const unsigned short UVAC_HT[256][2] = {
+      {0,2},{1,2},{4,3},{10,4},{24,5},{25,5},{56,6},{120,7},{500,9},{1014,10},{4084,12},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {11,4},{57,6},{246,8},{501,9},{2038,11},{4085,12},{65416,16},{65417,16},{65418,16},{65419,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {26,5},{247,8},{1015,10},{4086,12},{32706,15},{65420,16},{65421,16},{65422,16},{65423,16},{65424,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {27,5},{248,8},{1016,10},{4087,12},{65425,16},{65426,16},{65427,16},{65428,16},{65429,16},{65430,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {58,6},{502,9},{65431,16},{65432,16},{65433,16},{65434,16},{65435,16},{65436,16},{65437,16},{65438,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {59,6},{1017,10},{65439,16},{65440,16},{65441,16},{65442,16},{65443,16},{65444,16},{65445,16},{65446,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {121,7},{2039,11},{65447,16},{65448,16},{65449,16},{65450,16},{65451,16},{65452,16},{65453,16},{65454,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {122,7},{2040,11},{65455,16},{65456,16},{65457,16},{65458,16},{65459,16},{65460,16},{65461,16},{65462,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {249,8},{65463,16},{65464,16},{65465,16},{65466,16},{65467,16},{65468,16},{65469,16},{65470,16},{65471,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {503,9},{65472,16},{65473,16},{65474,16},{65475,16},{65476,16},{65477,16},{65478,16},{65479,16},{65480,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {504,9},{65481,16},{65482,16},{65483,16},{65484,16},{65485,16},{65486,16},{65487,16},{65488,16},{65489,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {505,9},{65490,16},{65491,16},{65492,16},{65493,16},{65494,16},{65495,16},{65496,16},{65497,16},{65498,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {506,9},{65499,16},{65500,16},{65501,16},{65502,16},{65503,16},{65504,16},{65505,16},{65506,16},{65507,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {2041,11},{65508,16},{65509,16},{65510,16},{65511,16},{65512,16},{65513,16},{65514,16},{65515,16},{65516,16},{0,0},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {16352,14},{65517,16},{65518,16},{65519,16},{65520,16},{65521,16},{65522,16},{65523,16},{65524,16},{65525,16},{0,0},{0,0},{0,0},{0,0},{0,0},
+      {1018,10},{32707,15},{65526,16},{65527,16},{65528,16},{65529,16},{65530,16},{65531,16},{65532,16},{65533,16},{65534,16},{0,0},{0,0},{0,0},{0,0},{0,0}
+   };
+   static const int YQT[] = {16,11,10,16,24,40,51,61,12,12,14,19,26,58,60,55,14,13,16,24,40,57,69,56,14,17,22,29,51,87,80,62,18,22,
+                             37,56,68,109,103,77,24,35,55,64,81,104,113,92,49,64,78,87,103,121,120,101,72,92,95,98,112,100,103,99};
+   static const int UVQT[] = {17,18,24,47,99,99,99,99,18,21,26,66,99,99,99,99,24,26,56,99,99,99,99,99,47,66,99,99,99,99,99,99,
+                              99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99,99};
+   static const float aasf[] = { 1.0f * 2.828427125f, 1.387039845f * 2.828427125f, 1.306562965f * 2.828427125f, 1.175875602f * 2.828427125f,
+                                 1.0f * 2.828427125f, 0.785694958f * 2.828427125f, 0.541196100f * 2.828427125f, 0.275899379f * 2.828427125f };
+
+   int row, col, i, k, subsample;
+   float fdtbl_Y[64], fdtbl_UV[64];
+   unsigned char YTable[64], UVTable[64];
+
+   if(!data || !width || !height || comp > 4 || comp < 1) {
+      return 0;
+   }
+
+   quality = quality ? quality : 90;
+   subsample = quality <= 90 ? 1 : 0;
+   quality = quality < 1 ? 1 : quality > 100 ? 100 : quality;
+   quality = quality < 50 ? 5000 / quality : 200 - quality * 2;
+
+   for(i = 0; i < 64; ++i) {
+      int uvti, yti = (YQT[i]*quality+50)/100;
+      YTable[stbiw__jpg_ZigZag[i]] = (unsigned char) (yti < 1 ? 1 : yti > 255 ? 255 : yti);
+      uvti = (UVQT[i]*quality+50)/100;
+      UVTable[stbiw__jpg_ZigZag[i]] = (unsigned char) (uvti < 1 ? 1 : uvti > 255 ? 255 : uvti);
+   }
+
+   for(row = 0, k = 0; row < 8; ++row) {
+      for(col = 0; col < 8; ++col, ++k) {
+         fdtbl_Y[k]  = 1 / (YTable [stbiw__jpg_ZigZag[k]] * aasf[row] * aasf[col]);
+         fdtbl_UV[k] = 1 / (UVTable[stbiw__jpg_ZigZag[k]] * aasf[row] * aasf[col]);
+      }
+   }
+
+   // Write Headers
+   {
+      static const unsigned char head0[] = { 0xFF,0xD8,0xFF,0xE0,0,0x10,'J','F','I','F',0,1,1,0,0,1,0,1,0,0,0xFF,0xDB,0,0x84,0 };
+      static const unsigned char head2[] = { 0xFF,0xDA,0,0xC,3,1,0,2,0x11,3,0x11,0,0x3F,0 };
+      const unsigned char head1[] = { 0xFF,0xC0,0,0x11,8,(unsigned char)(height>>8),STBIW_UCHAR(height),(unsigned char)(width>>8),STBIW_UCHAR(width),
+                                      3,1,(unsigned char)(subsample?0x22:0x11),0,2,0x11,1,3,0x11,1,0xFF,0xC4,0x01,0xA2,0 };
+      s->func(s->context, (void*)head0, sizeof(head0));
+      s->func(s->context, (void*)YTable, sizeof(YTable));
+      stbiw__putc(s, 1);
+      s->func(s->context, UVTable, sizeof(UVTable));
+      s->func(s->context, (void*)head1, sizeof(head1));
+      s->func(s->context, (void*)(std_dc_luminance_nrcodes+1), sizeof(std_dc_luminance_nrcodes)-1);
+      s->func(s->context, (void*)std_dc_luminance_values, sizeof(std_dc_luminance_values));
+      stbiw__putc(s, 0x10); // HTYACinfo
+      s->func(s->context, (void*)(std_ac_luminance_nrcodes+1), sizeof(std_ac_luminance_nrcodes)-1);
+      s->func(s->context, (void*)std_ac_luminance_values, sizeof(std_ac_luminance_values));
+      stbiw__putc(s, 1); // HTUDCinfo
+      s->func(s->context, (void*)(std_dc_chrominance_nrcodes+1), sizeof(std_dc_chrominance_nrcodes)-1);
+      s->func(s->context, (void*)std_dc_chrominance_values, sizeof(std_dc_chrominance_values));
+      stbiw__putc(s, 0x11); // HTUACinfo
+      s->func(s->context, (void*)(std_ac_chrominance_nrcodes+1), sizeof(std_ac_chrominance_nrcodes)-1);
+      s->func(s->context, (void*)std_ac_chrominance_values, sizeof(std_ac_chrominance_values));
+      s->func(s->context, (void*)head2, sizeof(head2));
+   }
+
+   // Encode 8x8 macroblocks
+   {
+      static const unsigned short fillBits[] = {0x7F, 7};
+      int DCY=0, DCU=0, DCV=0;
+      int bitBuf=0, bitCnt=0;
+      // comp == 2 is grey+alpha (alpha is ignored)
+      int ofsG = comp > 2 ? 1 : 0, ofsB = comp > 2 ? 2 : 0;
+      const unsigned char *dataR = (const unsigned char *)data;
+      const unsigned char *dataG = dataR + ofsG;
+      const unsigned char *dataB = dataR + ofsB;
+      int x, y, pos;
+      if(subsample) {
+         for(y = 0; y < height; y += 16) {
+            for(x = 0; x < width; x += 16) {
+               float Y[256], U[256], V[256];
+               for(row = y, pos = 0; row < y+16; ++row) {
+                  // row >= height => use last input row
+                  int clamped_row = (row < height) ? row : height - 1;
+                  int base_p = (stbi__flip_vertically_on_write ? (height-1-clamped_row) : clamped_row)*width*comp;
+                  for(col = x; col < x+16; ++col, ++pos) {
+                     // if col >= width => use pixel from last input column
+                     int p = base_p + ((col < width) ? col : (width-1))*comp;
+                     float r = dataR[p], g = dataG[p], b = dataB[p];
+                     Y[pos]= +0.29900f*r + 0.58700f*g + 0.11400f*b - 128;
+                     U[pos]= -0.16874f*r - 0.33126f*g + 0.50000f*b;
+                     V[pos]= +0.50000f*r - 0.41869f*g - 0.08131f*b;
+                  }
+               }
+               DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+0,   16, fdtbl_Y, DCY, YDC_HT, YAC_HT);
+               DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+8,   16, fdtbl_Y, DCY, YDC_HT, YAC_HT);
+               DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+128, 16, fdtbl_Y, DCY, YDC_HT, YAC_HT);
+               DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+136, 16, fdtbl_Y, DCY, YDC_HT, YAC_HT);
+
+               // subsample U,V
+               {
+                  float subU[64], subV[64];
+                  int yy, xx;
+                  for(yy = 0, pos = 0; yy < 8; ++yy) {
+                     for(xx = 0; xx < 8; ++xx, ++pos) {
+                        int j = yy*32+xx*2;
+                        subU[pos] = (U[j+0] + U[j+1] + U[j+16] + U[j+17]) * 0.25f;
+                        subV[pos] = (V[j+0] + V[j+1] + V[j+16] + V[j+17]) * 0.25f;
+                     }
+                  }
+                  DCU = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, subU, 8, fdtbl_UV, DCU, UVDC_HT, UVAC_HT);
+                  DCV = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, subV, 8, fdtbl_UV, DCV, UVDC_HT, UVAC_HT);
+               }
+            }
+         }
+      } else {
+         for(y = 0; y < height; y += 8) {
+            for(x = 0; x < width; x += 8) {
+               float Y[64], U[64], V[64];
+               for(row = y, pos = 0; row < y+8; ++row) {
+                  // row >= height => use last input row
+                  int clamped_row = (row < height) ? row : height - 1;
+                  int base_p = (stbi__flip_vertically_on_write ? (height-1-clamped_row) : clamped_row)*width*comp;
+                  for(col = x; col < x+8; ++col, ++pos) {
+                     // if col >= width => use pixel from last input column
+                     int p = base_p + ((col < width) ? col : (width-1))*comp;
+                     float r = dataR[p], g = dataG[p], b = dataB[p];
+                     Y[pos]= +0.29900f*r + 0.58700f*g + 0.11400f*b - 128;
+                     U[pos]= -0.16874f*r - 0.33126f*g + 0.50000f*b;
+                     V[pos]= +0.50000f*r - 0.41869f*g - 0.08131f*b;
+                  }
+               }
+
+               DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y, 8, fdtbl_Y,  DCY, YDC_HT, YAC_HT);
+               DCU = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, U, 8, fdtbl_UV, DCU, UVDC_HT, UVAC_HT);
+               DCV = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, V, 8, fdtbl_UV, DCV, UVDC_HT, UVAC_HT);
+            }
+         }
+      }
+
+      // Do the bit alignment of the EOI marker
+      stbiw__jpg_writeBits(s, &bitBuf, &bitCnt, fillBits);
+   }
+
+   // EOI
+   stbiw__putc(s, 0xFF);
+   stbiw__putc(s, 0xD9);
+
+   return 1;
+}
+
+STBIWDEF int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data, int quality)
+{
+   stbi__write_context s = { 0 };
+   stbi__start_write_callbacks(&s, func, context);
+   return stbi_write_jpg_core(&s, x, y, comp, (void *) data, quality);
+}
+
+
+#ifndef STBI_WRITE_NO_STDIO
+STBIWDEF int stbi_write_jpg(char const *filename, int x, int y, int comp, const void *data, int quality)
+{
+   stbi__write_context s = { 0 };
+   if (stbi__start_write_file(&s,filename)) {
+      int r = stbi_write_jpg_core(&s, x, y, comp, data, quality);
+      stbi__end_write_file(&s);
+      return r;
+   } else
+      return 0;
+}
+#endif
+
+#endif // STB_IMAGE_WRITE_IMPLEMENTATION
+
+/* Revision history
+      1.16  (2021-07-11)
+             make Deflate code emit uncompressed blocks when it would otherwise expand
+             support writing BMPs with alpha channel
+      1.15  (2020-07-13) unknown
+      1.14  (2020-02-02) updated JPEG writer to downsample chroma channels
+      1.13
+      1.12
+      1.11  (2019-08-11)
+
+      1.10  (2019-02-07)
+             support utf8 filenames in Windows; fix warnings and platform ifdefs
+      1.09  (2018-02-11)
+             fix typo in zlib quality API, improve STB_I_W_STATIC in C++
+      1.08  (2018-01-29)
+             add stbi__flip_vertically_on_write, external zlib, zlib quality, choose PNG filter
+      1.07  (2017-07-24)
+             doc fix
+      1.06 (2017-07-23)
+             writing JPEG (using Jon Olick's code)
+      1.05   ???
+      1.04 (2017-03-03)
+             monochrome BMP expansion
+      1.03   ???
+      1.02 (2016-04-02)
+             avoid allocating large structures on the stack
+      1.01 (2016-01-16)
+             STBIW_REALLOC_SIZED: support allocators with no realloc support
+             avoid race-condition in crc initialization
+             minor compile issues
+      1.00 (2015-09-14)
+             installable file IO function
+      0.99 (2015-09-13)
+             warning fixes; TGA rle support
+      0.98 (2015-04-08)
+             added STBIW_MALLOC, STBIW_ASSERT etc
+      0.97 (2015-01-18)
+             fixed HDR asserts, rewrote HDR rle logic
+      0.96 (2015-01-17)
+             add HDR output
+             fix monochrome BMP
+      0.95 (2014-08-17)
+             add monochrome TGA output
+      0.94 (2014-05-31)
+             rename private functions to avoid conflicts with stb_image.h
+      0.93 (2014-05-27)
+             warning fixes
+      0.92 (2010-08-01)
+             casts to unsigned char to fix warnings
+      0.91 (2010-07-17)
+             first public release
+      0.90   first internal release
+*/
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_include.h b/vendor/stb/stb_include.h
new file mode 100644
index 0000000..c5db201
--- /dev/null
+++ b/vendor/stb/stb_include.h
@@ -0,0 +1,295 @@
+// stb_include.h - v0.02 - parse and process #include directives - public domain
+//
+// To build this, in one source file that includes this file do
+//      #define STB_INCLUDE_IMPLEMENTATION
+//
+// This program parses a string and replaces lines of the form
+//         #include "foo"
+// with the contents of a file named "foo". It also embeds the
+// appropriate #line directives. Note that all include files must
+// reside in the location specified in the path passed to the API;
+// it does not check multiple directories.
+//
+// If the string contains a line of the form
+//         #inject
+// then it will be replaced with the contents of the string 'inject' passed to the API.
+//
+// Options:
+//
+//      Define STB_INCLUDE_LINE_GLSL to get GLSL-style #line directives
+//      which use numbers instead of filenames.
+//
+//      Define STB_INCLUDE_LINE_NONE to disable output of #line directives.
+//
+// Standard libraries:
+//
+//      stdio.h     FILE, fopen, fclose, fseek, ftell
+//      stdlib.h    malloc, realloc, free
+//      string.h    strcpy, strncmp, memcpy
+//
+// Credits:
+//
+// Written by Sean Barrett.
+//
+// Fixes:
+//  Michal Klos
+
+#ifndef STB_INCLUDE_STB_INCLUDE_H
+#define STB_INCLUDE_STB_INCLUDE_H
+
+// Do include-processing on the string 'str'. To free the return value, pass it to free()
+char *stb_include_string(char *str, char *inject, char *path_to_includes, char *filename_for_line_directive, char error[256]);
+
+// Concatenate the strings 'strs' and do include-processing on the result. To free the return value, pass it to free()
+char *stb_include_strings(char **strs, int count, char *inject, char *path_to_includes, char *filename_for_line_directive, char error[256]);
+
+// Load the file 'filename' and do include-processing on the string therein. note that
+// 'filename' is opened directly; 'path_to_includes' is not used. To free the return value, pass it to free()
+char *stb_include_file(char *filename, char *inject, char *path_to_includes, char error[256]);
+
+#endif
+
+
+#ifdef STB_INCLUDE_IMPLEMENTATION
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static char *stb_include_load_file(char *filename, size_t *plen)
+{
+   char *text;
+   size_t len;
+   FILE *f = fopen(filename, "rb");
+   if (f == 0) return 0;
+   fseek(f, 0, SEEK_END);
+   len = (size_t) ftell(f);
+   if (plen) *plen = len;
+   text = (char *) malloc(len+1);
+   if (text == 0) return 0;
+   fseek(f, 0, SEEK_SET);
+   fread(text, 1, len, f);
+   fclose(f);
+   text[len] = 0;
+   return text;
+}
+
+typedef struct
+{
+   int offset;
+   int end;
+   char *filename;
+   int next_line_after;
+} include_info;
+
+static include_info *stb_include_append_include(include_info *array, int len, int offset, int end, char *filename, int next_line)
+{
+   include_info *z = (include_info *) realloc(array, sizeof(*z) * (len+1));
+   z[len].offset   = offset;
+   z[len].end      = end;
+   z[len].filename = filename;
+   z[len].next_line_after = next_line;
+   return z;
+}
+
+static void stb_include_free_includes(include_info *array, int len)
+{
+   int i;
+   for (i=0; i < len; ++i)
+      free(array[i].filename);
+   free(array);
+}
+
+static int stb_include_isspace(int ch)
+{
+   return (ch == ' ' || ch == '\t' || ch == '\r' || ch == '\n');
+}
+
+// find location of all #include and #inject
+static int stb_include_find_includes(char *text, include_info **plist)
+{
+   int line_count = 1;
+   int inc_count = 0;
+   char *s = text, *start;
+   include_info *list = NULL;
+   while (*s) {
+      // parse is always at start of line when we reach here
+      start = s;
+      while (*s == ' ' || *s == '\t')
+         ++s;
+      if (*s == '#') {
+         ++s;
+         while (*s == ' ' || *s == '\t')
+            ++s;
+         if (0==strncmp(s, "include", 7) && stb_include_isspace(s[7])) {
+            s += 7;
+            while (*s == ' ' || *s == '\t')
+               ++s;
+            if (*s == '"') {
+               char *t = ++s;
+               while (*t != '"' && *t != '\n' && *t != '\r' && *t != 0)
+                  ++t;
+               if (*t == '"') {
+                  char *filename = (char *) malloc(t-s+1);
+                  memcpy(filename, s, t-s);
+                  filename[t-s] = 0;
+                  s=t;
+                  while (*s != '\r' && *s != '\n' && *s != 0)
+                     ++s;
+                  // s points to the newline, so s-start is everything except the newline
+                  list = stb_include_append_include(list, inc_count++, start-text, s-text, filename, line_count+1);
+               }
+            }
+         } else if (0==strncmp(s, "inject", 6) && (stb_include_isspace(s[6]) || s[6]==0)) {
+            while (*s != '\r' && *s != '\n' && *s != 0)
+               ++s;
+            list = stb_include_append_include(list, inc_count++, start-text, s-text, NULL, line_count+1);
+         }
+      }
+      while (*s != '\r' && *s != '\n' && *s != 0)
+         ++s;
+      if (*s == '\r' || *s == '\n') {
+         s = s + (s[0] + s[1] == '\r' + '\n' ? 2 : 1);
+      }
+      ++line_count;
+   }
+   *plist = list;
+   return inc_count;
+}
+
+// avoid dependency on sprintf()
+static void stb_include_itoa(char str[9], int n)
+{
+   int i;
+   for (i=0; i < 8; ++i)
+      str[i] = ' ';
+   str[i] = 0;
+
+   for (i=1; i < 8; ++i) {
+      str[7-i] = '0' + (n % 10);
+      n /= 10;
+      if (n == 0)
+         break;
+   }
+}
+
+static char *stb_include_append(char *str, size_t *curlen, char *addstr, size_t addlen)
+{
+   str = (char *) realloc(str, *curlen + addlen);
+   memcpy(str + *curlen, addstr, addlen);
+   *curlen += addlen;
+   return str;
+}
+
+char *stb_include_string(char *str, char *inject, char *path_to_includes, char *filename, char error[256])
+{
+   char temp[4096];
+   include_info *inc_list;
+   int i, num = stb_include_find_includes(str, &inc_list);
+   size_t source_len = strlen(str);
+   char *text=0;
+   size_t textlen=0, last=0;
+   for (i=0; i < num; ++i) {
+      text = stb_include_append(text, &textlen, str+last, inc_list[i].offset - last);
+      // write out line directive for the include
+      #ifndef STB_INCLUDE_LINE_NONE
+      #ifdef STB_INCLUDE_LINE_GLSL
+      if (textlen != 0)  // GLSL #version must appear first, so don't put a #line at the top
+      #endif
+      {
+         strcpy(temp, "#line ");
+         stb_include_itoa(temp+6, 1);
+         strcat(temp, " ");
+         #ifdef STB_INCLUDE_LINE_GLSL
+         stb_include_itoa(temp+15, i+1);
+         #else
+         strcat(temp, "\"");
+         if (inc_list[i].filename == 0)
+            strcmp(temp, "INJECT");
+         else
+            strcat(temp, inc_list[i].filename);
+         strcat(temp, "\"");
+         #endif
+         strcat(temp, "\n");
+         text = stb_include_append(text, &textlen, temp, strlen(temp));
+      }
+      #endif
+      if (inc_list[i].filename == 0) {
+         if (inject != 0)
+            text = stb_include_append(text, &textlen, inject, strlen(inject));
+      } else {
+         char *inc;
+         strcpy(temp, path_to_includes);
+         strcat(temp, "/");
+         strcat(temp, inc_list[i].filename);
+         inc = stb_include_file(temp, inject, path_to_includes, error);
+         if (inc == NULL) {
+            stb_include_free_includes(inc_list, num);
+            return NULL;
+         }
+         text = stb_include_append(text, &textlen, inc, strlen(inc));
+         free(inc);
+      }
+      // write out line directive
+      #ifndef STB_INCLUDE_LINE_NONE
+      strcpy(temp, "\n#line ");
+      stb_include_itoa(temp+6, inc_list[i].next_line_after);
+      strcat(temp, " ");
+      #ifdef STB_INCLUDE_LINE_GLSL
+      stb_include_itoa(temp+15, 0);
+      #else
+      strcat(temp, filename != 0 ? filename : "source-file");
+      #endif
+      text = stb_include_append(text, &textlen, temp, strlen(temp));
+      // no newlines, because we kept the #include newlines, which will get appended next
+      #endif
+      last = inc_list[i].end;
+   }
+   text = stb_include_append(text, &textlen, str+last, source_len - last + 1); // append '\0'
+   stb_include_free_includes(inc_list, num);
+   return text;
+}
+
+char *stb_include_strings(char **strs, int count, char *inject, char *path_to_includes, char *filename, char error[256])
+{
+   char *text;
+   char *result;
+   int i;
+   size_t length=0;
+   for (i=0; i < count; ++i)
+      length += strlen(strs[i]);
+   text = (char *) malloc(length+1);
+   length = 0;
+   for (i=0; i < count; ++i) {
+      strcpy(text + length, strs[i]);
+      length += strlen(strs[i]);
+   }
+   result = stb_include_string(text, inject, path_to_includes, filename, error);
+   free(text);
+   return result;
+}
+
+char *stb_include_file(char *filename, char *inject, char *path_to_includes, char error[256])
+{
+   size_t len;
+   char *result;
+   char *text = stb_include_load_file(filename, &len);
+   if (text == NULL) {
+      strcpy(error, "Error: couldn't load '");
+      strcat(error, filename);
+      strcat(error, "'");
+      return 0;
+   }
+   result = stb_include_string(text, inject, path_to_includes, filename, error);
+   free(text);
+   return result;
+}
+
+#if 0 // @TODO, GL_ARB_shader_language_include-style system that doesn't touch filesystem
+char *stb_include_preloaded(char *str, char *inject, char *includes[][2], char error[256])
+{
+
+}
+#endif
+
+#endif // STB_INCLUDE_IMPLEMENTATION
diff --git a/vendor/stb/stb_leakcheck.h b/vendor/stb/stb_leakcheck.h
new file mode 100644
index 0000000..19ee6e7
--- /dev/null
+++ b/vendor/stb/stb_leakcheck.h
@@ -0,0 +1,194 @@
+// stb_leakcheck.h - v0.6 - quick & dirty malloc leak-checking - public domain
+// LICENSE
+//
+//   See end of file.
+
+#ifdef STB_LEAKCHECK_IMPLEMENTATION
+#undef STB_LEAKCHECK_IMPLEMENTATION // don't implement more than once
+
+// if we've already included leakcheck before, undefine the macros
+#ifdef malloc
+#undef malloc
+#undef free
+#undef realloc
+#endif
+
+#ifndef STB_LEAKCHECK_OUTPUT_PIPE
+#define STB_LEAKCHECK_OUTPUT_PIPE stdout
+#endif
+
+#include <assert.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stddef.h>
+typedef struct malloc_info stb_leakcheck_malloc_info;
+
+struct malloc_info
+{
+   const char *file;
+   int line;
+   size_t size;
+   stb_leakcheck_malloc_info *next,*prev;
+};
+
+static stb_leakcheck_malloc_info *mi_head;
+
+void *stb_leakcheck_malloc(size_t sz, const char *file, int line)
+{
+   stb_leakcheck_malloc_info *mi = (stb_leakcheck_malloc_info *) malloc(sz + sizeof(*mi));
+   if (mi == NULL) return mi;
+   mi->file = file;
+   mi->line = line;
+   mi->next = mi_head;
+   if (mi_head)
+      mi->next->prev = mi;
+   mi->prev = NULL;
+   mi->size = (int) sz;
+   mi_head = mi;
+   return mi+1;
+}
+
+void stb_leakcheck_free(void *ptr)
+{
+   if (ptr != NULL) {
+      stb_leakcheck_malloc_info *mi = (stb_leakcheck_malloc_info *) ptr - 1;
+      mi->size = ~mi->size;
+      #ifndef STB_LEAKCHECK_SHOWALL
+      if (mi->prev == NULL) {
+         assert(mi_head == mi);
+         mi_head = mi->next;
+      } else
+         mi->prev->next = mi->next;
+      if (mi->next)
+         mi->next->prev = mi->prev;
+      free(mi);
+      #endif
+   }
+}
+
+void *stb_leakcheck_realloc(void *ptr, size_t sz, const char *file, int line)
+{
+   if (ptr == NULL) {
+      return stb_leakcheck_malloc(sz, file, line);
+   } else if (sz == 0) {
+      stb_leakcheck_free(ptr);
+      return NULL;
+   } else {
+      stb_leakcheck_malloc_info *mi = (stb_leakcheck_malloc_info *) ptr - 1;
+      if (sz <= mi->size)
+         return ptr;
+      else {
+         #ifdef STB_LEAKCHECK_REALLOC_PRESERVE_MALLOC_FILELINE
+         void *q = stb_leakcheck_malloc(sz, mi->file, mi->line);
+         #else
+         void *q = stb_leakcheck_malloc(sz, file, line);
+         #endif
+         if (q) {
+            memcpy(q, ptr, mi->size);
+            stb_leakcheck_free(ptr);
+         }
+         return q;
+      }
+   }
+}
+
+static void stblkck_internal_print(const char *reason, stb_leakcheck_malloc_info *mi)
+{
+#if defined(_MSC_VER) && _MSC_VER < 1900 // 1900=VS 2015
+   // Compilers that use the old MS C runtime library don't have %zd
+   // and the older ones don't even have %lld either... however, the old compilers
+   // without "long long" don't support 64-bit targets either, so here's the
+   // compromise:
+   #if _MSC_VER < 1400 // before VS 2005
+      fprintf(STB_LEAKCHECK_OUTPUT_PIPE, "%s: %s (%4d): %8d bytes at %p\n", reason, mi->file, mi->line, (int)mi->size, (void*)(mi+1));
+   #else
+      fprintf(STB_LEAKCHECK_OUTPUT_PIPE, "%s: %s (%4d): %16lld bytes at %p\n", reason, mi->file, mi->line, (long long)mi->size, (void*)(mi+1));
+   #endif
+#else
+   // Assume we have %zd on other targets.
+   #ifdef __MINGW32__
+      __mingw_fprintf(STB_LEAKCHECK_OUTPUT_PIPE, "%s: %s (%4d): %zd bytes at %p\n", reason, mi->file, mi->line, mi->size, (void*)(mi+1));
+   #else
+      fprintf(STB_LEAKCHECK_OUTPUT_PIPE, "%s: %s (%4d): %zd bytes at %p\n", reason, mi->file, mi->line, mi->size, (void*)(mi+1));
+   #endif
+#endif
+}
+
+void stb_leakcheck_dumpmem(void)
+{
+   stb_leakcheck_malloc_info *mi = mi_head;
+   while (mi) {
+      if ((ptrdiff_t) mi->size >= 0)
+         stblkck_internal_print("LEAKED", mi);
+      mi = mi->next;
+   }
+   #ifdef STB_LEAKCHECK_SHOWALL
+   mi = mi_head;
+   while (mi) {
+      if ((ptrdiff_t) mi->size < 0)
+         stblkck_internal_print("FREED ", mi);
+      mi = mi->next;
+   }
+   #endif
+}
+#endif // STB_LEAKCHECK_IMPLEMENTATION
+
+#if !defined(INCLUDE_STB_LEAKCHECK_H) || !defined(malloc)
+#define INCLUDE_STB_LEAKCHECK_H
+
+#include <stdlib.h> // we want to define the macros *after* stdlib to avoid a slew of errors
+
+#define malloc(sz)    stb_leakcheck_malloc(sz, __FILE__, __LINE__)
+#define free(p)       stb_leakcheck_free(p)
+#define realloc(p,sz) stb_leakcheck_realloc(p,sz, __FILE__, __LINE__)
+
+extern void * stb_leakcheck_malloc(size_t sz, const char *file, int line);
+extern void * stb_leakcheck_realloc(void *ptr, size_t sz, const char *file, int line);
+extern void   stb_leakcheck_free(void *ptr);
+extern void   stb_leakcheck_dumpmem(void);
+
+#endif // INCLUDE_STB_LEAKCHECK_H
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_perlin.h b/vendor/stb/stb_perlin.h
new file mode 100644
index 0000000..47cb9a4
--- /dev/null
+++ b/vendor/stb/stb_perlin.h
@@ -0,0 +1,428 @@
+// stb_perlin.h - v0.5 - perlin noise
+// public domain single-file C implementation by Sean Barrett
+//
+// LICENSE
+//
+//   See end of file.
+//
+//
+// to create the implementation,
+//     #define STB_PERLIN_IMPLEMENTATION
+// in *one* C/CPP file that includes this file.
+//
+//
+// Documentation:
+//
+// float  stb_perlin_noise3( float x,
+//                           float y,
+//                           float z,
+//                           int   x_wrap=0,
+//                           int   y_wrap=0,
+//                           int   z_wrap=0)
+//
+// This function computes a random value at the coordinate (x,y,z).
+// Adjacent random values are continuous but the noise fluctuates
+// its randomness with period 1, i.e. takes on wholly unrelated values
+// at integer points. Specifically, this implements Ken Perlin's
+// revised noise function from 2002.
+//
+// The "wrap" parameters can be used to create wraparound noise that
+// wraps at powers of two. The numbers MUST be powers of two. Specify
+// 0 to mean "don't care". (The noise always wraps every 256 due
+// details of the implementation, even if you ask for larger or no
+// wrapping.)
+//
+// float  stb_perlin_noise3_seed( float x,
+//                                float y,
+//                                float z,
+//                                int   x_wrap=0,
+//                                int   y_wrap=0,
+//                                int   z_wrap=0,
+//                                int   seed)
+//
+// As above, but 'seed' selects from multiple different variations of the
+// noise function. The current implementation only uses the bottom 8 bits
+// of 'seed', but possibly in the future more bits will be used.
+//
+//
+// Fractal Noise:
+//
+// Three common fractal noise functions are included, which produce
+// a wide variety of nice effects depending on the parameters
+// provided. Note that each function will call stb_perlin_noise3
+// 'octaves' times, so this parameter will affect runtime.
+//
+// float stb_perlin_ridge_noise3(float x, float y, float z,
+//                               float lacunarity, float gain, float offset, int octaves)
+//
+// float stb_perlin_fbm_noise3(float x, float y, float z,
+//                             float lacunarity, float gain, int octaves)
+//
+// float stb_perlin_turbulence_noise3(float x, float y, float z,
+//                                    float lacunarity, float gain, int octaves)
+//
+// Typical values to start playing with:
+//     octaves    =   6     -- number of "octaves" of noise3() to sum
+//     lacunarity = ~ 2.0   -- spacing between successive octaves (use exactly 2.0 for wrapping output)
+//     gain       =   0.5   -- relative weighting applied to each successive octave
+//     offset     =   1.0?  -- used to invert the ridges, may need to be larger, not sure
+//
+//
+// Contributors:
+//    Jack Mott - additional noise functions
+//    Jordan Peck - seeded noise
+//
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+extern float stb_perlin_noise3(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap);
+extern float stb_perlin_noise3_seed(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap, int seed);
+extern float stb_perlin_ridge_noise3(float x, float y, float z, float lacunarity, float gain, float offset, int octaves);
+extern float stb_perlin_fbm_noise3(float x, float y, float z, float lacunarity, float gain, int octaves);
+extern float stb_perlin_turbulence_noise3(float x, float y, float z, float lacunarity, float gain, int octaves);
+extern float stb_perlin_noise3_wrap_nonpow2(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap, unsigned char seed);
+#ifdef __cplusplus
+}
+#endif
+
+#ifdef STB_PERLIN_IMPLEMENTATION
+
+#include <math.h> // fabs()
+
+// not same permutation table as Perlin's reference to avoid copyright issues;
+// Perlin's table can be found at http://mrl.nyu.edu/~perlin/noise/
+static unsigned char stb__perlin_randtab[512] =
+{
+   23, 125, 161, 52, 103, 117, 70, 37, 247, 101, 203, 169, 124, 126, 44, 123,
+   152, 238, 145, 45, 171, 114, 253, 10, 192, 136, 4, 157, 249, 30, 35, 72,
+   175, 63, 77, 90, 181, 16, 96, 111, 133, 104, 75, 162, 93, 56, 66, 240,
+   8, 50, 84, 229, 49, 210, 173, 239, 141, 1, 87, 18, 2, 198, 143, 57,
+   225, 160, 58, 217, 168, 206, 245, 204, 199, 6, 73, 60, 20, 230, 211, 233,
+   94, 200, 88, 9, 74, 155, 33, 15, 219, 130, 226, 202, 83, 236, 42, 172,
+   165, 218, 55, 222, 46, 107, 98, 154, 109, 67, 196, 178, 127, 158, 13, 243,
+   65, 79, 166, 248, 25, 224, 115, 80, 68, 51, 184, 128, 232, 208, 151, 122,
+   26, 212, 105, 43, 179, 213, 235, 148, 146, 89, 14, 195, 28, 78, 112, 76,
+   250, 47, 24, 251, 140, 108, 186, 190, 228, 170, 183, 139, 39, 188, 244, 246,
+   132, 48, 119, 144, 180, 138, 134, 193, 82, 182, 120, 121, 86, 220, 209, 3,
+   91, 241, 149, 85, 205, 150, 113, 216, 31, 100, 41, 164, 177, 214, 153, 231,
+   38, 71, 185, 174, 97, 201, 29, 95, 7, 92, 54, 254, 191, 118, 34, 221,
+   131, 11, 163, 99, 234, 81, 227, 147, 156, 176, 17, 142, 69, 12, 110, 62,
+   27, 255, 0, 194, 59, 116, 242, 252, 19, 21, 187, 53, 207, 129, 64, 135,
+   61, 40, 167, 237, 102, 223, 106, 159, 197, 189, 215, 137, 36, 32, 22, 5,
+
+   // and a second copy so we don't need an extra mask or static initializer
+   23, 125, 161, 52, 103, 117, 70, 37, 247, 101, 203, 169, 124, 126, 44, 123,
+   152, 238, 145, 45, 171, 114, 253, 10, 192, 136, 4, 157, 249, 30, 35, 72,
+   175, 63, 77, 90, 181, 16, 96, 111, 133, 104, 75, 162, 93, 56, 66, 240,
+   8, 50, 84, 229, 49, 210, 173, 239, 141, 1, 87, 18, 2, 198, 143, 57,
+   225, 160, 58, 217, 168, 206, 245, 204, 199, 6, 73, 60, 20, 230, 211, 233,
+   94, 200, 88, 9, 74, 155, 33, 15, 219, 130, 226, 202, 83, 236, 42, 172,
+   165, 218, 55, 222, 46, 107, 98, 154, 109, 67, 196, 178, 127, 158, 13, 243,
+   65, 79, 166, 248, 25, 224, 115, 80, 68, 51, 184, 128, 232, 208, 151, 122,
+   26, 212, 105, 43, 179, 213, 235, 148, 146, 89, 14, 195, 28, 78, 112, 76,
+   250, 47, 24, 251, 140, 108, 186, 190, 228, 170, 183, 139, 39, 188, 244, 246,
+   132, 48, 119, 144, 180, 138, 134, 193, 82, 182, 120, 121, 86, 220, 209, 3,
+   91, 241, 149, 85, 205, 150, 113, 216, 31, 100, 41, 164, 177, 214, 153, 231,
+   38, 71, 185, 174, 97, 201, 29, 95, 7, 92, 54, 254, 191, 118, 34, 221,
+   131, 11, 163, 99, 234, 81, 227, 147, 156, 176, 17, 142, 69, 12, 110, 62,
+   27, 255, 0, 194, 59, 116, 242, 252, 19, 21, 187, 53, 207, 129, 64, 135,
+   61, 40, 167, 237, 102, 223, 106, 159, 197, 189, 215, 137, 36, 32, 22, 5,
+};
+
+
+// perlin's gradient has 12 cases so some get used 1/16th of the time
+// and some 2/16ths. We reduce bias by changing those fractions
+// to 5/64ths and 6/64ths
+
+// this array is designed to match the previous implementation
+// of gradient hash: indices[stb__perlin_randtab[i]&63]
+static unsigned char stb__perlin_randtab_grad_idx[512] =
+{
+    7, 9, 5, 0, 11, 1, 6, 9, 3, 9, 11, 1, 8, 10, 4, 7,
+    8, 6, 1, 5, 3, 10, 9, 10, 0, 8, 4, 1, 5, 2, 7, 8,
+    7, 11, 9, 10, 1, 0, 4, 7, 5, 0, 11, 6, 1, 4, 2, 8,
+    8, 10, 4, 9, 9, 2, 5, 7, 9, 1, 7, 2, 2, 6, 11, 5,
+    5, 4, 6, 9, 0, 1, 1, 0, 7, 6, 9, 8, 4, 10, 3, 1,
+    2, 8, 8, 9, 10, 11, 5, 11, 11, 2, 6, 10, 3, 4, 2, 4,
+    9, 10, 3, 2, 6, 3, 6, 10, 5, 3, 4, 10, 11, 2, 9, 11,
+    1, 11, 10, 4, 9, 4, 11, 0, 4, 11, 4, 0, 0, 0, 7, 6,
+    10, 4, 1, 3, 11, 5, 3, 4, 2, 9, 1, 3, 0, 1, 8, 0,
+    6, 7, 8, 7, 0, 4, 6, 10, 8, 2, 3, 11, 11, 8, 0, 2,
+    4, 8, 3, 0, 0, 10, 6, 1, 2, 2, 4, 5, 6, 0, 1, 3,
+    11, 9, 5, 5, 9, 6, 9, 8, 3, 8, 1, 8, 9, 6, 9, 11,
+    10, 7, 5, 6, 5, 9, 1, 3, 7, 0, 2, 10, 11, 2, 6, 1,
+    3, 11, 7, 7, 2, 1, 7, 3, 0, 8, 1, 1, 5, 0, 6, 10,
+    11, 11, 0, 2, 7, 0, 10, 8, 3, 5, 7, 1, 11, 1, 0, 7,
+    9, 0, 11, 5, 10, 3, 2, 3, 5, 9, 7, 9, 8, 4, 6, 5,
+
+    // and a second copy so we don't need an extra mask or static initializer
+    7, 9, 5, 0, 11, 1, 6, 9, 3, 9, 11, 1, 8, 10, 4, 7,
+    8, 6, 1, 5, 3, 10, 9, 10, 0, 8, 4, 1, 5, 2, 7, 8,
+    7, 11, 9, 10, 1, 0, 4, 7, 5, 0, 11, 6, 1, 4, 2, 8,
+    8, 10, 4, 9, 9, 2, 5, 7, 9, 1, 7, 2, 2, 6, 11, 5,
+    5, 4, 6, 9, 0, 1, 1, 0, 7, 6, 9, 8, 4, 10, 3, 1,
+    2, 8, 8, 9, 10, 11, 5, 11, 11, 2, 6, 10, 3, 4, 2, 4,
+    9, 10, 3, 2, 6, 3, 6, 10, 5, 3, 4, 10, 11, 2, 9, 11,
+    1, 11, 10, 4, 9, 4, 11, 0, 4, 11, 4, 0, 0, 0, 7, 6,
+    10, 4, 1, 3, 11, 5, 3, 4, 2, 9, 1, 3, 0, 1, 8, 0,
+    6, 7, 8, 7, 0, 4, 6, 10, 8, 2, 3, 11, 11, 8, 0, 2,
+    4, 8, 3, 0, 0, 10, 6, 1, 2, 2, 4, 5, 6, 0, 1, 3,
+    11, 9, 5, 5, 9, 6, 9, 8, 3, 8, 1, 8, 9, 6, 9, 11,
+    10, 7, 5, 6, 5, 9, 1, 3, 7, 0, 2, 10, 11, 2, 6, 1,
+    3, 11, 7, 7, 2, 1, 7, 3, 0, 8, 1, 1, 5, 0, 6, 10,
+    11, 11, 0, 2, 7, 0, 10, 8, 3, 5, 7, 1, 11, 1, 0, 7,
+    9, 0, 11, 5, 10, 3, 2, 3, 5, 9, 7, 9, 8, 4, 6, 5,
+};
+
+static float stb__perlin_lerp(float a, float b, float t)
+{
+   return a + (b-a) * t;
+}
+
+static int stb__perlin_fastfloor(float a)
+{
+    int ai = (int) a;
+    return (a < ai) ? ai-1 : ai;
+}
+
+// different grad function from Perlin's, but easy to modify to match reference
+static float stb__perlin_grad(int grad_idx, float x, float y, float z)
+{
+   static float basis[12][4] =
+   {
+      {  1, 1, 0 },
+      { -1, 1, 0 },
+      {  1,-1, 0 },
+      { -1,-1, 0 },
+      {  1, 0, 1 },
+      { -1, 0, 1 },
+      {  1, 0,-1 },
+      { -1, 0,-1 },
+      {  0, 1, 1 },
+      {  0,-1, 1 },
+      {  0, 1,-1 },
+      {  0,-1,-1 },
+   };
+
+   float *grad = basis[grad_idx];
+   return grad[0]*x + grad[1]*y + grad[2]*z;
+}
+
+float stb_perlin_noise3_internal(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap, unsigned char seed)
+{
+   float u,v,w;
+   float n000,n001,n010,n011,n100,n101,n110,n111;
+   float n00,n01,n10,n11;
+   float n0,n1;
+
+   unsigned int x_mask = (x_wrap-1) & 255;
+   unsigned int y_mask = (y_wrap-1) & 255;
+   unsigned int z_mask = (z_wrap-1) & 255;
+   int px = stb__perlin_fastfloor(x);
+   int py = stb__perlin_fastfloor(y);
+   int pz = stb__perlin_fastfloor(z);
+   int x0 = px & x_mask, x1 = (px+1) & x_mask;
+   int y0 = py & y_mask, y1 = (py+1) & y_mask;
+   int z0 = pz & z_mask, z1 = (pz+1) & z_mask;
+   int r0,r1, r00,r01,r10,r11;
+
+   #define stb__perlin_ease(a)   (((a*6-15)*a + 10) * a * a * a)
+
+   x -= px; u = stb__perlin_ease(x);
+   y -= py; v = stb__perlin_ease(y);
+   z -= pz; w = stb__perlin_ease(z);
+
+   r0 = stb__perlin_randtab[x0+seed];
+   r1 = stb__perlin_randtab[x1+seed];
+
+   r00 = stb__perlin_randtab[r0+y0];
+   r01 = stb__perlin_randtab[r0+y1];
+   r10 = stb__perlin_randtab[r1+y0];
+   r11 = stb__perlin_randtab[r1+y1];
+
+   n000 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r00+z0], x  , y  , z   );
+   n001 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r00+z1], x  , y  , z-1 );
+   n010 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r01+z0], x  , y-1, z   );
+   n011 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r01+z1], x  , y-1, z-1 );
+   n100 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r10+z0], x-1, y  , z   );
+   n101 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r10+z1], x-1, y  , z-1 );
+   n110 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r11+z0], x-1, y-1, z   );
+   n111 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r11+z1], x-1, y-1, z-1 );
+
+   n00 = stb__perlin_lerp(n000,n001,w);
+   n01 = stb__perlin_lerp(n010,n011,w);
+   n10 = stb__perlin_lerp(n100,n101,w);
+   n11 = stb__perlin_lerp(n110,n111,w);
+
+   n0 = stb__perlin_lerp(n00,n01,v);
+   n1 = stb__perlin_lerp(n10,n11,v);
+
+   return stb__perlin_lerp(n0,n1,u);
+}
+
+float stb_perlin_noise3(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap)
+{
+    return stb_perlin_noise3_internal(x,y,z,x_wrap,y_wrap,z_wrap,0);
+}
+
+float stb_perlin_noise3_seed(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap, int seed)
+{
+    return stb_perlin_noise3_internal(x,y,z,x_wrap,y_wrap,z_wrap, (unsigned char) seed);
+}
+
+float stb_perlin_ridge_noise3(float x, float y, float z, float lacunarity, float gain, float offset, int octaves)
+{
+   int i;
+   float frequency = 1.0f;
+   float prev = 1.0f;
+   float amplitude = 0.5f;
+   float sum = 0.0f;
+
+   for (i = 0; i < octaves; i++) {
+      float r = stb_perlin_noise3_internal(x*frequency,y*frequency,z*frequency,0,0,0,(unsigned char)i);
+      r = offset - (float) fabs(r);
+      r = r*r;
+      sum += r*amplitude*prev;
+      prev = r;
+      frequency *= lacunarity;
+      amplitude *= gain;
+   }
+   return sum;
+}
+
+float stb_perlin_fbm_noise3(float x, float y, float z, float lacunarity, float gain, int octaves)
+{
+   int i;
+   float frequency = 1.0f;
+   float amplitude = 1.0f;
+   float sum = 0.0f;
+
+   for (i = 0; i < octaves; i++) {
+      sum += stb_perlin_noise3_internal(x*frequency,y*frequency,z*frequency,0,0,0,(unsigned char)i)*amplitude;
+      frequency *= lacunarity;
+      amplitude *= gain;
+   }
+   return sum;
+}
+
+float stb_perlin_turbulence_noise3(float x, float y, float z, float lacunarity, float gain, int octaves)
+{
+   int i;
+   float frequency = 1.0f;
+   float amplitude = 1.0f;
+   float sum = 0.0f;
+
+   for (i = 0; i < octaves; i++) {
+      float r = stb_perlin_noise3_internal(x*frequency,y*frequency,z*frequency,0,0,0,(unsigned char)i)*amplitude;
+      sum += (float) fabs(r);
+      frequency *= lacunarity;
+      amplitude *= gain;
+   }
+   return sum;
+}
+
+float stb_perlin_noise3_wrap_nonpow2(float x, float y, float z, int x_wrap, int y_wrap, int z_wrap, unsigned char seed)
+{
+   float u,v,w;
+   float n000,n001,n010,n011,n100,n101,n110,n111;
+   float n00,n01,n10,n11;
+   float n0,n1;
+
+   int px = stb__perlin_fastfloor(x);
+   int py = stb__perlin_fastfloor(y);
+   int pz = stb__perlin_fastfloor(z);
+   int x_wrap2 = (x_wrap ? x_wrap : 256);
+   int y_wrap2 = (y_wrap ? y_wrap : 256);
+   int z_wrap2 = (z_wrap ? z_wrap : 256);
+   int x0 = px % x_wrap2, x1;
+   int y0 = py % y_wrap2, y1;
+   int z0 = pz % z_wrap2, z1;
+   int r0,r1, r00,r01,r10,r11;
+
+   if (x0 < 0) x0 += x_wrap2;
+   if (y0 < 0) y0 += y_wrap2;
+   if (z0 < 0) z0 += z_wrap2;
+   x1 = (x0+1) % x_wrap2;
+   y1 = (y0+1) % y_wrap2;
+   z1 = (z0+1) % z_wrap2;
+
+   #define stb__perlin_ease(a)   (((a*6-15)*a + 10) * a * a * a)
+
+   x -= px; u = stb__perlin_ease(x);
+   y -= py; v = stb__perlin_ease(y);
+   z -= pz; w = stb__perlin_ease(z);
+
+   r0 = stb__perlin_randtab[x0];
+   r0 = stb__perlin_randtab[r0+seed];
+   r1 = stb__perlin_randtab[x1];
+   r1 = stb__perlin_randtab[r1+seed];
+
+   r00 = stb__perlin_randtab[r0+y0];
+   r01 = stb__perlin_randtab[r0+y1];
+   r10 = stb__perlin_randtab[r1+y0];
+   r11 = stb__perlin_randtab[r1+y1];
+
+   n000 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r00+z0], x  , y  , z   );
+   n001 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r00+z1], x  , y  , z-1 );
+   n010 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r01+z0], x  , y-1, z   );
+   n011 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r01+z1], x  , y-1, z-1 );
+   n100 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r10+z0], x-1, y  , z   );
+   n101 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r10+z1], x-1, y  , z-1 );
+   n110 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r11+z0], x-1, y-1, z   );
+   n111 = stb__perlin_grad(stb__perlin_randtab_grad_idx[r11+z1], x-1, y-1, z-1 );
+
+   n00 = stb__perlin_lerp(n000,n001,w);
+   n01 = stb__perlin_lerp(n010,n011,w);
+   n10 = stb__perlin_lerp(n100,n101,w);
+   n11 = stb__perlin_lerp(n110,n111,w);
+
+   n0 = stb__perlin_lerp(n00,n01,v);
+   n1 = stb__perlin_lerp(n10,n11,v);
+
+   return stb__perlin_lerp(n0,n1,u);
+}
+#endif  // STB_PERLIN_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_rect_pack.h b/vendor/stb/stb_rect_pack.h
new file mode 100644
index 0000000..6a633ce
--- /dev/null
+++ b/vendor/stb/stb_rect_pack.h
@@ -0,0 +1,623 @@
+// stb_rect_pack.h - v1.01 - public domain - rectangle packing
+// Sean Barrett 2014
+//
+// Useful for e.g. packing rectangular textures into an atlas.
+// Does not do rotation.
+//
+// Before #including,
+//
+//    #define STB_RECT_PACK_IMPLEMENTATION
+//
+// in the file that you want to have the implementation.
+//
+// Not necessarily the awesomest packing method, but better than
+// the totally naive one in stb_truetype (which is primarily what
+// this is meant to replace).
+//
+// Has only had a few tests run, may have issues.
+//
+// More docs to come.
+//
+// No memory allocations; uses qsort() and assert() from stdlib.
+// Can override those by defining STBRP_SORT and STBRP_ASSERT.
+//
+// This library currently uses the Skyline Bottom-Left algorithm.
+//
+// Please note: better rectangle packers are welcome! Please
+// implement them to the same API, but with a different init
+// function.
+//
+// Credits
+//
+//  Library
+//    Sean Barrett
+//  Minor features
+//    Martins Mozeiko
+//    github:IntellectualKitty
+//
+//  Bugfixes / warning fixes
+//    Jeremy Jaussaud
+//    Fabian Giesen
+//
+// Version history:
+//
+//     1.01  (2021-07-11)  always use large rect mode, expose STBRP__MAXVAL in public section
+//     1.00  (2019-02-25)  avoid small space waste; gracefully fail too-wide rectangles
+//     0.99  (2019-02-07)  warning fixes
+//     0.11  (2017-03-03)  return packing success/fail result
+//     0.10  (2016-10-25)  remove cast-away-const to avoid warnings
+//     0.09  (2016-08-27)  fix compiler warnings
+//     0.08  (2015-09-13)  really fix bug with empty rects (w=0 or h=0)
+//     0.07  (2015-09-13)  fix bug with empty rects (w=0 or h=0)
+//     0.06  (2015-04-15)  added STBRP_SORT to allow replacing qsort
+//     0.05:  added STBRP_ASSERT to allow replacing assert
+//     0.04:  fixed minor bug in STBRP_LARGE_RECTS support
+//     0.01:  initial release
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//       INCLUDE SECTION
+//
+
+#ifndef STB_INCLUDE_STB_RECT_PACK_H
+#define STB_INCLUDE_STB_RECT_PACK_H
+
+#define STB_RECT_PACK_VERSION  1
+
+#ifdef STBRP_STATIC
+#define STBRP_DEF static
+#else
+#define STBRP_DEF extern
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct stbrp_context stbrp_context;
+typedef struct stbrp_node    stbrp_node;
+typedef struct stbrp_rect    stbrp_rect;
+
+typedef int            stbrp_coord;
+
+#define STBRP__MAXVAL  0x7fffffff
+// Mostly for internal use, but this is the maximum supported coordinate value.
+
+STBRP_DEF int stbrp_pack_rects (stbrp_context *context, stbrp_rect *rects, int num_rects);
+// Assign packed locations to rectangles. The rectangles are of type
+// 'stbrp_rect' defined below, stored in the array 'rects', and there
+// are 'num_rects' many of them.
+//
+// Rectangles which are successfully packed have the 'was_packed' flag
+// set to a non-zero value and 'x' and 'y' store the minimum location
+// on each axis (i.e. bottom-left in cartesian coordinates, top-left
+// if you imagine y increasing downwards). Rectangles which do not fit
+// have the 'was_packed' flag set to 0.
+//
+// You should not try to access the 'rects' array from another thread
+// while this function is running, as the function temporarily reorders
+// the array while it executes.
+//
+// To pack into another rectangle, you need to call stbrp_init_target
+// again. To continue packing into the same rectangle, you can call
+// this function again. Calling this multiple times with multiple rect
+// arrays will probably produce worse packing results than calling it
+// a single time with the full rectangle array, but the option is
+// available.
+//
+// The function returns 1 if all of the rectangles were successfully
+// packed and 0 otherwise.
+
+struct stbrp_rect
+{
+   // reserved for your use:
+   int            id;
+
+   // input:
+   stbrp_coord    w, h;
+
+   // output:
+   stbrp_coord    x, y;
+   int            was_packed;  // non-zero if valid packing
+
+}; // 16 bytes, nominally
+
+
+STBRP_DEF void stbrp_init_target (stbrp_context *context, int width, int height, stbrp_node *nodes, int num_nodes);
+// Initialize a rectangle packer to:
+//    pack a rectangle that is 'width' by 'height' in dimensions
+//    using temporary storage provided by the array 'nodes', which is 'num_nodes' long
+//
+// You must call this function every time you start packing into a new target.
+//
+// There is no "shutdown" function. The 'nodes' memory must stay valid for
+// the following stbrp_pack_rects() call (or calls), but can be freed after
+// the call (or calls) finish.
+//
+// Note: to guarantee best results, either:
+//       1. make sure 'num_nodes' >= 'width'
+//   or  2. call stbrp_allow_out_of_mem() defined below with 'allow_out_of_mem = 1'
+//
+// If you don't do either of the above things, widths will be quantized to multiples
+// of small integers to guarantee the algorithm doesn't run out of temporary storage.
+//
+// If you do #2, then the non-quantized algorithm will be used, but the algorithm
+// may run out of temporary storage and be unable to pack some rectangles.
+
+STBRP_DEF void stbrp_setup_allow_out_of_mem (stbrp_context *context, int allow_out_of_mem);
+// Optionally call this function after init but before doing any packing to
+// change the handling of the out-of-temp-memory scenario, described above.
+// If you call init again, this will be reset to the default (false).
+
+
+STBRP_DEF void stbrp_setup_heuristic (stbrp_context *context, int heuristic);
+// Optionally select which packing heuristic the library should use. Different
+// heuristics will produce better/worse results for different data sets.
+// If you call init again, this will be reset to the default.
+
+enum
+{
+   STBRP_HEURISTIC_Skyline_default=0,
+   STBRP_HEURISTIC_Skyline_BL_sortHeight = STBRP_HEURISTIC_Skyline_default,
+   STBRP_HEURISTIC_Skyline_BF_sortHeight
+};
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// the details of the following structures don't matter to you, but they must
+// be visible so you can handle the memory allocations for them
+
+struct stbrp_node
+{
+   stbrp_coord  x,y;
+   stbrp_node  *next;
+};
+
+struct stbrp_context
+{
+   int width;
+   int height;
+   int align;
+   int init_mode;
+   int heuristic;
+   int num_nodes;
+   stbrp_node *active_head;
+   stbrp_node *free_head;
+   stbrp_node extra[2]; // we allocate two extra nodes so optimal user-node-count is 'width' not 'width+2'
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//     IMPLEMENTATION SECTION
+//
+
+#ifdef STB_RECT_PACK_IMPLEMENTATION
+#ifndef STBRP_SORT
+#include <stdlib.h>
+#define STBRP_SORT qsort
+#endif
+
+#ifndef STBRP_ASSERT
+#include <assert.h>
+#define STBRP_ASSERT assert
+#endif
+
+#ifdef _MSC_VER
+#define STBRP__NOTUSED(v)  (void)(v)
+#define STBRP__CDECL       __cdecl
+#else
+#define STBRP__NOTUSED(v)  (void)sizeof(v)
+#define STBRP__CDECL
+#endif
+
+enum
+{
+   STBRP__INIT_skyline = 1
+};
+
+STBRP_DEF void stbrp_setup_heuristic(stbrp_context *context, int heuristic)
+{
+   switch (context->init_mode) {
+      case STBRP__INIT_skyline:
+         STBRP_ASSERT(heuristic == STBRP_HEURISTIC_Skyline_BL_sortHeight || heuristic == STBRP_HEURISTIC_Skyline_BF_sortHeight);
+         context->heuristic = heuristic;
+         break;
+      default:
+         STBRP_ASSERT(0);
+   }
+}
+
+STBRP_DEF void stbrp_setup_allow_out_of_mem(stbrp_context *context, int allow_out_of_mem)
+{
+   if (allow_out_of_mem)
+      // if it's ok to run out of memory, then don't bother aligning them;
+      // this gives better packing, but may fail due to OOM (even though
+      // the rectangles easily fit). @TODO a smarter approach would be to only
+      // quantize once we've hit OOM, then we could get rid of this parameter.
+      context->align = 1;
+   else {
+      // if it's not ok to run out of memory, then quantize the widths
+      // so that num_nodes is always enough nodes.
+      //
+      // I.e. num_nodes * align >= width
+      //                  align >= width / num_nodes
+      //                  align = ceil(width/num_nodes)
+
+      context->align = (context->width + context->num_nodes-1) / context->num_nodes;
+   }
+}
+
+STBRP_DEF void stbrp_init_target(stbrp_context *context, int width, int height, stbrp_node *nodes, int num_nodes)
+{
+   int i;
+
+   for (i=0; i < num_nodes-1; ++i)
+      nodes[i].next = &nodes[i+1];
+   nodes[i].next = NULL;
+   context->init_mode = STBRP__INIT_skyline;
+   context->heuristic = STBRP_HEURISTIC_Skyline_default;
+   context->free_head = &nodes[0];
+   context->active_head = &context->extra[0];
+   context->width = width;
+   context->height = height;
+   context->num_nodes = num_nodes;
+   stbrp_setup_allow_out_of_mem(context, 0);
+
+   // node 0 is the full width, node 1 is the sentinel (lets us not store width explicitly)
+   context->extra[0].x = 0;
+   context->extra[0].y = 0;
+   context->extra[0].next = &context->extra[1];
+   context->extra[1].x = (stbrp_coord) width;
+   context->extra[1].y = (1<<30);
+   context->extra[1].next = NULL;
+}
+
+// find minimum y position if it starts at x1
+static int stbrp__skyline_find_min_y(stbrp_context *c, stbrp_node *first, int x0, int width, int *pwaste)
+{
+   stbrp_node *node = first;
+   int x1 = x0 + width;
+   int min_y, visited_width, waste_area;
+
+   STBRP__NOTUSED(c);
+
+   STBRP_ASSERT(first->x <= x0);
+
+   #if 0
+   // skip in case we're past the node
+   while (node->next->x <= x0)
+      ++node;
+   #else
+   STBRP_ASSERT(node->next->x > x0); // we ended up handling this in the caller for efficiency
+   #endif
+
+   STBRP_ASSERT(node->x <= x0);
+
+   min_y = 0;
+   waste_area = 0;
+   visited_width = 0;
+   while (node->x < x1) {
+      if (node->y > min_y) {
+         // raise min_y higher.
+         // we've accounted for all waste up to min_y,
+         // but we'll now add more waste for everything we've visted
+         waste_area += visited_width * (node->y - min_y);
+         min_y = node->y;
+         // the first time through, visited_width might be reduced
+         if (node->x < x0)
+            visited_width += node->next->x - x0;
+         else
+            visited_width += node->next->x - node->x;
+      } else {
+         // add waste area
+         int under_width = node->next->x - node->x;
+         if (under_width + visited_width > width)
+            under_width = width - visited_width;
+         waste_area += under_width * (min_y - node->y);
+         visited_width += under_width;
+      }
+      node = node->next;
+   }
+
+   *pwaste = waste_area;
+   return min_y;
+}
+
+typedef struct
+{
+   int x,y;
+   stbrp_node **prev_link;
+} stbrp__findresult;
+
+static stbrp__findresult stbrp__skyline_find_best_pos(stbrp_context *c, int width, int height)
+{
+   int best_waste = (1<<30), best_x, best_y = (1 << 30);
+   stbrp__findresult fr;
+   stbrp_node **prev, *node, *tail, **best = NULL;
+
+   // align to multiple of c->align
+   width = (width + c->align - 1);
+   width -= width % c->align;
+   STBRP_ASSERT(width % c->align == 0);
+
+   // if it can't possibly fit, bail immediately
+   if (width > c->width || height > c->height) {
+      fr.prev_link = NULL;
+      fr.x = fr.y = 0;
+      return fr;
+   }
+
+   node = c->active_head;
+   prev = &c->active_head;
+   while (node->x + width <= c->width) {
+      int y,waste;
+      y = stbrp__skyline_find_min_y(c, node, node->x, width, &waste);
+      if (c->heuristic == STBRP_HEURISTIC_Skyline_BL_sortHeight) { // actually just want to test BL
+         // bottom left
+         if (y < best_y) {
+            best_y = y;
+            best = prev;
+         }
+      } else {
+         // best-fit
+         if (y + height <= c->height) {
+            // can only use it if it first vertically
+            if (y < best_y || (y == best_y && waste < best_waste)) {
+               best_y = y;
+               best_waste = waste;
+               best = prev;
+            }
+         }
+      }
+      prev = &node->next;
+      node = node->next;
+   }
+
+   best_x = (best == NULL) ? 0 : (*best)->x;
+
+   // if doing best-fit (BF), we also have to try aligning right edge to each node position
+   //
+   // e.g, if fitting
+   //
+   //     ____________________
+   //    |____________________|
+   //
+   //            into
+   //
+   //   |                         |
+   //   |             ____________|
+   //   |____________|
+   //
+   // then right-aligned reduces waste, but bottom-left BL is always chooses left-aligned
+   //
+   // This makes BF take about 2x the time
+
+   if (c->heuristic == STBRP_HEURISTIC_Skyline_BF_sortHeight) {
+      tail = c->active_head;
+      node = c->active_head;
+      prev = &c->active_head;
+      // find first node that's admissible
+      while (tail->x < width)
+         tail = tail->next;
+      while (tail) {
+         int xpos = tail->x - width;
+         int y,waste;
+         STBRP_ASSERT(xpos >= 0);
+         // find the left position that matches this
+         while (node->next->x <= xpos) {
+            prev = &node->next;
+            node = node->next;
+         }
+         STBRP_ASSERT(node->next->x > xpos && node->x <= xpos);
+         y = stbrp__skyline_find_min_y(c, node, xpos, width, &waste);
+         if (y + height <= c->height) {
+            if (y <= best_y) {
+               if (y < best_y || waste < best_waste || (waste==best_waste && xpos < best_x)) {
+                  best_x = xpos;
+                  STBRP_ASSERT(y <= best_y);
+                  best_y = y;
+                  best_waste = waste;
+                  best = prev;
+               }
+            }
+         }
+         tail = tail->next;
+      }
+   }
+
+   fr.prev_link = best;
+   fr.x = best_x;
+   fr.y = best_y;
+   return fr;
+}
+
+static stbrp__findresult stbrp__skyline_pack_rectangle(stbrp_context *context, int width, int height)
+{
+   // find best position according to heuristic
+   stbrp__findresult res = stbrp__skyline_find_best_pos(context, width, height);
+   stbrp_node *node, *cur;
+
+   // bail if:
+   //    1. it failed
+   //    2. the best node doesn't fit (we don't always check this)
+   //    3. we're out of memory
+   if (res.prev_link == NULL || res.y + height > context->height || context->free_head == NULL) {
+      res.prev_link = NULL;
+      return res;
+   }
+
+   // on success, create new node
+   node = context->free_head;
+   node->x = (stbrp_coord) res.x;
+   node->y = (stbrp_coord) (res.y + height);
+
+   context->free_head = node->next;
+
+   // insert the new node into the right starting point, and
+   // let 'cur' point to the remaining nodes needing to be
+   // stiched back in
+
+   cur = *res.prev_link;
+   if (cur->x < res.x) {
+      // preserve the existing one, so start testing with the next one
+      stbrp_node *next = cur->next;
+      cur->next = node;
+      cur = next;
+   } else {
+      *res.prev_link = node;
+   }
+
+   // from here, traverse cur and free the nodes, until we get to one
+   // that shouldn't be freed
+   while (cur->next && cur->next->x <= res.x + width) {
+      stbrp_node *next = cur->next;
+      // move the current node to the free list
+      cur->next = context->free_head;
+      context->free_head = cur;
+      cur = next;
+   }
+
+   // stitch the list back in
+   node->next = cur;
+
+   if (cur->x < res.x + width)
+      cur->x = (stbrp_coord) (res.x + width);
+
+#ifdef _DEBUG
+   cur = context->active_head;
+   while (cur->x < context->width) {
+      STBRP_ASSERT(cur->x < cur->next->x);
+      cur = cur->next;
+   }
+   STBRP_ASSERT(cur->next == NULL);
+
+   {
+      int count=0;
+      cur = context->active_head;
+      while (cur) {
+         cur = cur->next;
+         ++count;
+      }
+      cur = context->free_head;
+      while (cur) {
+         cur = cur->next;
+         ++count;
+      }
+      STBRP_ASSERT(count == context->num_nodes+2);
+   }
+#endif
+
+   return res;
+}
+
+static int STBRP__CDECL rect_height_compare(const void *a, const void *b)
+{
+   const stbrp_rect *p = (const stbrp_rect *) a;
+   const stbrp_rect *q = (const stbrp_rect *) b;
+   if (p->h > q->h)
+      return -1;
+   if (p->h < q->h)
+      return  1;
+   return (p->w > q->w) ? -1 : (p->w < q->w);
+}
+
+static int STBRP__CDECL rect_original_order(const void *a, const void *b)
+{
+   const stbrp_rect *p = (const stbrp_rect *) a;
+   const stbrp_rect *q = (const stbrp_rect *) b;
+   return (p->was_packed < q->was_packed) ? -1 : (p->was_packed > q->was_packed);
+}
+
+STBRP_DEF int stbrp_pack_rects(stbrp_context *context, stbrp_rect *rects, int num_rects)
+{
+   int i, all_rects_packed = 1;
+
+   // we use the 'was_packed' field internally to allow sorting/unsorting
+   for (i=0; i < num_rects; ++i) {
+      rects[i].was_packed = i;
+   }
+
+   // sort according to heuristic
+   STBRP_SORT(rects, num_rects, sizeof(rects[0]), rect_height_compare);
+
+   for (i=0; i < num_rects; ++i) {
+      if (rects[i].w == 0 || rects[i].h == 0) {
+         rects[i].x = rects[i].y = 0;  // empty rect needs no space
+      } else {
+         stbrp__findresult fr = stbrp__skyline_pack_rectangle(context, rects[i].w, rects[i].h);
+         if (fr.prev_link) {
+            rects[i].x = (stbrp_coord) fr.x;
+            rects[i].y = (stbrp_coord) fr.y;
+         } else {
+            rects[i].x = rects[i].y = STBRP__MAXVAL;
+         }
+      }
+   }
+
+   // unsort
+   STBRP_SORT(rects, num_rects, sizeof(rects[0]), rect_original_order);
+
+   // set was_packed flags and all_rects_packed status
+   for (i=0; i < num_rects; ++i) {
+      rects[i].was_packed = !(rects[i].x == STBRP__MAXVAL && rects[i].y == STBRP__MAXVAL);
+      if (!rects[i].was_packed)
+         all_rects_packed = 0;
+   }
+
+   // return the all_rects_packed status
+   return all_rects_packed;
+}
+#endif
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_sprintf.h b/vendor/stb/stb_sprintf.h
new file mode 100644
index 0000000..ca432a6
--- /dev/null
+++ b/vendor/stb/stb_sprintf.h
@@ -0,0 +1,1906 @@
+// stb_sprintf - v1.10 - public domain snprintf() implementation
+// originally by Jeff Roberts / RAD Game Tools, 2015/10/20
+// http://github.com/nothings/stb
+//
+// allowed types:  sc uidBboXx p AaGgEef n
+// lengths      :  hh h ll j z t I64 I32 I
+//
+// Contributors:
+//    Fabian "ryg" Giesen (reformatting)
+//    github:aganm (attribute format)
+//
+// Contributors (bugfixes):
+//    github:d26435
+//    github:trex78
+//    github:account-login
+//    Jari Komppa (SI suffixes)
+//    Rohit Nirmal
+//    Marcin Wojdyr
+//    Leonard Ritter
+//    Stefano Zanotti
+//    Adam Allison
+//    Arvid Gerstmann
+//    Markus Kolb
+//
+// LICENSE:
+//
+//   See end of file for license information.
+
+#ifndef STB_SPRINTF_H_INCLUDE
+#define STB_SPRINTF_H_INCLUDE
+
+/*
+Single file sprintf replacement.
+
+Originally written by Jeff Roberts at RAD Game Tools - 2015/10/20.
+Hereby placed in public domain.
+
+This is a full sprintf replacement that supports everything that
+the C runtime sprintfs support, including float/double, 64-bit integers,
+hex floats, field parameters (%*.*d stuff), length reads backs, etc.
+
+Why would you need this if sprintf already exists?  Well, first off,
+it's *much* faster (see below). It's also much smaller than the CRT
+versions code-space-wise. We've also added some simple improvements
+that are super handy (commas in thousands, callbacks at buffer full,
+for example). Finally, the format strings for MSVC and GCC differ
+for 64-bit integers (among other small things), so this lets you use
+the same format strings in cross platform code.
+
+It uses the standard single file trick of being both the header file
+and the source itself. If you just include it normally, you just get
+the header file function definitions. To get the code, you include
+it from a C or C++ file and define STB_SPRINTF_IMPLEMENTATION first.
+
+It only uses va_args macros from the C runtime to do it's work. It
+does cast doubles to S64s and shifts and divides U64s, which does
+drag in CRT code on most platforms.
+
+It compiles to roughly 8K with float support, and 4K without.
+As a comparison, when using MSVC static libs, calling sprintf drags
+in 16K.
+
+API:
+====
+int stbsp_sprintf( char * buf, char const * fmt, ... )
+int stbsp_snprintf( char * buf, int count, char const * fmt, ... )
+  Convert an arg list into a buffer.  stbsp_snprintf always returns
+  a zero-terminated string (unlike regular snprintf).
+
+int stbsp_vsprintf( char * buf, char const * fmt, va_list va )
+int stbsp_vsnprintf( char * buf, int count, char const * fmt, va_list va )
+  Convert a va_list arg list into a buffer.  stbsp_vsnprintf always returns
+  a zero-terminated string (unlike regular snprintf).
+
+int stbsp_vsprintfcb( STBSP_SPRINTFCB * callback, void * user, char * buf, char const * fmt, va_list va )
+    typedef char * STBSP_SPRINTFCB( char const * buf, void * user, int len );
+  Convert into a buffer, calling back every STB_SPRINTF_MIN chars.
+  Your callback can then copy the chars out, print them or whatever.
+  This function is actually the workhorse for everything else.
+  The buffer you pass in must hold at least STB_SPRINTF_MIN characters.
+    // you return the next buffer to use or 0 to stop converting
+
+void stbsp_set_separators( char comma, char period )
+  Set the comma and period characters to use.
+
+FLOATS/DOUBLES:
+===============
+This code uses a internal float->ascii conversion method that uses
+doubles with error correction (double-doubles, for ~105 bits of
+precision).  This conversion is round-trip perfect - that is, an atof
+of the values output here will give you the bit-exact double back.
+
+One difference is that our insignificant digits will be different than
+with MSVC or GCC (but they don't match each other either).  We also
+don't attempt to find the minimum length matching float (pre-MSVC15
+doesn't either).
+
+If you don't need float or doubles at all, define STB_SPRINTF_NOFLOAT
+and you'll save 4K of code space.
+
+64-BIT INTS:
+============
+This library also supports 64-bit integers and you can use MSVC style or
+GCC style indicators (%I64d or %lld).  It supports the C99 specifiers
+for size_t and ptr_diff_t (%jd %zd) as well.
+
+EXTRAS:
+=======
+Like some GCCs, for integers and floats, you can use a ' (single quote)
+specifier and commas will be inserted on the thousands: "%'d" on 12345
+would print 12,345.
+
+For integers and floats, you can use a "$" specifier and the number
+will be converted to float and then divided to get kilo, mega, giga or
+tera and then printed, so "%$d" 1000 is "1.0 k", "%$.2d" 2536000 is
+"2.53 M", etc. For byte values, use two $:s, like "%$$d" to turn
+2536000 to "2.42 Mi". If you prefer JEDEC suffixes to SI ones, use three
+$:s: "%$$$d" -> "2.42 M". To remove the space between the number and the
+suffix, add "_" specifier: "%_$d" -> "2.53M".
+
+In addition to octal and hexadecimal conversions, you can print
+integers in binary: "%b" for 256 would print 100.
+
+PERFORMANCE vs MSVC 2008 32-/64-bit (GCC is even slower than MSVC):
+===================================================================
+"%d" across all 32-bit ints (4.8x/4.0x faster than 32-/64-bit MSVC)
+"%24d" across all 32-bit ints (4.5x/4.2x faster)
+"%x" across all 32-bit ints (4.5x/3.8x faster)
+"%08x" across all 32-bit ints (4.3x/3.8x faster)
+"%f" across e-10 to e+10 floats (7.3x/6.0x faster)
+"%e" across e-10 to e+10 floats (8.1x/6.0x faster)
+"%g" across e-10 to e+10 floats (10.0x/7.1x faster)
+"%f" for values near e-300 (7.9x/6.5x faster)
+"%f" for values near e+300 (10.0x/9.1x faster)
+"%e" for values near e-300 (10.1x/7.0x faster)
+"%e" for values near e+300 (9.2x/6.0x faster)
+"%.320f" for values near e-300 (12.6x/11.2x faster)
+"%a" for random values (8.6x/4.3x faster)
+"%I64d" for 64-bits with 32-bit values (4.8x/3.4x faster)
+"%I64d" for 64-bits > 32-bit values (4.9x/5.5x faster)
+"%s%s%s" for 64 char strings (7.1x/7.3x faster)
+"...512 char string..." ( 35.0x/32.5x faster!)
+*/
+
+#if defined(__clang__)
+ #if defined(__has_feature) && defined(__has_attribute)
+  #if __has_feature(address_sanitizer)
+   #if __has_attribute(__no_sanitize__)
+    #define STBSP__ASAN __attribute__((__no_sanitize__("address")))
+   #elif __has_attribute(__no_sanitize_address__)
+    #define STBSP__ASAN __attribute__((__no_sanitize_address__))
+   #elif __has_attribute(__no_address_safety_analysis__)
+    #define STBSP__ASAN __attribute__((__no_address_safety_analysis__))
+   #endif
+  #endif
+ #endif
+#elif defined(__GNUC__) && (__GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
+ #if defined(__SANITIZE_ADDRESS__) && __SANITIZE_ADDRESS__
+  #define STBSP__ASAN __attribute__((__no_sanitize_address__))
+ #endif
+#endif
+
+#ifndef STBSP__ASAN
+#define STBSP__ASAN
+#endif
+
+#ifdef STB_SPRINTF_STATIC
+#define STBSP__PUBLICDEC static
+#define STBSP__PUBLICDEF static STBSP__ASAN
+#else
+#ifdef __cplusplus
+#define STBSP__PUBLICDEC extern "C"
+#define STBSP__PUBLICDEF extern "C" STBSP__ASAN
+#else
+#define STBSP__PUBLICDEC extern
+#define STBSP__PUBLICDEF STBSP__ASAN
+#endif
+#endif
+
+#if defined(__has_attribute)
+ #if __has_attribute(format)
+   #define STBSP__ATTRIBUTE_FORMAT(fmt,va) __attribute__((format(printf,fmt,va)))
+ #endif
+#endif
+
+#ifndef STBSP__ATTRIBUTE_FORMAT
+#define STBSP__ATTRIBUTE_FORMAT(fmt,va)
+#endif
+
+#ifdef _MSC_VER
+#define STBSP__NOTUSED(v)  (void)(v)
+#else
+#define STBSP__NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#include <stdarg.h> // for va_arg(), va_list()
+#include <stddef.h> // size_t, ptrdiff_t
+
+#ifndef STB_SPRINTF_MIN
+#define STB_SPRINTF_MIN 512 // how many characters per callback
+#endif
+typedef char *STBSP_SPRINTFCB(const char *buf, void *user, int len);
+
+#ifndef STB_SPRINTF_DECORATE
+#define STB_SPRINTF_DECORATE(name) stbsp_##name // define this before including if you want to change the names
+#endif
+
+STBSP__PUBLICDEC int STB_SPRINTF_DECORATE(vsprintf)(char *buf, char const *fmt, va_list va);
+STBSP__PUBLICDEC int STB_SPRINTF_DECORATE(vsnprintf)(char *buf, int count, char const *fmt, va_list va);
+STBSP__PUBLICDEC int STB_SPRINTF_DECORATE(sprintf)(char *buf, char const *fmt, ...) STBSP__ATTRIBUTE_FORMAT(2,3);
+STBSP__PUBLICDEC int STB_SPRINTF_DECORATE(snprintf)(char *buf, int count, char const *fmt, ...) STBSP__ATTRIBUTE_FORMAT(3,4);
+
+STBSP__PUBLICDEC int STB_SPRINTF_DECORATE(vsprintfcb)(STBSP_SPRINTFCB *callback, void *user, char *buf, char const *fmt, va_list va);
+STBSP__PUBLICDEC void STB_SPRINTF_DECORATE(set_separators)(char comma, char period);
+
+#endif // STB_SPRINTF_H_INCLUDE
+
+#ifdef STB_SPRINTF_IMPLEMENTATION
+
+#define stbsp__uint32 unsigned int
+#define stbsp__int32 signed int
+
+#ifdef _MSC_VER
+#define stbsp__uint64 unsigned __int64
+#define stbsp__int64 signed __int64
+#else
+#define stbsp__uint64 unsigned long long
+#define stbsp__int64 signed long long
+#endif
+#define stbsp__uint16 unsigned short
+
+#ifndef stbsp__uintptr
+#if defined(__ppc64__) || defined(__powerpc64__) || defined(__aarch64__) || defined(_M_X64) || defined(__x86_64__) || defined(__x86_64) || defined(__s390x__)
+#define stbsp__uintptr stbsp__uint64
+#else
+#define stbsp__uintptr stbsp__uint32
+#endif
+#endif
+
+#ifndef STB_SPRINTF_MSVC_MODE // used for MSVC2013 and earlier (MSVC2015 matches GCC)
+#if defined(_MSC_VER) && (_MSC_VER < 1900)
+#define STB_SPRINTF_MSVC_MODE
+#endif
+#endif
+
+#ifdef STB_SPRINTF_NOUNALIGNED // define this before inclusion to force stbsp_sprintf to always use aligned accesses
+#define STBSP__UNALIGNED(code)
+#else
+#define STBSP__UNALIGNED(code) code
+#endif
+
+#ifndef STB_SPRINTF_NOFLOAT
+// internal float utility functions
+static stbsp__int32 stbsp__real_to_str(char const **start, stbsp__uint32 *len, char *out, stbsp__int32 *decimal_pos, double value, stbsp__uint32 frac_digits);
+static stbsp__int32 stbsp__real_to_parts(stbsp__int64 *bits, stbsp__int32 *expo, double value);
+#define STBSP__SPECIAL 0x7000
+#endif
+
+static char stbsp__period = '.';
+static char stbsp__comma = ',';
+static struct
+{
+   short temp; // force next field to be 2-byte aligned
+   char pair[201];
+} stbsp__digitpair =
+{
+  0,
+   "00010203040506070809101112131415161718192021222324"
+   "25262728293031323334353637383940414243444546474849"
+   "50515253545556575859606162636465666768697071727374"
+   "75767778798081828384858687888990919293949596979899"
+};
+
+STBSP__PUBLICDEF void STB_SPRINTF_DECORATE(set_separators)(char pcomma, char pperiod)
+{
+   stbsp__period = pperiod;
+   stbsp__comma = pcomma;
+}
+
+#define STBSP__LEFTJUST 1
+#define STBSP__LEADINGPLUS 2
+#define STBSP__LEADINGSPACE 4
+#define STBSP__LEADING_0X 8
+#define STBSP__LEADINGZERO 16
+#define STBSP__INTMAX 32
+#define STBSP__TRIPLET_COMMA 64
+#define STBSP__NEGATIVE 128
+#define STBSP__METRIC_SUFFIX 256
+#define STBSP__HALFWIDTH 512
+#define STBSP__METRIC_NOSPACE 1024
+#define STBSP__METRIC_1024 2048
+#define STBSP__METRIC_JEDEC 4096
+
+static void stbsp__lead_sign(stbsp__uint32 fl, char *sign)
+{
+   sign[0] = 0;
+   if (fl & STBSP__NEGATIVE) {
+      sign[0] = 1;
+      sign[1] = '-';
+   } else if (fl & STBSP__LEADINGSPACE) {
+      sign[0] = 1;
+      sign[1] = ' ';
+   } else if (fl & STBSP__LEADINGPLUS) {
+      sign[0] = 1;
+      sign[1] = '+';
+   }
+}
+
+static STBSP__ASAN stbsp__uint32 stbsp__strlen_limited(char const *s, stbsp__uint32 limit)
+{
+   char const * sn = s;
+
+   // get up to 4-byte alignment
+   for (;;) {
+      if (((stbsp__uintptr)sn & 3) == 0)
+         break;
+
+      if (!limit || *sn == 0)
+         return (stbsp__uint32)(sn - s);
+
+      ++sn;
+      --limit;
+   }
+
+   // scan over 4 bytes at a time to find terminating 0
+   // this will intentionally scan up to 3 bytes past the end of buffers,
+   // but becase it works 4B aligned, it will never cross page boundaries
+   // (hence the STBSP__ASAN markup; the over-read here is intentional
+   // and harmless)
+   while (limit >= 4) {
+      stbsp__uint32 v = *(stbsp__uint32 *)sn;
+      // bit hack to find if there's a 0 byte in there
+      if ((v - 0x01010101) & (~v) & 0x80808080UL)
+         break;
+
+      sn += 4;
+      limit -= 4;
+   }
+
+   // handle the last few characters to find actual size
+   while (limit && *sn) {
+      ++sn;
+      --limit;
+   }
+
+   return (stbsp__uint32)(sn - s);
+}
+
+STBSP__PUBLICDEF int STB_SPRINTF_DECORATE(vsprintfcb)(STBSP_SPRINTFCB *callback, void *user, char *buf, char const *fmt, va_list va)
+{
+   static char hex[] = "0123456789abcdefxp";
+   static char hexu[] = "0123456789ABCDEFXP";
+   char *bf;
+   char const *f;
+   int tlen = 0;
+
+   bf = buf;
+   f = fmt;
+   for (;;) {
+      stbsp__int32 fw, pr, tz;
+      stbsp__uint32 fl;
+
+      // macros for the callback buffer stuff
+      #define stbsp__chk_cb_bufL(bytes)                        \
+         {                                                     \
+            int len = (int)(bf - buf);                         \
+            if ((len + (bytes)) >= STB_SPRINTF_MIN) {          \
+               tlen += len;                                    \
+               if (0 == (bf = buf = callback(buf, user, len))) \
+                  goto done;                                   \
+            }                                                  \
+         }
+      #define stbsp__chk_cb_buf(bytes)    \
+         {                                \
+            if (callback) {               \
+               stbsp__chk_cb_bufL(bytes); \
+            }                             \
+         }
+      #define stbsp__flush_cb()                      \
+         {                                           \
+            stbsp__chk_cb_bufL(STB_SPRINTF_MIN - 1); \
+         } // flush if there is even one byte in the buffer
+      #define stbsp__cb_buf_clamp(cl, v)                \
+         cl = v;                                        \
+         if (callback) {                                \
+            int lg = STB_SPRINTF_MIN - (int)(bf - buf); \
+            if (cl > lg)                                \
+               cl = lg;                                 \
+         }
+
+      // fast copy everything up to the next % (or end of string)
+      for (;;) {
+         while (((stbsp__uintptr)f) & 3) {
+         schk1:
+            if (f[0] == '%')
+               goto scandd;
+         schk2:
+            if (f[0] == 0)
+               goto endfmt;
+            stbsp__chk_cb_buf(1);
+            *bf++ = f[0];
+            ++f;
+         }
+         for (;;) {
+            // Check if the next 4 bytes contain %(0x25) or end of string.
+            // Using the 'hasless' trick:
+            // https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord
+            stbsp__uint32 v, c;
+            v = *(stbsp__uint32 *)f;
+            c = (~v) & 0x80808080;
+            if (((v ^ 0x25252525) - 0x01010101) & c)
+               goto schk1;
+            if ((v - 0x01010101) & c)
+               goto schk2;
+            if (callback)
+               if ((STB_SPRINTF_MIN - (int)(bf - buf)) < 4)
+                  goto schk1;
+            #ifdef STB_SPRINTF_NOUNALIGNED
+                if(((stbsp__uintptr)bf) & 3) {
+                    bf[0] = f[0];
+                    bf[1] = f[1];
+                    bf[2] = f[2];
+                    bf[3] = f[3];
+                } else
+            #endif
+            {
+                *(stbsp__uint32 *)bf = v;
+            }
+            bf += 4;
+            f += 4;
+         }
+      }
+   scandd:
+
+      ++f;
+
+      // ok, we have a percent, read the modifiers first
+      fw = 0;
+      pr = -1;
+      fl = 0;
+      tz = 0;
+
+      // flags
+      for (;;) {
+         switch (f[0]) {
+         // if we have left justify
+         case '-':
+            fl |= STBSP__LEFTJUST;
+            ++f;
+            continue;
+         // if we have leading plus
+         case '+':
+            fl |= STBSP__LEADINGPLUS;
+            ++f;
+            continue;
+         // if we have leading space
+         case ' ':
+            fl |= STBSP__LEADINGSPACE;
+            ++f;
+            continue;
+         // if we have leading 0x
+         case '#':
+            fl |= STBSP__LEADING_0X;
+            ++f;
+            continue;
+         // if we have thousand commas
+         case '\'':
+            fl |= STBSP__TRIPLET_COMMA;
+            ++f;
+            continue;
+         // if we have kilo marker (none->kilo->kibi->jedec)
+         case '$':
+            if (fl & STBSP__METRIC_SUFFIX) {
+               if (fl & STBSP__METRIC_1024) {
+                  fl |= STBSP__METRIC_JEDEC;
+               } else {
+                  fl |= STBSP__METRIC_1024;
+               }
+            } else {
+               fl |= STBSP__METRIC_SUFFIX;
+            }
+            ++f;
+            continue;
+         // if we don't want space between metric suffix and number
+         case '_':
+            fl |= STBSP__METRIC_NOSPACE;
+            ++f;
+            continue;
+         // if we have leading zero
+         case '0':
+            fl |= STBSP__LEADINGZERO;
+            ++f;
+            goto flags_done;
+         default: goto flags_done;
+         }
+      }
+   flags_done:
+
+      // get the field width
+      if (f[0] == '*') {
+         fw = va_arg(va, stbsp__uint32);
+         ++f;
+      } else {
+         while ((f[0] >= '0') && (f[0] <= '9')) {
+            fw = fw * 10 + f[0] - '0';
+            f++;
+         }
+      }
+      // get the precision
+      if (f[0] == '.') {
+         ++f;
+         if (f[0] == '*') {
+            pr = va_arg(va, stbsp__uint32);
+            ++f;
+         } else {
+            pr = 0;
+            while ((f[0] >= '0') && (f[0] <= '9')) {
+               pr = pr * 10 + f[0] - '0';
+               f++;
+            }
+         }
+      }
+
+      // handle integer size overrides
+      switch (f[0]) {
+      // are we halfwidth?
+      case 'h':
+         fl |= STBSP__HALFWIDTH;
+         ++f;
+         if (f[0] == 'h')
+            ++f;  // QUARTERWIDTH
+         break;
+      // are we 64-bit (unix style)
+      case 'l':
+         fl |= ((sizeof(long) == 8) ? STBSP__INTMAX : 0);
+         ++f;
+         if (f[0] == 'l') {
+            fl |= STBSP__INTMAX;
+            ++f;
+         }
+         break;
+      // are we 64-bit on intmax? (c99)
+      case 'j':
+         fl |= (sizeof(size_t) == 8) ? STBSP__INTMAX : 0;
+         ++f;
+         break;
+      // are we 64-bit on size_t or ptrdiff_t? (c99)
+      case 'z':
+         fl |= (sizeof(ptrdiff_t) == 8) ? STBSP__INTMAX : 0;
+         ++f;
+         break;
+      case 't':
+         fl |= (sizeof(ptrdiff_t) == 8) ? STBSP__INTMAX : 0;
+         ++f;
+         break;
+      // are we 64-bit (msft style)
+      case 'I':
+         if ((f[1] == '6') && (f[2] == '4')) {
+            fl |= STBSP__INTMAX;
+            f += 3;
+         } else if ((f[1] == '3') && (f[2] == '2')) {
+            f += 3;
+         } else {
+            fl |= ((sizeof(void *) == 8) ? STBSP__INTMAX : 0);
+            ++f;
+         }
+         break;
+      default: break;
+      }
+
+      // handle each replacement
+      switch (f[0]) {
+         #define STBSP__NUMSZ 512 // big enough for e308 (with commas) or e-307
+         char num[STBSP__NUMSZ];
+         char lead[8];
+         char tail[8];
+         char *s;
+         char const *h;
+         stbsp__uint32 l, n, cs;
+         stbsp__uint64 n64;
+#ifndef STB_SPRINTF_NOFLOAT
+         double fv;
+#endif
+         stbsp__int32 dp;
+         char const *sn;
+
+      case 's':
+         // get the string
+         s = va_arg(va, char *);
+         if (s == 0)
+            s = (char *)"null";
+         // get the length, limited to desired precision
+         // always limit to ~0u chars since our counts are 32b
+         l = stbsp__strlen_limited(s, (pr >= 0) ? pr : ~0u);
+         lead[0] = 0;
+         tail[0] = 0;
+         pr = 0;
+         dp = 0;
+         cs = 0;
+         // copy the string in
+         goto scopy;
+
+      case 'c': // char
+         // get the character
+         s = num + STBSP__NUMSZ - 1;
+         *s = (char)va_arg(va, int);
+         l = 1;
+         lead[0] = 0;
+         tail[0] = 0;
+         pr = 0;
+         dp = 0;
+         cs = 0;
+         goto scopy;
+
+      case 'n': // weird write-bytes specifier
+      {
+         int *d = va_arg(va, int *);
+         *d = tlen + (int)(bf - buf);
+      } break;
+
+#ifdef STB_SPRINTF_NOFLOAT
+      case 'A':              // float
+      case 'a':              // hex float
+      case 'G':              // float
+      case 'g':              // float
+      case 'E':              // float
+      case 'e':              // float
+      case 'f':              // float
+         va_arg(va, double); // eat it
+         s = (char *)"No float";
+         l = 8;
+         lead[0] = 0;
+         tail[0] = 0;
+         pr = 0;
+         cs = 0;
+         STBSP__NOTUSED(dp);
+         goto scopy;
+#else
+      case 'A': // hex float
+      case 'a': // hex float
+         h = (f[0] == 'A') ? hexu : hex;
+         fv = va_arg(va, double);
+         if (pr == -1)
+            pr = 6; // default is 6
+         // read the double into a string
+         if (stbsp__real_to_parts((stbsp__int64 *)&n64, &dp, fv))
+            fl |= STBSP__NEGATIVE;
+
+         s = num + 64;
+
+         stbsp__lead_sign(fl, lead);
+
+         if (dp == -1023)
+            dp = (n64) ? -1022 : 0;
+         else
+            n64 |= (((stbsp__uint64)1) << 52);
+         n64 <<= (64 - 56);
+         if (pr < 15)
+            n64 += ((((stbsp__uint64)8) << 56) >> (pr * 4));
+// add leading chars
+
+#ifdef STB_SPRINTF_MSVC_MODE
+         *s++ = '0';
+         *s++ = 'x';
+#else
+         lead[1 + lead[0]] = '0';
+         lead[2 + lead[0]] = 'x';
+         lead[0] += 2;
+#endif
+         *s++ = h[(n64 >> 60) & 15];
+         n64 <<= 4;
+         if (pr)
+            *s++ = stbsp__period;
+         sn = s;
+
+         // print the bits
+         n = pr;
+         if (n > 13)
+            n = 13;
+         if (pr > (stbsp__int32)n)
+            tz = pr - n;
+         pr = 0;
+         while (n--) {
+            *s++ = h[(n64 >> 60) & 15];
+            n64 <<= 4;
+         }
+
+         // print the expo
+         tail[1] = h[17];
+         if (dp < 0) {
+            tail[2] = '-';
+            dp = -dp;
+         } else
+            tail[2] = '+';
+         n = (dp >= 1000) ? 6 : ((dp >= 100) ? 5 : ((dp >= 10) ? 4 : 3));
+         tail[0] = (char)n;
+         for (;;) {
+            tail[n] = '0' + dp % 10;
+            if (n <= 3)
+               break;
+            --n;
+            dp /= 10;
+         }
+
+         dp = (int)(s - sn);
+         l = (int)(s - (num + 64));
+         s = num + 64;
+         cs = 1 + (3 << 24);
+         goto scopy;
+
+      case 'G': // float
+      case 'g': // float
+         h = (f[0] == 'G') ? hexu : hex;
+         fv = va_arg(va, double);
+         if (pr == -1)
+            pr = 6;
+         else if (pr == 0)
+            pr = 1; // default is 6
+         // read the double into a string
+         if (stbsp__real_to_str(&sn, &l, num, &dp, fv, (pr - 1) | 0x80000000))
+            fl |= STBSP__NEGATIVE;
+
+         // clamp the precision and delete extra zeros after clamp
+         n = pr;
+         if (l > (stbsp__uint32)pr)
+            l = pr;
+         while ((l > 1) && (pr) && (sn[l - 1] == '0')) {
+            --pr;
+            --l;
+         }
+
+         // should we use %e
+         if ((dp <= -4) || (dp > (stbsp__int32)n)) {
+            if (pr > (stbsp__int32)l)
+               pr = l - 1;
+            else if (pr)
+               --pr; // when using %e, there is one digit before the decimal
+            goto doexpfromg;
+         }
+         // this is the insane action to get the pr to match %g semantics for %f
+         if (dp > 0) {
+            pr = (dp < (stbsp__int32)l) ? l - dp : 0;
+         } else {
+            pr = -dp + ((pr > (stbsp__int32)l) ? (stbsp__int32) l : pr);
+         }
+         goto dofloatfromg;
+
+      case 'E': // float
+      case 'e': // float
+         h = (f[0] == 'E') ? hexu : hex;
+         fv = va_arg(va, double);
+         if (pr == -1)
+            pr = 6; // default is 6
+         // read the double into a string
+         if (stbsp__real_to_str(&sn, &l, num, &dp, fv, pr | 0x80000000))
+            fl |= STBSP__NEGATIVE;
+      doexpfromg:
+         tail[0] = 0;
+         stbsp__lead_sign(fl, lead);
+         if (dp == STBSP__SPECIAL) {
+            s = (char *)sn;
+            cs = 0;
+            pr = 0;
+            goto scopy;
+         }
+         s = num + 64;
+         // handle leading chars
+         *s++ = sn[0];
+
+         if (pr)
+            *s++ = stbsp__period;
+
+         // handle after decimal
+         if ((l - 1) > (stbsp__uint32)pr)
+            l = pr + 1;
+         for (n = 1; n < l; n++)
+            *s++ = sn[n];
+         // trailing zeros
+         tz = pr - (l - 1);
+         pr = 0;
+         // dump expo
+         tail[1] = h[0xe];
+         dp -= 1;
+         if (dp < 0) {
+            tail[2] = '-';
+            dp = -dp;
+         } else
+            tail[2] = '+';
+#ifdef STB_SPRINTF_MSVC_MODE
+         n = 5;
+#else
+         n = (dp >= 100) ? 5 : 4;
+#endif
+         tail[0] = (char)n;
+         for (;;) {
+            tail[n] = '0' + dp % 10;
+            if (n <= 3)
+               break;
+            --n;
+            dp /= 10;
+         }
+         cs = 1 + (3 << 24); // how many tens
+         goto flt_lead;
+
+      case 'f': // float
+         fv = va_arg(va, double);
+      doafloat:
+         // do kilos
+         if (fl & STBSP__METRIC_SUFFIX) {
+            double divisor;
+            divisor = 1000.0f;
+            if (fl & STBSP__METRIC_1024)
+               divisor = 1024.0;
+            while (fl < 0x4000000) {
+               if ((fv < divisor) && (fv > -divisor))
+                  break;
+               fv /= divisor;
+               fl += 0x1000000;
+            }
+         }
+         if (pr == -1)
+            pr = 6; // default is 6
+         // read the double into a string
+         if (stbsp__real_to_str(&sn, &l, num, &dp, fv, pr))
+            fl |= STBSP__NEGATIVE;
+      dofloatfromg:
+         tail[0] = 0;
+         stbsp__lead_sign(fl, lead);
+         if (dp == STBSP__SPECIAL) {
+            s = (char *)sn;
+            cs = 0;
+            pr = 0;
+            goto scopy;
+         }
+         s = num + 64;
+
+         // handle the three decimal varieties
+         if (dp <= 0) {
+            stbsp__int32 i;
+            // handle 0.000*000xxxx
+            *s++ = '0';
+            if (pr)
+               *s++ = stbsp__period;
+            n = -dp;
+            if ((stbsp__int32)n > pr)
+               n = pr;
+            i = n;
+            while (i) {
+               if ((((stbsp__uintptr)s) & 3) == 0)
+                  break;
+               *s++ = '0';
+               --i;
+            }
+            while (i >= 4) {
+               *(stbsp__uint32 *)s = 0x30303030;
+               s += 4;
+               i -= 4;
+            }
+            while (i) {
+               *s++ = '0';
+               --i;
+            }
+            if ((stbsp__int32)(l + n) > pr)
+               l = pr - n;
+            i = l;
+            while (i) {
+               *s++ = *sn++;
+               --i;
+            }
+            tz = pr - (n + l);
+            cs = 1 + (3 << 24); // how many tens did we write (for commas below)
+         } else {
+            cs = (fl & STBSP__TRIPLET_COMMA) ? ((600 - (stbsp__uint32)dp) % 3) : 0;
+            if ((stbsp__uint32)dp >= l) {
+               // handle xxxx000*000.0
+               n = 0;
+               for (;;) {
+                  if ((fl & STBSP__TRIPLET_COMMA) && (++cs == 4)) {
+                     cs = 0;
+                     *s++ = stbsp__comma;
+                  } else {
+                     *s++ = sn[n];
+                     ++n;
+                     if (n >= l)
+                        break;
+                  }
+               }
+               if (n < (stbsp__uint32)dp) {
+                  n = dp - n;
+                  if ((fl & STBSP__TRIPLET_COMMA) == 0) {
+                     while (n) {
+                        if ((((stbsp__uintptr)s) & 3) == 0)
+                           break;
+                        *s++ = '0';
+                        --n;
+                     }
+                     while (n >= 4) {
+                        *(stbsp__uint32 *)s = 0x30303030;
+                        s += 4;
+                        n -= 4;
+                     }
+                  }
+                  while (n) {
+                     if ((fl & STBSP__TRIPLET_COMMA) && (++cs == 4)) {
+                        cs = 0;
+                        *s++ = stbsp__comma;
+                     } else {
+                        *s++ = '0';
+                        --n;
+                     }
+                  }
+               }
+               cs = (int)(s - (num + 64)) + (3 << 24); // cs is how many tens
+               if (pr) {
+                  *s++ = stbsp__period;
+                  tz = pr;
+               }
+            } else {
+               // handle xxxxx.xxxx000*000
+               n = 0;
+               for (;;) {
+                  if ((fl & STBSP__TRIPLET_COMMA) && (++cs == 4)) {
+                     cs = 0;
+                     *s++ = stbsp__comma;
+                  } else {
+                     *s++ = sn[n];
+                     ++n;
+                     if (n >= (stbsp__uint32)dp)
+                        break;
+                  }
+               }
+               cs = (int)(s - (num + 64)) + (3 << 24); // cs is how many tens
+               if (pr)
+                  *s++ = stbsp__period;
+               if ((l - dp) > (stbsp__uint32)pr)
+                  l = pr + dp;
+               while (n < l) {
+                  *s++ = sn[n];
+                  ++n;
+               }
+               tz = pr - (l - dp);
+            }
+         }
+         pr = 0;
+
+         // handle k,m,g,t
+         if (fl & STBSP__METRIC_SUFFIX) {
+            char idx;
+            idx = 1;
+            if (fl & STBSP__METRIC_NOSPACE)
+               idx = 0;
+            tail[0] = idx;
+            tail[1] = ' ';
+            {
+               if (fl >> 24) { // SI kilo is 'k', JEDEC and SI kibits are 'K'.
+                  if (fl & STBSP__METRIC_1024)
+                     tail[idx + 1] = "_KMGT"[fl >> 24];
+                  else
+                     tail[idx + 1] = "_kMGT"[fl >> 24];
+                  idx++;
+                  // If printing kibits and not in jedec, add the 'i'.
+                  if (fl & STBSP__METRIC_1024 && !(fl & STBSP__METRIC_JEDEC)) {
+                     tail[idx + 1] = 'i';
+                     idx++;
+                  }
+                  tail[0] = idx;
+               }
+            }
+         };
+
+      flt_lead:
+         // get the length that we copied
+         l = (stbsp__uint32)(s - (num + 64));
+         s = num + 64;
+         goto scopy;
+#endif
+
+      case 'B': // upper binary
+      case 'b': // lower binary
+         h = (f[0] == 'B') ? hexu : hex;
+         lead[0] = 0;
+         if (fl & STBSP__LEADING_0X) {
+            lead[0] = 2;
+            lead[1] = '0';
+            lead[2] = h[0xb];
+         }
+         l = (8 << 4) | (1 << 8);
+         goto radixnum;
+
+      case 'o': // octal
+         h = hexu;
+         lead[0] = 0;
+         if (fl & STBSP__LEADING_0X) {
+            lead[0] = 1;
+            lead[1] = '0';
+         }
+         l = (3 << 4) | (3 << 8);
+         goto radixnum;
+
+      case 'p': // pointer
+         fl |= (sizeof(void *) == 8) ? STBSP__INTMAX : 0;
+         pr = sizeof(void *) * 2;
+         fl &= ~STBSP__LEADINGZERO; // 'p' only prints the pointer with zeros
+                                    // fall through - to X
+
+      case 'X': // upper hex
+      case 'x': // lower hex
+         h = (f[0] == 'X') ? hexu : hex;
+         l = (4 << 4) | (4 << 8);
+         lead[0] = 0;
+         if (fl & STBSP__LEADING_0X) {
+            lead[0] = 2;
+            lead[1] = '0';
+            lead[2] = h[16];
+         }
+      radixnum:
+         // get the number
+         if (fl & STBSP__INTMAX)
+            n64 = va_arg(va, stbsp__uint64);
+         else
+            n64 = va_arg(va, stbsp__uint32);
+
+         s = num + STBSP__NUMSZ;
+         dp = 0;
+         // clear tail, and clear leading if value is zero
+         tail[0] = 0;
+         if (n64 == 0) {
+            lead[0] = 0;
+            if (pr == 0) {
+               l = 0;
+               cs = 0;
+               goto scopy;
+            }
+         }
+         // convert to string
+         for (;;) {
+            *--s = h[n64 & ((1 << (l >> 8)) - 1)];
+            n64 >>= (l >> 8);
+            if (!((n64) || ((stbsp__int32)((num + STBSP__NUMSZ) - s) < pr)))
+               break;
+            if (fl & STBSP__TRIPLET_COMMA) {
+               ++l;
+               if ((l & 15) == ((l >> 4) & 15)) {
+                  l &= ~15;
+                  *--s = stbsp__comma;
+               }
+            }
+         };
+         // get the tens and the comma pos
+         cs = (stbsp__uint32)((num + STBSP__NUMSZ) - s) + ((((l >> 4) & 15)) << 24);
+         // get the length that we copied
+         l = (stbsp__uint32)((num + STBSP__NUMSZ) - s);
+         // copy it
+         goto scopy;
+
+      case 'u': // unsigned
+      case 'i':
+      case 'd': // integer
+         // get the integer and abs it
+         if (fl & STBSP__INTMAX) {
+            stbsp__int64 i64 = va_arg(va, stbsp__int64);
+            n64 = (stbsp__uint64)i64;
+            if ((f[0] != 'u') && (i64 < 0)) {
+               n64 = (stbsp__uint64)-i64;
+               fl |= STBSP__NEGATIVE;
+            }
+         } else {
+            stbsp__int32 i = va_arg(va, stbsp__int32);
+            n64 = (stbsp__uint32)i;
+            if ((f[0] != 'u') && (i < 0)) {
+               n64 = (stbsp__uint32)-i;
+               fl |= STBSP__NEGATIVE;
+            }
+         }
+
+#ifndef STB_SPRINTF_NOFLOAT
+         if (fl & STBSP__METRIC_SUFFIX) {
+            if (n64 < 1024)
+               pr = 0;
+            else if (pr == -1)
+               pr = 1;
+            fv = (double)(stbsp__int64)n64;
+            goto doafloat;
+         }
+#endif
+
+         // convert to string
+         s = num + STBSP__NUMSZ;
+         l = 0;
+
+         for (;;) {
+            // do in 32-bit chunks (avoid lots of 64-bit divides even with constant denominators)
+            char *o = s - 8;
+            if (n64 >= 100000000) {
+               n = (stbsp__uint32)(n64 % 100000000);
+               n64 /= 100000000;
+            } else {
+               n = (stbsp__uint32)n64;
+               n64 = 0;
+            }
+            if ((fl & STBSP__TRIPLET_COMMA) == 0) {
+               do {
+                  s -= 2;
+                  *(stbsp__uint16 *)s = *(stbsp__uint16 *)&stbsp__digitpair.pair[(n % 100) * 2];
+                  n /= 100;
+               } while (n);
+            }
+            while (n) {
+               if ((fl & STBSP__TRIPLET_COMMA) && (l++ == 3)) {
+                  l = 0;
+                  *--s = stbsp__comma;
+                  --o;
+               } else {
+                  *--s = (char)(n % 10) + '0';
+                  n /= 10;
+               }
+            }
+            if (n64 == 0) {
+               if ((s[0] == '0') && (s != (num + STBSP__NUMSZ)))
+                  ++s;
+               break;
+            }
+            while (s != o)
+               if ((fl & STBSP__TRIPLET_COMMA) && (l++ == 3)) {
+                  l = 0;
+                  *--s = stbsp__comma;
+                  --o;
+               } else {
+                  *--s = '0';
+               }
+         }
+
+         tail[0] = 0;
+         stbsp__lead_sign(fl, lead);
+
+         // get the length that we copied
+         l = (stbsp__uint32)((num + STBSP__NUMSZ) - s);
+         if (l == 0) {
+            *--s = '0';
+            l = 1;
+         }
+         cs = l + (3 << 24);
+         if (pr < 0)
+            pr = 0;
+
+      scopy:
+         // get fw=leading/trailing space, pr=leading zeros
+         if (pr < (stbsp__int32)l)
+            pr = l;
+         n = pr + lead[0] + tail[0] + tz;
+         if (fw < (stbsp__int32)n)
+            fw = n;
+         fw -= n;
+         pr -= l;
+
+         // handle right justify and leading zeros
+         if ((fl & STBSP__LEFTJUST) == 0) {
+            if (fl & STBSP__LEADINGZERO) // if leading zeros, everything is in pr
+            {
+               pr = (fw > pr) ? fw : pr;
+               fw = 0;
+            } else {
+               fl &= ~STBSP__TRIPLET_COMMA; // if no leading zeros, then no commas
+            }
+         }
+
+         // copy the spaces and/or zeros
+         if (fw + pr) {
+            stbsp__int32 i;
+            stbsp__uint32 c;
+
+            // copy leading spaces (or when doing %8.4d stuff)
+            if ((fl & STBSP__LEFTJUST) == 0)
+               while (fw > 0) {
+                  stbsp__cb_buf_clamp(i, fw);
+                  fw -= i;
+                  while (i) {
+                     if ((((stbsp__uintptr)bf) & 3) == 0)
+                        break;
+                     *bf++ = ' ';
+                     --i;
+                  }
+                  while (i >= 4) {
+                     *(stbsp__uint32 *)bf = 0x20202020;
+                     bf += 4;
+                     i -= 4;
+                  }
+                  while (i) {
+                     *bf++ = ' ';
+                     --i;
+                  }
+                  stbsp__chk_cb_buf(1);
+               }
+
+            // copy leader
+            sn = lead + 1;
+            while (lead[0]) {
+               stbsp__cb_buf_clamp(i, lead[0]);
+               lead[0] -= (char)i;
+               while (i) {
+                  *bf++ = *sn++;
+                  --i;
+               }
+               stbsp__chk_cb_buf(1);
+            }
+
+            // copy leading zeros
+            c = cs >> 24;
+            cs &= 0xffffff;
+            cs = (fl & STBSP__TRIPLET_COMMA) ? ((stbsp__uint32)(c - ((pr + cs) % (c + 1)))) : 0;
+            while (pr > 0) {
+               stbsp__cb_buf_clamp(i, pr);
+               pr -= i;
+               if ((fl & STBSP__TRIPLET_COMMA) == 0) {
+                  while (i) {
+                     if ((((stbsp__uintptr)bf) & 3) == 0)
+                        break;
+                     *bf++ = '0';
+                     --i;
+                  }
+                  while (i >= 4) {
+                     *(stbsp__uint32 *)bf = 0x30303030;
+                     bf += 4;
+                     i -= 4;
+                  }
+               }
+               while (i) {
+                  if ((fl & STBSP__TRIPLET_COMMA) && (cs++ == c)) {
+                     cs = 0;
+                     *bf++ = stbsp__comma;
+                  } else
+                     *bf++ = '0';
+                  --i;
+               }
+               stbsp__chk_cb_buf(1);
+            }
+         }
+
+         // copy leader if there is still one
+         sn = lead + 1;
+         while (lead[0]) {
+            stbsp__int32 i;
+            stbsp__cb_buf_clamp(i, lead[0]);
+            lead[0] -= (char)i;
+            while (i) {
+               *bf++ = *sn++;
+               --i;
+            }
+            stbsp__chk_cb_buf(1);
+         }
+
+         // copy the string
+         n = l;
+         while (n) {
+            stbsp__int32 i;
+            stbsp__cb_buf_clamp(i, n);
+            n -= i;
+            STBSP__UNALIGNED(while (i >= 4) {
+               *(stbsp__uint32 volatile *)bf = *(stbsp__uint32 volatile *)s;
+               bf += 4;
+               s += 4;
+               i -= 4;
+            })
+            while (i) {
+               *bf++ = *s++;
+               --i;
+            }
+            stbsp__chk_cb_buf(1);
+         }
+
+         // copy trailing zeros
+         while (tz) {
+            stbsp__int32 i;
+            stbsp__cb_buf_clamp(i, tz);
+            tz -= i;
+            while (i) {
+               if ((((stbsp__uintptr)bf) & 3) == 0)
+                  break;
+               *bf++ = '0';
+               --i;
+            }
+            while (i >= 4) {
+               *(stbsp__uint32 *)bf = 0x30303030;
+               bf += 4;
+               i -= 4;
+            }
+            while (i) {
+               *bf++ = '0';
+               --i;
+            }
+            stbsp__chk_cb_buf(1);
+         }
+
+         // copy tail if there is one
+         sn = tail + 1;
+         while (tail[0]) {
+            stbsp__int32 i;
+            stbsp__cb_buf_clamp(i, tail[0]);
+            tail[0] -= (char)i;
+            while (i) {
+               *bf++ = *sn++;
+               --i;
+            }
+            stbsp__chk_cb_buf(1);
+         }
+
+         // handle the left justify
+         if (fl & STBSP__LEFTJUST)
+            if (fw > 0) {
+               while (fw) {
+                  stbsp__int32 i;
+                  stbsp__cb_buf_clamp(i, fw);
+                  fw -= i;
+                  while (i) {
+                     if ((((stbsp__uintptr)bf) & 3) == 0)
+                        break;
+                     *bf++ = ' ';
+                     --i;
+                  }
+                  while (i >= 4) {
+                     *(stbsp__uint32 *)bf = 0x20202020;
+                     bf += 4;
+                     i -= 4;
+                  }
+                  while (i--)
+                     *bf++ = ' ';
+                  stbsp__chk_cb_buf(1);
+               }
+            }
+         break;
+
+      default: // unknown, just copy code
+         s = num + STBSP__NUMSZ - 1;
+         *s = f[0];
+         l = 1;
+         fw = fl = 0;
+         lead[0] = 0;
+         tail[0] = 0;
+         pr = 0;
+         dp = 0;
+         cs = 0;
+         goto scopy;
+      }
+      ++f;
+   }
+endfmt:
+
+   if (!callback)
+      *bf = 0;
+   else
+      stbsp__flush_cb();
+
+done:
+   return tlen + (int)(bf - buf);
+}
+
+// cleanup
+#undef STBSP__LEFTJUST
+#undef STBSP__LEADINGPLUS
+#undef STBSP__LEADINGSPACE
+#undef STBSP__LEADING_0X
+#undef STBSP__LEADINGZERO
+#undef STBSP__INTMAX
+#undef STBSP__TRIPLET_COMMA
+#undef STBSP__NEGATIVE
+#undef STBSP__METRIC_SUFFIX
+#undef STBSP__NUMSZ
+#undef stbsp__chk_cb_bufL
+#undef stbsp__chk_cb_buf
+#undef stbsp__flush_cb
+#undef stbsp__cb_buf_clamp
+
+// ============================================================================
+//   wrapper functions
+
+STBSP__PUBLICDEF int STB_SPRINTF_DECORATE(sprintf)(char *buf, char const *fmt, ...)
+{
+   int result;
+   va_list va;
+   va_start(va, fmt);
+   result = STB_SPRINTF_DECORATE(vsprintfcb)(0, 0, buf, fmt, va);
+   va_end(va);
+   return result;
+}
+
+typedef struct stbsp__context {
+   char *buf;
+   int count;
+   int length;
+   char tmp[STB_SPRINTF_MIN];
+} stbsp__context;
+
+static char *stbsp__clamp_callback(const char *buf, void *user, int len)
+{
+   stbsp__context *c = (stbsp__context *)user;
+   c->length += len;
+
+   if (len > c->count)
+      len = c->count;
+
+   if (len) {
+      if (buf != c->buf) {
+         const char *s, *se;
+         char *d;
+         d = c->buf;
+         s = buf;
+         se = buf + len;
+         do {
+            *d++ = *s++;
+         } while (s < se);
+      }
+      c->buf += len;
+      c->count -= len;
+   }
+
+   if (c->count <= 0)
+      return c->tmp;
+   return (c->count >= STB_SPRINTF_MIN) ? c->buf : c->tmp; // go direct into buffer if you can
+}
+
+static char * stbsp__count_clamp_callback( const char * buf, void * user, int len )
+{
+   stbsp__context * c = (stbsp__context*)user;
+   (void) sizeof(buf);
+
+   c->length += len;
+   return c->tmp; // go direct into buffer if you can
+}
+
+STBSP__PUBLICDEF int STB_SPRINTF_DECORATE( vsnprintf )( char * buf, int count, char const * fmt, va_list va )
+{
+   stbsp__context c;
+
+   if ( (count == 0) && !buf )
+   {
+      c.length = 0;
+
+      STB_SPRINTF_DECORATE( vsprintfcb )( stbsp__count_clamp_callback, &c, c.tmp, fmt, va );
+   }
+   else
+   {
+      int l;
+
+      c.buf = buf;
+      c.count = count;
+      c.length = 0;
+
+      STB_SPRINTF_DECORATE( vsprintfcb )( stbsp__clamp_callback, &c, stbsp__clamp_callback(0,&c,0), fmt, va );
+
+      // zero-terminate
+      l = (int)( c.buf - buf );
+      if ( l >= count ) // should never be greater, only equal (or less) than count
+         l = count - 1;
+      buf[l] = 0;
+   }
+
+   return c.length;
+}
+
+STBSP__PUBLICDEF int STB_SPRINTF_DECORATE(snprintf)(char *buf, int count, char const *fmt, ...)
+{
+   int result;
+   va_list va;
+   va_start(va, fmt);
+
+   result = STB_SPRINTF_DECORATE(vsnprintf)(buf, count, fmt, va);
+   va_end(va);
+
+   return result;
+}
+
+STBSP__PUBLICDEF int STB_SPRINTF_DECORATE(vsprintf)(char *buf, char const *fmt, va_list va)
+{
+   return STB_SPRINTF_DECORATE(vsprintfcb)(0, 0, buf, fmt, va);
+}
+
+// =======================================================================
+//   low level float utility functions
+
+#ifndef STB_SPRINTF_NOFLOAT
+
+// copies d to bits w/ strict aliasing (this compiles to nothing on /Ox)
+#define STBSP__COPYFP(dest, src)                   \
+   {                                               \
+      int cn;                                      \
+      for (cn = 0; cn < 8; cn++)                   \
+         ((char *)&dest)[cn] = ((char *)&src)[cn]; \
+   }
+
+// get float info
+static stbsp__int32 stbsp__real_to_parts(stbsp__int64 *bits, stbsp__int32 *expo, double value)
+{
+   double d;
+   stbsp__int64 b = 0;
+
+   // load value and round at the frac_digits
+   d = value;
+
+   STBSP__COPYFP(b, d);
+
+   *bits = b & ((((stbsp__uint64)1) << 52) - 1);
+   *expo = (stbsp__int32)(((b >> 52) & 2047) - 1023);
+
+   return (stbsp__int32)((stbsp__uint64) b >> 63);
+}
+
+static double const stbsp__bot[23] = {
+   1e+000, 1e+001, 1e+002, 1e+003, 1e+004, 1e+005, 1e+006, 1e+007, 1e+008, 1e+009, 1e+010, 1e+011,
+   1e+012, 1e+013, 1e+014, 1e+015, 1e+016, 1e+017, 1e+018, 1e+019, 1e+020, 1e+021, 1e+022
+};
+static double const stbsp__negbot[22] = {
+   1e-001, 1e-002, 1e-003, 1e-004, 1e-005, 1e-006, 1e-007, 1e-008, 1e-009, 1e-010, 1e-011,
+   1e-012, 1e-013, 1e-014, 1e-015, 1e-016, 1e-017, 1e-018, 1e-019, 1e-020, 1e-021, 1e-022
+};
+static double const stbsp__negboterr[22] = {
+   -5.551115123125783e-018,  -2.0816681711721684e-019, -2.0816681711721686e-020, -4.7921736023859299e-021, -8.1803053914031305e-022, 4.5251888174113741e-023,
+   4.5251888174113739e-024,  -2.0922560830128471e-025, -6.2281591457779853e-026, -3.6432197315497743e-027, 6.0503030718060191e-028,  2.0113352370744385e-029,
+   -3.0373745563400371e-030, 1.1806906454401013e-032,  -7.7705399876661076e-032, 2.0902213275965398e-033,  -7.1542424054621921e-034, -7.1542424054621926e-035,
+   2.4754073164739869e-036,  5.4846728545790429e-037,  9.2462547772103625e-038,  -4.8596774326570872e-039
+};
+static double const stbsp__top[13] = {
+   1e+023, 1e+046, 1e+069, 1e+092, 1e+115, 1e+138, 1e+161, 1e+184, 1e+207, 1e+230, 1e+253, 1e+276, 1e+299
+};
+static double const stbsp__negtop[13] = {
+   1e-023, 1e-046, 1e-069, 1e-092, 1e-115, 1e-138, 1e-161, 1e-184, 1e-207, 1e-230, 1e-253, 1e-276, 1e-299
+};
+static double const stbsp__toperr[13] = {
+   8388608,
+   6.8601809640529717e+028,
+   -7.253143638152921e+052,
+   -4.3377296974619174e+075,
+   -1.5559416129466825e+098,
+   -3.2841562489204913e+121,
+   -3.7745893248228135e+144,
+   -1.7356668416969134e+167,
+   -3.8893577551088374e+190,
+   -9.9566444326005119e+213,
+   6.3641293062232429e+236,
+   -5.2069140800249813e+259,
+   -5.2504760255204387e+282
+};
+static double const stbsp__negtoperr[13] = {
+   3.9565301985100693e-040,  -2.299904345391321e-063,  3.6506201437945798e-086,  1.1875228833981544e-109,
+   -5.0644902316928607e-132, -6.7156837247865426e-155, -2.812077463003139e-178,  -5.7778912386589953e-201,
+   7.4997100559334532e-224,  -4.6439668915134491e-247, -6.3691100762962136e-270, -9.436808465446358e-293,
+   8.0970921678014997e-317
+};
+
+#if defined(_MSC_VER) && (_MSC_VER <= 1200)
+static stbsp__uint64 const stbsp__powten[20] = {
+   1,
+   10,
+   100,
+   1000,
+   10000,
+   100000,
+   1000000,
+   10000000,
+   100000000,
+   1000000000,
+   10000000000,
+   100000000000,
+   1000000000000,
+   10000000000000,
+   100000000000000,
+   1000000000000000,
+   10000000000000000,
+   100000000000000000,
+   1000000000000000000,
+   10000000000000000000U
+};
+#define stbsp__tento19th ((stbsp__uint64)1000000000000000000)
+#else
+static stbsp__uint64 const stbsp__powten[20] = {
+   1,
+   10,
+   100,
+   1000,
+   10000,
+   100000,
+   1000000,
+   10000000,
+   100000000,
+   1000000000,
+   10000000000ULL,
+   100000000000ULL,
+   1000000000000ULL,
+   10000000000000ULL,
+   100000000000000ULL,
+   1000000000000000ULL,
+   10000000000000000ULL,
+   100000000000000000ULL,
+   1000000000000000000ULL,
+   10000000000000000000ULL
+};
+#define stbsp__tento19th (1000000000000000000ULL)
+#endif
+
+#define stbsp__ddmulthi(oh, ol, xh, yh)                            \
+   {                                                               \
+      double ahi = 0, alo, bhi = 0, blo;                           \
+      stbsp__int64 bt;                                             \
+      oh = xh * yh;                                                \
+      STBSP__COPYFP(bt, xh);                                       \
+      bt &= ((~(stbsp__uint64)0) << 27);                           \
+      STBSP__COPYFP(ahi, bt);                                      \
+      alo = xh - ahi;                                              \
+      STBSP__COPYFP(bt, yh);                                       \
+      bt &= ((~(stbsp__uint64)0) << 27);                           \
+      STBSP__COPYFP(bhi, bt);                                      \
+      blo = yh - bhi;                                              \
+      ol = ((ahi * bhi - oh) + ahi * blo + alo * bhi) + alo * blo; \
+   }
+
+#define stbsp__ddtoS64(ob, xh, xl)          \
+   {                                        \
+      double ahi = 0, alo, vh, t;           \
+      ob = (stbsp__int64)xh;                \
+      vh = (double)ob;                      \
+      ahi = (xh - vh);                      \
+      t = (ahi - xh);                       \
+      alo = (xh - (ahi - t)) - (vh + t);    \
+      ob += (stbsp__int64)(ahi + alo + xl); \
+   }
+
+#define stbsp__ddrenorm(oh, ol) \
+   {                            \
+      double s;                 \
+      s = oh + ol;              \
+      ol = ol - (s - oh);       \
+      oh = s;                   \
+   }
+
+#define stbsp__ddmultlo(oh, ol, xh, xl, yh, yl) ol = ol + (xh * yl + xl * yh);
+
+#define stbsp__ddmultlos(oh, ol, xh, yl) ol = ol + (xh * yl);
+
+static void stbsp__raise_to_power10(double *ohi, double *olo, double d, stbsp__int32 power) // power can be -323 to +350
+{
+   double ph, pl;
+   if ((power >= 0) && (power <= 22)) {
+      stbsp__ddmulthi(ph, pl, d, stbsp__bot[power]);
+   } else {
+      stbsp__int32 e, et, eb;
+      double p2h, p2l;
+
+      e = power;
+      if (power < 0)
+         e = -e;
+      et = (e * 0x2c9) >> 14; /* %23 */
+      if (et > 13)
+         et = 13;
+      eb = e - (et * 23);
+
+      ph = d;
+      pl = 0.0;
+      if (power < 0) {
+         if (eb) {
+            --eb;
+            stbsp__ddmulthi(ph, pl, d, stbsp__negbot[eb]);
+            stbsp__ddmultlos(ph, pl, d, stbsp__negboterr[eb]);
+         }
+         if (et) {
+            stbsp__ddrenorm(ph, pl);
+            --et;
+            stbsp__ddmulthi(p2h, p2l, ph, stbsp__negtop[et]);
+            stbsp__ddmultlo(p2h, p2l, ph, pl, stbsp__negtop[et], stbsp__negtoperr[et]);
+            ph = p2h;
+            pl = p2l;
+         }
+      } else {
+         if (eb) {
+            e = eb;
+            if (eb > 22)
+               eb = 22;
+            e -= eb;
+            stbsp__ddmulthi(ph, pl, d, stbsp__bot[eb]);
+            if (e) {
+               stbsp__ddrenorm(ph, pl);
+               stbsp__ddmulthi(p2h, p2l, ph, stbsp__bot[e]);
+               stbsp__ddmultlos(p2h, p2l, stbsp__bot[e], pl);
+               ph = p2h;
+               pl = p2l;
+            }
+         }
+         if (et) {
+            stbsp__ddrenorm(ph, pl);
+            --et;
+            stbsp__ddmulthi(p2h, p2l, ph, stbsp__top[et]);
+            stbsp__ddmultlo(p2h, p2l, ph, pl, stbsp__top[et], stbsp__toperr[et]);
+            ph = p2h;
+            pl = p2l;
+         }
+      }
+   }
+   stbsp__ddrenorm(ph, pl);
+   *ohi = ph;
+   *olo = pl;
+}
+
+// given a float value, returns the significant bits in bits, and the position of the
+//   decimal point in decimal_pos.  +/-INF and NAN are specified by special values
+//   returned in the decimal_pos parameter.
+// frac_digits is absolute normally, but if you want from first significant digits (got %g and %e), or in 0x80000000
+static stbsp__int32 stbsp__real_to_str(char const **start, stbsp__uint32 *len, char *out, stbsp__int32 *decimal_pos, double value, stbsp__uint32 frac_digits)
+{
+   double d;
+   stbsp__int64 bits = 0;
+   stbsp__int32 expo, e, ng, tens;
+
+   d = value;
+   STBSP__COPYFP(bits, d);
+   expo = (stbsp__int32)((bits >> 52) & 2047);
+   ng = (stbsp__int32)((stbsp__uint64) bits >> 63);
+   if (ng)
+      d = -d;
+
+   if (expo == 2047) // is nan or inf?
+   {
+      *start = (bits & ((((stbsp__uint64)1) << 52) - 1)) ? "NaN" : "Inf";
+      *decimal_pos = STBSP__SPECIAL;
+      *len = 3;
+      return ng;
+   }
+
+   if (expo == 0) // is zero or denormal
+   {
+      if (((stbsp__uint64) bits << 1) == 0) // do zero
+      {
+         *decimal_pos = 1;
+         *start = out;
+         out[0] = '0';
+         *len = 1;
+         return ng;
+      }
+      // find the right expo for denormals
+      {
+         stbsp__int64 v = ((stbsp__uint64)1) << 51;
+         while ((bits & v) == 0) {
+            --expo;
+            v >>= 1;
+         }
+      }
+   }
+
+   // find the decimal exponent as well as the decimal bits of the value
+   {
+      double ph, pl;
+
+      // log10 estimate - very specifically tweaked to hit or undershoot by no more than 1 of log10 of all expos 1..2046
+      tens = expo - 1023;
+      tens = (tens < 0) ? ((tens * 617) / 2048) : (((tens * 1233) / 4096) + 1);
+
+      // move the significant bits into position and stick them into an int
+      stbsp__raise_to_power10(&ph, &pl, d, 18 - tens);
+
+      // get full as much precision from double-double as possible
+      stbsp__ddtoS64(bits, ph, pl);
+
+      // check if we undershot
+      if (((stbsp__uint64)bits) >= stbsp__tento19th)
+         ++tens;
+   }
+
+   // now do the rounding in integer land
+   frac_digits = (frac_digits & 0x80000000) ? ((frac_digits & 0x7ffffff) + 1) : (tens + frac_digits);
+   if ((frac_digits < 24)) {
+      stbsp__uint32 dg = 1;
+      if ((stbsp__uint64)bits >= stbsp__powten[9])
+         dg = 10;
+      while ((stbsp__uint64)bits >= stbsp__powten[dg]) {
+         ++dg;
+         if (dg == 20)
+            goto noround;
+      }
+      if (frac_digits < dg) {
+         stbsp__uint64 r;
+         // add 0.5 at the right position and round
+         e = dg - frac_digits;
+         if ((stbsp__uint32)e >= 24)
+            goto noround;
+         r = stbsp__powten[e];
+         bits = bits + (r / 2);
+         if ((stbsp__uint64)bits >= stbsp__powten[dg])
+            ++tens;
+         bits /= r;
+      }
+   noround:;
+   }
+
+   // kill long trailing runs of zeros
+   if (bits) {
+      stbsp__uint32 n;
+      for (;;) {
+         if (bits <= 0xffffffff)
+            break;
+         if (bits % 1000)
+            goto donez;
+         bits /= 1000;
+      }
+      n = (stbsp__uint32)bits;
+      while ((n % 1000) == 0)
+         n /= 1000;
+      bits = n;
+   donez:;
+   }
+
+   // convert to string
+   out += 64;
+   e = 0;
+   for (;;) {
+      stbsp__uint32 n;
+      char *o = out - 8;
+      // do the conversion in chunks of U32s (avoid most 64-bit divides, worth it, constant denomiators be damned)
+      if (bits >= 100000000) {
+         n = (stbsp__uint32)(bits % 100000000);
+         bits /= 100000000;
+      } else {
+         n = (stbsp__uint32)bits;
+         bits = 0;
+      }
+      while (n) {
+         out -= 2;
+         *(stbsp__uint16 *)out = *(stbsp__uint16 *)&stbsp__digitpair.pair[(n % 100) * 2];
+         n /= 100;
+         e += 2;
+      }
+      if (bits == 0) {
+         if ((e) && (out[0] == '0')) {
+            ++out;
+            --e;
+         }
+         break;
+      }
+      while (out != o) {
+         *--out = '0';
+         ++e;
+      }
+   }
+
+   *decimal_pos = tens;
+   *start = out;
+   *len = e;
+   return ng;
+}
+
+#undef stbsp__ddmulthi
+#undef stbsp__ddrenorm
+#undef stbsp__ddmultlo
+#undef stbsp__ddmultlos
+#undef STBSP__SPECIAL
+#undef STBSP__COPYFP
+
+#endif // STB_SPRINTF_NOFLOAT
+
+// clean up
+#undef stbsp__uint16
+#undef stbsp__uint32
+#undef stbsp__int32
+#undef stbsp__uint64
+#undef stbsp__int64
+#undef STBSP__UNALIGNED
+
+#endif // STB_SPRINTF_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_textedit.h b/vendor/stb/stb_textedit.h
new file mode 100644
index 0000000..1442493
--- /dev/null
+++ b/vendor/stb/stb_textedit.h
@@ -0,0 +1,1429 @@
+// stb_textedit.h - v1.14  - public domain - Sean Barrett
+// Development of this library was sponsored by RAD Game Tools
+//
+// This C header file implements the guts of a multi-line text-editing
+// widget; you implement display, word-wrapping, and low-level string
+// insertion/deletion, and stb_textedit will map user inputs into
+// insertions & deletions, plus updates to the cursor position,
+// selection state, and undo state.
+//
+// It is intended for use in games and other systems that need to build
+// their own custom widgets and which do not have heavy text-editing
+// requirements (this library is not recommended for use for editing large
+// texts, as its performance does not scale and it has limited undo).
+//
+// Non-trivial behaviors are modelled after Windows text controls.
+//
+//
+// LICENSE
+//
+// See end of file for license information.
+//
+//
+// DEPENDENCIES
+//
+// Uses the C runtime function 'memmove', which you can override
+// by defining STB_TEXTEDIT_memmove before the implementation.
+// Uses no other functions. Performs no runtime allocations.
+//
+//
+// VERSION HISTORY
+//
+//   1.14 (2021-07-11) page up/down, various fixes
+//   1.13 (2019-02-07) fix bug in undo size management
+//   1.12 (2018-01-29) user can change STB_TEXTEDIT_KEYTYPE, fix redo to avoid crash
+//   1.11 (2017-03-03) fix HOME on last line, dragging off single-line textfield
+//   1.10 (2016-10-25) supress warnings about casting away const with -Wcast-qual
+//   1.9  (2016-08-27) customizable move-by-word
+//   1.8  (2016-04-02) better keyboard handling when mouse button is down
+//   1.7  (2015-09-13) change y range handling in case baseline is non-0
+//   1.6  (2015-04-15) allow STB_TEXTEDIT_memmove
+//   1.5  (2014-09-10) add support for secondary keys for OS X
+//   1.4  (2014-08-17) fix signed/unsigned warnings
+//   1.3  (2014-06-19) fix mouse clicking to round to nearest char boundary
+//   1.2  (2014-05-27) fix some RAD types that had crept into the new code
+//   1.1  (2013-12-15) move-by-word (requires STB_TEXTEDIT_IS_SPACE )
+//   1.0  (2012-07-26) improve documentation, initial public release
+//   0.3  (2012-02-24) bugfixes, single-line mode; insert mode
+//   0.2  (2011-11-28) fixes to undo/redo
+//   0.1  (2010-07-08) initial version
+//
+// ADDITIONAL CONTRIBUTORS
+//
+//   Ulf Winklemann: move-by-word in 1.1
+//   Fabian Giesen: secondary key inputs in 1.5
+//   Martins Mozeiko: STB_TEXTEDIT_memmove in 1.6
+//   Louis Schnellbach: page up/down in 1.14
+//
+//   Bugfixes:
+//      Scott Graham
+//      Daniel Keller
+//      Omar Cornut
+//      Dan Thompson
+//
+// USAGE
+//
+// This file behaves differently depending on what symbols you define
+// before including it.
+//
+//
+// Header-file mode:
+//
+//   If you do not define STB_TEXTEDIT_IMPLEMENTATION before including this,
+//   it will operate in "header file" mode. In this mode, it declares a
+//   single public symbol, STB_TexteditState, which encapsulates the current
+//   state of a text widget (except for the string, which you will store
+//   separately).
+//
+//   To compile in this mode, you must define STB_TEXTEDIT_CHARTYPE to a
+//   primitive type that defines a single character (e.g. char, wchar_t, etc).
+//
+//   To save space or increase undo-ability, you can optionally define the
+//   following things that are used by the undo system:
+//
+//      STB_TEXTEDIT_POSITIONTYPE         small int type encoding a valid cursor position
+//      STB_TEXTEDIT_UNDOSTATECOUNT       the number of undo states to allow
+//      STB_TEXTEDIT_UNDOCHARCOUNT        the number of characters to store in the undo buffer
+//
+//   If you don't define these, they are set to permissive types and
+//   moderate sizes. The undo system does no memory allocations, so
+//   it grows STB_TexteditState by the worst-case storage which is (in bytes):
+//
+//        [4 + 3 * sizeof(STB_TEXTEDIT_POSITIONTYPE)] * STB_TEXTEDIT_UNDOSTATECOUNT
+//      +          sizeof(STB_TEXTEDIT_CHARTYPE)      * STB_TEXTEDIT_UNDOCHARCOUNT
+//
+//
+// Implementation mode:
+//
+//   If you define STB_TEXTEDIT_IMPLEMENTATION before including this, it
+//   will compile the implementation of the text edit widget, depending
+//   on a large number of symbols which must be defined before the include.
+//
+//   The implementation is defined only as static functions. You will then
+//   need to provide your own APIs in the same file which will access the
+//   static functions.
+//
+//   The basic concept is that you provide a "string" object which
+//   behaves like an array of characters. stb_textedit uses indices to
+//   refer to positions in the string, implicitly representing positions
+//   in the displayed textedit. This is true for both plain text and
+//   rich text; even with rich text stb_truetype interacts with your
+//   code as if there was an array of all the displayed characters.
+//
+// Symbols that must be the same in header-file and implementation mode:
+//
+//     STB_TEXTEDIT_CHARTYPE             the character type
+//     STB_TEXTEDIT_POSITIONTYPE         small type that is a valid cursor position
+//     STB_TEXTEDIT_UNDOSTATECOUNT       the number of undo states to allow
+//     STB_TEXTEDIT_UNDOCHARCOUNT        the number of characters to store in the undo buffer
+//
+// Symbols you must define for implementation mode:
+//
+//    STB_TEXTEDIT_STRING               the type of object representing a string being edited,
+//                                      typically this is a wrapper object with other data you need
+//
+//    STB_TEXTEDIT_STRINGLEN(obj)       the length of the string (ideally O(1))
+//    STB_TEXTEDIT_LAYOUTROW(&r,obj,n)  returns the results of laying out a line of characters
+//                                        starting from character #n (see discussion below)
+//    STB_TEXTEDIT_GETWIDTH(obj,n,i)    returns the pixel delta from the xpos of the i'th character
+//                                        to the xpos of the i+1'th char for a line of characters
+//                                        starting at character #n (i.e. accounts for kerning
+//                                        with previous char)
+//    STB_TEXTEDIT_KEYTOTEXT(k)         maps a keyboard input to an insertable character
+//                                        (return type is int, -1 means not valid to insert)
+//    STB_TEXTEDIT_GETCHAR(obj,i)       returns the i'th character of obj, 0-based
+//    STB_TEXTEDIT_NEWLINE              the character returned by _GETCHAR() we recognize
+//                                        as manually wordwrapping for end-of-line positioning
+//
+//    STB_TEXTEDIT_DELETECHARS(obj,i,n)      delete n characters starting at i
+//    STB_TEXTEDIT_INSERTCHARS(obj,i,c*,n)   insert n characters at i (pointed to by STB_TEXTEDIT_CHARTYPE*)
+//
+//    STB_TEXTEDIT_K_SHIFT       a power of two that is or'd in to a keyboard input to represent the shift key
+//
+//    STB_TEXTEDIT_K_LEFT        keyboard input to move cursor left
+//    STB_TEXTEDIT_K_RIGHT       keyboard input to move cursor right
+//    STB_TEXTEDIT_K_UP          keyboard input to move cursor up
+//    STB_TEXTEDIT_K_DOWN        keyboard input to move cursor down
+//    STB_TEXTEDIT_K_PGUP        keyboard input to move cursor up a page
+//    STB_TEXTEDIT_K_PGDOWN      keyboard input to move cursor down a page
+//    STB_TEXTEDIT_K_LINESTART   keyboard input to move cursor to start of line  // e.g. HOME
+//    STB_TEXTEDIT_K_LINEEND     keyboard input to move cursor to end of line    // e.g. END
+//    STB_TEXTEDIT_K_TEXTSTART   keyboard input to move cursor to start of text  // e.g. ctrl-HOME
+//    STB_TEXTEDIT_K_TEXTEND     keyboard input to move cursor to end of text    // e.g. ctrl-END
+//    STB_TEXTEDIT_K_DELETE      keyboard input to delete selection or character under cursor
+//    STB_TEXTEDIT_K_BACKSPACE   keyboard input to delete selection or character left of cursor
+//    STB_TEXTEDIT_K_UNDO        keyboard input to perform undo
+//    STB_TEXTEDIT_K_REDO        keyboard input to perform redo
+//
+// Optional:
+//    STB_TEXTEDIT_K_INSERT              keyboard input to toggle insert mode
+//    STB_TEXTEDIT_IS_SPACE(ch)          true if character is whitespace (e.g. 'isspace'),
+//                                          required for default WORDLEFT/WORDRIGHT handlers
+//    STB_TEXTEDIT_MOVEWORDLEFT(obj,i)   custom handler for WORDLEFT, returns index to move cursor to
+//    STB_TEXTEDIT_MOVEWORDRIGHT(obj,i)  custom handler for WORDRIGHT, returns index to move cursor to
+//    STB_TEXTEDIT_K_WORDLEFT            keyboard input to move cursor left one word // e.g. ctrl-LEFT
+//    STB_TEXTEDIT_K_WORDRIGHT           keyboard input to move cursor right one word // e.g. ctrl-RIGHT
+//    STB_TEXTEDIT_K_LINESTART2          secondary keyboard input to move cursor to start of line
+//    STB_TEXTEDIT_K_LINEEND2            secondary keyboard input to move cursor to end of line
+//    STB_TEXTEDIT_K_TEXTSTART2          secondary keyboard input to move cursor to start of text
+//    STB_TEXTEDIT_K_TEXTEND2            secondary keyboard input to move cursor to end of text
+//
+// Keyboard input must be encoded as a single integer value; e.g. a character code
+// and some bitflags that represent shift states. to simplify the interface, SHIFT must
+// be a bitflag, so we can test the shifted state of cursor movements to allow selection,
+// i.e. (STB_TEXTEDIT_K_RIGHT|STB_TEXTEDIT_K_SHIFT) should be shifted right-arrow.
+//
+// You can encode other things, such as CONTROL or ALT, in additional bits, and
+// then test for their presence in e.g. STB_TEXTEDIT_K_WORDLEFT. For example,
+// my Windows implementations add an additional CONTROL bit, and an additional KEYDOWN
+// bit. Then all of the STB_TEXTEDIT_K_ values bitwise-or in the KEYDOWN bit,
+// and I pass both WM_KEYDOWN and WM_CHAR events to the "key" function in the
+// API below. The control keys will only match WM_KEYDOWN events because of the
+// keydown bit I add, and STB_TEXTEDIT_KEYTOTEXT only tests for the KEYDOWN
+// bit so it only decodes WM_CHAR events.
+//
+// STB_TEXTEDIT_LAYOUTROW returns information about the shape of one displayed
+// row of characters assuming they start on the i'th character--the width and
+// the height and the number of characters consumed. This allows this library
+// to traverse the entire layout incrementally. You need to compute word-wrapping
+// here.
+//
+// Each textfield keeps its own insert mode state, which is not how normal
+// applications work. To keep an app-wide insert mode, update/copy the
+// "insert_mode" field of STB_TexteditState before/after calling API functions.
+//
+// API
+//
+//    void stb_textedit_initialize_state(STB_TexteditState *state, int is_single_line)
+//
+//    void stb_textedit_click(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, float x, float y)
+//    void stb_textedit_drag(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, float x, float y)
+//    int  stb_textedit_cut(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+//    int  stb_textedit_paste(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, STB_TEXTEDIT_CHARTYPE *text, int len)
+//    void stb_textedit_key(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, STB_TEXEDIT_KEYTYPE key)
+//
+//    Each of these functions potentially updates the string and updates the
+//    state.
+//
+//      initialize_state:
+//          set the textedit state to a known good default state when initially
+//          constructing the textedit.
+//
+//      click:
+//          call this with the mouse x,y on a mouse down; it will update the cursor
+//          and reset the selection start/end to the cursor point. the x,y must
+//          be relative to the text widget, with (0,0) being the top left.
+//
+//      drag:
+//          call this with the mouse x,y on a mouse drag/up; it will update the
+//          cursor and the selection end point
+//
+//      cut:
+//          call this to delete the current selection; returns true if there was
+//          one. you should FIRST copy the current selection to the system paste buffer.
+//          (To copy, just copy the current selection out of the string yourself.)
+//
+//      paste:
+//          call this to paste text at the current cursor point or over the current
+//          selection if there is one.
+//
+//      key:
+//          call this for keyboard inputs sent to the textfield. you can use it
+//          for "key down" events or for "translated" key events. if you need to
+//          do both (as in Win32), or distinguish Unicode characters from control
+//          inputs, set a high bit to distinguish the two; then you can define the
+//          various definitions like STB_TEXTEDIT_K_LEFT have the is-key-event bit
+//          set, and make STB_TEXTEDIT_KEYTOCHAR check that the is-key-event bit is
+//          clear. STB_TEXTEDIT_KEYTYPE defaults to int, but you can #define it to
+//          anything other type you wante before including.
+//
+//
+//   When rendering, you can read the cursor position and selection state from
+//   the STB_TexteditState.
+//
+//
+// Notes:
+//
+// This is designed to be usable in IMGUI, so it allows for the possibility of
+// running in an IMGUI that has NOT cached the multi-line layout. For this
+// reason, it provides an interface that is compatible with computing the
+// layout incrementally--we try to make sure we make as few passes through
+// as possible. (For example, to locate the mouse pointer in the text, we
+// could define functions that return the X and Y positions of characters
+// and binary search Y and then X, but if we're doing dynamic layout this
+// will run the layout algorithm many times, so instead we manually search
+// forward in one pass. Similar logic applies to e.g. up-arrow and
+// down-arrow movement.)
+//
+// If it's run in a widget that *has* cached the layout, then this is less
+// efficient, but it's not horrible on modern computers. But you wouldn't
+// want to edit million-line files with it.
+
+
+////////////////////////////////////////////////////////////////////////////
+////////////////////////////////////////////////////////////////////////////
+////
+////   Header-file mode
+////
+////
+
+#ifndef INCLUDE_STB_TEXTEDIT_H
+#define INCLUDE_STB_TEXTEDIT_H
+
+////////////////////////////////////////////////////////////////////////
+//
+//     STB_TexteditState
+//
+// Definition of STB_TexteditState which you should store
+// per-textfield; it includes cursor position, selection state,
+// and undo state.
+//
+
+#ifndef STB_TEXTEDIT_UNDOSTATECOUNT
+#define STB_TEXTEDIT_UNDOSTATECOUNT   99
+#endif
+#ifndef STB_TEXTEDIT_UNDOCHARCOUNT
+#define STB_TEXTEDIT_UNDOCHARCOUNT   999
+#endif
+#ifndef STB_TEXTEDIT_CHARTYPE
+#define STB_TEXTEDIT_CHARTYPE        int
+#endif
+#ifndef STB_TEXTEDIT_POSITIONTYPE
+#define STB_TEXTEDIT_POSITIONTYPE    int
+#endif
+
+typedef struct
+{
+   // private data
+   STB_TEXTEDIT_POSITIONTYPE  where;
+   STB_TEXTEDIT_POSITIONTYPE  insert_length;
+   STB_TEXTEDIT_POSITIONTYPE  delete_length;
+   int                        char_storage;
+} StbUndoRecord;
+
+typedef struct
+{
+   // private data
+   StbUndoRecord          undo_rec [STB_TEXTEDIT_UNDOSTATECOUNT];
+   STB_TEXTEDIT_CHARTYPE  undo_char[STB_TEXTEDIT_UNDOCHARCOUNT];
+   short undo_point, redo_point;
+   int undo_char_point, redo_char_point;
+} StbUndoState;
+
+typedef struct
+{
+   /////////////////////
+   //
+   // public data
+   //
+
+   int cursor;
+   // position of the text cursor within the string
+
+   int select_start;          // selection start point
+   int select_end;
+   // selection start and end point in characters; if equal, no selection.
+   // note that start may be less than or greater than end (e.g. when
+   // dragging the mouse, start is where the initial click was, and you
+   // can drag in either direction)
+
+   unsigned char insert_mode;
+   // each textfield keeps its own insert mode state. to keep an app-wide
+   // insert mode, copy this value in/out of the app state
+
+   int row_count_per_page;
+   // page size in number of row.
+   // this value MUST be set to >0 for pageup or pagedown in multilines documents.
+
+   /////////////////////
+   //
+   // private data
+   //
+   unsigned char cursor_at_end_of_line; // not implemented yet
+   unsigned char initialized;
+   unsigned char has_preferred_x;
+   unsigned char single_line;
+   unsigned char padding1, padding2, padding3;
+   float preferred_x; // this determines where the cursor up/down tries to seek to along x
+   StbUndoState undostate;
+} STB_TexteditState;
+
+
+////////////////////////////////////////////////////////////////////////
+//
+//     StbTexteditRow
+//
+// Result of layout query, used by stb_textedit to determine where
+// the text in each row is.
+
+// result of layout query
+typedef struct
+{
+   float x0,x1;             // starting x location, end x location (allows for align=right, etc)
+   float baseline_y_delta;  // position of baseline relative to previous row's baseline
+   float ymin,ymax;         // height of row above and below baseline
+   int num_chars;
+} StbTexteditRow;
+#endif //INCLUDE_STB_TEXTEDIT_H
+
+
+////////////////////////////////////////////////////////////////////////////
+////////////////////////////////////////////////////////////////////////////
+////
+////   Implementation mode
+////
+////
+
+
+// implementation isn't include-guarded, since it might have indirectly
+// included just the "header" portion
+#ifdef STB_TEXTEDIT_IMPLEMENTATION
+
+#ifndef STB_TEXTEDIT_memmove
+#include <string.h>
+#define STB_TEXTEDIT_memmove memmove
+#endif
+
+
+/////////////////////////////////////////////////////////////////////////////
+//
+//      Mouse input handling
+//
+
+// traverse the layout to locate the nearest character to a display position
+static int stb_text_locate_coord(STB_TEXTEDIT_STRING *str, float x, float y)
+{
+   StbTexteditRow r;
+   int n = STB_TEXTEDIT_STRINGLEN(str);
+   float base_y = 0, prev_x;
+   int i=0, k;
+
+   r.x0 = r.x1 = 0;
+   r.ymin = r.ymax = 0;
+   r.num_chars = 0;
+
+   // search rows to find one that straddles 'y'
+   while (i < n) {
+      STB_TEXTEDIT_LAYOUTROW(&r, str, i);
+      if (r.num_chars <= 0)
+         return n;
+
+      if (i==0 && y < base_y + r.ymin)
+         return 0;
+
+      if (y < base_y + r.ymax)
+         break;
+
+      i += r.num_chars;
+      base_y += r.baseline_y_delta;
+   }
+
+   // below all text, return 'after' last character
+   if (i >= n)
+      return n;
+
+   // check if it's before the beginning of the line
+   if (x < r.x0)
+      return i;
+
+   // check if it's before the end of the line
+   if (x < r.x1) {
+      // search characters in row for one that straddles 'x'
+      prev_x = r.x0;
+      for (k=0; k < r.num_chars; ++k) {
+         float w = STB_TEXTEDIT_GETWIDTH(str, i, k);
+         if (x < prev_x+w) {
+            if (x < prev_x+w/2)
+               return k+i;
+            else
+               return k+i+1;
+         }
+         prev_x += w;
+      }
+      // shouldn't happen, but if it does, fall through to end-of-line case
+   }
+
+   // if the last character is a newline, return that. otherwise return 'after' the last character
+   if (STB_TEXTEDIT_GETCHAR(str, i+r.num_chars-1) == STB_TEXTEDIT_NEWLINE)
+      return i+r.num_chars-1;
+   else
+      return i+r.num_chars;
+}
+
+// API click: on mouse down, move the cursor to the clicked location, and reset the selection
+static void stb_textedit_click(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, float x, float y)
+{
+   // In single-line mode, just always make y = 0. This lets the drag keep working if the mouse
+   // goes off the top or bottom of the text
+   if( state->single_line )
+   {
+      StbTexteditRow r;
+      STB_TEXTEDIT_LAYOUTROW(&r, str, 0);
+      y = r.ymin;
+   }
+
+   state->cursor = stb_text_locate_coord(str, x, y);
+   state->select_start = state->cursor;
+   state->select_end = state->cursor;
+   state->has_preferred_x = 0;
+}
+
+// API drag: on mouse drag, move the cursor and selection endpoint to the clicked location
+static void stb_textedit_drag(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, float x, float y)
+{
+   int p = 0;
+
+   // In single-line mode, just always make y = 0. This lets the drag keep working if the mouse
+   // goes off the top or bottom of the text
+   if( state->single_line )
+   {
+      StbTexteditRow r;
+      STB_TEXTEDIT_LAYOUTROW(&r, str, 0);
+      y = r.ymin;
+   }
+
+   if (state->select_start == state->select_end)
+      state->select_start = state->cursor;
+
+   p = stb_text_locate_coord(str, x, y);
+   state->cursor = state->select_end = p;
+}
+
+/////////////////////////////////////////////////////////////////////////////
+//
+//      Keyboard input handling
+//
+
+// forward declarations
+static void stb_text_undo(STB_TEXTEDIT_STRING *str, STB_TexteditState *state);
+static void stb_text_redo(STB_TEXTEDIT_STRING *str, STB_TexteditState *state);
+static void stb_text_makeundo_delete(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, int where, int length);
+static void stb_text_makeundo_insert(STB_TexteditState *state, int where, int length);
+static void stb_text_makeundo_replace(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, int where, int old_length, int new_length);
+
+typedef struct
+{
+   float x,y;    // position of n'th character
+   float height; // height of line
+   int first_char, length; // first char of row, and length
+   int prev_first;  // first char of previous row
+} StbFindState;
+
+// find the x/y location of a character, and remember info about the previous row in
+// case we get a move-up event (for page up, we'll have to rescan)
+static void stb_textedit_find_charpos(StbFindState *find, STB_TEXTEDIT_STRING *str, int n, int single_line)
+{
+   StbTexteditRow r;
+   int prev_start = 0;
+   int z = STB_TEXTEDIT_STRINGLEN(str);
+   int i=0, first;
+
+   if (n == z) {
+      // if it's at the end, then find the last line -- simpler than trying to
+      // explicitly handle this case in the regular code
+      if (single_line) {
+         STB_TEXTEDIT_LAYOUTROW(&r, str, 0);
+         find->y = 0;
+         find->first_char = 0;
+         find->length = z;
+         find->height = r.ymax - r.ymin;
+         find->x = r.x1;
+      } else {
+         find->y = 0;
+         find->x = 0;
+         find->height = 1;
+         while (i < z) {
+            STB_TEXTEDIT_LAYOUTROW(&r, str, i);
+            prev_start = i;
+            i += r.num_chars;
+         }
+         find->first_char = i;
+         find->length = 0;
+         find->prev_first = prev_start;
+      }
+      return;
+   }
+
+   // search rows to find the one that straddles character n
+   find->y = 0;
+
+   for(;;) {
+      STB_TEXTEDIT_LAYOUTROW(&r, str, i);
+      if (n < i + r.num_chars)
+         break;
+      prev_start = i;
+      i += r.num_chars;
+      find->y += r.baseline_y_delta;
+   }
+
+   find->first_char = first = i;
+   find->length = r.num_chars;
+   find->height = r.ymax - r.ymin;
+   find->prev_first = prev_start;
+
+   // now scan to find xpos
+   find->x = r.x0;
+   for (i=0; first+i < n; ++i)
+      find->x += STB_TEXTEDIT_GETWIDTH(str, first, i);
+}
+
+#define STB_TEXT_HAS_SELECTION(s)   ((s)->select_start != (s)->select_end)
+
+// make the selection/cursor state valid if client altered the string
+static void stb_textedit_clamp(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+{
+   int n = STB_TEXTEDIT_STRINGLEN(str);
+   if (STB_TEXT_HAS_SELECTION(state)) {
+      if (state->select_start > n) state->select_start = n;
+      if (state->select_end   > n) state->select_end = n;
+      // if clamping forced them to be equal, move the cursor to match
+      if (state->select_start == state->select_end)
+         state->cursor = state->select_start;
+   }
+   if (state->cursor > n) state->cursor = n;
+}
+
+// delete characters while updating undo
+static void stb_textedit_delete(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, int where, int len)
+{
+   stb_text_makeundo_delete(str, state, where, len);
+   STB_TEXTEDIT_DELETECHARS(str, where, len);
+   state->has_preferred_x = 0;
+}
+
+// delete the section
+static void stb_textedit_delete_selection(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+{
+   stb_textedit_clamp(str, state);
+   if (STB_TEXT_HAS_SELECTION(state)) {
+      if (state->select_start < state->select_end) {
+         stb_textedit_delete(str, state, state->select_start, state->select_end - state->select_start);
+         state->select_end = state->cursor = state->select_start;
+      } else {
+         stb_textedit_delete(str, state, state->select_end, state->select_start - state->select_end);
+         state->select_start = state->cursor = state->select_end;
+      }
+      state->has_preferred_x = 0;
+   }
+}
+
+// canoncialize the selection so start <= end
+static void stb_textedit_sortselection(STB_TexteditState *state)
+{
+   if (state->select_end < state->select_start) {
+      int temp = state->select_end;
+      state->select_end = state->select_start;
+      state->select_start = temp;
+   }
+}
+
+// move cursor to first character of selection
+static void stb_textedit_move_to_first(STB_TexteditState *state)
+{
+   if (STB_TEXT_HAS_SELECTION(state)) {
+      stb_textedit_sortselection(state);
+      state->cursor = state->select_start;
+      state->select_end = state->select_start;
+      state->has_preferred_x = 0;
+   }
+}
+
+// move cursor to last character of selection
+static void stb_textedit_move_to_last(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+{
+   if (STB_TEXT_HAS_SELECTION(state)) {
+      stb_textedit_sortselection(state);
+      stb_textedit_clamp(str, state);
+      state->cursor = state->select_end;
+      state->select_start = state->select_end;
+      state->has_preferred_x = 0;
+   }
+}
+
+#ifdef STB_TEXTEDIT_IS_SPACE
+static int is_word_boundary( STB_TEXTEDIT_STRING *str, int idx )
+{
+   return idx > 0 ? (STB_TEXTEDIT_IS_SPACE( STB_TEXTEDIT_GETCHAR(str,idx-1) ) && !STB_TEXTEDIT_IS_SPACE( STB_TEXTEDIT_GETCHAR(str, idx) ) ) : 1;
+}
+
+#ifndef STB_TEXTEDIT_MOVEWORDLEFT
+static int stb_textedit_move_to_word_previous( STB_TEXTEDIT_STRING *str, int c )
+{
+   --c; // always move at least one character
+   while( c >= 0 && !is_word_boundary( str, c ) )
+      --c;
+
+   if( c < 0 )
+      c = 0;
+
+   return c;
+}
+#define STB_TEXTEDIT_MOVEWORDLEFT stb_textedit_move_to_word_previous
+#endif
+
+#ifndef STB_TEXTEDIT_MOVEWORDRIGHT
+static int stb_textedit_move_to_word_next( STB_TEXTEDIT_STRING *str, int c )
+{
+   const int len = STB_TEXTEDIT_STRINGLEN(str);
+   ++c; // always move at least one character
+   while( c < len && !is_word_boundary( str, c ) )
+      ++c;
+
+   if( c > len )
+      c = len;
+
+   return c;
+}
+#define STB_TEXTEDIT_MOVEWORDRIGHT stb_textedit_move_to_word_next
+#endif
+
+#endif
+
+// update selection and cursor to match each other
+static void stb_textedit_prep_selection_at_cursor(STB_TexteditState *state)
+{
+   if (!STB_TEXT_HAS_SELECTION(state))
+      state->select_start = state->select_end = state->cursor;
+   else
+      state->cursor = state->select_end;
+}
+
+// API cut: delete selection
+static int stb_textedit_cut(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+{
+   if (STB_TEXT_HAS_SELECTION(state)) {
+      stb_textedit_delete_selection(str,state); // implicitly clamps
+      state->has_preferred_x = 0;
+      return 1;
+   }
+   return 0;
+}
+
+// API paste: replace existing selection with passed-in text
+static int stb_textedit_paste_internal(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, STB_TEXTEDIT_CHARTYPE *text, int len)
+{
+   // if there's a selection, the paste should delete it
+   stb_textedit_clamp(str, state);
+   stb_textedit_delete_selection(str,state);
+   // try to insert the characters
+   if (STB_TEXTEDIT_INSERTCHARS(str, state->cursor, text, len)) {
+      stb_text_makeundo_insert(state, state->cursor, len);
+      state->cursor += len;
+      state->has_preferred_x = 0;
+      return 1;
+   }
+   // note: paste failure will leave deleted selection, may be restored with an undo (see https://github.com/nothings/stb/issues/734 for details)
+   return 0;
+}
+
+#ifndef STB_TEXTEDIT_KEYTYPE
+#define STB_TEXTEDIT_KEYTYPE int
+#endif
+
+// API key: process a keyboard input
+static void stb_textedit_key(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, STB_TEXTEDIT_KEYTYPE key)
+{
+retry:
+   switch (key) {
+      default: {
+         int c = STB_TEXTEDIT_KEYTOTEXT(key);
+         if (c > 0) {
+            STB_TEXTEDIT_CHARTYPE ch = (STB_TEXTEDIT_CHARTYPE) c;
+
+            // can't add newline in single-line mode
+            if (c == '\n' && state->single_line)
+               break;
+
+            if (state->insert_mode && !STB_TEXT_HAS_SELECTION(state) && state->cursor < STB_TEXTEDIT_STRINGLEN(str)) {
+               stb_text_makeundo_replace(str, state, state->cursor, 1, 1);
+               STB_TEXTEDIT_DELETECHARS(str, state->cursor, 1);
+               if (STB_TEXTEDIT_INSERTCHARS(str, state->cursor, &ch, 1)) {
+                  ++state->cursor;
+                  state->has_preferred_x = 0;
+               }
+            } else {
+               stb_textedit_delete_selection(str,state); // implicitly clamps
+               if (STB_TEXTEDIT_INSERTCHARS(str, state->cursor, &ch, 1)) {
+                  stb_text_makeundo_insert(state, state->cursor, 1);
+                  ++state->cursor;
+                  state->has_preferred_x = 0;
+               }
+            }
+         }
+         break;
+      }
+
+#ifdef STB_TEXTEDIT_K_INSERT
+      case STB_TEXTEDIT_K_INSERT:
+         state->insert_mode = !state->insert_mode;
+         break;
+#endif
+
+      case STB_TEXTEDIT_K_UNDO:
+         stb_text_undo(str, state);
+         state->has_preferred_x = 0;
+         break;
+
+      case STB_TEXTEDIT_K_REDO:
+         stb_text_redo(str, state);
+         state->has_preferred_x = 0;
+         break;
+
+      case STB_TEXTEDIT_K_LEFT:
+         // if currently there's a selection, move cursor to start of selection
+         if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_move_to_first(state);
+         else
+            if (state->cursor > 0)
+               --state->cursor;
+         state->has_preferred_x = 0;
+         break;
+
+      case STB_TEXTEDIT_K_RIGHT:
+         // if currently there's a selection, move cursor to end of selection
+         if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_move_to_last(str, state);
+         else
+            ++state->cursor;
+         stb_textedit_clamp(str, state);
+         state->has_preferred_x = 0;
+         break;
+
+      case STB_TEXTEDIT_K_LEFT | STB_TEXTEDIT_K_SHIFT:
+         stb_textedit_clamp(str, state);
+         stb_textedit_prep_selection_at_cursor(state);
+         // move selection left
+         if (state->select_end > 0)
+            --state->select_end;
+         state->cursor = state->select_end;
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_MOVEWORDLEFT
+      case STB_TEXTEDIT_K_WORDLEFT:
+         if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_move_to_first(state);
+         else {
+            state->cursor = STB_TEXTEDIT_MOVEWORDLEFT(str, state->cursor);
+            stb_textedit_clamp( str, state );
+         }
+         break;
+
+      case STB_TEXTEDIT_K_WORDLEFT | STB_TEXTEDIT_K_SHIFT:
+         if( !STB_TEXT_HAS_SELECTION( state ) )
+            stb_textedit_prep_selection_at_cursor(state);
+
+         state->cursor = STB_TEXTEDIT_MOVEWORDLEFT(str, state->cursor);
+         state->select_end = state->cursor;
+
+         stb_textedit_clamp( str, state );
+         break;
+#endif
+
+#ifdef STB_TEXTEDIT_MOVEWORDRIGHT
+      case STB_TEXTEDIT_K_WORDRIGHT:
+         if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_move_to_last(str, state);
+         else {
+            state->cursor = STB_TEXTEDIT_MOVEWORDRIGHT(str, state->cursor);
+            stb_textedit_clamp( str, state );
+         }
+         break;
+
+      case STB_TEXTEDIT_K_WORDRIGHT | STB_TEXTEDIT_K_SHIFT:
+         if( !STB_TEXT_HAS_SELECTION( state ) )
+            stb_textedit_prep_selection_at_cursor(state);
+
+         state->cursor = STB_TEXTEDIT_MOVEWORDRIGHT(str, state->cursor);
+         state->select_end = state->cursor;
+
+         stb_textedit_clamp( str, state );
+         break;
+#endif
+
+      case STB_TEXTEDIT_K_RIGHT | STB_TEXTEDIT_K_SHIFT:
+         stb_textedit_prep_selection_at_cursor(state);
+         // move selection right
+         ++state->select_end;
+         stb_textedit_clamp(str, state);
+         state->cursor = state->select_end;
+         state->has_preferred_x = 0;
+         break;
+
+      case STB_TEXTEDIT_K_DOWN:
+      case STB_TEXTEDIT_K_DOWN | STB_TEXTEDIT_K_SHIFT:
+      case STB_TEXTEDIT_K_PGDOWN:
+      case STB_TEXTEDIT_K_PGDOWN | STB_TEXTEDIT_K_SHIFT: {
+         StbFindState find;
+         StbTexteditRow row;
+         int i, j, sel = (key & STB_TEXTEDIT_K_SHIFT) != 0;
+         int is_page = (key & ~STB_TEXTEDIT_K_SHIFT) == STB_TEXTEDIT_K_PGDOWN;
+         int row_count = is_page ? state->row_count_per_page : 1;
+
+         if (!is_page && state->single_line) {
+            // on windows, up&down in single-line behave like left&right
+            key = STB_TEXTEDIT_K_RIGHT | (key & STB_TEXTEDIT_K_SHIFT);
+            goto retry;
+         }
+
+         if (sel)
+            stb_textedit_prep_selection_at_cursor(state);
+         else if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_move_to_last(str, state);
+
+         // compute current position of cursor point
+         stb_textedit_clamp(str, state);
+         stb_textedit_find_charpos(&find, str, state->cursor, state->single_line);
+
+         for (j = 0; j < row_count; ++j) {
+            float x, goal_x = state->has_preferred_x ? state->preferred_x : find.x;
+            int start = find.first_char + find.length;
+
+            if (find.length == 0)
+               break;
+
+            // now find character position down a row
+            state->cursor = start;
+            STB_TEXTEDIT_LAYOUTROW(&row, str, state->cursor);
+            x = row.x0;
+            for (i=0; i < row.num_chars; ++i) {
+               float dx = STB_TEXTEDIT_GETWIDTH(str, start, i);
+               #ifdef STB_TEXTEDIT_GETWIDTH_NEWLINE
+               if (dx == STB_TEXTEDIT_GETWIDTH_NEWLINE)
+                  break;
+               #endif
+               x += dx;
+               if (x > goal_x)
+                  break;
+               ++state->cursor;
+            }
+            stb_textedit_clamp(str, state);
+
+            state->has_preferred_x = 1;
+            state->preferred_x = goal_x;
+
+            if (sel)
+               state->select_end = state->cursor;
+
+            // go to next line
+            find.first_char = find.first_char + find.length;
+            find.length = row.num_chars;
+         }
+         break;
+      }
+
+      case STB_TEXTEDIT_K_UP:
+      case STB_TEXTEDIT_K_UP | STB_TEXTEDIT_K_SHIFT:
+      case STB_TEXTEDIT_K_PGUP:
+      case STB_TEXTEDIT_K_PGUP | STB_TEXTEDIT_K_SHIFT: {
+         StbFindState find;
+         StbTexteditRow row;
+         int i, j, prev_scan, sel = (key & STB_TEXTEDIT_K_SHIFT) != 0;
+         int is_page = (key & ~STB_TEXTEDIT_K_SHIFT) == STB_TEXTEDIT_K_PGUP;
+         int row_count = is_page ? state->row_count_per_page : 1;
+
+         if (!is_page && state->single_line) {
+            // on windows, up&down become left&right
+            key = STB_TEXTEDIT_K_LEFT | (key & STB_TEXTEDIT_K_SHIFT);
+            goto retry;
+         }
+
+         if (sel)
+            stb_textedit_prep_selection_at_cursor(state);
+         else if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_move_to_first(state);
+
+         // compute current position of cursor point
+         stb_textedit_clamp(str, state);
+         stb_textedit_find_charpos(&find, str, state->cursor, state->single_line);
+
+         for (j = 0; j < row_count; ++j) {
+            float  x, goal_x = state->has_preferred_x ? state->preferred_x : find.x;
+
+            // can only go up if there's a previous row
+            if (find.prev_first == find.first_char)
+               break;
+
+            // now find character position up a row
+            state->cursor = find.prev_first;
+            STB_TEXTEDIT_LAYOUTROW(&row, str, state->cursor);
+            x = row.x0;
+            for (i=0; i < row.num_chars; ++i) {
+               float dx = STB_TEXTEDIT_GETWIDTH(str, find.prev_first, i);
+               #ifdef STB_TEXTEDIT_GETWIDTH_NEWLINE
+               if (dx == STB_TEXTEDIT_GETWIDTH_NEWLINE)
+                  break;
+               #endif
+               x += dx;
+               if (x > goal_x)
+                  break;
+               ++state->cursor;
+            }
+            stb_textedit_clamp(str, state);
+
+            state->has_preferred_x = 1;
+            state->preferred_x = goal_x;
+
+            if (sel)
+               state->select_end = state->cursor;
+
+            // go to previous line
+            // (we need to scan previous line the hard way. maybe we could expose this as a new API function?)
+            prev_scan = find.prev_first > 0 ? find.prev_first - 1 : 0;
+            while (prev_scan > 0 && STB_TEXTEDIT_GETCHAR(str, prev_scan - 1) != STB_TEXTEDIT_NEWLINE)
+               --prev_scan;
+            find.first_char = find.prev_first;
+            find.prev_first = prev_scan;
+         }
+         break;
+      }
+
+      case STB_TEXTEDIT_K_DELETE:
+      case STB_TEXTEDIT_K_DELETE | STB_TEXTEDIT_K_SHIFT:
+         if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_delete_selection(str, state);
+         else {
+            int n = STB_TEXTEDIT_STRINGLEN(str);
+            if (state->cursor < n)
+               stb_textedit_delete(str, state, state->cursor, 1);
+         }
+         state->has_preferred_x = 0;
+         break;
+
+      case STB_TEXTEDIT_K_BACKSPACE:
+      case STB_TEXTEDIT_K_BACKSPACE | STB_TEXTEDIT_K_SHIFT:
+         if (STB_TEXT_HAS_SELECTION(state))
+            stb_textedit_delete_selection(str, state);
+         else {
+            stb_textedit_clamp(str, state);
+            if (state->cursor > 0) {
+               stb_textedit_delete(str, state, state->cursor-1, 1);
+               --state->cursor;
+            }
+         }
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_K_TEXTSTART2
+      case STB_TEXTEDIT_K_TEXTSTART2:
+#endif
+      case STB_TEXTEDIT_K_TEXTSTART:
+         state->cursor = state->select_start = state->select_end = 0;
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_K_TEXTEND2
+      case STB_TEXTEDIT_K_TEXTEND2:
+#endif
+      case STB_TEXTEDIT_K_TEXTEND:
+         state->cursor = STB_TEXTEDIT_STRINGLEN(str);
+         state->select_start = state->select_end = 0;
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_K_TEXTSTART2
+      case STB_TEXTEDIT_K_TEXTSTART2 | STB_TEXTEDIT_K_SHIFT:
+#endif
+      case STB_TEXTEDIT_K_TEXTSTART | STB_TEXTEDIT_K_SHIFT:
+         stb_textedit_prep_selection_at_cursor(state);
+         state->cursor = state->select_end = 0;
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_K_TEXTEND2
+      case STB_TEXTEDIT_K_TEXTEND2 | STB_TEXTEDIT_K_SHIFT:
+#endif
+      case STB_TEXTEDIT_K_TEXTEND | STB_TEXTEDIT_K_SHIFT:
+         stb_textedit_prep_selection_at_cursor(state);
+         state->cursor = state->select_end = STB_TEXTEDIT_STRINGLEN(str);
+         state->has_preferred_x = 0;
+         break;
+
+
+#ifdef STB_TEXTEDIT_K_LINESTART2
+      case STB_TEXTEDIT_K_LINESTART2:
+#endif
+      case STB_TEXTEDIT_K_LINESTART:
+         stb_textedit_clamp(str, state);
+         stb_textedit_move_to_first(state);
+         if (state->single_line)
+            state->cursor = 0;
+         else while (state->cursor > 0 && STB_TEXTEDIT_GETCHAR(str, state->cursor-1) != STB_TEXTEDIT_NEWLINE)
+            --state->cursor;
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_K_LINEEND2
+      case STB_TEXTEDIT_K_LINEEND2:
+#endif
+      case STB_TEXTEDIT_K_LINEEND: {
+         int n = STB_TEXTEDIT_STRINGLEN(str);
+         stb_textedit_clamp(str, state);
+         stb_textedit_move_to_first(state);
+         if (state->single_line)
+             state->cursor = n;
+         else while (state->cursor < n && STB_TEXTEDIT_GETCHAR(str, state->cursor) != STB_TEXTEDIT_NEWLINE)
+             ++state->cursor;
+         state->has_preferred_x = 0;
+         break;
+      }
+
+#ifdef STB_TEXTEDIT_K_LINESTART2
+      case STB_TEXTEDIT_K_LINESTART2 | STB_TEXTEDIT_K_SHIFT:
+#endif
+      case STB_TEXTEDIT_K_LINESTART | STB_TEXTEDIT_K_SHIFT:
+         stb_textedit_clamp(str, state);
+         stb_textedit_prep_selection_at_cursor(state);
+         if (state->single_line)
+            state->cursor = 0;
+         else while (state->cursor > 0 && STB_TEXTEDIT_GETCHAR(str, state->cursor-1) != STB_TEXTEDIT_NEWLINE)
+            --state->cursor;
+         state->select_end = state->cursor;
+         state->has_preferred_x = 0;
+         break;
+
+#ifdef STB_TEXTEDIT_K_LINEEND2
+      case STB_TEXTEDIT_K_LINEEND2 | STB_TEXTEDIT_K_SHIFT:
+#endif
+      case STB_TEXTEDIT_K_LINEEND | STB_TEXTEDIT_K_SHIFT: {
+         int n = STB_TEXTEDIT_STRINGLEN(str);
+         stb_textedit_clamp(str, state);
+         stb_textedit_prep_selection_at_cursor(state);
+         if (state->single_line)
+             state->cursor = n;
+         else while (state->cursor < n && STB_TEXTEDIT_GETCHAR(str, state->cursor) != STB_TEXTEDIT_NEWLINE)
+            ++state->cursor;
+         state->select_end = state->cursor;
+         state->has_preferred_x = 0;
+         break;
+      }
+   }
+}
+
+/////////////////////////////////////////////////////////////////////////////
+//
+//      Undo processing
+//
+// @OPTIMIZE: the undo/redo buffer should be circular
+
+static void stb_textedit_flush_redo(StbUndoState *state)
+{
+   state->redo_point = STB_TEXTEDIT_UNDOSTATECOUNT;
+   state->redo_char_point = STB_TEXTEDIT_UNDOCHARCOUNT;
+}
+
+// discard the oldest entry in the undo list
+static void stb_textedit_discard_undo(StbUndoState *state)
+{
+   if (state->undo_point > 0) {
+      // if the 0th undo state has characters, clean those up
+      if (state->undo_rec[0].char_storage >= 0) {
+         int n = state->undo_rec[0].insert_length, i;
+         // delete n characters from all other records
+         state->undo_char_point -= n;
+         STB_TEXTEDIT_memmove(state->undo_char, state->undo_char + n, (size_t) (state->undo_char_point*sizeof(STB_TEXTEDIT_CHARTYPE)));
+         for (i=0; i < state->undo_point; ++i)
+            if (state->undo_rec[i].char_storage >= 0)
+               state->undo_rec[i].char_storage -= n; // @OPTIMIZE: get rid of char_storage and infer it
+      }
+      --state->undo_point;
+      STB_TEXTEDIT_memmove(state->undo_rec, state->undo_rec+1, (size_t) (state->undo_point*sizeof(state->undo_rec[0])));
+   }
+}
+
+// discard the oldest entry in the redo list--it's bad if this
+// ever happens, but because undo & redo have to store the actual
+// characters in different cases, the redo character buffer can
+// fill up even though the undo buffer didn't
+static void stb_textedit_discard_redo(StbUndoState *state)
+{
+   int k = STB_TEXTEDIT_UNDOSTATECOUNT-1;
+
+   if (state->redo_point <= k) {
+      // if the k'th undo state has characters, clean those up
+      if (state->undo_rec[k].char_storage >= 0) {
+         int n = state->undo_rec[k].insert_length, i;
+         // move the remaining redo character data to the end of the buffer
+         state->redo_char_point += n;
+         STB_TEXTEDIT_memmove(state->undo_char + state->redo_char_point, state->undo_char + state->redo_char_point-n, (size_t) ((STB_TEXTEDIT_UNDOCHARCOUNT - state->redo_char_point)*sizeof(STB_TEXTEDIT_CHARTYPE)));
+         // adjust the position of all the other records to account for above memmove
+         for (i=state->redo_point; i < k; ++i)
+            if (state->undo_rec[i].char_storage >= 0)
+               state->undo_rec[i].char_storage += n;
+      }
+      // now move all the redo records towards the end of the buffer; the first one is at 'redo_point'
+      STB_TEXTEDIT_memmove(state->undo_rec + state->redo_point+1, state->undo_rec + state->redo_point, (size_t) ((STB_TEXTEDIT_UNDOSTATECOUNT - state->redo_point)*sizeof(state->undo_rec[0])));
+      // now move redo_point to point to the new one
+      ++state->redo_point;
+   }
+}
+
+static StbUndoRecord *stb_text_create_undo_record(StbUndoState *state, int numchars)
+{
+   // any time we create a new undo record, we discard redo
+   stb_textedit_flush_redo(state);
+
+   // if we have no free records, we have to make room, by sliding the
+   // existing records down
+   if (state->undo_point == STB_TEXTEDIT_UNDOSTATECOUNT)
+      stb_textedit_discard_undo(state);
+
+   // if the characters to store won't possibly fit in the buffer, we can't undo
+   if (numchars > STB_TEXTEDIT_UNDOCHARCOUNT) {
+      state->undo_point = 0;
+      state->undo_char_point = 0;
+      return NULL;
+   }
+
+   // if we don't have enough free characters in the buffer, we have to make room
+   while (state->undo_char_point + numchars > STB_TEXTEDIT_UNDOCHARCOUNT)
+      stb_textedit_discard_undo(state);
+
+   return &state->undo_rec[state->undo_point++];
+}
+
+static STB_TEXTEDIT_CHARTYPE *stb_text_createundo(StbUndoState *state, int pos, int insert_len, int delete_len)
+{
+   StbUndoRecord *r = stb_text_create_undo_record(state, insert_len);
+   if (r == NULL)
+      return NULL;
+
+   r->where = pos;
+   r->insert_length = (STB_TEXTEDIT_POSITIONTYPE) insert_len;
+   r->delete_length = (STB_TEXTEDIT_POSITIONTYPE) delete_len;
+
+   if (insert_len == 0) {
+      r->char_storage = -1;
+      return NULL;
+   } else {
+      r->char_storage = state->undo_char_point;
+      state->undo_char_point += insert_len;
+      return &state->undo_char[r->char_storage];
+   }
+}
+
+static void stb_text_undo(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+{
+   StbUndoState *s = &state->undostate;
+   StbUndoRecord u, *r;
+   if (s->undo_point == 0)
+      return;
+
+   // we need to do two things: apply the undo record, and create a redo record
+   u = s->undo_rec[s->undo_point-1];
+   r = &s->undo_rec[s->redo_point-1];
+   r->char_storage = -1;
+
+   r->insert_length = u.delete_length;
+   r->delete_length = u.insert_length;
+   r->where = u.where;
+
+   if (u.delete_length) {
+      // if the undo record says to delete characters, then the redo record will
+      // need to re-insert the characters that get deleted, so we need to store
+      // them.
+
+      // there are three cases:
+      //    there's enough room to store the characters
+      //    characters stored for *redoing* don't leave room for redo
+      //    characters stored for *undoing* don't leave room for redo
+      // if the last is true, we have to bail
+
+      if (s->undo_char_point + u.delete_length >= STB_TEXTEDIT_UNDOCHARCOUNT) {
+         // the undo records take up too much character space; there's no space to store the redo characters
+         r->insert_length = 0;
+      } else {
+         int i;
+
+         // there's definitely room to store the characters eventually
+         while (s->undo_char_point + u.delete_length > s->redo_char_point) {
+            // should never happen:
+            if (s->redo_point == STB_TEXTEDIT_UNDOSTATECOUNT)
+               return;
+            // there's currently not enough room, so discard a redo record
+            stb_textedit_discard_redo(s);
+         }
+         r = &s->undo_rec[s->redo_point-1];
+
+         r->char_storage = s->redo_char_point - u.delete_length;
+         s->redo_char_point = s->redo_char_point - u.delete_length;
+
+         // now save the characters
+         for (i=0; i < u.delete_length; ++i)
+            s->undo_char[r->char_storage + i] = STB_TEXTEDIT_GETCHAR(str, u.where + i);
+      }
+
+      // now we can carry out the deletion
+      STB_TEXTEDIT_DELETECHARS(str, u.where, u.delete_length);
+   }
+
+   // check type of recorded action:
+   if (u.insert_length) {
+      // easy case: was a deletion, so we need to insert n characters
+      STB_TEXTEDIT_INSERTCHARS(str, u.where, &s->undo_char[u.char_storage], u.insert_length);
+      s->undo_char_point -= u.insert_length;
+   }
+
+   state->cursor = u.where + u.insert_length;
+
+   s->undo_point--;
+   s->redo_point--;
+}
+
+static void stb_text_redo(STB_TEXTEDIT_STRING *str, STB_TexteditState *state)
+{
+   StbUndoState *s = &state->undostate;
+   StbUndoRecord *u, r;
+   if (s->redo_point == STB_TEXTEDIT_UNDOSTATECOUNT)
+      return;
+
+   // we need to do two things: apply the redo record, and create an undo record
+   u = &s->undo_rec[s->undo_point];
+   r = s->undo_rec[s->redo_point];
+
+   // we KNOW there must be room for the undo record, because the redo record
+   // was derived from an undo record
+
+   u->delete_length = r.insert_length;
+   u->insert_length = r.delete_length;
+   u->where = r.where;
+   u->char_storage = -1;
+
+   if (r.delete_length) {
+      // the redo record requires us to delete characters, so the undo record
+      // needs to store the characters
+
+      if (s->undo_char_point + u->insert_length > s->redo_char_point) {
+         u->insert_length = 0;
+         u->delete_length = 0;
+      } else {
+         int i;
+         u->char_storage = s->undo_char_point;
+         s->undo_char_point = s->undo_char_point + u->insert_length;
+
+         // now save the characters
+         for (i=0; i < u->insert_length; ++i)
+            s->undo_char[u->char_storage + i] = STB_TEXTEDIT_GETCHAR(str, u->where + i);
+      }
+
+      STB_TEXTEDIT_DELETECHARS(str, r.where, r.delete_length);
+   }
+
+   if (r.insert_length) {
+      // easy case: need to insert n characters
+      STB_TEXTEDIT_INSERTCHARS(str, r.where, &s->undo_char[r.char_storage], r.insert_length);
+      s->redo_char_point += r.insert_length;
+   }
+
+   state->cursor = r.where + r.insert_length;
+
+   s->undo_point++;
+   s->redo_point++;
+}
+
+static void stb_text_makeundo_insert(STB_TexteditState *state, int where, int length)
+{
+   stb_text_createundo(&state->undostate, where, 0, length);
+}
+
+static void stb_text_makeundo_delete(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, int where, int length)
+{
+   int i;
+   STB_TEXTEDIT_CHARTYPE *p = stb_text_createundo(&state->undostate, where, length, 0);
+   if (p) {
+      for (i=0; i < length; ++i)
+         p[i] = STB_TEXTEDIT_GETCHAR(str, where+i);
+   }
+}
+
+static void stb_text_makeundo_replace(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, int where, int old_length, int new_length)
+{
+   int i;
+   STB_TEXTEDIT_CHARTYPE *p = stb_text_createundo(&state->undostate, where, old_length, new_length);
+   if (p) {
+      for (i=0; i < old_length; ++i)
+         p[i] = STB_TEXTEDIT_GETCHAR(str, where+i);
+   }
+}
+
+// reset the state to default
+static void stb_textedit_clear_state(STB_TexteditState *state, int is_single_line)
+{
+   state->undostate.undo_point = 0;
+   state->undostate.undo_char_point = 0;
+   state->undostate.redo_point = STB_TEXTEDIT_UNDOSTATECOUNT;
+   state->undostate.redo_char_point = STB_TEXTEDIT_UNDOCHARCOUNT;
+   state->select_end = state->select_start = 0;
+   state->cursor = 0;
+   state->has_preferred_x = 0;
+   state->preferred_x = 0;
+   state->cursor_at_end_of_line = 0;
+   state->initialized = 1;
+   state->single_line = (unsigned char) is_single_line;
+   state->insert_mode = 0;
+   state->row_count_per_page = 0;
+}
+
+// API initialize
+static void stb_textedit_initialize_state(STB_TexteditState *state, int is_single_line)
+{
+   stb_textedit_clear_state(state, is_single_line);
+}
+
+#if defined(__GNUC__) || defined(__clang__)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
+
+static int stb_textedit_paste(STB_TEXTEDIT_STRING *str, STB_TexteditState *state, STB_TEXTEDIT_CHARTYPE const *ctext, int len)
+{
+   return stb_textedit_paste_internal(str, state, (STB_TEXTEDIT_CHARTYPE *) ctext, len);
+}
+
+#if defined(__GNUC__) || defined(__clang__)
+#pragma GCC diagnostic pop
+#endif
+
+#endif//STB_TEXTEDIT_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_tilemap_editor.h b/vendor/stb/stb_tilemap_editor.h
new file mode 100644
index 0000000..fbd3388
--- /dev/null
+++ b/vendor/stb/stb_tilemap_editor.h
@@ -0,0 +1,4187 @@
+// stb_tilemap_editor.h - v0.42 - Sean Barrett - http://nothings.org/stb
+// placed in the public domain - not copyrighted - first released 2014-09
+//
+// Embeddable tilemap editor for C/C++
+//
+//
+// TABLE OF CONTENTS
+//    FAQ
+//    How to compile/use the library
+//    Additional configuration macros
+//    API documentation
+//    Info on editing multiple levels
+//    Revision history
+//    Todo
+//    Credits
+//    License
+//
+//
+// FAQ
+//
+//   Q: What counts as a tilemap for this library?
+//
+//   A: An array of rectangles, where each rectangle contains a small
+//      stack of images.
+//
+//   Q: What are the limitations?
+//
+//   A: Maps are limited to 4096x4096 in dimension.
+//      Each map square can only contain a stack of at most 32 images.
+//      A map can only use up to 32768 distinct image tiles.
+//
+//   Q: How do I compile this?
+//
+//   A: You need to #define several symbols before #including it, but only
+//      in one file. This will cause all the function definitions to be
+//      generated in that file. See the "HOW TO COMPILE" section.
+//
+//   Q: What advantages does this have over a standalone editor?
+//
+//   A: For one, you can integrate the editor into your game so you can
+//      flip between editing and testing without even switching windows.
+//      For another, you don't need an XML parser to get at the map data.
+//
+//   Q: Can I live-edit my game maps?
+//
+//   A: Not really, the editor keeps its own map representation.
+//
+//   Q: How do I save and load maps?
+//
+//   A: You have to do this yourself. The editor provides serialization
+//      functions (get & set) for reading and writing the map it holds.
+//      You can choose whatever format you want to store the map to on
+//      disk; you just need to provide functions to convert. (For example,
+//      I actually store the editor's map representation to disk basically
+//      as-is; then I have a single function that converts from the editor
+//      map representation to the game representation, which is used both
+//      to go from editor-to-game and from loaded-map-to-game.)
+//
+//   Q: I want to have tiles change appearance based on what's
+//      adjacent, or other tile-display/substitution trickiness.
+//
+//   A: You can do this when you convert from the editor's map
+//      representation to the game representation, but there's
+//      no way to show this live in the editor.
+//
+//   Q: The editor appears to be put map location (0,0) at the top left?
+//      I want to use a different coordinate system in my game (e.g. y
+//      increasing upwards, or origin at the center).
+//
+//   A: You can do this when you convert from the editor's map
+//      representation to the game representation. (Don't forget to
+//      translate link coordinates as well!)
+//
+//   Q: The editor appears to put pixel (0,0) at the top left? I want
+//      to use a different coordinate system in my game.
+//
+//   A: The editor defines an "editor pixel coordinate system" with
+//      (0,0) at the top left and requires you to display things in
+//      that coordinate system. You can freely remap those coordinates
+//      to anything you want on screen.
+//
+//   Q: How do I scale the user interface?
+//
+//   A: Since you do all the rendering, you can scale up all the rendering
+//      calls that the library makes to you. If you do, (a) you need
+//      to also scale up the mouse coordinates, and (b) you may want
+//      to scale the map display back down so that you're only scaling
+//      the UI and not everything. See the next question.
+//
+//   Q: How do I scale the map display?
+//
+//   A: Use stbte_set_spacing() to change the size that the map is displayed
+//      at. Note that the "callbacks" to draw tiles are used for both drawing
+//      the map and drawing the tile palette, so that callback may need to
+//      draw at two different scales. You should choose the scales to match
+//       You can tell them apart because the
+//      tile palette gets NULL for the property pointer.
+//
+//   Q: How does object editing work?
+//
+//   A: One way to think of this is that in the editor, you're placing
+//      spawners, not objects. Each spawner must be tile-aligned, because
+//      it's only a tile editor. Each tile (stack of layers) gets
+//      an associated set of properties, and it's up to you to
+//      determine what properties should appear for a given tile,
+//      based on e.g. the spawners that are in it.
+//
+//   Q: How are properties themselves handled?
+//
+//   A: All properties, regardless of UI behavior, are internally floats.
+//      Each tile has an array of floats associated with it, which is
+//      passed back to you when drawing the tiles so you can draw
+//      objects appropriately modified by the properties.
+//
+//   Q: What if I want to have two different objects/spawners in
+//      one tile, both of which have their own properties?
+//
+//   A: Make sure STBTE_MAX_PROPERTIES is large enough for the sum of
+//      properties in both objects, and then you have to explicitly
+//      map the property slot #s to the appropriate objects. They'll
+//      still all appear in a single property panel; there's no way
+//      to get multiple panels.
+//
+//   Q: Can I do one-to-many linking?
+//
+//   A: The library only supports one link per tile. However, you
+//      can have multiple tiles all link to a single tile. So, you
+//      can fake one-to-many linking by linking in the reverse
+//      direction.
+//
+//   Q: What if I have two objects in the same tile, and they each
+//      need an independent link? Or I have two kinds of link associated
+//      with a single object?
+//
+//   A: There is no way to do this. (Unless you can reverse one link.)
+//
+//   Q: How does cut & paste interact with object properties & links?
+//
+//   A: Currently the library has no idea which properties or links
+//      are associated with which layers of a tile. So currently, the
+//      library will only copy properties & links if the layer panel
+//      is set to allow all layers to be copied, OR if you set the
+//      "props" in the layer panel to "always". Similarly, you can
+//      set "props" to "none" so it will never copy.
+//
+//   Q: What happens if the library gets a memory allocation failure
+//      while I'm editing? Will I lose my work?
+//
+//   A: The library allocates all editor memory when you create
+//      the tilemap. It allocates a maximally-sized map and a
+//      fixed-size undo buffer (and the fixed-size copy buffer
+//      is static), and never allocates memory while it's running.
+//      So it can't fail due to running out of memory.
+//
+//   Q: What happens if the library crashes while I'm editing? Will
+//      I lose my work?
+//
+//   A: Yes. Save often.
+//
+//
+// HOW TO COMPILE
+//
+//   This header file contains both the header file and the
+//   implementation file in one. To create the implementation,
+//   in one source file define a few symbols first and then
+//   include this header:
+//
+//      #define STB_TILEMAP_EDITOR_IMPLEMENTATION
+//      // this triggers the implementation
+//
+//      void STBTE_DRAW_RECT(int x0, int y0, int x1, int y1, unsigned int color);
+//      // this must draw a filled rectangle (exclusive on right/bottom)
+//      // color = (r<<16)|(g<<8)|(b)
+//
+//      void STBTE_DRAW_TILE(int x0, int y0,
+//                    unsigned short id, int highlight, float *data);
+//      // this draws the tile image identified by 'id' in one of several
+//      // highlight modes (see STBTE_drawmode_* in the header section);
+//      // if 'data' is NULL, it's drawing the tile in the palette; if 'data'
+//      // is not NULL, it's drawing a tile on the map, and that is the data
+//      // associated with that map tile
+//
+//      #include "stb_tilemap_editor.h"
+//
+//   Optionally you can define the following functions before the include;
+//   note these must be macros (but they can just call a function) so
+//   this library can #ifdef to detect if you've defined them:
+//
+//      #define STBTE_PROP_TYPE(int n, short *tiledata, float *params) ...
+//      // Returns the type of the n'th property of a given tile, which
+//      // controls how it is edited. Legal types are:
+//      //     0                    /* no editable property in this slot */
+//      //     STBTE_PROP_int       /* uses a slider to adjust value     */
+//      //     STBTE_PROP_float     /* uses a weird multi-axis control   */
+//      //     STBTE_PROP_bool      /* uses a checkbox to change value   */
+//      // And you can bitwise-OR in the following flags:
+//      //     STBTE_PROP_disabled
+//      // Note that all of these are stored as floats in the param array.
+//      // The integer slider is limited in precision based on the space
+//      // available on screen, so for wide-ranged integers you may want
+//      // to use floats instead.
+//      //
+//      // Since the tiledata is passed to you, you can choose which property
+//      // is bound to that slot based on that data.
+//      //
+//      // Changing the type of a parameter does not cause the underlying
+//      // value to be clamped to the type min/max except when the tile is
+//      // explicitly selected.
+//
+//      #define STBTE_PROP_NAME(int n, short *tiledata, float *params) ...
+//      // these return a string with the name for slot #n in the float
+//      // property list for the tile.
+//
+//      #define STBTE_PROP_MIN(int n, short *tiledata) ...your code here...
+//      #define STBTE_PROP_MAX(int n, short *tiledata) ...your code here...
+//      // These return the allowable range for the property values for
+//      // the specified slot. It is never called for boolean types.
+//
+//      #define STBTE_PROP_FLOAT_SCALE(int n, short *tiledata, float *params)
+//      // This rescales the float control for a given property; by default
+//      // left mouse drags add integers, right mouse drags adds fractions,
+//      // but you can rescale this per-property.
+//
+//      #define STBTE_FLOAT_CONTROL_GRANULARITY       ... value ...
+//      // This returns the number of pixels of mouse motion necessary
+//      // to advance the object float control. Default is 4
+//
+//      #define STBTE_ALLOW_LINK(short *src, float *src_data,  \
+//                               short *dest, float *dest_data) ...your code...
+//      // this returns true or false depending on whether you allow a link
+//      // to be drawn from a tile 'src' to a tile 'dest'. if you don't
+//      // define this, linking will not be supported
+//
+//      #define STBTE_LINK_COLOR(short *src, float *src_data,  \
+//                               short *dest, float *dest_data) ...your code...
+//      // return a color encoded as a 24-bit unsigned integer in the
+//      // form 0xRRGGBB. If you don't define this, default colors will
+//      // be used.
+//
+//
+//      [[ support for those below is not implemented yet ]]
+//
+//      #define STBTE_HITTEST_TILE(x0,y0,id,mx,my)   ...your code here...
+//      // this returns true or false depending on whether the mouse
+//      // pointer at mx,my is over (touching) a tile of type 'id'
+//      // displayed at x0,y0. Normally stb_tilemap_editor just does
+//      // this hittest based on the tile geometry, but if you have
+//      // tiles whose images extend out of the tile, you'll need this.
+//
+// ADDITIONAL CONFIGURATION
+//
+//   The following symbols set static limits which determine how much
+//   memory will be allocated for the editor. You can override them
+//   by making similar definitions, but memory usage will increase.
+//
+//      #define STBTE_MAX_TILEMAP_X      200   // max 4096
+//      #define STBTE_MAX_TILEMAP_Y      200   // max 4096
+//      #define STBTE_MAX_LAYERS         8     // max 32
+//      #define STBTE_MAX_CATEGORIES     100
+//      #define STBTE_UNDO_BUFFER_BYTES  (1 << 24) // 16 MB
+//      #define STBTE_MAX_COPY           90000  // e.g. 300x300
+//      #define STBTE_MAX_PROPERTIES     10     // max properties per tile
+//
+// API
+//
+//   Further documentation appears in the header-file section below.
+//
+// EDITING MULTIPLE LEVELS
+//
+//   You can only have one active editor instance. To switch between multiple
+//   levels, you can either store the levels in your own format and copy them
+//   in and out of the editor format, or you can create multiple stbte_tilemap
+//   objects and switch between them. The latter has the advantage that each
+//   stbte_tilemap keeps its own undo state. (The clipboard is global, so
+//   either approach allows cut&pasting between levels.)
+//
+// REVISION HISTORY
+//   0.42  fix compilation errors
+//   0.41  fix warnings
+//   0.40  fix warning
+//   0.39  fix warning
+//   0.38  fix warning
+//   0.37  fix warning
+//   0.36  minor compiler support
+//   0.35  layername button changes
+//          - layername buttons grow with the layer panel
+//          - fix stbte_create_map being declared as stbte_create
+//          - fix declaration of stbte_create_map
+//   0.30  properties release
+//          - properties panel for editing user-defined "object" properties
+//          - can link each tile to one other tile
+//          - keyboard interface
+//          - fix eraser tool bug (worked in complex cases, failed in simple)
+//          - undo/redo tools have visible disabled state
+//          - tiles on higher layers draw on top of adjacent lower-layer tiles
+//   0.20  erasable release
+//          - eraser tool
+//          - fix bug when pasting into protected layer
+//          - better color scheme
+//          - internal-use color picker
+//   0.10  initial release
+//
+// TODO
+//
+//   Separate scroll state for each category
+//   Implement paint bucket
+//   Support STBTE_HITTEST_TILE above
+//  ?Cancel drags by clicking other button? - may be fixed
+//   Finish support for toolbar at side
+//
+// CREDITS
+//
+//
+//   Main editor & features
+//      Sean Barrett
+//   Additional features:
+//      Josh Huelsman
+//   Bugfixes:
+//      Ryan Whitworth
+//      Eugene Opalev
+//      Rob Loach
+//      github:wernsey
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+
+
+///////////////////////////////////////////////////////////////////////
+//
+//   HEADER SECTION
+
+#ifndef STB_TILEMAP_INCLUDE_STB_TILEMAP_EDITOR_H
+#define STB_TILEMAP_INCLUDE_STB_TILEMAP_EDITOR_H
+
+#ifdef _WIN32
+  #ifndef _CRT_SECURE_NO_WARNINGS
+  #define _CRT_SECURE_NO_WARNINGS
+  #endif
+  #include <stdlib.h>
+  #include <stdio.h>
+#endif
+
+typedef struct stbte_tilemap stbte_tilemap;
+
+// these are the drawmodes used in STBTE_DRAW_TILE
+enum
+{
+   STBTE_drawmode_deemphasize = -1,
+   STBTE_drawmode_normal      =  0,
+   STBTE_drawmode_emphasize   =  1,
+};
+
+// these are the property types
+#define STBTE_PROP_none     0
+#define STBTE_PROP_int      1
+#define STBTE_PROP_float    2
+#define STBTE_PROP_bool     3
+#define STBTE_PROP_disabled 4
+
+////////
+//
+// creation
+//
+
+extern stbte_tilemap *stbte_create_map(int map_x, int map_y, int map_layers, int spacing_x, int spacing_y, int max_tiles);
+// create an editable tilemap
+//   map_x      : dimensions of map horizontally (user can change this in editor), <= STBTE_MAX_TILEMAP_X
+//   map_y      : dimensions of map vertically (user can change this in editor)    <= STBTE_MAX_TILEMAP_Y
+//   map_layers : number of layers to use (fixed), <= STBTE_MAX_LAYERS
+//   spacing_x  : initial horizontal distance between left edges of map tiles in stb_tilemap_editor pixels
+//   spacing_y  : initial vertical distance between top edges of map tiles in stb_tilemap_editor pixels
+//   max_tiles  : maximum number of tiles that can defined
+//
+// If insufficient memory, returns NULL
+
+extern void stbte_define_tile(stbte_tilemap *tm, unsigned short id, unsigned int layermask, const char * category);
+// call this repeatedly for each tile to install the tile definitions into the editable tilemap
+//   tm        : tilemap created by stbte_create_map
+//   id        : unique identifier for each tile, 0 <= id < 32768
+//   layermask : bitmask of which layers tile is allowed on: 1 = layer 0, 255 = layers 0..7
+//               (note that onscreen, the editor numbers the layers from 1 not 0)
+//               layer 0 is the furthest back, layer 1 is just in front of layer 0, etc
+//   category  : which category this tile is grouped in
+
+extern void stbte_set_display(int x0, int y0, int x1, int y1);
+// call this once to set the size; if you resize, call it again
+
+
+/////////
+//
+// every frame
+//
+
+extern void stbte_draw(stbte_tilemap *tm);
+
+extern void stbte_tick(stbte_tilemap *tm, float time_in_seconds_since_last_frame);
+
+////////////
+//
+//  user input
+//
+
+// if you're using SDL, call the next function for SDL_MOUSEMOTION, SDL_MOUSEBUTTONDOWN, SDL_MOUSEBUTTONUP, SDL_MOUSEWHEEL;
+// the transformation lets you scale from SDL mouse coords to stb_tilemap_editor coords
+extern void stbte_mouse_sdl(stbte_tilemap *tm, const void *sdl_event, float xscale, float yscale, int xoffset, int yoffset);
+
+// otherwise, hook these up explicitly:
+extern void stbte_mouse_move(stbte_tilemap *tm, int x, int y, int shifted, int scrollkey);
+extern void stbte_mouse_button(stbte_tilemap *tm, int x, int y, int right, int down, int shifted, int scrollkey);
+extern void stbte_mouse_wheel(stbte_tilemap *tm, int x, int y, int vscroll);
+
+// note: at the moment, mouse wheel events (SDL_MOUSEWHEEL) are ignored.
+
+// for keyboard, define your own mapping from keys to the following actions.
+// this is totally optional, as all features are accessible with the mouse
+enum stbte_action
+{
+   STBTE_tool_select,
+   STBTE_tool_brush,
+   STBTE_tool_erase,
+   STBTE_tool_rectangle,
+   STBTE_tool_eyedropper,
+   STBTE_tool_link,
+   STBTE_act_toggle_grid,
+   STBTE_act_toggle_links,
+   STBTE_act_undo,
+   STBTE_act_redo,
+   STBTE_act_cut,
+   STBTE_act_copy,
+   STBTE_act_paste,
+   STBTE_scroll_left,
+   STBTE_scroll_right,
+   STBTE_scroll_up,
+   STBTE_scroll_down,
+};
+extern void stbte_action(stbte_tilemap *tm, enum stbte_action act);
+
+////////////////
+//
+//  save/load
+//
+//  There is no editor file format. You have to save and load the data yourself
+//  through the following functions. You can also use these functions to get the
+//  data to generate game-formatted levels directly. (But make sure you save
+//  first! You may also want to autosave to a temp file periodically, etc etc.)
+
+#define STBTE_EMPTY    -1
+
+extern void stbte_get_dimensions(stbte_tilemap *tm, int *max_x, int *max_y);
+// get the dimensions of the level, since the user can change them
+
+extern short* stbte_get_tile(stbte_tilemap *tm, int x, int y);
+// returns an array of shorts that is 'map_layers' in length. each short is
+// either one of the tile_id values from define_tile, or STBTE_EMPTY.
+
+extern float *stbte_get_properties(stbte_tilemap *tm, int x, int y);
+// get the property array associated with the tile at x,y. this is an
+// array of floats that is STBTE_MAX_PROPERTIES in length; you have to
+// interpret the slots according to the semantics you've chosen
+
+extern void stbte_get_link(stbte_tilemap *tm, int x, int y, int *destx, int *desty);
+// gets the link associated with the tile at x,y.
+
+extern void stbte_set_dimensions(stbte_tilemap *tm, int max_x, int max_y);
+// set the dimensions of the level, overrides previous stbte_create_map()
+// values or anything the user has changed
+
+extern void stbte_clear_map(stbte_tilemap *tm);
+// clears the map, including the region outside the defined region, so if the
+// user expands the map, they won't see garbage there
+
+extern void stbte_set_tile(stbte_tilemap *tm, int x, int y, int layer, signed short tile);
+// tile is your tile_id from define_tile, or STBTE_EMPTY
+
+extern void stbte_set_property(stbte_tilemap *tm, int x, int y, int n, float val);
+// set the value of the n'th slot of the tile at x,y
+
+extern void stbte_set_link(stbte_tilemap *tm, int x, int y, int destx, int desty);
+// set a link going from x,y to destx,desty. to force no link,
+// use destx=desty=-1
+
+////////
+//
+// optional
+//
+
+extern void stbte_set_background_tile(stbte_tilemap *tm, short id);
+// selects the tile to fill the bottom layer with and used to clear bottom tiles to;
+// should be same ID as
+
+extern void stbte_set_sidewidths(int left, int right);
+// call this once to set the left & right side widths. don't call
+// it again since the user can change it
+
+extern void stbte_set_spacing(stbte_tilemap *tm, int spacing_x, int spacing_y, int palette_spacing_x, int palette_spacing_y);
+// call this to set the spacing of map tiles and the spacing of palette tiles.
+// if you rescale your display, call it again (e.g. you can implement map zooming yourself)
+
+extern void stbte_set_layername(stbte_tilemap *tm, int layer, const char *layername);
+// sets a string name for your layer that shows in the layer selector. note that this
+// makes the layer selector wider. 'layer' is from 0..(map_layers-1)
+
+#endif
+
+#ifdef STB_TILEMAP_EDITOR_IMPLEMENTATION
+
+#ifndef STBTE_ASSERT
+#define STBTE_ASSERT assert
+#include <assert.h>
+#endif
+
+#ifdef _MSC_VER
+#define STBTE__NOTUSED(v)  (void)(v)
+#else
+#define STBTE__NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#ifndef STBTE_MAX_TILEMAP_X
+#define STBTE_MAX_TILEMAP_X      200
+#endif
+
+#ifndef STBTE_MAX_TILEMAP_Y
+#define STBTE_MAX_TILEMAP_Y      200
+#endif
+
+#ifndef STBTE_MAX_LAYERS
+#define STBTE_MAX_LAYERS         8
+#endif
+
+#ifndef STBTE_MAX_CATEGORIES
+#define STBTE_MAX_CATEGORIES     100
+#endif
+
+#ifndef STBTE_MAX_COPY
+#define STBTE_MAX_COPY           65536
+#endif
+
+#ifndef STBTE_UNDO_BUFFER_BYTES
+#define STBTE_UNDO_BUFFER_BYTES  (1 << 24) // 16 MB
+#endif
+
+#ifndef STBTE_PROP_TYPE
+#define STBTE__NO_PROPS
+#define STBTE_PROP_TYPE(n,td,tp)   0
+#endif
+
+#ifndef STBTE_PROP_NAME
+#define STBTE_PROP_NAME(n,td,tp)  ""
+#endif
+
+#ifndef STBTE_MAX_PROPERTIES
+#define STBTE_MAX_PROPERTIES           10
+#endif
+
+#ifndef STBTE_PROP_MIN
+#define STBTE_PROP_MIN(n,td,tp)  0
+#endif
+
+#ifndef STBTE_PROP_MAX
+#define STBTE_PROP_MAX(n,td,tp)  100.0
+#endif
+
+#ifndef STBTE_PROP_FLOAT_SCALE
+#define STBTE_PROP_FLOAT_SCALE(n,td,tp)  1   // default scale size
+#endif
+
+#ifndef STBTE_FLOAT_CONTROL_GRANULARITY
+#define STBTE_FLOAT_CONTROL_GRANULARITY 4
+#endif
+
+
+#define STBTE__UNDO_BUFFER_COUNT  (STBTE_UNDO_BUFFER_BYTES>>1)
+
+#if STBTE_MAX_TILEMAP_X > 4096 || STBTE_MAX_TILEMAP_Y > 4096
+#error "Maximum editable map size is 4096 x 4096"
+#endif
+#if STBTE_MAX_LAYERS > 32
+#error "Maximum layers allowed is 32"
+#endif
+#if STBTE_UNDO_BUFFER_COUNT & (STBTE_UNDO_BUFFER_COUNT-1)
+#error "Undo buffer size must be a power of 2"
+#endif
+
+#if STBTE_MAX_PROPERTIES == 0
+#define STBTE__NO_PROPS
+#endif
+
+#ifdef STBTE__NO_PROPS
+#undef STBTE_MAX_PROPERTIES
+#define STBTE_MAX_PROPERTIES 1  // so we can declare arrays
+#endif
+
+typedef struct
+{
+   short x,y;
+} stbte__link;
+
+enum
+{
+   STBTE__base,
+   STBTE__outline,
+   STBTE__text,
+
+   STBTE__num_color_aspects,
+};
+
+enum
+{
+   STBTE__idle,
+   STBTE__over,
+   STBTE__down,
+   STBTE__over_down,
+   STBTE__selected,
+   STBTE__selected_over,
+   STBTE__disabled,
+   STBTE__num_color_states,
+};
+
+enum
+{
+   STBTE__cexpander,
+   STBTE__ctoolbar,
+   STBTE__ctoolbar_button,
+   STBTE__cpanel,
+   STBTE__cpanel_sider,
+   STBTE__cpanel_sizer,
+   STBTE__cscrollbar,
+   STBTE__cmapsize,
+   STBTE__clayer_button,
+   STBTE__clayer_hide,
+   STBTE__clayer_lock,
+   STBTE__clayer_solo,
+   STBTE__ccategory_button,
+
+   STBTE__num_color_modes,
+};
+
+#ifdef STBTE__COLORPICKER
+static char *stbte__color_names[] =
+{
+   "expander", "toolbar", "tool button", "panel",
+   "panel c1", "panel c2", "scollbar", "map button",
+   "layer", "hide", "lock", "solo",
+   "category",
+};
+#endif // STBTE__COLORPICKER
+
+      // idle,    over,     down,    over&down, selected, sel&over, disabled
+static int stbte__color_table[STBTE__num_color_modes][STBTE__num_color_aspects][STBTE__num_color_states] =
+{
+   {
+      { 0x000000, 0x84987c, 0xdcdca8, 0xdcdca8, 0x40c040, 0x60d060, 0x505050, },
+      { 0xa4b090, 0xe0ec80, 0xffffc0, 0xffffc0, 0x80ff80, 0x80ff80, 0x606060, },
+      { 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x909090, },
+   }, {
+      { 0x808890, 0x606060, 0x606060, 0x606060, 0x606060, 0x606060, 0x606060, },
+      { 0x605860, 0x606060, 0x606060, 0x606060, 0x606060, 0x606060, 0x606060, },
+      { 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, },
+   }, {
+      { 0x3c5068, 0x7088a8, 0x647488, 0x94b4dc, 0x8890c4, 0x9caccc, 0x404040, },
+      { 0x889cb8, 0x889cb8, 0x889cb8, 0x889cb8, 0x84c4e8, 0xacc8ff, 0x0c0c08, },
+      { 0xbcc4cc, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x707074, },
+   }, {
+      { 0x403848, 0x403010, 0x403010, 0x403010, 0x403010, 0x403010, 0x303024, },
+      { 0x68546c, 0xc08040, 0xc08040, 0xc08040, 0xc08040, 0xc08040, 0x605030, },
+      { 0xf4e4ff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x909090, },
+   }, {
+      { 0xb4b04c, 0xacac60, 0xc0ffc0, 0xc0ffc0, 0x40c040, 0x60d060, 0x505050, },
+      { 0xa0a04c, 0xd0d04c, 0xffff80, 0xffff80, 0x80ff80, 0x80ff80, 0x606060, },
+      { 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x909090, },
+   }, {
+      { 0x40c440, 0x60d060, 0xc0ffc0, 0xc0ffc0, 0x40c040, 0x60d060, 0x505050, },
+      { 0x40c040, 0x80ff80, 0x80ff80, 0x80ff80, 0x80ff80, 0x80ff80, 0x606060, },
+      { 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x909090, },
+   }, {
+      { 0x9090ac, 0xa0a0b8, 0xbcb8cc, 0xbcb8cc, 0x909040, 0x909040, 0x909040, },
+      { 0xa0a0b8, 0xb0b4d0, 0xa0a0b8, 0xa0a0b8, 0xa0a050, 0xa0a050, 0xa0a050, },
+      { 0x808088, 0x808030, 0x808030, 0x808030, 0x808030, 0x808030, 0x808030, },
+   }, {
+      { 0x704c70, 0x885c8c, 0x9c68a4, 0xb870bc, 0xb490bc, 0xb490bc, 0x302828, },
+      { 0x646064, 0xcca8d4, 0xc060c0, 0xa07898, 0xe0b8e0, 0xe0b8e0, 0x403838, },
+      { 0xdccce4, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x909090, },
+   }, {
+      { 0x704c70, 0x885c8c, 0x9c68a4, 0xb870bc, 0xb490bc, 0xb490bc, 0x302828, },
+      { 0xb09cb4, 0xcca8d4, 0xc060c0, 0xa07898, 0xe0b8e0, 0xe0b8e0, 0x403838, },
+      { 0xdccce4, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0xffffff, 0x909090, },
+   }, {
+      { 0x646494, 0x888cb8, 0xb0b0b0, 0xb0b0cc, 0x9c9cf4, 0x8888b0, 0x50506c, },
+      { 0x9090a4, 0xb0b4d4, 0xb0b0dc, 0xb0b0cc, 0xd0d0fc, 0xd0d4f0, 0x606060, },
+      { 0xb4b4d4, 0xe4e4ff, 0xffffff, 0xffffff, 0xe0e4ff, 0xececff, 0x909090, },
+   }, {
+      { 0x646444, 0x888c64, 0xb0b0b0, 0xb0b088, 0xaca858, 0x88886c, 0x505050, },
+      { 0x88886c, 0xb0b490, 0xb0b0b0, 0xb0b088, 0xd8d898, 0xd0d4b0, 0x606060, },
+      { 0xb4b49c, 0xffffd8, 0xffffff, 0xffffd4, 0xffffdc, 0xffffcc, 0x909090, },
+   }, {
+      { 0x906464, 0xb48c8c, 0xd4b0b0, 0xdcb0b0, 0xff9c9c, 0xc88888, 0x505050, },
+      { 0xb47c80, 0xd4b4b8, 0xc4a8a8, 0xdcb0b0, 0xffc0c0, 0xfce8ec, 0x606060, },
+      { 0xe0b4b4, 0xffdcd8, 0xffd8d4, 0xffe0e4, 0xffece8, 0xffffff, 0x909090, },
+   }, {
+      { 0x403848, 0x403848, 0x403848, 0x886894, 0x7c80c8, 0x7c80c8, 0x302828, },
+      { 0x403848, 0x403848, 0x403848, 0x403848, 0x7c80c8, 0x7c80c8, 0x403838, },
+      { 0xc8c4c8, 0xffffff, 0xffffff, 0xffffff, 0xe8e8ec, 0xffffff, 0x909090, },
+   },
+};
+
+#define STBTE_COLOR_TILEMAP_BACKGROUND      0x000000
+#define STBTE_COLOR_TILEMAP_BORDER          0x203060
+#define STBTE_COLOR_TILEMAP_HIGHLIGHT       0xffffff
+#define STBTE_COLOR_GRID                    0x404040
+#define STBTE_COLOR_SELECTION_OUTLINE1      0xdfdfdf
+#define STBTE_COLOR_SELECTION_OUTLINE2      0x303030
+#define STBTE_COLOR_TILEPALETTE_OUTLINE     0xffffff
+#define STBTE_COLOR_TILEPALETTE_BACKGROUND  0x000000
+
+#ifndef STBTE_LINK_COLOR
+#define STBTE_LINK_COLOR(src,sp,dest,dp)    0x5030ff
+#endif
+
+#ifndef STBTE_LINK_COLOR_DRAWING
+#define STBTE_LINK_COLOR_DRAWING            0xff40ff
+#endif
+
+#ifndef STBTE_LINK_COLOR_DISALLOWED
+#define STBTE_LINK_COLOR_DISALLOWED         0x602060
+#endif
+
+
+// disabled, selected, down, over
+static unsigned char stbte__state_to_index[2][2][2][2] =
+{
+   {
+      { { STBTE__idle    , STBTE__over          }, { STBTE__down    , STBTE__over_down }, },
+      { { STBTE__selected, STBTE__selected_over }, { STBTE__down    , STBTE__over_down }, },
+   },{
+      { { STBTE__disabled, STBTE__disabled      }, { STBTE__disabled, STBTE__disabled  }, },
+      { { STBTE__selected, STBTE__selected_over }, { STBTE__disabled, STBTE__disabled  }, },
+   }
+};
+#define STBTE__INDEX_FOR_STATE(disable,select,down,over) stbte__state_to_index[disable][select][down][over]
+#define STBTE__INDEX_FOR_ID(id,disable,select) STBTE__INDEX_FOR_STATE(disable,select,STBTE__IS_ACTIVE(id),STBTE__IS_HOT(id))
+
+#define STBTE__FONT_HEIGHT    9
+static short stbte__font_offset[95+16];
+static short stbte__fontdata[769] =
+{
+   4,9,6,9,9,9,9,8,9,8,4,9,7,7,7,7,4,2,6,8,6,6,7,3,4,4,8,6,3,6,2,6,6,6,6,6,6,
+   6,6,6,6,6,2,3,5,4,5,6,6,6,6,6,6,6,6,6,6,6,6,7,6,7,7,7,6,7,6,6,6,6,7,7,6,6,
+   6,4,6,4,7,7,3,6,6,5,6,6,5,6,6,4,5,6,4,7,6,6,6,6,6,6,6,6,6,7,6,6,6,5,2,5,8,
+   0,0,0,0,2,253,130,456,156,8,72,184,64,2,125,66,64,160,64,146,511,146,146,
+   511,146,146,511,146,511,257,341,297,341,297,341,257,511,16,56,124,16,16,16,
+   124,56,16,96,144,270,261,262,136,80,48,224,192,160,80,40,22,14,15,3,448,496,
+   496,240,232,20,10,5,2,112,232,452,450,225,113,58,28,63,30,60,200,455,257,
+   257,0,0,0,257,257,455,120,204,132,132,159,14,4,4,14,159,132,132,204,120,8,
+   24,56,120,56,24,8,32,48,56,60,56,48,32,0,0,0,0,111,111,7,7,0,0,7,7,34,127,
+   127,34,34,127,127,34,36,46,107,107,58,18,99,51,24,12,102,99,48,122,79,93,
+   55,114,80,4,7,3,62,127,99,65,65,99,127,62,8,42,62,28,28,62,42,8,8,8,62,62,
+   8,8,128,224,96,8,8,8,8,8,8,96,96,96,48,24,12,6,3,62,127,89,77,127,62,64,66,
+   127,127,64,64,98,115,89,77,71,66,33,97,73,93,119,35,24,28,22,127,127,16,39,
+   103,69,69,125,57,62,127,73,73,121,48,1,1,113,121,15,7,54,127,73,73,127,54,
+   6,79,73,105,63,30,54,54,128,246,118,8,28,54,99,65,20,20,20,20,65,99,54,28,
+   8,2,3,105,109,7,2,30,63,33,45,47,46,124,126,19,19,126,124,127,127,73,73,127,
+   54,62,127,65,65,99,34,127,127,65,99,62,28,127,127,73,73,73,65,127,127,9,9,
+   9,1,62,127,65,73,121,121,127,127,8,8,127,127,65,65,127,127,65,65,32,96,64,
+   64,127,63,127,127,8,28,54,99,65,127,127,64,64,64,64,127,127,6,12,6,127,127,
+   127,127,6,12,24,127,127,62,127,65,65,65,127,62,127,127,9,9,15,6,62,127,65,
+   81,49,127,94,127,127,9,25,127,102,70,79,73,73,121,49,1,1,127,127,1,1,63,127,
+   64,64,127,63,15,31,48,96,48,31,15,127,127,48,24,48,127,127,99,119,28,28,119,
+   99,7,15,120,120,15,7,97,113,89,77,71,67,127,127,65,65,3,6,12,24,48,96,65,
+   65,127,127,8,12,6,3,6,12,8,64,64,64,64,64,64,64,3,7,4,32,116,84,84,124,120,
+   127,127,68,68,124,56,56,124,68,68,68,56,124,68,68,127,127,56,124,84,84,92,
+   24,8,124,126,10,10,56,380,324,324,508,252,127,127,4,4,124,120,72,122,122,
+   64,256,256,256,506,250,126,126,16,56,104,64,66,126,126,64,124,124,24,56,28,
+   124,120,124,124,4,4,124,120,56,124,68,68,124,56,508,508,68,68,124,56,56,124,
+   68,68,508,508,124,124,4,4,12,8,72,92,84,84,116,36,4,4,62,126,68,68,60,124,
+   64,64,124,124,28,60,96,96,60,28,28,124,112,56,112,124,28,68,108,56,56,108,
+   68,284,316,352,320,508,252,68,100,116,92,76,68,8,62,119,65,65,127,127,65,
+   65,119,62,8,16,24,12,12,24,24,12,4,
+};
+
+typedef struct
+{
+   short id;
+   unsigned short category_id;
+   char *category;
+   unsigned int layermask;
+} stbte__tileinfo;
+
+#define MAX_LAYERMASK    (1 << (8*sizeof(unsigned int)))
+
+typedef short stbte__tiledata;
+
+#define STBTE__NO_TILE   -1
+
+enum
+{
+   STBTE__panel_toolbar,
+   STBTE__panel_colorpick,
+   STBTE__panel_info,
+   STBTE__panel_layers,
+   STBTE__panel_props,
+   STBTE__panel_categories,
+   STBTE__panel_tiles,
+
+   STBTE__num_panel,
+};
+
+enum
+{
+   STBTE__side_left,
+   STBTE__side_right,
+   STBTE__side_top,
+   STBTE__side_bottom,
+};
+
+enum
+{
+   STBTE__tool_select,
+   STBTE__tool_brush,
+   STBTE__tool_erase,
+   STBTE__tool_rect,
+   STBTE__tool_eyedrop,
+   STBTE__tool_fill,
+   STBTE__tool_link,
+
+   STBTE__tool_showgrid,
+   STBTE__tool_showlinks,
+
+   STBTE__tool_undo,
+   STBTE__tool_redo,
+   // copy/cut/paste aren't included here because they're displayed differently
+
+   STBTE__num_tool,
+};
+
+// icons are stored in the 0-31 range of ASCII in the font
+static int toolchar[] = { 26,24,25,20,23,22,18, 19,17, 29,28, };
+
+enum
+{
+   STBTE__propmode_default,
+   STBTE__propmode_always,
+   STBTE__propmode_never,
+};
+
+enum
+{
+   STBTE__paint,
+
+   // from here down does hittesting
+   STBTE__tick,
+   STBTE__mousemove,
+   STBTE__mousewheel,
+   STBTE__leftdown,
+   STBTE__leftup,
+   STBTE__rightdown,
+   STBTE__rightup,
+};
+
+typedef struct
+{
+   int expanded, mode;
+   int delta_height;     // number of rows they've requested for this
+   int side;
+   int width,height;
+   int x0,y0;
+} stbte__panel;
+
+typedef struct
+{
+   int x0,y0,x1,y1,color;
+} stbte__colorrect;
+
+#define STBTE__MAX_DELAYRECT 256
+
+typedef struct
+{
+   int tool, active_event;
+   int active_id, hot_id, next_hot_id;
+   int event;
+   int mx,my, dx,dy;
+   int ms_time;
+   int shift, scrollkey;
+   int initted;
+   int side_extended[2];
+   stbte__colorrect delayrect[STBTE__MAX_DELAYRECT];
+   int delaycount;
+   int show_grid, show_links;
+   int brush_state; // used to decide which kind of erasing
+   int eyedrop_x, eyedrop_y, eyedrop_last_layer;
+   int pasting, paste_x, paste_y;
+   int scrolling, start_x, start_y;
+   int last_mouse_x, last_mouse_y;
+   int accum_x, accum_y;
+   int linking;
+   int dragging;
+   int drag_x, drag_y, drag_w, drag_h;
+   int drag_offx, drag_offy, drag_dest_x, drag_dest_y;
+   int undoing;
+   int has_selection, select_x0, select_y0, select_x1, select_y1;
+   int sx,sy;
+   int x0,y0,x1,y1, left_width, right_width; // configurable widths
+   float alert_timer;
+   const char *alert_msg;
+   float dt;
+   stbte__panel panel[STBTE__num_panel];
+   short copybuffer[STBTE_MAX_COPY][STBTE_MAX_LAYERS];
+   float copyprops[STBTE_MAX_COPY][STBTE_MAX_PROPERTIES];
+#ifdef STBTE_ALLOW_LINK
+   stbte__link copylinks[STBTE_MAX_COPY];
+#endif
+   int copy_src_x, copy_src_y;
+   stbte_tilemap *copy_src;
+   int copy_width,copy_height,has_copy,copy_has_props;
+} stbte__ui_t;
+
+// there's only one UI system at a time, so we can globalize this
+static stbte__ui_t stbte__ui = { STBTE__tool_brush, 0 };
+
+#define STBTE__INACTIVE()     (stbte__ui.active_id == 0)
+#define STBTE__IS_ACTIVE(id)  (stbte__ui.active_id == (id))
+#define STBTE__IS_HOT(id)     (stbte__ui.hot_id    == (id))
+
+#define STBTE__BUTTON_HEIGHT            (STBTE__FONT_HEIGHT + 2 * STBTE__BUTTON_INTERNAL_SPACING)
+#define STBTE__BUTTON_INTERNAL_SPACING  (2 + (STBTE__FONT_HEIGHT>>4))
+
+typedef struct
+{
+   const char *name;
+   int locked;
+   int hidden;
+} stbte__layer;
+
+enum
+{
+   STBTE__unlocked,
+   STBTE__protected,
+   STBTE__locked,
+};
+
+struct stbte_tilemap
+{
+    stbte__tiledata data[STBTE_MAX_TILEMAP_Y][STBTE_MAX_TILEMAP_X][STBTE_MAX_LAYERS];
+    float props[STBTE_MAX_TILEMAP_Y][STBTE_MAX_TILEMAP_X][STBTE_MAX_PROPERTIES];
+    #ifdef STBTE_ALLOW_LINK
+    stbte__link link[STBTE_MAX_TILEMAP_Y][STBTE_MAX_TILEMAP_X];
+    int linkcount[STBTE_MAX_TILEMAP_Y][STBTE_MAX_TILEMAP_X];
+    #endif
+    int max_x, max_y, num_layers;
+    int spacing_x, spacing_y;
+    int palette_spacing_x, palette_spacing_y;
+    int scroll_x,scroll_y;
+    int cur_category, cur_tile, cur_layer;
+    char *categories[STBTE_MAX_CATEGORIES];
+    int num_categories, category_scroll;
+    stbte__tileinfo *tiles;
+    int num_tiles, max_tiles, digits;
+    unsigned char undo_available_valid;
+    unsigned char undo_available;
+    unsigned char redo_available;
+    unsigned char padding;
+    int cur_palette_count;
+    int palette_scroll;
+    int tileinfo_dirty;
+    stbte__layer layerinfo[STBTE_MAX_LAYERS];
+    int has_layer_names;
+    int layername_width;
+    int layer_scroll;
+    int propmode;
+    int solo_layer;
+    int undo_pos, undo_len, redo_len;
+    short background_tile;
+    unsigned char id_in_use[32768>>3];
+    short *undo_buffer;
+};
+
+static char *default_category = (char*) "[unassigned]";
+
+static void stbte__init_gui(void)
+{
+   int i,n;
+   stbte__ui.initted = 1;
+   // init UI state
+   stbte__ui.show_links = 1;
+   for (i=0; i < STBTE__num_panel; ++i) {
+      stbte__ui.panel[i].expanded     = 1; // visible if not autohidden
+      stbte__ui.panel[i].delta_height = 0;
+      stbte__ui.panel[i].side         = STBTE__side_left;
+   }
+   stbte__ui.panel[STBTE__panel_toolbar  ].side = STBTE__side_top;
+   stbte__ui.panel[STBTE__panel_colorpick].side = STBTE__side_right;
+
+   if (stbte__ui.left_width == 0)
+      stbte__ui.left_width = 80;
+   if (stbte__ui.right_width == 0)
+      stbte__ui.right_width = 80;
+
+   // init font
+   n=95+16;
+   for (i=0; i < 95+16; ++i) {
+      stbte__font_offset[i] = n;
+      n += stbte__fontdata[i];
+   }
+}
+
+stbte_tilemap *stbte_create_map(int map_x, int map_y, int map_layers, int spacing_x, int spacing_y, int max_tiles)
+{
+   int i;
+   stbte_tilemap *tm;
+   STBTE_ASSERT(map_layers >= 0 && map_layers <= STBTE_MAX_LAYERS);
+   STBTE_ASSERT(map_x >= 0 && map_x <= STBTE_MAX_TILEMAP_X);
+   STBTE_ASSERT(map_y >= 0 && map_y <= STBTE_MAX_TILEMAP_Y);
+   if (map_x < 0 || map_y < 0 || map_layers < 0 ||
+       map_x > STBTE_MAX_TILEMAP_X || map_y > STBTE_MAX_TILEMAP_Y || map_layers > STBTE_MAX_LAYERS)
+      return NULL;
+
+   if (!stbte__ui.initted)
+      stbte__init_gui();
+
+   tm = (stbte_tilemap *) malloc(sizeof(*tm) + sizeof(*tm->tiles) * max_tiles + STBTE_UNDO_BUFFER_BYTES);
+   if (tm == NULL)
+      return NULL;
+
+   tm->tiles = (stbte__tileinfo *) (tm+1);
+   tm->undo_buffer = (short *) (tm->tiles + max_tiles);
+   tm->num_layers = map_layers;
+   tm->max_x = map_x;
+   tm->max_y = map_y;
+   tm->spacing_x = spacing_x;
+   tm->spacing_y = spacing_y;
+   tm->scroll_x = 0;
+   tm->scroll_y = 0;
+   tm->palette_scroll = 0;
+   tm->palette_spacing_x = spacing_x+1;
+   tm->palette_spacing_y = spacing_y+1;
+   tm->cur_category = -1;
+   tm->cur_tile = 0;
+   tm->solo_layer = -1;
+   tm->undo_len = 0;
+   tm->redo_len = 0;
+   tm->undo_pos = 0;
+   tm->category_scroll = 0;
+   tm->layer_scroll = 0;
+   tm->propmode = 0;
+   tm->has_layer_names = 0;
+   tm->layername_width = 0;
+   tm->undo_available_valid = 0;
+
+   for (i=0; i < tm->num_layers; ++i) {
+      tm->layerinfo[i].hidden = 0;
+      tm->layerinfo[i].locked = STBTE__unlocked;
+      tm->layerinfo[i].name   = 0;
+   }
+
+   tm->background_tile = STBTE__NO_TILE;
+   stbte_clear_map(tm);
+
+   tm->max_tiles = max_tiles;
+   tm->num_tiles = 0;
+   for (i=0; i < 32768/8; ++i)
+      tm->id_in_use[i] = 0;
+   tm->tileinfo_dirty = 1;
+   return tm;
+}
+
+void stbte_set_background_tile(stbte_tilemap *tm, short id)
+{
+   int i;
+   STBTE_ASSERT(id >= -1);
+   // STBTE_ASSERT(id < 32768);
+   if (id < -1)
+      return;
+   for (i=0; i < STBTE_MAX_TILEMAP_X * STBTE_MAX_TILEMAP_Y; ++i)
+      if (tm->data[0][i][0] == -1)
+         tm->data[0][i][0] = id;
+   tm->background_tile = id;
+}
+
+void stbte_set_spacing(stbte_tilemap *tm, int spacing_x, int spacing_y, int palette_spacing_x, int palette_spacing_y)
+{
+   tm->spacing_x = spacing_x;
+   tm->spacing_y = spacing_y;
+   tm->palette_spacing_x = palette_spacing_x;
+   tm->palette_spacing_y = palette_spacing_y;
+}
+
+void stbte_set_sidewidths(int left, int right)
+{
+   stbte__ui.left_width  = left;
+   stbte__ui.right_width = right;
+}
+
+void stbte_set_display(int x0, int y0, int x1, int y1)
+{
+   stbte__ui.x0 = x0;
+   stbte__ui.y0 = y0;
+   stbte__ui.x1 = x1;
+   stbte__ui.y1 = y1;
+}
+
+void stbte_define_tile(stbte_tilemap *tm, unsigned short id, unsigned int layermask, const char * category_c)
+{
+   char *category = (char *) category_c;
+   STBTE_ASSERT(id < 32768);
+   STBTE_ASSERT(tm->num_tiles < tm->max_tiles);
+   STBTE_ASSERT((tm->id_in_use[id>>3]&(1<<(id&7))) == 0);
+   if (id >= 32768 || tm->num_tiles >= tm->max_tiles || (tm->id_in_use[id>>3]&(1<<(id&7))))
+      return;
+
+   if (category == NULL)
+      category = (char*) default_category;
+   tm->id_in_use[id>>3] |= 1 << (id&7);
+   tm->tiles[tm->num_tiles].category    = category;
+   tm->tiles[tm->num_tiles].id        = id;
+   tm->tiles[tm->num_tiles].layermask = layermask;
+   ++tm->num_tiles;
+   tm->tileinfo_dirty = 1;
+}
+
+static int stbte__text_width(const char *str);
+
+void stbte_set_layername(stbte_tilemap *tm, int layer, const char *layername)
+{
+   STBTE_ASSERT(layer >= 0 && layer < tm->num_layers);
+   if (layer >= 0 && layer < tm->num_layers) {
+      int width;
+      tm->layerinfo[layer].name = layername;
+      tm->has_layer_names = 1;
+      width = stbte__text_width(layername);
+      tm->layername_width = (width > tm->layername_width ? width : tm->layername_width);
+   }
+}
+
+void stbte_get_dimensions(stbte_tilemap *tm, int *max_x, int *max_y)
+{
+   *max_x = tm->max_x;
+   *max_y = tm->max_y;
+}
+
+short* stbte_get_tile(stbte_tilemap *tm, int x, int y)
+{
+   STBTE_ASSERT(x >= 0 && x < tm->max_x && y >= 0 && y < tm->max_y);
+   if (x < 0 || x >= STBTE_MAX_TILEMAP_X || y < 0 || y >= STBTE_MAX_TILEMAP_Y)
+      return NULL;
+   return tm->data[y][x];
+}
+
+float *stbte_get_properties(stbte_tilemap *tm, int x, int y)
+{
+   STBTE_ASSERT(x >= 0 && x < tm->max_x && y >= 0 && y < tm->max_y);
+   if (x < 0 || x >= STBTE_MAX_TILEMAP_X || y < 0 || y >= STBTE_MAX_TILEMAP_Y)
+      return NULL;
+   return tm->props[y][x];
+}
+
+void stbte_get_link(stbte_tilemap *tm, int x, int y, int *destx, int *desty)
+{
+   int gx=-1,gy=-1;
+   STBTE_ASSERT(x >= 0 && x < tm->max_x && y >= 0 && y < tm->max_y);
+#ifdef STBTE_ALLOW_LINK
+   if (x >= 0 && x < STBTE_MAX_TILEMAP_X && y >= 0 && y < STBTE_MAX_TILEMAP_Y) {
+      gx = tm->link[y][x].x;
+      gy = tm->link[y][x].y;
+      if (gx >= 0)
+         if (!STBTE_ALLOW_LINK(tm->data[y][x], tm->props[y][x], tm->data[gy][gx], tm->props[gy][gx]))
+            gx = gy = -1;
+   }
+#endif
+   *destx = gx;
+   *desty = gy;
+}
+
+void stbte_set_property(stbte_tilemap *tm, int x, int y, int n, float val)
+{
+   tm->props[y][x][n] = val;
+}
+
+#ifdef STBTE_ALLOW_LINK
+static void stbte__set_link(stbte_tilemap *tm, int src_x, int src_y, int dest_x, int dest_y, int undo_mode);
+#endif
+
+enum
+{
+   STBTE__undo_none,
+   STBTE__undo_record,
+   STBTE__undo_block,
+};
+
+void stbte_set_link(stbte_tilemap *tm, int x, int y, int destx, int desty)
+{
+#ifdef STBTE_ALLOW_LINK
+   stbte__set_link(tm, x, y, destx, desty, STBTE__undo_none);
+#else
+   STBTE_ASSERT(0);
+#endif
+}
+
+
+// returns an array of map_layers shorts. each short is either
+// one of the tile_id values from define_tile, or STBTE_EMPTY
+
+void stbte_set_dimensions(stbte_tilemap *tm, int map_x, int map_y)
+{
+   STBTE_ASSERT(map_x >= 0 && map_x <= STBTE_MAX_TILEMAP_X);
+   STBTE_ASSERT(map_y >= 0 && map_y <= STBTE_MAX_TILEMAP_Y);
+   if (map_x < 0 || map_y < 0 || map_x > STBTE_MAX_TILEMAP_X || map_y > STBTE_MAX_TILEMAP_Y)
+      return;
+   tm->max_x = map_x;
+   tm->max_y = map_y;
+}
+
+void stbte_clear_map(stbte_tilemap *tm)
+{
+   int i,j;
+   for (i=0; i < STBTE_MAX_TILEMAP_X * STBTE_MAX_TILEMAP_Y; ++i) {
+      tm->data[0][i][0] = tm->background_tile;
+      for (j=1; j < tm->num_layers; ++j)
+         tm->data[0][i][j] = STBTE__NO_TILE;
+      for (j=0; j < STBTE_MAX_PROPERTIES; ++j)
+         tm->props[0][i][j] = 0;
+      #ifdef STBTE_ALLOW_LINK
+      tm->link[0][i].x = -1;
+      tm->link[0][i].y = -1;
+      tm->linkcount[0][i] = 0;
+      #endif
+   }
+}
+
+void stbte_set_tile(stbte_tilemap *tm, int x, int y, int layer, signed short tile)
+{
+   STBTE_ASSERT(x >= 0 && x < tm->max_x && y >= 0 && y < tm->max_y);
+   STBTE_ASSERT(layer >= 0 && layer < tm->num_layers);
+   STBTE_ASSERT(tile >= -1);
+   //STBTE_ASSERT(tile < 32768);
+   if (x < 0 || x >= STBTE_MAX_TILEMAP_X || y < 0 || y >= STBTE_MAX_TILEMAP_Y)
+      return;
+   if (layer < 0 || layer >= tm->num_layers || tile < -1)
+      return;
+   tm->data[y][x][layer] = tile;
+}
+
+static void stbte__choose_category(stbte_tilemap *tm, int category)
+{
+   int i,n=0;
+   tm->cur_category = category;
+   for (i=0; i < tm->num_tiles; ++i)
+      if (tm->tiles[i].category_id == category || category == -1)
+         ++n;
+   tm->cur_palette_count = n;
+   tm->palette_scroll = 0;
+}
+
+static int stbte__strequal(char *p, char *q)
+{
+   while (*p)
+      if (*p++ != *q++) return 0;
+   return *q == 0;
+}
+
+static void stbte__compute_tileinfo(stbte_tilemap *tm)
+{
+   int i,j;
+
+   tm->num_categories=0;
+
+   for (i=0; i < tm->num_tiles; ++i) {
+      stbte__tileinfo *t = &tm->tiles[i];
+      // find category
+      for (j=0; j < tm->num_categories; ++j)
+         if (stbte__strequal(t->category, tm->categories[j]))
+            goto found;
+      tm->categories[j] = t->category;
+      ++tm->num_categories;
+     found:
+      t->category_id = (unsigned short) j;
+   }
+
+   // currently number of categories can never decrease because you
+   // can't remove tile definitions, but let's get it right anyway
+   if (tm->cur_category > tm->num_categories) {
+      tm->cur_category = -1;
+   }
+
+   stbte__choose_category(tm, tm->cur_category);
+
+   tm->tileinfo_dirty = 0;
+}
+
+static void stbte__prepare_tileinfo(stbte_tilemap *tm)
+{
+   if (tm->tileinfo_dirty)
+      stbte__compute_tileinfo(tm);
+}
+
+
+/////////////////////// undo system ////////////////////////
+
+// the undo system works by storing "commands" into a buffer, and
+// then playing back those commands. undo and redo have to store
+// the commands in different order.
+//
+// the commands are:
+//
+// 1)  end_of_undo_record
+//       -1:short
+//
+// 2)  end_of_redo_record
+//       -2:short
+//
+// 3)  tile update
+//       tile_id:short (-1..32767)
+//       x_coord:short
+//       y_coord:short
+//       layer:short (0..31)
+//
+// 4)  property update (also used for links)
+//       value_hi:short
+//       value_lo:short
+//       y_coord:short
+//       x_coord:short
+//       property:short (256+prop#)
+//
+// Since we use a circular buffer, we might overwrite the undo storage.
+// To detect this, before playing back commands we scan back and see
+// if we see an end_of_undo_record before hitting the relevant boundary,
+// it's wholly contained.
+//
+// When we read back through, we see them in reverse order, so
+// we'll see the layer number or property number first
+//
+// To be clearer about the circular buffer, there are two cases:
+//     1. a single record is larger than the whole buffer.
+//        this is caught because the end_of_undo_record will
+//        get overwritten.
+//     2. multiple records written are larger than the whole
+//        buffer, so some of them have been overwritten by
+//        the later ones. this is handled by explicitly tracking
+//        the undo length; we never try to parse the data that
+//        got overwritten
+
+// given two points, compute the length between them
+#define stbte__wrap(pos)            ((pos) & (STBTE__UNDO_BUFFER_COUNT-1))
+
+#define STBTE__undo_record  -2
+#define STBTE__redo_record  -3
+#define STBTE__undo_junk    -4  // this is written underneath the undo pointer, never used
+
+static void stbte__write_undo(stbte_tilemap *tm, short value)
+{
+   int pos = tm->undo_pos;
+   tm->undo_buffer[pos] = value;
+   tm->undo_pos = stbte__wrap(pos+1);
+   tm->undo_len += (tm->undo_len < STBTE__UNDO_BUFFER_COUNT-2);
+   tm->redo_len -= (tm->redo_len > 0);
+   tm->undo_available_valid = 0;
+}
+
+static void stbte__write_redo(stbte_tilemap *tm, short value)
+{
+   int pos = tm->undo_pos;
+   tm->undo_buffer[pos] = value;
+   tm->undo_pos = stbte__wrap(pos-1);
+   tm->redo_len += (tm->redo_len < STBTE__UNDO_BUFFER_COUNT-2);
+   tm->undo_len -= (tm->undo_len > 0);
+   tm->undo_available_valid = 0;
+}
+
+static void stbte__begin_undo(stbte_tilemap *tm)
+{
+   tm->redo_len = 0;
+   stbte__write_undo(tm, STBTE__undo_record);
+   stbte__ui.undoing = 1;
+   stbte__ui.alert_msg = 0; // clear alert if they start doing something
+}
+
+static void stbte__end_undo(stbte_tilemap *tm)
+{
+   if (stbte__ui.undoing) {
+      // check if anything got written
+      int pos = stbte__wrap(tm->undo_pos-1);
+      if (tm->undo_buffer[pos] == STBTE__undo_record) {
+         // empty undo record, move back
+         tm->undo_pos = pos;
+         STBTE_ASSERT(tm->undo_len > 0);
+         tm->undo_len -= 1;
+      }
+      tm->undo_buffer[tm->undo_pos] = STBTE__undo_junk;
+      // otherwise do nothing
+
+      stbte__ui.undoing = 0;
+   }
+}
+
+static void stbte__undo_record(stbte_tilemap *tm, int x, int y, int i, int v)
+{
+   STBTE_ASSERT(stbte__ui.undoing);
+   if (stbte__ui.undoing) {
+      stbte__write_undo(tm, v);
+      stbte__write_undo(tm, x);
+      stbte__write_undo(tm, y);
+      stbte__write_undo(tm, i);
+   }
+}
+
+static void stbte__redo_record(stbte_tilemap *tm, int x, int y, int i, int v)
+{
+   stbte__write_redo(tm, v);
+   stbte__write_redo(tm, x);
+   stbte__write_redo(tm, y);
+   stbte__write_redo(tm, i);
+}
+
+static float stbte__extract_float(short s0, short s1)
+{
+   union { float f; short s[2]; } converter;
+   converter.s[0] = s0;
+   converter.s[1] = s1;
+   return converter.f;
+}
+
+static short stbte__extract_short(float f, int slot)
+{
+   union { float f; short s[2]; } converter;
+   converter.f = f;
+   return converter.s[slot];
+}
+
+static void stbte__undo_record_prop(stbte_tilemap *tm, int x, int y, int i, short s0, short s1)
+{
+   STBTE_ASSERT(stbte__ui.undoing);
+   if (stbte__ui.undoing) {
+      stbte__write_undo(tm, s1);
+      stbte__write_undo(tm, s0);
+      stbte__write_undo(tm, x);
+      stbte__write_undo(tm, y);
+      stbte__write_undo(tm, 256+i);
+   }
+}
+
+static void stbte__undo_record_prop_float(stbte_tilemap *tm, int x, int y, int i, float f)
+{
+   stbte__undo_record_prop(tm, x,y,i, stbte__extract_short(f,0), stbte__extract_short(f,1));
+}
+
+static void stbte__redo_record_prop(stbte_tilemap *tm, int x, int y, int i, short s0, short s1)
+{
+   stbte__write_redo(tm, s1);
+   stbte__write_redo(tm, s0);
+   stbte__write_redo(tm, x);
+   stbte__write_redo(tm, y);
+   stbte__write_redo(tm, 256+i);
+}
+
+
+static int stbte__undo_find_end(stbte_tilemap *tm)
+{
+   // first scan through for the end record
+   int i, pos = stbte__wrap(tm->undo_pos-1);
+   for (i=0; i < tm->undo_len;) {
+      STBTE_ASSERT(tm->undo_buffer[pos] != STBTE__undo_junk);
+      if (tm->undo_buffer[pos] == STBTE__undo_record)
+         break;
+      if (tm->undo_buffer[pos] >= 255)
+         pos = stbte__wrap(pos-5), i += 5;
+      else
+         pos = stbte__wrap(pos-4), i += 4;
+   }
+   if (i >= tm->undo_len)
+      return -1;
+   return pos;
+}
+
+static void stbte__undo(stbte_tilemap *tm)
+{
+   int i, pos, endpos;
+   endpos = stbte__undo_find_end(tm);
+   if (endpos < 0)
+      return;
+
+   // we found a complete undo record
+   pos = stbte__wrap(tm->undo_pos-1);
+
+   // start a redo record
+   stbte__write_redo(tm, STBTE__redo_record);
+
+   // so now go back through undo and apply in reverse
+   // order, and copy it to redo
+   for (i=0; endpos != pos; i += 4) {
+      int x,y,n,v;
+      // get the undo entry
+      n = tm->undo_buffer[pos];
+      y = tm->undo_buffer[stbte__wrap(pos-1)];
+      x = tm->undo_buffer[stbte__wrap(pos-2)];
+      v = tm->undo_buffer[stbte__wrap(pos-3)];
+      if (n >= 255) {
+         short s0=0,s1=0;
+         int v2 = tm->undo_buffer[stbte__wrap(pos-4)];
+         pos = stbte__wrap(pos-5);
+         if (n > 255) {
+            float vf = stbte__extract_float(v, v2);
+            s0 = stbte__extract_short(tm->props[y][x][n-256], 0);
+            s1 = stbte__extract_short(tm->props[y][x][n-256], 1);
+            tm->props[y][x][n-256] = vf;
+         } else {
+#ifdef STBTE_ALLOW_LINK
+            s0 = tm->link[y][x].x;
+            s1 = tm->link[y][x].y;
+            stbte__set_link(tm, x,y, v, v2, STBTE__undo_none);
+#endif
+         }
+         // write the redo entry
+         stbte__redo_record_prop(tm, x, y, n-256, s0,s1);
+         // apply the undo entry
+      } else {
+         pos = stbte__wrap(pos-4);
+         // write the redo entry
+         stbte__redo_record(tm, x, y, n, tm->data[y][x][n]);
+         // apply the undo entry
+         tm->data[y][x][n] = (short) v;
+      }
+   }
+   // overwrite undo record with junk
+   tm->undo_buffer[tm->undo_pos] = STBTE__undo_junk;
+}
+
+static int stbte__redo_find_end(stbte_tilemap *tm)
+{
+   // first scan through for the end record
+   int i, pos = stbte__wrap(tm->undo_pos+1);
+   for (i=0; i < tm->redo_len;) {
+      STBTE_ASSERT(tm->undo_buffer[pos] != STBTE__undo_junk);
+      if (tm->undo_buffer[pos] == STBTE__redo_record)
+         break;
+      if (tm->undo_buffer[pos] >= 255)
+         pos = stbte__wrap(pos+5), i += 5;
+      else
+         pos = stbte__wrap(pos+4), i += 4;
+   }
+   if (i >= tm->redo_len)
+      return -1; // this should only ever happen if redo buffer is empty
+   return pos;
+}
+
+static void stbte__redo(stbte_tilemap *tm)
+{
+   // first scan through for the end record
+   int i, pos, endpos;
+   endpos = stbte__redo_find_end(tm);
+   if (endpos < 0)
+      return;
+
+   // we found a complete redo record
+   pos = stbte__wrap(tm->undo_pos+1);
+
+   // start an undo record
+   stbte__write_undo(tm, STBTE__undo_record);
+
+   for (i=0; pos != endpos; i += 4) {
+      int x,y,n,v;
+      n = tm->undo_buffer[pos];
+      y = tm->undo_buffer[stbte__wrap(pos+1)];
+      x = tm->undo_buffer[stbte__wrap(pos+2)];
+      v = tm->undo_buffer[stbte__wrap(pos+3)];
+      if (n >= 255) {
+         int v2 = tm->undo_buffer[stbte__wrap(pos+4)];
+         short s0=0,s1=0;
+         pos = stbte__wrap(pos+5);
+         if (n > 255) {
+            float vf = stbte__extract_float(v, v2);
+            s0 = stbte__extract_short(tm->props[y][x][n-256],0);
+            s1 = stbte__extract_short(tm->props[y][x][n-256],1);
+            tm->props[y][x][n-256] = vf;
+         } else {
+#ifdef STBTE_ALLOW_LINK
+            s0 = tm->link[y][x].x;
+            s1 = tm->link[y][x].y;
+            stbte__set_link(tm, x,y,v,v2, STBTE__undo_none);
+#endif
+         }
+         // don't use stbte__undo_record_prop because it's guarded
+         stbte__write_undo(tm, s1);
+         stbte__write_undo(tm, s0);
+         stbte__write_undo(tm, x);
+         stbte__write_undo(tm, y);
+         stbte__write_undo(tm, n);
+      } else {
+         pos = stbte__wrap(pos+4);
+         // don't use stbte__undo_record because it's guarded
+         stbte__write_undo(tm, tm->data[y][x][n]);
+         stbte__write_undo(tm, x);
+         stbte__write_undo(tm, y);
+         stbte__write_undo(tm, n);
+         tm->data[y][x][n] = (short) v;
+      }
+   }
+   tm->undo_buffer[tm->undo_pos] = STBTE__undo_junk;
+}
+
+// because detecting that undo is available
+static void stbte__recompute_undo_available(stbte_tilemap *tm)
+{
+   tm->undo_available = (stbte__undo_find_end(tm) >= 0);
+   tm->redo_available = (stbte__redo_find_end(tm) >= 0);
+}
+
+static int stbte__undo_available(stbte_tilemap *tm)
+{
+   if (!tm->undo_available_valid)
+      stbte__recompute_undo_available(tm);
+   return tm->undo_available;
+}
+
+static int stbte__redo_available(stbte_tilemap *tm)
+{
+   if (!tm->undo_available_valid)
+      stbte__recompute_undo_available(tm);
+   return tm->redo_available;
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////////
+
+#ifdef STBTE_ALLOW_LINK
+static void stbte__set_link(stbte_tilemap *tm, int src_x, int src_y, int dest_x, int dest_y, int undo_mode)
+{
+   stbte__link *a;
+   STBTE_ASSERT(src_x >= 0 && src_x < STBTE_MAX_TILEMAP_X && src_y >= 0 && src_y < STBTE_MAX_TILEMAP_Y);
+   a = &tm->link[src_y][src_x];
+   // check if it's a do nothing
+   if (a->x == dest_x && a->y == dest_y)
+      return;
+   if (undo_mode != STBTE__undo_none ) {
+      if (undo_mode == STBTE__undo_block) stbte__begin_undo(tm);
+      stbte__undo_record_prop(tm, src_x, src_y, -1, a->x, a->y);
+      if (undo_mode == STBTE__undo_block) stbte__end_undo(tm);
+   }
+   // check if there's an existing link
+   if (a->x >= 0) {
+      // decrement existing link refcount
+      STBTE_ASSERT(tm->linkcount[a->y][a->x] > 0);
+      --tm->linkcount[a->y][a->x];
+   }
+   // increment new dest
+   if (dest_x >= 0) {
+      ++tm->linkcount[dest_y][dest_x];
+   }
+   a->x = dest_x;
+   a->y = dest_y;
+}
+#endif
+
+
+static void stbte__draw_rect(int x0, int y0, int x1, int y1, unsigned int color)
+{
+   STBTE_DRAW_RECT(x0,y0,x1,y1, color);
+}
+
+#ifdef STBTE_ALLOW_LINK
+static void stbte__draw_line(int x0, int y0, int x1, int y1, unsigned int color)
+{
+   int temp;
+   if (x1 < x0) temp=x0,x0=x1,x1=temp;
+   if (y1 < y0) temp=y0,y0=y1,y1=temp;
+   stbte__draw_rect(x0,y0,x1+1,y1+1,color);
+}
+
+static void stbte__draw_link(int x0, int y0, int x1, int y1, unsigned int color)
+{
+   stbte__draw_line(x0,y0,x0,y1, color);
+   stbte__draw_line(x0,y1,x1,y1, color);
+}
+#endif
+
+static void stbte__draw_frame(int x0, int y0, int x1, int y1, unsigned int color)
+{
+   stbte__draw_rect(x0,y0,x1-1,y0+1,color);
+   stbte__draw_rect(x1-1,y0,x1,y1-1,color);
+   stbte__draw_rect(x0+1,y1-1,x1,y1,color);
+   stbte__draw_rect(x0,y0+1,x0+1,y1,color);
+}
+
+static int stbte__get_char_width(int ch)
+{
+   return stbte__fontdata[ch-16];
+}
+
+static short *stbte__get_char_bitmap(int ch)
+{
+   return stbte__fontdata + stbte__font_offset[ch-16];
+}
+
+static void stbte__draw_bitmask_as_columns(int x, int y, short bitmask, int color)
+{
+   int start_i = -1, i=0;
+   while (bitmask) {
+      if (bitmask & (1<<i)) {
+         if (start_i < 0)
+            start_i = i;
+      } else if (start_i >= 0) {
+         stbte__draw_rect(x, y+start_i, x+1, y+i, color);
+         start_i = -1;
+         bitmask &= ~((1<<i)-1); // clear all the old bits; we don't clear them as we go to save code
+      }
+      ++i;
+   }
+}
+
+static void stbte__draw_bitmap(int x, int y, int w, short *bitmap, int color)
+{
+   int i;
+   for (i=0; i < w; ++i)
+      stbte__draw_bitmask_as_columns(x+i, y, *bitmap++, color);
+}
+
+static void stbte__draw_text_core(int x, int y, const char *str, int w, int color, int digitspace)
+{
+   int x_end = x+w;
+   while (*str) {
+      int c = *str++;
+      int cw = stbte__get_char_width(c);
+      if (x + cw > x_end)
+         break;
+      stbte__draw_bitmap(x, y, cw, stbte__get_char_bitmap(c), color);
+      if (digitspace && c == ' ')
+         cw = stbte__get_char_width('0');
+      x += cw+1;
+   }
+}
+
+static void stbte__draw_text(int x, int y, const char *str, int w, int color)
+{
+   stbte__draw_text_core(x,y,str,w,color,0);
+}
+
+static int stbte__text_width(const char *str)
+{
+   int x = 0;
+   while (*str) {
+      int c = *str++;
+      int cw = stbte__get_char_width(c);
+      x += cw+1;
+   }
+   return x;
+}
+
+static void stbte__draw_frame_delayed(int x0, int y0, int x1, int y1, int color)
+{
+   if (stbte__ui.delaycount < STBTE__MAX_DELAYRECT) {
+      stbte__colorrect r = { x0,y0,x1,y1,color };
+      stbte__ui.delayrect[stbte__ui.delaycount++] = r;
+   }
+}
+
+static void stbte__flush_delay(void)
+{
+   stbte__colorrect *r;
+   int i;
+   r = stbte__ui.delayrect;
+   for (i=0; i < stbte__ui.delaycount; ++i,++r)
+      stbte__draw_frame(r->x0,r->y0,r->x1,r->y1,r->color);
+   stbte__ui.delaycount = 0;
+}
+
+static void stbte__activate(int id)
+{
+   stbte__ui.active_id = id;
+   stbte__ui.active_event = stbte__ui.event;
+   stbte__ui.accum_x = 0;
+   stbte__ui.accum_y = 0;
+}
+
+static int stbte__hittest(int x0, int y0, int x1, int y1, int id)
+{
+   int over =    stbte__ui.mx >= x0 && stbte__ui.my >= y0
+              && stbte__ui.mx <  x1 && stbte__ui.my <  y1;
+
+   if (over && stbte__ui.event >= STBTE__tick)
+      stbte__ui.next_hot_id = id;
+
+   return over;
+}
+
+static int stbte__button_core(int id)
+{
+   switch (stbte__ui.event) {
+      case STBTE__leftdown:
+         if (stbte__ui.hot_id == id && STBTE__INACTIVE())
+            stbte__activate(id);
+         break;
+      case STBTE__leftup:
+         if (stbte__ui.active_id == id && STBTE__IS_HOT(id)) {
+            stbte__activate(0);
+            return 1;
+         }
+         break;
+      case STBTE__rightdown:
+         if (stbte__ui.hot_id == id && STBTE__INACTIVE())
+            stbte__activate(id);
+         break;
+      case STBTE__rightup:
+         if (stbte__ui.active_id == id && STBTE__IS_HOT(id)) {
+            stbte__activate(0);
+            return -1;
+         }
+         break;
+   }
+   return 0;
+}
+
+static void stbte__draw_box(int x0, int y0, int x1, int y1, int colormode, int colorindex)
+{
+   stbte__draw_rect (x0,y0,x1,y1, stbte__color_table[colormode][STBTE__base   ][colorindex]);
+   stbte__draw_frame(x0,y0,x1,y1, stbte__color_table[colormode][STBTE__outline][colorindex]);
+}
+
+static void stbte__draw_textbox(int x0, int y0, int x1, int y1, char *text, int xoff, int yoff, int colormode, int colorindex)
+{
+   stbte__draw_box(x0,y0,x1,y1,colormode,colorindex);
+   stbte__draw_text(x0+xoff,y0+yoff, text, x1-x0-xoff-1, stbte__color_table[colormode][STBTE__text][colorindex]);
+}
+
+static int stbte__button(int colormode, const char *label, int x, int y, int textoff, int width, int id, int toggled, int disabled)
+{
+   int x0=x,y0=y, x1=x+width,y1=y+STBTE__BUTTON_HEIGHT;
+   int s = STBTE__BUTTON_INTERNAL_SPACING;
+
+   if(!disabled) stbte__hittest(x0,y0,x1,y1,id);
+
+   if (stbte__ui.event == STBTE__paint)
+      stbte__draw_textbox(x0,y0,x1,y1, (char*) label,s+textoff,s, colormode, STBTE__INDEX_FOR_ID(id,disabled,toggled));
+   if (disabled)
+      return 0;
+   return (stbte__button_core(id) == 1);
+}
+
+static int stbte__button_icon(int colormode, char ch, int x, int y, int width, int id, int toggled, int disabled)
+{
+   int x0=x,y0=y, x1=x+width,y1=y+STBTE__BUTTON_HEIGHT;
+   int s = STBTE__BUTTON_INTERNAL_SPACING;
+
+   stbte__hittest(x0,y0,x1,y1,id);
+
+   if (stbte__ui.event == STBTE__paint) {
+      char label[2] = { ch, 0 };
+      int pad = (9 - stbte__get_char_width(ch))/2;
+      stbte__draw_textbox(x0,y0,x1,y1, label,s+pad,s, colormode, STBTE__INDEX_FOR_ID(id,disabled,toggled));
+   }
+   if (disabled)
+      return 0;
+   return (stbte__button_core(id) == 1);
+}
+
+static int stbte__minibutton(int colormode, int x, int y, int ch, int id)
+{
+   int x0 = x, y0 = y, x1 = x+8, y1 = y+7;
+   stbte__hittest(x0,y0,x1,y1,id);
+   if (stbte__ui.event == STBTE__paint) {
+      char str[2] = { (char)ch, 0 };
+      stbte__draw_textbox(x0,y0,x1,y1, str,1,0,colormode, STBTE__INDEX_FOR_ID(id,0,0));
+   }
+   return stbte__button_core(id);
+}
+
+static int stbte__layerbutton(int x, int y, int ch, int id, int toggled, int disabled, int colormode)
+{
+   int x0 = x, y0 = y, x1 = x+10, y1 = y+11;
+   if(!disabled) stbte__hittest(x0,y0,x1,y1,id);
+   if (stbte__ui.event == STBTE__paint) {
+      char str[2] = { (char)ch, 0 };
+      int off = (9-stbte__get_char_width(ch))/2;
+      stbte__draw_textbox(x0,y0,x1,y1, str, off+1,2, colormode, STBTE__INDEX_FOR_ID(id,disabled,toggled));
+   }
+   if (disabled)
+      return 0;
+   return stbte__button_core(id);
+}
+
+static int stbte__microbutton(int x, int y, int size, int id, int colormode)
+{
+   int x0 = x, y0 = y, x1 = x+size, y1 = y+size;
+   stbte__hittest(x0,y0,x1,y1,id);
+   if (stbte__ui.event == STBTE__paint) {
+      stbte__draw_box(x0,y0,x1,y1, colormode, STBTE__INDEX_FOR_ID(id,0,0));
+   }
+   return stbte__button_core(id);
+}
+
+static int stbte__microbutton_dragger(int x, int y, int size, int id, int *pos)
+{
+   int x0 = x, y0 = y, x1 = x+size, y1 = y+size;
+   stbte__hittest(x0,y0,x1,y1,id);
+   switch (stbte__ui.event) {
+      case STBTE__paint:
+         stbte__draw_box(x0,y0,x1,y1, STBTE__cexpander, STBTE__INDEX_FOR_ID(id,0,0));
+         break;
+      case STBTE__leftdown:
+         if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+            stbte__activate(id);
+            stbte__ui.sx = stbte__ui.mx - *pos;
+         }
+         break;
+      case STBTE__mousemove:
+         if (STBTE__IS_ACTIVE(id) && stbte__ui.active_event == STBTE__leftdown) {
+            *pos = stbte__ui.mx - stbte__ui.sx;
+         }
+         break;
+      case STBTE__leftup:
+         if (STBTE__IS_ACTIVE(id))
+            stbte__activate(0);
+         break;
+      default:
+         return stbte__button_core(id);
+   }
+   return 0;
+}
+
+static int stbte__category_button(const char *label, int x, int y, int width, int id, int toggled)
+{
+   int x0=x,y0=y, x1=x+width,y1=y+STBTE__BUTTON_HEIGHT;
+   int s = STBTE__BUTTON_INTERNAL_SPACING;
+
+   stbte__hittest(x0,y0,x1,y1,id);
+
+   if (stbte__ui.event == STBTE__paint)
+      stbte__draw_textbox(x0,y0,x1,y1, (char*) label, s,s, STBTE__ccategory_button, STBTE__INDEX_FOR_ID(id,0,toggled));
+
+   return (stbte__button_core(id) == 1);
+}
+
+enum
+{
+   STBTE__none,
+   STBTE__begin,
+   STBTE__end,
+   STBTE__change,
+};
+
+// returns -1 if value changes, 1 at end of drag
+static int stbte__slider(int x0, int w, int y, int range, int *value, int id)
+{
+   int x1 = x0+w;
+   int pos = *value * w / (range+1);
+   stbte__hittest(x0,y-2,x1,y+3,id);
+   int event_mouse_move = STBTE__change;
+   switch (stbte__ui.event) {
+      case STBTE__paint:
+         stbte__draw_rect(x0,y,x1,y+1, 0x808080);
+         stbte__draw_rect(x0+pos-1,y-1,x0+pos+2,y+2, 0xffffff);
+         break;
+      case STBTE__leftdown:
+         if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+            stbte__activate(id);
+            event_mouse_move = STBTE__begin;
+         }
+         // fall through
+      case STBTE__mousemove:
+         if (STBTE__IS_ACTIVE(id)) {
+            int v = (stbte__ui.mx-x0)*(range+1)/w;
+            if (v < 0) v = 0; else if (v > range) v = range;
+            *value = v;
+            return event_mouse_move;
+         }
+         break;
+      case STBTE__leftup:
+         if (STBTE__IS_ACTIVE(id)) {
+            stbte__activate(0);
+            return STBTE__end;
+         }
+         break;
+   }
+   return STBTE__none;
+}
+
+#if defined(_WIN32) && defined(__STDC_WANT_SECURE_LIB__)
+   #define stbte__sprintf      sprintf_s
+   #define stbte__sizeof(s)    , sizeof(s)
+#else
+   #define stbte__sprintf      sprintf
+   #define stbte__sizeof(s)
+#endif
+
+static int stbte__float_control(int x0, int y0, int w, float minv, float maxv, float scale, const char *fmt, float *value, int colormode, int id)
+{
+   int x1 = x0+w;
+   int y1 = y0+11;
+   stbte__hittest(x0,y0,x1,y1,id);
+   switch (stbte__ui.event) {
+      case STBTE__paint: {
+         char text[32];
+         stbte__sprintf(text stbte__sizeof(text), fmt ? fmt : "%6.2f", *value);
+         stbte__draw_textbox(x0,y0,x1,y1, text, 1,2, colormode, STBTE__INDEX_FOR_ID(id,0,0));
+         break;
+      }
+      case STBTE__leftdown:
+      case STBTE__rightdown:
+         if (STBTE__IS_HOT(id) && STBTE__INACTIVE())
+            stbte__activate(id);
+         return STBTE__begin;
+         break;
+      case STBTE__leftup:
+      case STBTE__rightup:
+         if (STBTE__IS_ACTIVE(id)) {
+            stbte__activate(0);
+            return STBTE__end;
+         }
+         break;
+      case STBTE__mousemove:
+         if (STBTE__IS_ACTIVE(id)) {
+            float v = *value, delta;
+            int ax = stbte__ui.accum_x/STBTE_FLOAT_CONTROL_GRANULARITY;
+            int ay = stbte__ui.accum_y/STBTE_FLOAT_CONTROL_GRANULARITY;
+            stbte__ui.accum_x -= ax*STBTE_FLOAT_CONTROL_GRANULARITY;
+            stbte__ui.accum_y -= ay*STBTE_FLOAT_CONTROL_GRANULARITY;
+            if (stbte__ui.shift) {
+               if (stbte__ui.active_event == STBTE__leftdown)
+                  delta = ax * 16.0f + ay;
+               else
+                  delta = ax / 16.0f + ay / 256.0f;
+            } else {
+               if (stbte__ui.active_event == STBTE__leftdown)
+                  delta = ax*10.0f + ay;
+               else
+                  delta = ax * 0.1f + ay * 0.01f;
+            }
+            v += delta * scale;
+            if (v < minv) v = minv;
+            if (v > maxv) v = maxv;
+            *value = v;
+            return STBTE__change;
+         }
+         break;
+   }
+   return STBTE__none;
+}
+
+static void stbte__scrollbar(int x, int y0, int y1, int *val, int v0, int v1, int num_vis, int id)
+{
+   int thumbpos;
+   if (v1 - v0 <= num_vis)
+      return;
+
+   // generate thumbpos from numvis
+   thumbpos = y0+2 + (y1-y0-4) * *val / (v1 - v0 - num_vis);
+   if (thumbpos < y0) thumbpos = y0;
+   if (thumbpos >= y1) thumbpos = y1;
+   stbte__hittest(x-1,y0,x+2,y1,id);
+   switch (stbte__ui.event) {
+      case STBTE__paint:
+         stbte__draw_rect(x,y0,x+1,y1, stbte__color_table[STBTE__cscrollbar][STBTE__text][STBTE__idle]);
+         stbte__draw_box(x-1,thumbpos-3,x+2,thumbpos+4, STBTE__cscrollbar, STBTE__INDEX_FOR_ID(id,0,0));
+         break;
+      case STBTE__leftdown:
+         if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+            // check if it's over the thumb
+            stbte__activate(id);
+            *val = ((stbte__ui.my-y0) * (v1 - v0 - num_vis) + (y1-y0)/2)/ (y1-y0);
+         }
+         break;
+      case STBTE__mousemove:
+         if (STBTE__IS_ACTIVE(id) && stbte__ui.mx >= x-15 && stbte__ui.mx <= x+15)
+            *val = ((stbte__ui.my-y0) * (v1 - v0 - num_vis) + (y1-y0)/2)/ (y1-y0);
+         break;
+      case STBTE__leftup:
+         if (STBTE__IS_ACTIVE(id))
+            stbte__activate(0);
+         break;
+
+   }
+
+   if (*val >= v1-num_vis)
+      *val = v1-num_vis;
+   if (*val <= v0)
+      *val = v0;
+}
+
+
+static void stbte__compute_digits(stbte_tilemap *tm)
+{
+   if (tm->max_x >= 1000 || tm->max_y >= 1000)
+      tm->digits = 4;
+   else if (tm->max_x >= 100 || tm->max_y >= 100)
+      tm->digits = 3;
+   else
+      tm->digits = 2;
+}
+
+static int stbte__is_single_selection(void)
+{
+   return stbte__ui.has_selection
+       && stbte__ui.select_x0 == stbte__ui.select_x1
+       && stbte__ui.select_y0 == stbte__ui.select_y1;
+}
+
+typedef struct
+{
+   int width, height;
+   int x,y;
+   int active;
+   float retracted;
+} stbte__region_t;
+
+static stbte__region_t stbte__region[4];
+
+#define STBTE__TOOLBAR_ICON_SIZE   (9+2*2)
+#define STBTE__TOOLBAR_PASTE_SIZE  (34+2*2)
+
+// This routine computes where every panel goes onscreen: computes
+// a minimum width for each side based on which panels are on that
+// side, and accounts for width-dependent layout of certain panels.
+static void stbte__compute_panel_locations(stbte_tilemap *tm)
+{
+   int i, limit, w, k;
+   int window_width  = stbte__ui.x1 - stbte__ui.x0;
+   int window_height = stbte__ui.y1 - stbte__ui.y0;
+   int min_width[STBTE__num_panel]={0,0,0,0,0,0,0};
+   int height[STBTE__num_panel]={0,0,0,0,0,0,0};
+   int panel_active[STBTE__num_panel]={1,0,1,1,1,1,1};
+   int vpos[4] = { 0,0,0,0 };
+   stbte__panel *p = stbte__ui.panel;
+   stbte__panel *pt = &p[STBTE__panel_toolbar];
+#ifdef STBTE__NO_PROPS
+   int props = 0;
+#else
+   int props = 1;
+#endif
+
+   for (i=0; i < 4; ++i) {
+      stbte__region[i].active = 0;
+      stbte__region[i].width = 0;
+      stbte__region[i].height = 0;
+   }
+
+   // compute number of digits needs for info panel
+   stbte__compute_digits(tm);
+
+   // determine which panels are active
+   panel_active[STBTE__panel_categories] = tm->num_categories != 0;
+   panel_active[STBTE__panel_layers    ] = tm->num_layers     >  1;
+#ifdef STBTE__COLORPICKER
+   panel_active[STBTE__panel_colorpick ] = 1;
+#endif
+
+   panel_active[STBTE__panel_props     ] = props && stbte__is_single_selection();
+
+   // compute minimum widths for each panel (assuming they're on sides not top)
+   min_width[STBTE__panel_info      ] = 8 + 11 + 7*tm->digits+17+7;               // estimate min width of "w:0000"
+   min_width[STBTE__panel_colorpick ] = 120;
+   min_width[STBTE__panel_tiles     ] = 4 + tm->palette_spacing_x + 5;            // 5 for scrollbar
+   min_width[STBTE__panel_categories] = 4 + 42 + 5;                               // 42 is enough to show ~7 chars; 5 for scrollbar
+   min_width[STBTE__panel_layers    ] = 4 + 54 + 30*tm->has_layer_names;          // 2 digits plus 3 buttons plus scrollbar
+   min_width[STBTE__panel_toolbar   ] = 4 + STBTE__TOOLBAR_PASTE_SIZE;            // wide enough for 'Paste' button
+   min_width[STBTE__panel_props     ] = 80;                    // narrowest info panel
+
+   // compute minimum widths for left & right panels based on the above
+   stbte__region[0].width = stbte__ui.left_width;
+   stbte__region[1].width = stbte__ui.right_width;
+
+   for (i=0; i < STBTE__num_panel; ++i) {
+      if (panel_active[i]) {
+         int side = stbte__ui.panel[i].side;
+         if (min_width[i] > stbte__region[side].width)
+            stbte__region[side].width = min_width[i];
+         stbte__region[side].active = 1;
+      }
+   }
+
+   // now compute the heights of each panel
+
+   // if toolbar at top, compute its size & push the left and right start points down
+   if (stbte__region[STBTE__side_top].active) {
+      int height = STBTE__TOOLBAR_ICON_SIZE+2;
+      pt->x0     = stbte__ui.x0;
+      pt->y0     = stbte__ui.y0;
+      pt->width  = window_width;
+      pt->height = height;
+      vpos[STBTE__side_left] = vpos[STBTE__side_right] = height;
+   } else {
+      int num_rows = STBTE__num_tool * ((stbte__region[pt->side].width-4)/STBTE__TOOLBAR_ICON_SIZE);
+      height[STBTE__panel_toolbar] = num_rows*13 + 3*15 + 4; // 3*15 for cut/copy/paste, which are stacked vertically
+   }
+
+   for (i=0; i < 4; ++i)
+      stbte__region[i].y = stbte__ui.y0 + vpos[i];
+
+   for (i=0; i < 2; ++i) {
+      int anim = (int) (stbte__region[i].width * stbte__region[i].retracted);
+      stbte__region[i].x = (i == STBTE__side_left) ? stbte__ui.x0 - anim : stbte__ui.x1 - stbte__region[i].width + anim;
+   }
+
+   // color picker
+   height[STBTE__panel_colorpick] = 300;
+
+   // info panel
+   w = stbte__region[p[STBTE__panel_info].side].width;
+   p[STBTE__panel_info].mode = (w >= 8 + (11+7*tm->digits+17)*2 + 4);
+   if (p[STBTE__panel_info].mode)
+      height[STBTE__panel_info] = 5 + 11*2 + 2 + tm->palette_spacing_y;
+   else
+      height[STBTE__panel_info] = 5 + 11*4 + 2 + tm->palette_spacing_y;
+
+   // layers
+   limit = 6 + stbte__ui.panel[STBTE__panel_layers].delta_height;
+   height[STBTE__panel_layers] = (tm->num_layers > limit ? limit : tm->num_layers)*15 + 7 + (tm->has_layer_names ? 0 : 11) + props*13;
+
+   // categories
+   limit = 6 + stbte__ui.panel[STBTE__panel_categories].delta_height;
+   height[STBTE__panel_categories] = (tm->num_categories+1 > limit ? limit : tm->num_categories+1)*11 + 14;
+   if (stbte__ui.panel[STBTE__panel_categories].side == stbte__ui.panel[STBTE__panel_categories].side)
+      height[STBTE__panel_categories] -= 4;
+
+   // palette
+   k =  (stbte__region[p[STBTE__panel_tiles].side].width - 8) / tm->palette_spacing_x;
+   if (k == 0) k = 1;
+   height[STBTE__panel_tiles] = ((tm->num_tiles+k-1)/k) * tm->palette_spacing_y + 8;
+
+   // properties panel
+   height[STBTE__panel_props] = 9 + STBTE_MAX_PROPERTIES*14;
+
+   // now compute the locations of all the panels
+   for (i=0; i < STBTE__num_panel; ++i) {
+      if (panel_active[i]) {
+         int side = p[i].side;
+         if (side == STBTE__side_left || side == STBTE__side_right) {
+            p[i].width  = stbte__region[side].width;
+            p[i].x0     = stbte__region[side].x;
+            p[i].y0     = stbte__ui.y0 + vpos[side];
+            p[i].height = height[i];
+            vpos[side] += height[i];
+            if (vpos[side] > window_height) {
+               vpos[side] = window_height;
+               p[i].height = stbte__ui.y1 - p[i].y0;
+            }
+         } else {
+            ; // it's at top, it's already been explicitly set up earlier
+         }
+      } else {
+         // inactive panel
+         p[i].height = 0;
+         p[i].width  = 0;
+         p[i].x0     = stbte__ui.x1;
+         p[i].y0     = stbte__ui.y1;
+      }
+   }
+}
+
+// unique identifiers for imgui
+enum
+{
+   STBTE__map=1,
+   STBTE__region,
+   STBTE__panel,                          // panel background to hide map, and misc controls
+   STBTE__info,                           // info data
+   STBTE__toolbarA, STBTE__toolbarB,      // toolbar buttons: param is tool number
+   STBTE__palette,                        // palette selectors: param is tile index
+   STBTE__categories,                     // category selectors: param is category index
+   STBTE__layer,                          //
+   STBTE__solo, STBTE__hide, STBTE__lock, // layer controls: param is layer
+   STBTE__scrollbar,                      // param is panel ID
+   STBTE__panel_mover,                    // p1 is panel ID, p2 is destination side
+   STBTE__panel_sizer,                    // param panel ID
+   STBTE__scrollbar_id,
+   STBTE__colorpick_id,
+   STBTE__prop_flag,
+   STBTE__prop_float,
+   STBTE__prop_int,
+};
+
+// id is:      [      24-bit data     : 7-bit identifier ]
+// map id is:  [  12-bit y : 12 bit x : 7-bit identifier ]
+
+#define STBTE__ID(n,p)     ((n) + ((p)<<7))
+#define STBTE__ID2(n,p,q)  STBTE__ID(n, ((p)<<12)+(q) )
+#define STBTE__IDMAP(x,y)  STBTE__ID2(STBTE__map, x,y)
+
+static void stbte__activate_map(int x, int y)
+{
+   stbte__ui.active_id = STBTE__IDMAP(x,y);
+   stbte__ui.active_event = stbte__ui.event;
+   stbte__ui.sx = x;
+   stbte__ui.sy = y;
+}
+
+static void stbte__alert(const char *msg)
+{
+   stbte__ui.alert_msg = msg;
+   stbte__ui.alert_timer = 3;
+}
+
+#define STBTE__BG(tm,layer) ((layer) == 0 ? (tm)->background_tile : STBTE__NO_TILE)
+
+
+
+static void stbte__brush_predict(stbte_tilemap *tm, short result[])
+{
+   stbte__tileinfo *ti;
+   int i;
+
+   if (tm->cur_tile < 0) return;
+
+   ti = &tm->tiles[tm->cur_tile];
+
+   // find lowest legit layer to paint it on, and put it there
+   for (i=0; i < tm->num_layers; ++i) {
+      // check if object is allowed on layer
+      if (!(ti->layermask & (1 << i)))
+         continue;
+
+      if (i != tm->solo_layer) {
+         // if there's a selected layer, can only paint on that
+         if (tm->cur_layer >= 0 && i != tm->cur_layer)
+            continue;
+
+         // if the layer is hidden, we can't see it
+         if (tm->layerinfo[i].hidden)
+            continue;
+
+         // if the layer is locked, we can't write to it
+         if (tm->layerinfo[i].locked == STBTE__locked)
+            continue;
+
+         // if the layer is non-empty and protected, can't write to it
+         if (tm->layerinfo[i].locked == STBTE__protected && result[i] != STBTE__BG(tm,i))
+            continue;
+      }
+
+      result[i] = ti->id;
+      return;
+   }
+}
+
+static void stbte__brush(stbte_tilemap *tm, int x, int y)
+{
+   stbte__tileinfo *ti;
+
+   // find lowest legit layer to paint it on, and put it there
+   int i;
+
+   if (tm->cur_tile < 0) return;
+
+   ti = &tm->tiles[tm->cur_tile];
+
+   for (i=0; i < tm->num_layers; ++i) {
+      // check if object is allowed on layer
+      if (!(ti->layermask & (1 << i)))
+         continue;
+
+      if (i != tm->solo_layer) {
+         // if there's a selected layer, can only paint on that
+         if (tm->cur_layer >= 0 && i != tm->cur_layer)
+            continue;
+
+         // if the layer is hidden, we can't see it
+         if (tm->layerinfo[i].hidden)
+            continue;
+
+         // if the layer is locked, we can't write to it
+         if (tm->layerinfo[i].locked == STBTE__locked)
+            continue;
+
+         // if the layer is non-empty and protected, can't write to it
+         if (tm->layerinfo[i].locked == STBTE__protected && tm->data[y][x][i] != STBTE__BG(tm,i))
+            continue;
+      }
+
+      stbte__undo_record(tm,x,y,i,tm->data[y][x][i]);
+      tm->data[y][x][i] = ti->id;
+      return;
+   }
+
+   //stbte__alert("Selected tile not valid on active layer(s)");
+}
+
+enum
+{
+   STBTE__erase_none = -1,
+   STBTE__erase_brushonly = 0,
+   STBTE__erase_any = 1,
+   STBTE__erase_all = 2,
+};
+
+static int stbte__erase_predict(stbte_tilemap *tm, short result[], int allow_any)
+{
+   stbte__tileinfo *ti = tm->cur_tile >= 0 ? &tm->tiles[tm->cur_tile] : NULL;
+   int i;
+
+   if (allow_any == STBTE__erase_none)
+      return allow_any;
+
+   // first check if only one layer is legit
+   i = tm->cur_layer;
+   if (tm->solo_layer >= 0)
+      i = tm->solo_layer;
+
+   // if only one layer is legit, directly process that one for clarity
+   if (i >= 0) {
+      short bg = (i == 0 ? tm->background_tile : -1);
+      if (tm->solo_layer < 0) {
+         // check that we're allowed to write to it
+         if (tm->layerinfo[i].hidden) return STBTE__erase_none;
+         if (tm->layerinfo[i].locked) return STBTE__erase_none;
+      }
+      if (result[i] == bg)
+         return STBTE__erase_none; // didn't erase anything
+      if (ti && result[i] == ti->id && (i != 0 || ti->id != tm->background_tile)) {
+         result[i] = bg;
+         return STBTE__erase_brushonly;
+      }
+      if (allow_any == STBTE__erase_any) {
+         result[i] = bg;
+         return STBTE__erase_any;
+      }
+      return STBTE__erase_none;
+   }
+
+   // if multiple layers are legit, first scan all for brush data
+
+   if (ti && allow_any != STBTE__erase_all) {
+      for (i=tm->num_layers-1; i >= 0; --i) {
+         if (result[i] != ti->id)
+            continue;
+         if (tm->layerinfo[i].locked || tm->layerinfo[i].hidden)
+            continue;
+         if (i == 0 && result[i] == tm->background_tile)
+            return STBTE__erase_none;
+         result[i] = STBTE__BG(tm,i);
+         return STBTE__erase_brushonly;
+      }
+   }
+
+   if (allow_any != STBTE__erase_any && allow_any != STBTE__erase_all)
+      return STBTE__erase_none;
+
+   // apply layer filters, erase from top
+   for (i=tm->num_layers-1; i >= 0; --i) {
+      if (result[i] < 0)
+         continue;
+      if (tm->layerinfo[i].locked || tm->layerinfo[i].hidden)
+         continue;
+      if (i == 0 && result[i] == tm->background_tile)
+         return STBTE__erase_none;
+      result[i] = STBTE__BG(tm,i);
+      if (allow_any != STBTE__erase_all)
+         return STBTE__erase_any;
+   }
+
+   if (allow_any == STBTE__erase_all)
+      return allow_any;
+   return STBTE__erase_none;
+}
+
+static int stbte__erase(stbte_tilemap *tm, int x, int y, int allow_any)
+{
+   stbte__tileinfo *ti = tm->cur_tile >= 0 ? &tm->tiles[tm->cur_tile] : NULL;
+   int i;
+
+   if (allow_any == STBTE__erase_none)
+      return allow_any;
+
+   // first check if only one layer is legit
+   i = tm->cur_layer;
+   if (tm->solo_layer >= 0)
+      i = tm->solo_layer;
+
+   // if only one layer is legit, directly process that one for clarity
+   if (i >= 0) {
+      short bg = (i == 0 ? tm->background_tile : -1);
+      if (tm->solo_layer < 0) {
+         // check that we're allowed to write to it
+         if (tm->layerinfo[i].hidden) return STBTE__erase_none;
+         if (tm->layerinfo[i].locked) return STBTE__erase_none;
+      }
+      if (tm->data[y][x][i] == bg)
+         return -1; // didn't erase anything
+      if (ti && tm->data[y][x][i] == ti->id && (i != 0 || ti->id != tm->background_tile)) {
+         stbte__undo_record(tm,x,y,i,tm->data[y][x][i]);
+         tm->data[y][x][i] = bg;
+         return STBTE__erase_brushonly;
+      }
+      if (allow_any == STBTE__erase_any) {
+         stbte__undo_record(tm,x,y,i,tm->data[y][x][i]);
+         tm->data[y][x][i] = bg;
+         return STBTE__erase_any;
+      }
+      return STBTE__erase_none;
+   }
+
+   // if multiple layers are legit, first scan all for brush data
+
+   if (ti && allow_any != STBTE__erase_all) {
+      for (i=tm->num_layers-1; i >= 0; --i) {
+         if (tm->data[y][x][i] != ti->id)
+            continue;
+         if (tm->layerinfo[i].locked || tm->layerinfo[i].hidden)
+            continue;
+         if (i == 0 && tm->data[y][x][i] == tm->background_tile)
+            return STBTE__erase_none;
+         stbte__undo_record(tm,x,y,i,tm->data[y][x][i]);
+         tm->data[y][x][i] = STBTE__BG(tm,i);
+         return STBTE__erase_brushonly;
+      }
+   }
+
+   if (allow_any != STBTE__erase_any && allow_any != STBTE__erase_all)
+      return STBTE__erase_none;
+
+   // apply layer filters, erase from top
+   for (i=tm->num_layers-1; i >= 0; --i) {
+      if (tm->data[y][x][i] < 0)
+         continue;
+      if (tm->layerinfo[i].locked || tm->layerinfo[i].hidden)
+         continue;
+      if (i == 0 && tm->data[y][x][i] == tm->background_tile)
+         return STBTE__erase_none;
+      stbte__undo_record(tm,x,y,i,tm->data[y][x][i]);
+      tm->data[y][x][i] = STBTE__BG(tm,i);
+      if (allow_any != STBTE__erase_all)
+         return STBTE__erase_any;
+   }
+   if (allow_any == STBTE__erase_all)
+      return allow_any;
+   return STBTE__erase_none;
+}
+
+static int stbte__find_tile(stbte_tilemap *tm, int tile_id)
+{
+   int i;
+   for (i=0; i < tm->num_tiles; ++i)
+      if (tm->tiles[i].id == tile_id)
+         return i;
+   stbte__alert("Eyedropped tile that isn't in tileset");
+   return -1;
+}
+
+static void stbte__eyedrop(stbte_tilemap *tm, int x, int y)
+{
+   int i,j;
+
+   // flush eyedropper state
+   if (stbte__ui.eyedrop_x != x || stbte__ui.eyedrop_y != y) {
+      stbte__ui.eyedrop_x = x;
+      stbte__ui.eyedrop_y = y;
+      stbte__ui.eyedrop_last_layer = tm->num_layers;
+   }
+
+   // if only one layer is active, query that
+   i = tm->cur_layer;
+   if (tm->solo_layer >= 0)
+      i = tm->solo_layer;
+   if (i >= 0) {
+      if (tm->data[y][x][i] == STBTE__NO_TILE)
+         return;
+      tm->cur_tile = stbte__find_tile(tm, tm->data[y][x][i]);
+      return;
+   }
+
+   // if multiple layers, continue from previous
+   i = stbte__ui.eyedrop_last_layer;
+   for (j=0; j < tm->num_layers; ++j) {
+      if (--i < 0)
+         i = tm->num_layers-1;
+      if (tm->layerinfo[i].hidden)
+         continue;
+      if (tm->data[y][x][i] == STBTE__NO_TILE)
+         continue;
+      stbte__ui.eyedrop_last_layer = i;
+      tm->cur_tile = stbte__find_tile(tm, tm->data[y][x][i]);
+      return;
+   }
+}
+
+static int stbte__should_copy_properties(stbte_tilemap *tm)
+{
+   int i;
+   if (tm->propmode == STBTE__propmode_always)
+      return 1;
+   if (tm->propmode == STBTE__propmode_never)
+      return 0;
+   if (tm->solo_layer >= 0 || tm->cur_layer >= 0)
+      return 0;
+   for (i=0; i < tm->num_layers; ++i)
+      if (tm->layerinfo[i].hidden || tm->layerinfo[i].locked)
+         return 0;
+   return 1;
+}
+
+// compute the result of pasting into a tile non-destructively so we can preview it
+static void stbte__paste_stack(stbte_tilemap *tm, short result[], short dest[], short src[], int dragging)
+{
+   int i;
+
+   // special case single-layer
+   i = tm->cur_layer;
+   if (tm->solo_layer >= 0)
+      i = tm->solo_layer;
+   if (i >= 0) {
+      if (tm->solo_layer < 0) {
+         // check that we're allowed to write to it
+         if (tm->layerinfo[i].hidden) return;
+         if (tm->layerinfo[i].locked == STBTE__locked) return;
+         // if protected, dest has to be empty
+         if (tm->layerinfo[i].locked == STBTE__protected && dest[i] != STBTE__BG(tm,i)) return;
+         // if dragging w/o copy, we will try to erase stuff, which protection disallows
+         if (dragging && tm->layerinfo[i].locked == STBTE__protected)
+             return;
+      }
+      result[i] = dest[i];
+      if (src[i] != STBTE__BG(tm,i))
+         result[i] = src[i];
+      return;
+   }
+
+   for (i=0; i < tm->num_layers; ++i) {
+      result[i] = dest[i];
+      if (src[i] != STBTE__NO_TILE)
+         if (!tm->layerinfo[i].hidden && tm->layerinfo[i].locked != STBTE__locked)
+            if (tm->layerinfo[i].locked == STBTE__unlocked || (!dragging && dest[i] == STBTE__BG(tm,i)))
+               result[i] = src[i];
+   }
+}
+
+// compute the result of dragging away from a tile
+static void stbte__clear_stack(stbte_tilemap *tm, short result[])
+{
+   int i;
+   // special case single-layer
+   i = tm->cur_layer;
+   if (tm->solo_layer >= 0)
+      i = tm->solo_layer;
+   if (i >= 0)
+      result[i] = STBTE__BG(tm,i);
+   else
+      for (i=0; i < tm->num_layers; ++i)
+         if (!tm->layerinfo[i].hidden && tm->layerinfo[i].locked == STBTE__unlocked)
+            result[i] = STBTE__BG(tm,i);
+}
+
+// check if some map square is active
+#define STBTE__IS_MAP_ACTIVE()  ((stbte__ui.active_id & 127) == STBTE__map)
+#define STBTE__IS_MAP_HOT()     ((stbte__ui.hot_id & 127) == STBTE__map)
+
+static void stbte__fillrect(stbte_tilemap *tm, int x0, int y0, int x1, int y1, int fill)
+{
+   int i,j;
+
+   stbte__begin_undo(tm);
+   if (x0 > x1) i=x0,x0=x1,x1=i;
+   if (y0 > y1) j=y0,y0=y1,y1=j;
+   for (j=y0; j <= y1; ++j)
+      for (i=x0; i <= x1; ++i)
+         if (fill)
+            stbte__brush(tm, i,j);
+         else
+            stbte__erase(tm, i,j,STBTE__erase_any);
+   stbte__end_undo(tm);
+   // suppress warning from brush
+   stbte__ui.alert_msg = 0;
+}
+
+static void stbte__select_rect(stbte_tilemap *tm, int x0, int y0, int x1, int y1)
+{
+   stbte__ui.has_selection = 1;
+   stbte__ui.select_x0 = (x0 < x1 ? x0 : x1);
+   stbte__ui.select_x1 = (x0 < x1 ? x1 : x0);
+   stbte__ui.select_y0 = (y0 < y1 ? y0 : y1);
+   stbte__ui.select_y1 = (y0 < y1 ? y1 : y0);
+}
+
+static void stbte__copy_properties(float *dest, float *src)
+{
+   int i;
+   for (i=0; i < STBTE_MAX_PROPERTIES; ++i)
+      dest[i] = src[i];
+}
+
+static void stbte__copy_cut(stbte_tilemap *tm, int cut)
+{
+   int i,j,n,w,h,p=0;
+   int copy_props = stbte__should_copy_properties(tm);
+   if (!stbte__ui.has_selection)
+      return;
+   w = stbte__ui.select_x1 - stbte__ui.select_x0 + 1;
+   h = stbte__ui.select_y1 - stbte__ui.select_y0 + 1;
+   if (STBTE_MAX_COPY / w < h) {
+      stbte__alert("Selection too large for copy buffer, increase STBTE_MAX_COPY");
+      return;
+   }
+
+   for (i=0; i < w*h; ++i)
+      for (n=0; n < tm->num_layers; ++n)
+         stbte__ui.copybuffer[i][n] = STBTE__NO_TILE;
+
+   if (cut)
+      stbte__begin_undo(tm);
+   for (j=stbte__ui.select_y0; j <= stbte__ui.select_y1; ++j) {
+      for (i=stbte__ui.select_x0; i <= stbte__ui.select_x1; ++i) {
+         for (n=0; n < tm->num_layers; ++n) {
+            if (tm->solo_layer >= 0) {
+               if (tm->solo_layer != n)
+                  continue;
+            } else {
+               if (tm->cur_layer >= 0)
+                  if (tm->cur_layer != n)
+                     continue;
+               if (tm->layerinfo[n].hidden)
+                  continue;
+               if (cut && tm->layerinfo[n].locked)
+                  continue;
+            }
+            stbte__ui.copybuffer[p][n] = tm->data[j][i][n];
+            if (cut) {
+               stbte__undo_record(tm,i,j,n, tm->data[j][i][n]);
+               tm->data[j][i][n] = (n==0 ? tm->background_tile : -1);
+            }
+         }
+         if (copy_props) {
+            stbte__copy_properties(stbte__ui.copyprops[p], tm->props[j][i]);
+#ifdef STBTE_ALLOW_LINK
+            stbte__ui.copylinks[p] = tm->link[j][i];
+            if (cut)
+               stbte__set_link(tm, i,j,-1,-1, STBTE__undo_record);
+#endif
+         }
+         ++p;
+      }
+   }
+   if (cut)
+      stbte__end_undo(tm);
+   stbte__ui.copy_width = w;
+   stbte__ui.copy_height = h;
+   stbte__ui.has_copy = 1;
+   //stbte__ui.has_selection = 0;
+   stbte__ui.copy_has_props = copy_props;
+   stbte__ui.copy_src = tm; // used to give better semantics when copying links
+   stbte__ui.copy_src_x = stbte__ui.select_x0;
+   stbte__ui.copy_src_y = stbte__ui.select_y0;
+}
+
+static int stbte__in_rect(int x, int y, int x0, int y0, int w, int h)
+{
+   return x >= x0 && x < x0+w && y >= y0 && y < y0+h;
+}
+
+#ifdef STBTE_ALLOW_LINK
+static int stbte__in_src_rect(int x, int y)
+{
+   return stbte__in_rect(x,y, stbte__ui.copy_src_x, stbte__ui.copy_src_y, stbte__ui.copy_width, stbte__ui.copy_height);
+}
+
+static int stbte__in_dest_rect(int x, int y, int destx, int desty)
+{
+   return stbte__in_rect(x,y, destx, desty, stbte__ui.copy_width, stbte__ui.copy_height);
+}
+#endif
+
+static void stbte__paste(stbte_tilemap *tm, int mapx, int mapy)
+{
+   int w = stbte__ui.copy_width;
+   int h = stbte__ui.copy_height;
+   int i,j,k,p;
+   int x = mapx - (w>>1);
+   int y = mapy - (h>>1);
+   int copy_props = stbte__should_copy_properties(tm) && stbte__ui.copy_has_props;
+   if (stbte__ui.has_copy == 0)
+      return;
+   stbte__begin_undo(tm);
+   p = 0;
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i) {
+         if (y+j >= 0 && y+j < tm->max_y && x+i >= 0 && x+i < tm->max_x) {
+            // compute the new stack
+            short tilestack[STBTE_MAX_LAYERS];
+            for (k=0; k < tm->num_layers; ++k)
+               tilestack[k] = tm->data[y+j][x+i][k];
+            stbte__paste_stack(tm, tilestack, tilestack, stbte__ui.copybuffer[p], 0);
+            // update anything that changed
+            for (k=0; k < tm->num_layers; ++k) {
+               if (tilestack[k] != tm->data[y+j][x+i][k]) {
+                  stbte__undo_record(tm, x+i,y+j,k, tm->data[y+j][x+i][k]);
+                  tm->data[y+j][x+i][k] = tilestack[k];
+               }
+            }
+         }
+         if (copy_props) {
+#ifdef STBTE_ALLOW_LINK
+            // need to decide how to paste a link, so there's a few cases
+            int destx = -1, desty = -1;
+            stbte__link *link = &stbte__ui.copylinks[p];
+
+            // check if link is within-rect
+            if (stbte__in_src_rect(link->x, link->y)) {
+               // new link should point to copy (but only if copy is within map)
+               destx = x + (link->x - stbte__ui.copy_src_x);
+               desty = y + (link->y - stbte__ui.copy_src_y);
+            } else if (tm == stbte__ui.copy_src) {
+               // if same map, then preserve link unless target is overwritten
+               if (!stbte__in_dest_rect(link->x,link->y,x,y)) {
+                  destx = link->x;
+                  desty = link->y;
+               }
+            }
+            // this is necessary for offset-copy, but also in case max_x/max_y has changed
+            if (destx < 0 || destx >= tm->max_x || desty < 0 || desty >= tm->max_y)
+               destx = -1, desty = -1;
+            stbte__set_link(tm, x+i, y+j, destx, desty, STBTE__undo_record);
+#endif
+            for (k=0; k < STBTE_MAX_PROPERTIES; ++k) {
+               if (tm->props[y+j][x+i][k] != stbte__ui.copyprops[p][k])
+                  stbte__undo_record_prop_float(tm, x+i, y+j, k, tm->props[y+j][x+i][k]);
+            }
+            stbte__copy_properties(tm->props[y+j][x+i], stbte__ui.copyprops[p]);
+         }
+         ++p;
+      }
+   }
+   stbte__end_undo(tm);
+}
+
+static void stbte__drag_update(stbte_tilemap *tm, int mapx, int mapy, int copy_props)
+{
+   int w = stbte__ui.drag_w, h = stbte__ui.drag_h;
+   int ox,oy,i,deleted=0,written=0;
+   short temp[STBTE_MAX_LAYERS];
+   short *data = NULL;
+
+   STBTE__NOTUSED(deleted);
+   STBTE__NOTUSED(written);
+
+   if (!stbte__ui.shift) {
+      ox = mapx - stbte__ui.drag_x;
+      oy = mapy - stbte__ui.drag_y;
+      if (ox >= 0 && ox < w && oy >= 0 && oy < h) {
+         deleted=1;
+         for (i=0; i < tm->num_layers; ++i)
+            temp[i] = tm->data[mapy][mapx][i];
+         data = temp;
+         stbte__clear_stack(tm, data);
+      }
+   }
+   ox = mapx - stbte__ui.drag_dest_x;
+   oy = mapy - stbte__ui.drag_dest_y;
+   // if this map square is in the target drag region
+   if (ox >= 0 && ox < w && oy >= 0 && oy < h) {
+      // and the src map square is on the map
+      if (stbte__in_rect(stbte__ui.drag_x+ox, stbte__ui.drag_y+oy, 0, 0, tm->max_x, tm->max_y)) {
+         written = 1;
+         if (data == NULL) {
+            for (i=0; i < tm->num_layers; ++i)
+               temp[i] = tm->data[mapy][mapx][i];
+            data = temp;
+         }
+         stbte__paste_stack(tm, data, data, tm->data[stbte__ui.drag_y+oy][stbte__ui.drag_x+ox], !stbte__ui.shift);
+         if (copy_props) {
+            for (i=0; i < STBTE_MAX_PROPERTIES; ++i) {
+               if (tm->props[mapy][mapx][i] != tm->props[stbte__ui.drag_y+oy][stbte__ui.drag_x+ox][i]) {
+                  stbte__undo_record_prop_float(tm, mapx, mapy, i, tm->props[mapy][mapx][i]);
+                  tm->props[mapy][mapx][i] = tm->props[stbte__ui.drag_y+oy][stbte__ui.drag_x+ox][i];
+               }
+            }
+         }
+      }
+   }
+   if (data) {
+      for (i=0; i < tm->num_layers; ++i) {
+         if (tm->data[mapy][mapx][i] != data[i]) {
+            stbte__undo_record(tm, mapx, mapy, i, tm->data[mapy][mapx][i]);
+            tm->data[mapy][mapx][i] = data[i];
+         }
+      }
+   }
+   #ifdef STBTE_ALLOW_LINK
+   if (copy_props) {
+      int overwritten=0, moved=0, copied=0;
+      // since this function is called on EVERY tile, we can fix up even tiles not
+      // involved in the move
+
+      stbte__link *k;
+      // first, determine what src link ends up here
+      k = &tm->link[mapy][mapx]; // by default, it's the one currently here
+      if (deleted)               // if dragged away, it's erased
+         k = NULL;
+      if (written)               // if dragged into, it gets that link
+         k = &tm->link[stbte__ui.drag_y+oy][stbte__ui.drag_x+ox];
+
+      // now check whether the *target* gets moved or overwritten
+      if (k && k->x >= 0) {
+         overwritten = stbte__in_rect(k->x, k->y, stbte__ui.drag_dest_x, stbte__ui.drag_dest_y, w, h);
+         if (!stbte__ui.shift)
+            moved    = stbte__in_rect(k->x, k->y, stbte__ui.drag_x     , stbte__ui.drag_y     , w, h);
+         else
+            copied   = stbte__in_rect(k->x, k->y, stbte__ui.drag_x     , stbte__ui.drag_y     , w, h);
+      }
+
+      if (deleted || written || overwritten || moved || copied) {
+         // choose the final link value based on the above
+         if (k == NULL || k->x < 0)
+            stbte__set_link(tm, mapx, mapy, -1, -1, STBTE__undo_record);
+         else if (moved || (copied && written)) {
+            // if we move the target, we update to point to the new target;
+            // or, if we copy the target and the source is part of the copy, then update to new target
+            int x = k->x + (stbte__ui.drag_dest_x - stbte__ui.drag_x);
+            int y = k->y + (stbte__ui.drag_dest_y - stbte__ui.drag_y);
+            if (!(x >= 0 && y >= 0 && x < tm->max_x && y < tm->max_y))
+               x = -1, y = -1;
+            stbte__set_link(tm, mapx, mapy, x, y, STBTE__undo_record);
+         } else if (overwritten) {
+            stbte__set_link(tm, mapx, mapy, -1, -1, STBTE__undo_record);
+         } else
+            stbte__set_link(tm, mapx, mapy, k->x, k->y, STBTE__undo_record);
+      }
+   }
+   #endif
+}
+
+static void stbte__drag_place(stbte_tilemap *tm, int mapx, int mapy)
+{
+   int i,j;
+   int copy_props = stbte__should_copy_properties(tm);
+   int move_x = (stbte__ui.drag_dest_x - stbte__ui.drag_x);
+   int move_y = (stbte__ui.drag_dest_y - stbte__ui.drag_y);
+   if (move_x == 0 && move_y == 0)
+      return;
+
+   stbte__begin_undo(tm);
+   // we now need a 2D memmove-style mover that doesn't
+   // overwrite any data as it goes. this requires being
+   // direction sensitive in the same way as memmove
+   if (move_y > 0 || (move_y == 0 && move_x > 0)) {
+      for (j=tm->max_y-1; j >= 0; --j)
+         for (i=tm->max_x-1; i >= 0; --i)
+            stbte__drag_update(tm,i,j,copy_props);
+   } else {
+      for (j=0; j < tm->max_y; ++j)
+         for (i=0; i < tm->max_x; ++i)
+            stbte__drag_update(tm,i,j,copy_props);
+   }
+   stbte__end_undo(tm);
+
+   stbte__ui.has_selection = 1;
+   stbte__ui.select_x0 = stbte__ui.drag_dest_x;
+   stbte__ui.select_y0 = stbte__ui.drag_dest_y;
+   stbte__ui.select_x1 = stbte__ui.select_x0 + stbte__ui.drag_w - 1;
+   stbte__ui.select_y1 = stbte__ui.select_y0 + stbte__ui.drag_h - 1;
+}
+
+static void stbte__tile_paint(stbte_tilemap *tm, int sx, int sy, int mapx, int mapy, int layer)
+{
+   int i;
+   int id = STBTE__IDMAP(mapx,mapy);
+   int x0=sx, y0=sy;
+   int x1=sx+tm->spacing_x, y1=sy+tm->spacing_y;
+   stbte__hittest(x0,y0,x1,y1, id);
+   short *data = tm->data[mapy][mapx];
+   short temp[STBTE_MAX_LAYERS];
+
+   if (STBTE__IS_MAP_HOT()) {
+      if (stbte__ui.pasting) {
+         int ox = mapx - stbte__ui.paste_x;
+         int oy = mapy - stbte__ui.paste_y;
+         if (ox >= 0 && ox < stbte__ui.copy_width && oy >= 0 && oy < stbte__ui.copy_height) {
+            stbte__paste_stack(tm, temp, tm->data[mapy][mapx], stbte__ui.copybuffer[oy*stbte__ui.copy_width+ox], 0);
+            data = temp;
+         }
+      } else if (stbte__ui.dragging) {
+         int ox,oy;
+         for (i=0; i < tm->num_layers; ++i)
+            temp[i] = tm->data[mapy][mapx][i];
+         data = temp;
+
+         // if it's in the source area, remove things unless shift-dragging
+         ox = mapx - stbte__ui.drag_x;
+         oy = mapy - stbte__ui.drag_y;
+         if (!stbte__ui.shift && ox >= 0 && ox < stbte__ui.drag_w && oy >= 0 && oy < stbte__ui.drag_h) {
+            stbte__clear_stack(tm, temp);
+         }
+
+         ox = mapx - stbte__ui.drag_dest_x;
+         oy = mapy - stbte__ui.drag_dest_y;
+         if (ox >= 0 && ox < stbte__ui.drag_w && oy >= 0 && oy < stbte__ui.drag_h) {
+            stbte__paste_stack(tm, temp, temp, tm->data[stbte__ui.drag_y+oy][stbte__ui.drag_x+ox], !stbte__ui.shift);
+         }
+      } else if (STBTE__IS_MAP_ACTIVE()) {
+         if (stbte__ui.tool == STBTE__tool_rect) {
+            if ((stbte__ui.ms_time & 511) < 380) {
+               int ex = ((stbte__ui.hot_id >> 19) & 4095);
+               int ey = ((stbte__ui.hot_id >>  7) & 4095);
+               int sx = stbte__ui.sx;
+               int sy = stbte__ui.sy;
+
+               if (   ((mapx >= sx && mapx < ex+1) || (mapx >= ex && mapx < sx+1))
+                   && ((mapy >= sy && mapy < ey+1) || (mapy >= ey && mapy < sy+1))) {
+                  int i;
+                  for (i=0; i < tm->num_layers; ++i)
+                     temp[i] = tm->data[mapy][mapx][i];
+                  data = temp;
+                  if (stbte__ui.active_event == STBTE__leftdown)
+                     stbte__brush_predict(tm, temp);
+                  else
+                     stbte__erase_predict(tm, temp, STBTE__erase_any);
+               }
+            }
+         }
+      }
+   }
+
+   if (STBTE__IS_HOT(id) && STBTE__INACTIVE() && !stbte__ui.pasting) {
+      if (stbte__ui.tool == STBTE__tool_brush) {
+         if ((stbte__ui.ms_time & 511) < 300) {
+            data = temp;
+            for (i=0; i < tm->num_layers; ++i)
+               temp[i] = tm->data[mapy][mapx][i];
+            stbte__brush_predict(tm, temp);
+         }
+      }
+   }
+
+   {
+      i = layer;
+      if (i == tm->solo_layer || (!tm->layerinfo[i].hidden && tm->solo_layer < 0))
+         if (data[i] >= 0)
+            STBTE_DRAW_TILE(sx,sy, (unsigned short) data[i], 0, tm->props[mapy][mapx]);
+   }
+}
+
+static void stbte__tile(stbte_tilemap *tm, int sx, int sy, int mapx, int mapy)
+{
+   int tool = stbte__ui.tool;
+   int x0=sx, y0=sy;
+   int x1=sx+tm->spacing_x, y1=sy+tm->spacing_y;
+   int id = STBTE__IDMAP(mapx,mapy);
+   int over = stbte__hittest(x0,y0,x1,y1, id);
+   switch (stbte__ui.event) {
+      case STBTE__paint: {
+         if (stbte__ui.pasting || stbte__ui.dragging || stbte__ui.scrolling)
+            break;
+         if (stbte__ui.scrollkey && !STBTE__IS_MAP_ACTIVE())
+            break;
+         if (STBTE__IS_HOT(id) && STBTE__IS_MAP_ACTIVE() && (tool == STBTE__tool_rect || tool == STBTE__tool_select)) {
+            int rx0,ry0,rx1,ry1,t;
+            // compute the center of each rect
+            rx0 = x0 + tm->spacing_x/2;
+            ry0 = y0 + tm->spacing_y/2;
+            rx1 = rx0 + (stbte__ui.sx - mapx) * tm->spacing_x;
+            ry1 = ry0 + (stbte__ui.sy - mapy) * tm->spacing_y;
+            if (rx0 > rx1) t=rx0,rx0=rx1,rx1=t;
+            if (ry0 > ry1) t=ry0,ry0=ry1,ry1=t;
+            rx0 -= tm->spacing_x/2;
+            ry0 -= tm->spacing_y/2;
+            rx1 += tm->spacing_x/2;
+            ry1 += tm->spacing_y/2;
+            stbte__draw_frame(rx0-1,ry0-1,rx1+1,ry1+1, STBTE_COLOR_TILEMAP_HIGHLIGHT);
+            break;
+         }
+         if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+            stbte__draw_frame(x0-1,y0-1,x1+1,y1+1, STBTE_COLOR_TILEMAP_HIGHLIGHT);
+         }
+#ifdef STBTE_ALLOW_LINK
+         if (stbte__ui.show_links && tm->link[mapy][mapx].x >= 0) {
+            int tx = tm->link[mapy][mapx].x;
+            int ty = tm->link[mapy][mapx].y;
+            int lx0,ly0,lx1,ly1;
+            if (STBTE_ALLOW_LINK(tm->data[mapy][mapx], tm->props[mapy][mapx],
+                                 tm->data[ty  ][tx  ], tm->props[ty  ][tx  ]))
+            {
+               lx0 =  x0 + (tm->spacing_x >> 1) - 1;
+               ly0 =  y0 + (tm->spacing_y >> 1) - 1;
+               lx1 = lx0 + (tx - mapx) * tm->spacing_x + 2;
+               ly1 = ly0 + (ty - mapy) * tm->spacing_y + 2;
+               stbte__draw_link(lx0,ly0,lx1,ly1,
+                   STBTE_LINK_COLOR(tm->data[mapy][mapx], tm->props[mapy][mapx],
+                                    tm->data[ty  ][tx  ], tm->props[ty  ][tx]));
+            }
+         }
+#endif
+         break;
+      }
+   }
+
+   if (stbte__ui.pasting) {
+      switch (stbte__ui.event) {
+         case STBTE__leftdown:
+            if (STBTE__IS_HOT(id)) {
+               stbte__ui.pasting = 0;
+               stbte__paste(tm, mapx, mapy);
+               stbte__activate(0);
+            }
+            break;
+         case STBTE__leftup:
+            // just clear it no matter what, since they might click away to clear it
+            stbte__activate(0);
+            break;
+         case STBTE__rightdown:
+            if (STBTE__IS_HOT(id)) {
+               stbte__activate(0);
+               stbte__ui.pasting = 0;
+            }
+            break;
+      }
+      return;
+   }
+
+   if (stbte__ui.scrolling) {
+      if (stbte__ui.event == STBTE__leftup) {
+         stbte__activate(0);
+         stbte__ui.scrolling = 0;
+      }
+      if (stbte__ui.event == STBTE__mousemove) {
+         tm->scroll_x += (stbte__ui.start_x - stbte__ui.mx);
+         tm->scroll_y += (stbte__ui.start_y - stbte__ui.my);
+         stbte__ui.start_x = stbte__ui.mx;
+         stbte__ui.start_y = stbte__ui.my;
+      }
+      return;
+   }
+
+   // regardless of tool, leftdown is a scrolldrag
+   if (STBTE__IS_HOT(id) && stbte__ui.scrollkey && stbte__ui.event == STBTE__leftdown) {
+      stbte__ui.scrolling = 1;
+      stbte__ui.start_x = stbte__ui.mx;
+      stbte__ui.start_y = stbte__ui.my;
+      return;
+   }
+
+   switch (tool) {
+      case STBTE__tool_brush:
+         switch (stbte__ui.event) {
+            case STBTE__mousemove:
+               if (STBTE__IS_MAP_ACTIVE() && over) {
+                  // don't brush/erase same tile multiple times unless they move away and back @TODO should just be only once, but that needs another data structure
+                  if (!STBTE__IS_ACTIVE(id)) {
+                     if (stbte__ui.active_event == STBTE__leftdown)
+                        stbte__brush(tm, mapx, mapy);
+                     else
+                        stbte__erase(tm, mapx, mapy, stbte__ui.brush_state);
+                     stbte__ui.active_id = id; // switch to this map square so we don't rebrush IT multiple times
+                  }
+               }
+               break;
+            case STBTE__leftdown:
+               if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+                  stbte__activate(id);
+                  stbte__begin_undo(tm);
+                  stbte__brush(tm, mapx, mapy);
+               }
+               break;
+            case STBTE__rightdown:
+               if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+                  stbte__activate(id);
+                  stbte__begin_undo(tm);
+                  if (stbte__erase(tm, mapx, mapy, STBTE__erase_any) == STBTE__erase_brushonly)
+                     stbte__ui.brush_state = STBTE__erase_brushonly;
+                  else
+                     stbte__ui.brush_state = STBTE__erase_any;
+               }
+               break;
+            case STBTE__leftup:
+            case STBTE__rightup:
+               if (STBTE__IS_MAP_ACTIVE()) {
+                  stbte__end_undo(tm);
+                  stbte__activate(0);
+               }
+               break;
+         }
+         break;
+
+#ifdef STBTE_ALLOW_LINK
+      case STBTE__tool_link:
+         switch (stbte__ui.event) {
+            case STBTE__leftdown:
+               if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+                  stbte__activate(id);
+                  stbte__ui.linking = 1;
+                  stbte__ui.sx = mapx;
+                  stbte__ui.sy = mapy;
+                  // @TODO: undo
+               }
+               break;
+            case STBTE__leftup:
+               if (STBTE__IS_HOT(id) && STBTE__IS_MAP_ACTIVE()) {
+                  if ((mapx != stbte__ui.sx || mapy != stbte__ui.sy) &&
+                         STBTE_ALLOW_LINK(tm->data[stbte__ui.sy][stbte__ui.sx], tm->props[stbte__ui.sy][stbte__ui.sx],
+                                          tm->data[mapy][mapx], tm->props[mapy][mapx]))
+                     stbte__set_link(tm, stbte__ui.sx, stbte__ui.sy, mapx, mapy, STBTE__undo_block);
+                  else
+                     stbte__set_link(tm, stbte__ui.sx, stbte__ui.sy, -1,-1, STBTE__undo_block);
+                  stbte__ui.linking = 0;
+                  stbte__activate(0);
+               }
+               break;
+
+            case STBTE__rightdown:
+               if (STBTE__IS_ACTIVE(id)) {
+                  stbte__activate(0);
+                  stbte__ui.linking = 0;
+               }
+               break;
+         }
+         break;
+#endif
+
+      case STBTE__tool_erase:
+         switch (stbte__ui.event) {
+            case STBTE__mousemove:
+               if (STBTE__IS_MAP_ACTIVE() && over)
+                  stbte__erase(tm, mapx, mapy, STBTE__erase_all);
+               break;
+            case STBTE__leftdown:
+               if (STBTE__IS_HOT(id) && STBTE__INACTIVE()) {
+                  stbte__activate(id);
+                  stbte__begin_undo(tm);
+                  stbte__erase(tm, mapx, mapy, STBTE__erase_all);
+               }
+               break;
+            case STBTE__leftup:
+               if (STBTE__IS_MAP_ACTIVE()) {
+                  stbte__end_undo(tm);
+                  stbte__activate(0);
+               }
+               break;
+         }
+         break;
+
+      case STBTE__tool_select:
+         if (STBTE__IS_HOT(id)) {
+            switch (stbte__ui.event) {
+               case STBTE__leftdown:
+                  if (STBTE__INACTIVE()) {
+                     // if we're clicking in an existing selection...
+                     if (stbte__ui.has_selection) {
+                        if (  mapx >= stbte__ui.select_x0 && mapx <= stbte__ui.select_x1
+                           && mapy >= stbte__ui.select_y0 && mapy <= stbte__ui.select_y1)
+                        {
+                           stbte__ui.dragging = 1;
+                           stbte__ui.drag_x = stbte__ui.select_x0;
+                           stbte__ui.drag_y = stbte__ui.select_y0;
+                           stbte__ui.drag_w = stbte__ui.select_x1 - stbte__ui.select_x0 + 1;
+                           stbte__ui.drag_h = stbte__ui.select_y1 - stbte__ui.select_y0 + 1;
+                           stbte__ui.drag_offx = mapx - stbte__ui.select_x0;
+                           stbte__ui.drag_offy = mapy - stbte__ui.select_y0;
+                        }
+                     }
+                     stbte__ui.has_selection = 0; // no selection until it completes
+                     stbte__activate_map(mapx,mapy);
+                  }
+                  break;
+               case STBTE__leftup:
+                  if (STBTE__IS_MAP_ACTIVE()) {
+                     if (stbte__ui.dragging) {
+                        stbte__drag_place(tm, mapx,mapy);
+                        stbte__ui.dragging = 0;
+                        stbte__activate(0);
+                     } else {
+                        stbte__select_rect(tm, stbte__ui.sx, stbte__ui.sy, mapx, mapy);
+                        stbte__activate(0);
+                     }
+                  }
+                  break;
+               case STBTE__rightdown:
+                  stbte__ui.has_selection = 0;
+                  break;
+            }
+         }
+         break;
+
+      case STBTE__tool_rect:
+         if (STBTE__IS_HOT(id)) {
+            switch (stbte__ui.event) {
+               case STBTE__leftdown:
+                  if (STBTE__INACTIVE())
+                     stbte__activate_map(mapx,mapy);
+                  break;
+               case STBTE__leftup:
+                  if (STBTE__IS_MAP_ACTIVE()) {
+                     stbte__fillrect(tm, stbte__ui.sx, stbte__ui.sy, mapx, mapy, 1);
+                     stbte__activate(0);
+                  }
+                  break;
+               case STBTE__rightdown:
+                  if (STBTE__INACTIVE())
+                     stbte__activate_map(mapx,mapy);
+                  break;
+               case STBTE__rightup:
+                  if (STBTE__IS_MAP_ACTIVE()) {
+                     stbte__fillrect(tm, stbte__ui.sx, stbte__ui.sy, mapx, mapy, 0);
+                     stbte__activate(0);
+                  }
+                  break;
+            }
+         }
+         break;
+
+
+      case STBTE__tool_eyedrop:
+         switch (stbte__ui.event) {
+            case STBTE__leftdown:
+               if (STBTE__IS_HOT(id) && STBTE__INACTIVE())
+                  stbte__eyedrop(tm,mapx,mapy);
+               break;
+         }
+         break;
+   }
+}
+
+static void stbte__start_paste(stbte_tilemap *tm)
+{
+   if (stbte__ui.has_copy) {
+      stbte__ui.pasting = 1;
+      stbte__activate(STBTE__ID(STBTE__toolbarB,3));
+   }
+}
+
+static void stbte__toolbar(stbte_tilemap *tm, int x0, int y0, int w, int h)
+{
+   int i;
+   int estimated_width = 13 * STBTE__num_tool + 8+8+ 120+4 - 30;
+   int x = x0 + w/2 - estimated_width/2;
+   int y = y0+1;
+
+   for (i=0; i < STBTE__num_tool; ++i) {
+      int highlight=0, disable=0;
+      highlight = (stbte__ui.tool == i);
+      if (i == STBTE__tool_undo || i == STBTE__tool_showgrid)
+          x += 8;
+      if (i == STBTE__tool_showgrid && stbte__ui.show_grid)
+         highlight = 1;
+      if (i == STBTE__tool_showlinks && stbte__ui.show_links)
+         highlight = 1;
+      if (i == STBTE__tool_fill)
+         continue;
+      #ifndef STBTE_ALLOW_LINK
+      if (i == STBTE__tool_link || i == STBTE__tool_showlinks)
+         disable = 1;
+      #endif
+      if (i == STBTE__tool_undo && !stbte__undo_available(tm))
+         disable = 1;
+      if (i == STBTE__tool_redo && !stbte__redo_available(tm))
+         disable = 1;
+      if (stbte__button_icon(STBTE__ctoolbar_button, toolchar[i], x, y, 13, STBTE__ID(STBTE__toolbarA, i), highlight, disable)) {
+         switch (i) {
+            case STBTE__tool_eyedrop:
+               stbte__ui.eyedrop_last_layer = tm->num_layers; // flush eyedropper state
+               // fallthrough
+            default:
+               stbte__ui.tool = i;
+               stbte__ui.has_selection = 0;
+               break;
+            case STBTE__tool_showlinks:
+               stbte__ui.show_links = !stbte__ui.show_links;
+               break;
+            case STBTE__tool_showgrid:
+               stbte__ui.show_grid = (stbte__ui.show_grid+1)%3;
+               break;
+            case STBTE__tool_undo:
+               stbte__undo(tm);
+               break;
+            case STBTE__tool_redo:
+               stbte__redo(tm);
+               break;
+         }
+      }
+      x += 13;
+   }
+
+   x += 8;
+   if (stbte__button(STBTE__ctoolbar_button, "cut"  , x, y,10, 40, STBTE__ID(STBTE__toolbarB,0), 0, !stbte__ui.has_selection))
+      stbte__copy_cut(tm, 1);
+   x += 42;
+   if (stbte__button(STBTE__ctoolbar_button, "copy" , x, y, 5, 40, STBTE__ID(STBTE__toolbarB,1), 0, !stbte__ui.has_selection))
+      stbte__copy_cut(tm, 0);
+   x += 42;
+   if (stbte__button(STBTE__ctoolbar_button, "paste", x, y, 0, 40, STBTE__ID(STBTE__toolbarB,2), stbte__ui.pasting, !stbte__ui.has_copy))
+      stbte__start_paste(tm);
+}
+
+#define STBTE__TEXTCOLOR(n)  stbte__color_table[n][STBTE__text][STBTE__idle]
+
+static int stbte__info_value(const char *label, int x, int y, int val, int digits, int id)
+{
+   if (stbte__ui.event == STBTE__paint) {
+      int off = 9-stbte__get_char_width(label[0]);
+      char text[16];
+      stbte__sprintf(text stbte__sizeof(text), label, digits, val);
+      stbte__draw_text_core(x+off,y, text, 999, STBTE__TEXTCOLOR(STBTE__cpanel),1);
+   }
+   if (id) {
+      x += 9+7*digits+4;
+      if (stbte__minibutton(STBTE__cmapsize, x,y, '+', STBTE__ID2(id,1,0)))
+         val += (stbte__ui.shift ? 10 : 1);
+      x += 9;
+      if (stbte__minibutton(STBTE__cmapsize, x,y, '-', STBTE__ID2(id,2,0)))
+         val -= (stbte__ui.shift ? 10 : 1);
+      if (val < 1) val = 1; else if (val > 4096) val = 4096;
+   }
+   return val;
+}
+
+static void stbte__info(stbte_tilemap *tm, int x0, int y0, int w, int h)
+{
+   int mode = stbte__ui.panel[STBTE__panel_info].mode;
+   int s = 11+7*tm->digits+4+15;
+   int x,y;
+   int in_region;
+
+   x = x0+2;
+   y = y0+2;
+   tm->max_x = stbte__info_value("w:%*d",x,y, tm->max_x, tm->digits, STBTE__ID(STBTE__info,0));
+   if (mode)
+      x += s;
+   else
+      y += 11;
+   tm->max_y = stbte__info_value("h:%*d",x,y, tm->max_y, tm->digits, STBTE__ID(STBTE__info,1));
+   x = x0+2;
+   y += 11;
+   in_region = (stbte__ui.hot_id & 127) == STBTE__map;
+   stbte__info_value(in_region ? "x:%*d" : "x:",x,y, (stbte__ui.hot_id>>19)&4095, tm->digits, 0);
+   if (mode)
+      x += s;
+   else
+      y += 11;
+   stbte__info_value(in_region ? "y:%*d" : "y:",x,y, (stbte__ui.hot_id>> 7)&4095, tm->digits, 0);
+   y += 15;
+   x = x0+2;
+   stbte__draw_text(x,y,"brush:",40,STBTE__TEXTCOLOR(STBTE__cpanel));
+   if (tm->cur_tile >= 0)
+      STBTE_DRAW_TILE(x+43,y-3,tm->tiles[tm->cur_tile].id,1,0);
+}
+
+static void stbte__layers(stbte_tilemap *tm, int x0, int y0, int w, int h)
+{
+   static const char *propmodes[3] = {
+      "default", "always", "never"
+   };
+   int num_rows;
+   int i, y, n;
+   int x1 = x0+w;
+   int y1 = y0+h;
+   int xoff = 20;
+
+   if (tm->has_layer_names) {
+      int side = stbte__ui.panel[STBTE__panel_layers].side;
+      xoff = stbte__region[side].width - 42;
+      xoff = (xoff < tm->layername_width + 10 ? xoff : tm->layername_width + 10);
+   }
+
+   x0 += 2;
+   y0 += 5;
+   if (!tm->has_layer_names) {
+      if (stbte__ui.event == STBTE__paint) {
+         stbte__draw_text(x0,y0, "Layers", w-4, STBTE__TEXTCOLOR(STBTE__cpanel));
+      }
+      y0 += 11;
+   }
+   num_rows = (y1-y0)/15;
+#ifndef STBTE_NO_PROPS
+   --num_rows;
+#endif
+   y = y0;
+   for (i=0; i < tm->num_layers; ++i) {
+      char text[3], *str = (char *) tm->layerinfo[i].name;
+      static char lockedchar[3] = { 'U', 'P', 'L' };
+      int locked = tm->layerinfo[i].locked;
+      int disabled = (tm->solo_layer >= 0 && tm->solo_layer != i);
+      if (i-tm->layer_scroll >= 0 && i-tm->layer_scroll < num_rows) {
+         if (str == NULL)
+            stbte__sprintf(str=text stbte__sizeof(text), "%2d", i+1);
+         if (stbte__button(STBTE__clayer_button, str, x0,y,(i+1<10)*2,xoff-2, STBTE__ID(STBTE__layer,i), tm->cur_layer==i,0))
+            tm->cur_layer = (tm->cur_layer == i ? -1 : i);
+         if (stbte__layerbutton(x0+xoff +  0,y+1,'H',STBTE__ID(STBTE__hide,i), tm->layerinfo[i].hidden,disabled,STBTE__clayer_hide))
+            tm->layerinfo[i].hidden = !tm->layerinfo[i].hidden;
+         if (stbte__layerbutton(x0+xoff + 12,y+1,lockedchar[locked],STBTE__ID(STBTE__lock,i), locked!=0,disabled,STBTE__clayer_lock))
+            tm->layerinfo[i].locked = (locked+1)%3;
+         if (stbte__layerbutton(x0+xoff + 24,y+1,'S',STBTE__ID(STBTE__solo,i), tm->solo_layer==i,0,STBTE__clayer_solo))
+            tm->solo_layer = (tm->solo_layer == i ? -1 : i);
+         y += 15;
+      }
+   }
+   stbte__scrollbar(x1-4, y0,y-2, &tm->layer_scroll, 0, tm->num_layers, num_rows, STBTE__ID(STBTE__scrollbar_id, STBTE__layer));
+#ifndef STBTE_NO_PROPS
+   n = stbte__text_width("prop:")+2;
+   stbte__draw_text(x0,y+2, "prop:", w, STBTE__TEXTCOLOR(STBTE__cpanel));
+   i = w - n - 4;
+   if (i > 50) i = 50;
+   if (stbte__button(STBTE__clayer_button, propmodes[tm->propmode], x0+n,y,0,i, STBTE__ID(STBTE__layer,256), 0,0))
+      tm->propmode = (tm->propmode+1)%3;
+#endif
+}
+
+static void stbte__categories(stbte_tilemap *tm, int x0, int y0, int w, int h)
+{
+   int s=11, x,y, i;
+   int num_rows = h / s;
+
+   w -= 4;
+   x = x0+2;
+   y = y0+4;
+   if (tm->category_scroll == 0) {
+      if (stbte__category_button("*ALL*", x,y, w, STBTE__ID(STBTE__categories, 65535), tm->cur_category == -1)) {
+         stbte__choose_category(tm, -1);
+      }
+      y += s;
+   }
+
+   for (i=0; i < tm->num_categories; ++i) {
+      if (i+1 - tm->category_scroll >= 0 && i+1 - tm->category_scroll < num_rows) {
+         if (y + 10 > y0+h)
+            return;
+         if (stbte__category_button(tm->categories[i], x,y,w, STBTE__ID(STBTE__categories,i), tm->cur_category == i))
+            stbte__choose_category(tm, i);
+         y += s;
+      }
+   }
+   stbte__scrollbar(x0+w, y0+4, y0+h-4, &tm->category_scroll, 0, tm->num_categories+1, num_rows, STBTE__ID(STBTE__scrollbar_id, STBTE__categories));
+}
+
+static void stbte__tile_in_palette(stbte_tilemap *tm, int x, int y, int slot)
+{
+   stbte__tileinfo *t = &tm->tiles[slot];
+   int x0=x, y0=y, x1 = x+tm->palette_spacing_x - 1, y1 = y+tm->palette_spacing_y;
+   int id = STBTE__ID(STBTE__palette, slot);
+   stbte__hittest(x0,y0,x1,y1, id);
+   switch (stbte__ui.event) {
+      case STBTE__paint:
+         stbte__draw_rect(x,y,x+tm->palette_spacing_x-1,y+tm->palette_spacing_x-1, STBTE_COLOR_TILEPALETTE_BACKGROUND);
+         STBTE_DRAW_TILE(x,y,id, slot == tm->cur_tile,0);
+         if (slot == tm->cur_tile)
+            stbte__draw_frame_delayed(x-1,y-1,x+tm->palette_spacing_x,y+tm->palette_spacing_y, STBTE_COLOR_TILEPALETTE_OUTLINE);
+         break;
+      default:
+         if (stbte__button_core(id))
+            tm->cur_tile = slot;
+         break;
+   }
+}
+
+static void stbte__palette_of_tiles(stbte_tilemap *tm, int x0, int y0, int w, int h)
+{
+   int i,x,y;
+   int num_vis_rows = (h-6) / tm->palette_spacing_y;
+   int num_columns = (w-2-6) / tm->palette_spacing_x;
+   int num_total_rows;
+   int column,row;
+   int x1 = x0+w, y1=y0+h;
+   x = x0+2;
+   y = y0+6;
+
+   if (num_columns == 0)
+      return;
+
+   num_total_rows = (tm->cur_palette_count + num_columns-1) / num_columns; // ceil()
+
+   column = 0;
+   row    = -tm->palette_scroll;
+   for (i=0; i < tm->num_tiles; ++i) {
+      stbte__tileinfo *t = &tm->tiles[i];
+
+      // filter based on category
+      if (tm->cur_category >= 0 && t->category_id != tm->cur_category)
+         continue;
+
+      // display it
+      if (row >= 0 && row < num_vis_rows) {
+         x = x0 + 2 + tm->palette_spacing_x * column;
+         y = y0 + 6 + tm->palette_spacing_y * row;
+         stbte__tile_in_palette(tm,x,y,i);
+      }
+
+      ++column;
+      if (column == num_columns) {
+         column = 0;
+         ++row;
+      }
+   }
+   stbte__flush_delay();
+   stbte__scrollbar(x1-4, y0+6, y1-2, &tm->palette_scroll, 0, num_total_rows, num_vis_rows, STBTE__ID(STBTE__scrollbar_id, STBTE__palette));
+}
+
+static float stbte__saved;
+static void stbte__props_panel(stbte_tilemap *tm, int x0, int y0, int w, int h)
+{
+   int x1 = x0+w;
+   int i;
+   int y = y0 + 5, x = x0+2;
+   int slider_width = 60;
+   int mx,my;
+   float *p;
+   short *data;
+   if (!stbte__is_single_selection())
+      return;
+   mx = stbte__ui.select_x0;
+   my = stbte__ui.select_y0;
+   p = tm->props[my][mx];
+   data = tm->data[my][mx];
+   STBTE__NOTUSED(data);
+   for (i=0; i < STBTE_MAX_PROPERTIES; ++i) {
+      unsigned int n = STBTE_PROP_TYPE(i, data, p);
+      if (n) {
+         char *s = (char*) STBTE_PROP_NAME(i, data, p);
+         if (s == NULL) s = (char*) "";
+         switch (n & 3) {
+            case STBTE_PROP_bool: {
+               int flag = (int) p[i];
+               if (stbte__layerbutton(x,y, flag ? 'x' : ' ', STBTE__ID(STBTE__prop_flag,i), flag, 0, 2)) {
+                  stbte__begin_undo(tm);
+                  stbte__undo_record_prop_float(tm,mx,my,i,(float) flag);
+                  p[i] = (float) !flag;
+                  stbte__end_undo(tm);
+               }
+               stbte__draw_text(x+13,y+1,s,x1-(x+13)-2,STBTE__TEXTCOLOR(STBTE__cpanel));
+               y += 13;
+               break;
+            }
+            case STBTE_PROP_int: {
+               int a = (int) STBTE_PROP_MIN(i,data,p);
+               int b = (int) STBTE_PROP_MAX(i,data,p);
+               int v = (int) p[i] - a;
+               if (a+v != p[i] || v < 0 || v > b-a) {
+                  if (v < 0) v = 0;
+                  if (v > b-a) v = b-a;
+                  p[i] = (float) (a+v); // @TODO undo
+               }
+               switch (stbte__slider(x, slider_width, y+7, b-a, &v, STBTE__ID(STBTE__prop_int,i)))
+               {
+                  case STBTE__begin:
+                     stbte__saved = p[i];
+                     // fallthrough
+                  case STBTE__change:
+                     p[i] = (float) (a+v); // @TODO undo
+                     break;
+                  case STBTE__end:
+                     if (p[i] != stbte__saved) {
+                        stbte__begin_undo(tm);
+                        stbte__undo_record_prop_float(tm,mx,my,i,stbte__saved);
+                        stbte__end_undo(tm);
+                     }
+                     break;
+               }
+               stbte__draw_text(x+slider_width+2,y+2, s, x1-1-(x+slider_width+2), STBTE__TEXTCOLOR(STBTE__cpanel));
+               y += 12;
+               break;
+            }
+            case STBTE_PROP_float: {
+               float a = (float) STBTE_PROP_MIN(i, data,p);
+               float b = (float) STBTE_PROP_MAX(i, data,p);
+               float c = STBTE_PROP_FLOAT_SCALE(i, data, p);
+               float old;
+               if (p[i] < a || p[i] > b) {
+                  // @TODO undo
+                  if (p[i] < a) p[i] = a;
+                  if (p[i] > b) p[i] = b;
+               }
+               old = p[i];
+               switch (stbte__float_control(x, y, 50, a, b, c, "%8.4f", &p[i], STBTE__layer,STBTE__ID(STBTE__prop_float,i))) {
+                  case STBTE__begin:
+                     stbte__saved = old;
+                     break;
+                  case STBTE__end:
+                     if (stbte__saved != p[i]) {
+                        stbte__begin_undo(tm);
+                        stbte__undo_record_prop_float(tm,mx,my,i, stbte__saved);
+                        stbte__end_undo(tm);
+                     }
+                     break;
+               }
+               stbte__draw_text(x+53,y+1, s, x1-1-(x+53), STBTE__TEXTCOLOR(STBTE__cpanel));
+               y += 12;
+               break;
+            }
+         }
+      }
+   }
+}
+
+static int stbte__cp_mode, stbte__cp_aspect, stbte__save, stbte__cp_altered;
+#ifdef STBTE__COLORPICKER
+static int stbte__cp_state, stbte__cp_index, stbte__color_copy;
+static void stbte__dump_colorstate(void)
+{
+   int i,j,k;
+   printf("static int stbte__color_table[STBTE__num_color_modes][STBTE__num_color_aspects][STBTE__num_color_states] =\n");
+   printf("{\n");
+   printf("   {\n");
+   for (k=0; k < STBTE__num_color_modes; ++k) {
+      for (j=0; j < STBTE__num_color_aspects; ++j) {
+         printf("      { ");
+         for (i=0; i < STBTE__num_color_states; ++i) {
+            printf("0x%06x, ", stbte__color_table[k][j][i]);
+         }
+         printf("},\n");
+      }
+      if (k+1 < STBTE__num_color_modes)
+         printf("   }, {\n");
+      else
+         printf("   },\n");
+   }
+   printf("};\n");
+}
+
+static void stbte__colorpicker(int x0, int y0, int w, int h)
+{
+   int x1 = x0+w, y1 = y0+h, x,y, i;
+
+   x =  x0+2; y = y0+6;
+
+   y += 5;
+   x += 8;
+
+
+   {
+      int color = stbte__color_table[stbte__cp_mode][stbte__cp_aspect][stbte__cp_index];
+      int rgb[3];
+      if (stbte__cp_altered && stbte__cp_index == STBTE__idle)
+         color = stbte__save;
+
+      if (stbte__minibutton(STBTE__cmapsize, x1-20,y+ 5, 'C', STBTE__ID2(STBTE__colorpick_id,4,0)))
+         stbte__color_copy = color;
+      if (stbte__minibutton(STBTE__cmapsize, x1-20,y+15, 'P', STBTE__ID2(STBTE__colorpick_id,4,1)))
+         color = stbte__color_copy;
+
+      rgb[0] = color >> 16; rgb[1] = (color>>8)&255; rgb[2] = color & 255;
+      for (i=0; i < 3; ++i) {
+         if (stbte__slider(x+8,64, y, 255, rgb+i, STBTE__ID2(STBTE__colorpick_id,3,i)) > 0)
+            stbte__dump_colorstate();
+         y += 15;
+      }
+      if (stbte__ui.event != STBTE__paint && stbte__ui.event != STBTE__tick)
+         stbte__color_table[stbte__cp_mode][stbte__cp_aspect][stbte__cp_index] = (rgb[0]<<16)|(rgb[1]<<8)|(rgb[2]);
+   }
+
+   y += 5;
+
+   // states
+   x = x0+2+35;
+   if (stbte__ui.event == STBTE__paint) {
+      static char *states[] = { "idle", "over", "down", "down&over", "selected", "selected&over", "disabled" };
+      stbte__draw_text(x, y+1, states[stbte__cp_index], x1-x-1, 0xffffff);
+   }
+
+   x = x0+24; y += 12;
+
+   for (i=3; i >= 0; --i) {
+      int state = 0 != (stbte__cp_state & (1 << i));
+      if (stbte__layerbutton(x,y, "OASD"[i], STBTE__ID2(STBTE__colorpick_id, 0,i), state,0, STBTE__clayer_button)) {
+         stbte__cp_state ^= (1 << i);
+         stbte__cp_index = stbte__state_to_index[0][0][0][stbte__cp_state];
+      }
+      x += 16;
+   }
+   x = x0+2; y += 18;
+
+   for (i=0; i < 3; ++i) {
+      static char *labels[] = { "Base", "Edge", "Text" };
+      if (stbte__button(STBTE__ctoolbar_button, labels[i], x,y,0,36, STBTE__ID2(STBTE__colorpick_id,1,i), stbte__cp_aspect==i,0))
+         stbte__cp_aspect = i;
+      x += 40;
+   }
+
+   y += 18;
+   x = x0+2;
+
+   for (i=0; i < STBTE__num_color_modes; ++i) {
+      if (stbte__button(STBTE__ctoolbar_button, stbte__color_names[i], x, y, 0,80, STBTE__ID2(STBTE__colorpick_id,2,i), stbte__cp_mode == i,0))
+         stbte__cp_mode = i;
+      y += 12;
+   }
+
+   // make the currently selected aspect flash, unless we're actively dragging color slider etc
+   if (stbte__ui.event == STBTE__tick) {
+      stbte__save = stbte__color_table[stbte__cp_mode][stbte__cp_aspect][STBTE__idle];
+      if ((stbte__ui.active_id & 127) != STBTE__colorpick_id) {
+         if ((stbte__ui.ms_time & 2047) < 200) {
+            stbte__color_table[stbte__cp_mode][stbte__cp_aspect][STBTE__idle] ^= 0x1f1f1f;
+            stbte__cp_altered = 1;
+         }
+      }
+   }
+}
+#endif
+
+static void stbte__editor_traverse(stbte_tilemap *tm)
+{
+   int i,j,i0,j0,i1,j1,n;
+
+   if (tm == NULL)
+      return;
+   if (stbte__ui.x0 == stbte__ui.x1 || stbte__ui.y0 == stbte__ui.y1)
+      return;
+
+   stbte__prepare_tileinfo(tm);
+
+   stbte__compute_panel_locations(tm); // @OPTIMIZE: we don't need to recompute this every time
+
+   if (stbte__ui.event == STBTE__paint) {
+      // fill screen with border
+      stbte__draw_rect(stbte__ui.x0, stbte__ui.y0, stbte__ui.x1, stbte__ui.y1, STBTE_COLOR_TILEMAP_BORDER);
+      // fill tilemap with tilemap background
+      stbte__draw_rect(stbte__ui.x0 - tm->scroll_x, stbte__ui.y0 - tm->scroll_y,
+                       stbte__ui.x0 - tm->scroll_x + tm->spacing_x * tm->max_x,
+                       stbte__ui.y0 - tm->scroll_y + tm->spacing_y * tm->max_y, STBTE_COLOR_TILEMAP_BACKGROUND);
+   }
+
+   // step 1: traverse all the tilemap data...
+
+   i0 = (tm->scroll_x - tm->spacing_x) / tm->spacing_x;
+   j0 = (tm->scroll_y - tm->spacing_y) / tm->spacing_y;
+   i1 = (tm->scroll_x + stbte__ui.x1 - stbte__ui.x0) / tm->spacing_x + 1;
+   j1 = (tm->scroll_y + stbte__ui.y1 - stbte__ui.y0) / tm->spacing_y + 1;
+
+   if (i0 < 0) i0 = 0;
+   if (j0 < 0) j0 = 0;
+   if (i1 > tm->max_x) i1 = tm->max_x;
+   if (j1 > tm->max_y) j1 = tm->max_y;
+
+   if (stbte__ui.event == STBTE__paint) {
+      // draw all of layer 0, then all of layer 1, etc, instead of old
+      // way which drew entire stack of each tile at once
+      for (n=0; n < tm->num_layers; ++n) {
+         for (j=j0; j < j1; ++j) {
+            for (i=i0; i < i1; ++i) {
+               int x = stbte__ui.x0 + i * tm->spacing_x - tm->scroll_x;
+               int y = stbte__ui.y0 + j * tm->spacing_y - tm->scroll_y;
+               stbte__tile_paint(tm, x, y, i, j, n);
+            }
+         }
+         if (n == 0 && stbte__ui.show_grid == 1) {
+            int x = stbte__ui.x0 + i0 * tm->spacing_x - tm->scroll_x;
+            int y = stbte__ui.y0 + j0 * tm->spacing_y - tm->scroll_y;
+            for (i=0; x < stbte__ui.x1 && i <= i1; ++i, x += tm->spacing_x)
+               stbte__draw_rect(x, stbte__ui.y0, x+1, stbte__ui.y1, STBTE_COLOR_GRID);
+            for (j=0; y < stbte__ui.y1 && j <= j1; ++j, y += tm->spacing_y)
+               stbte__draw_rect(stbte__ui.x0, y, stbte__ui.x1, y+1, STBTE_COLOR_GRID);
+         }
+      }
+   }
+
+   if (stbte__ui.event == STBTE__paint) {
+      // draw grid on top of everything except UI
+      if (stbte__ui.show_grid == 2) {
+         int x = stbte__ui.x0 + i0 * tm->spacing_x - tm->scroll_x;
+         int y = stbte__ui.y0 + j0 * tm->spacing_y - tm->scroll_y;
+         for (i=0; x < stbte__ui.x1 && i <= i1; ++i, x += tm->spacing_x)
+            stbte__draw_rect(x, stbte__ui.y0, x+1, stbte__ui.y1, STBTE_COLOR_GRID);
+         for (j=0; y < stbte__ui.y1 && j <= j1; ++j, y += tm->spacing_y)
+            stbte__draw_rect(stbte__ui.x0, y, stbte__ui.x1, y+1, STBTE_COLOR_GRID);
+      }
+   }
+
+   for (j=j0; j < j1; ++j) {
+      for (i=i0; i < i1; ++i) {
+         int x = stbte__ui.x0 + i * tm->spacing_x - tm->scroll_x;
+         int y = stbte__ui.y0 + j * tm->spacing_y - tm->scroll_y;
+         stbte__tile(tm, x, y, i, j);
+      }
+   }
+
+   if (stbte__ui.event == STBTE__paint) {
+      // draw the selection border
+      if (stbte__ui.has_selection) {
+         int x0,y0,x1,y1;
+         x0 = stbte__ui.x0 + (stbte__ui.select_x0    ) * tm->spacing_x - tm->scroll_x;
+         y0 = stbte__ui.y0 + (stbte__ui.select_y0    ) * tm->spacing_y - tm->scroll_y;
+         x1 = stbte__ui.x0 + (stbte__ui.select_x1 + 1) * tm->spacing_x - tm->scroll_x + 1;
+         y1 = stbte__ui.y0 + (stbte__ui.select_y1 + 1) * tm->spacing_y - tm->scroll_y + 1;
+         stbte__draw_frame(x0,y0,x1,y1, (stbte__ui.ms_time & 256 ? STBTE_COLOR_SELECTION_OUTLINE1 : STBTE_COLOR_SELECTION_OUTLINE2));
+      }
+
+      stbte__flush_delay(); // draw a dynamic link on top of the queued links
+
+      #ifdef STBTE_ALLOW_LINK
+      if (stbte__ui.linking && STBTE__IS_MAP_HOT()) {
+         int x0,y0,x1,y1;
+         int color;
+         int ex = ((stbte__ui.hot_id >> 19) & 4095);
+         int ey = ((stbte__ui.hot_id >>  7) & 4095);
+         x0 = stbte__ui.x0 + (stbte__ui.sx    ) * tm->spacing_x - tm->scroll_x + (tm->spacing_x>>1)+1;
+         y0 = stbte__ui.y0 + (stbte__ui.sy    ) * tm->spacing_y - tm->scroll_y + (tm->spacing_y>>1)+1;
+         x1 = stbte__ui.x0 + (ex              ) * tm->spacing_x - tm->scroll_x + (tm->spacing_x>>1)-1;
+         y1 = stbte__ui.y0 + (ey              ) * tm->spacing_y - tm->scroll_y + (tm->spacing_y>>1)-1;
+         if (STBTE_ALLOW_LINK(tm->data[stbte__ui.sy][stbte__ui.sx], tm->props[stbte__ui.sy][stbte__ui.sx], tm->data[ey][ex], tm->props[ey][ex]))
+            color = STBTE_LINK_COLOR_DRAWING;
+         else
+            color = STBTE_LINK_COLOR_DISALLOWED;
+         stbte__draw_link(x0,y0,x1,y1, color);
+      }
+      #endif
+   }
+   stbte__flush_delay();
+
+   // step 2: traverse the panels
+   for (i=0; i < STBTE__num_panel; ++i) {
+      stbte__panel *p = &stbte__ui.panel[i];
+      if (stbte__ui.event == STBTE__paint) {
+         stbte__draw_box(p->x0,p->y0,p->x0+p->width,p->y0+p->height, STBTE__cpanel, STBTE__idle);
+      }
+      // obscure tilemap data underneath panel
+      stbte__hittest(p->x0,p->y0,p->x0+p->width,p->y0+p->height, STBTE__ID2(STBTE__panel, i, 0));
+      switch (i) {
+         case STBTE__panel_toolbar:
+            if (stbte__ui.event == STBTE__paint)
+               stbte__draw_rect(p->x0,p->y0,p->x0+p->width,p->y0+p->height, stbte__color_table[STBTE__ctoolbar][STBTE__base][STBTE__idle]);
+            stbte__toolbar(tm,p->x0,p->y0,p->width,p->height);
+            break;
+         case STBTE__panel_info:
+            stbte__info(tm,p->x0,p->y0,p->width,p->height);
+            break;
+         case STBTE__panel_layers:
+            stbte__layers(tm,p->x0,p->y0,p->width,p->height);
+            break;
+         case STBTE__panel_categories:
+            stbte__categories(tm,p->x0,p->y0,p->width,p->height);
+            break;
+         case STBTE__panel_colorpick:
+#ifdef STBTE__COLORPICKER
+            stbte__colorpicker(p->x0,p->y0,p->width,p->height);
+#endif
+            break;
+         case STBTE__panel_tiles:
+            // erase boundary between categories and tiles if they're on same side
+            if (stbte__ui.event == STBTE__paint && p->side == stbte__ui.panel[STBTE__panel_categories].side)
+               stbte__draw_rect(p->x0+1,p->y0-1,p->x0+p->width-1,p->y0+1, stbte__color_table[STBTE__cpanel][STBTE__base][STBTE__idle]);
+            stbte__palette_of_tiles(tm,p->x0,p->y0,p->width,p->height);
+            break;
+         case STBTE__panel_props:
+            stbte__props_panel(tm,p->x0,p->y0,p->width,p->height);
+            break;
+      }
+      // draw the panel side selectors
+      for (j=0; j < 2; ++j) {
+         int result;
+         if (i == STBTE__panel_toolbar) continue;
+         result = stbte__microbutton(p->x0+p->width - 1 - 2*4 + 4*j,p->y0+2,3, STBTE__ID2(STBTE__panel, i, j+1), STBTE__cpanel_sider+j);
+         if (result) {
+            switch (j) {
+               case 0: p->side = result > 0 ? STBTE__side_left : STBTE__side_right; break;
+               case 1: p->delta_height += result; break;
+            }
+         }
+      }
+   }
+
+   if (stbte__ui.panel[STBTE__panel_categories].delta_height < -5) stbte__ui.panel[STBTE__panel_categories].delta_height = -5;
+   if (stbte__ui.panel[STBTE__panel_layers    ].delta_height < -5) stbte__ui.panel[STBTE__panel_layers    ].delta_height = -5;
+
+
+   // step 3: traverse the regions to place expander controls on them
+   for (i=0; i < 2; ++i) {
+      if (stbte__region[i].active) {
+         int x = stbte__region[i].x;
+         int width;
+         if (i == STBTE__side_left)
+            width =  stbte__ui.left_width , x += stbte__region[i].width + 1;
+         else
+            width = -stbte__ui.right_width, x -= 6;
+         if (stbte__microbutton_dragger(x, stbte__region[i].y+2, 5, STBTE__ID(STBTE__region,i), &width)) {
+            // if non-0, it is expanding, so retract it
+            if (stbte__region[i].retracted == 0.0)
+               stbte__region[i].retracted = 0.01f;
+            else
+               stbte__region[i].retracted = 0.0;
+         }
+         if (i == STBTE__side_left)
+            stbte__ui.left_width  =  width;
+         else
+            stbte__ui.right_width = -width;
+         if (stbte__ui.event == STBTE__tick) {
+            if (stbte__region[i].retracted && stbte__region[i].retracted < 1.0f) {
+               stbte__region[i].retracted += stbte__ui.dt*4;
+               if (stbte__region[i].retracted > 1)
+                  stbte__region[i].retracted = 1;
+            }
+         }
+      }
+   }
+
+   if (stbte__ui.event == STBTE__paint && stbte__ui.alert_msg) {
+      int w = stbte__text_width(stbte__ui.alert_msg);
+      int x = (stbte__ui.x0+stbte__ui.x1)/2;
+      int y = (stbte__ui.y0+stbte__ui.y1)*5/6;
+      stbte__draw_rect (x-w/2-4,y-8, x+w/2+4,y+8, 0x604020);
+      stbte__draw_frame(x-w/2-4,y-8, x+w/2+4,y+8, 0x906030);
+      stbte__draw_text (x-w/2,y-4, stbte__ui.alert_msg, w+1, 0xff8040);
+   }
+
+#ifdef STBTE_SHOW_CURSOR
+   if (stbte__ui.event == STBTE__paint)
+      stbte__draw_bitmap(stbte__ui.mx, stbte__ui.my, stbte__get_char_width(26), stbte__get_char_bitmap(26), 0xe0e0e0);
+#endif
+
+   if (stbte__ui.event == STBTE__tick && stbte__ui.alert_msg) {
+      stbte__ui.alert_timer -= stbte__ui.dt;
+      if (stbte__ui.alert_timer < 0) {
+         stbte__ui.alert_timer = 0;
+         stbte__ui.alert_msg = 0;
+      }
+   }
+
+   if (stbte__ui.event == STBTE__paint) {
+      stbte__color_table[stbte__cp_mode][stbte__cp_aspect][STBTE__idle] = stbte__save;
+      stbte__cp_altered = 0;
+   }
+}
+
+static void stbte__do_event(stbte_tilemap *tm)
+{
+   stbte__ui.next_hot_id = 0;
+   stbte__editor_traverse(tm);
+   stbte__ui.hot_id = stbte__ui.next_hot_id;
+
+   // automatically cancel on mouse-up in case the object that triggered it
+   // doesn't exist anymore
+   if (stbte__ui.active_id) {
+      if (stbte__ui.event == STBTE__leftup || stbte__ui.event == STBTE__rightup) {
+         if (!stbte__ui.pasting) {
+            stbte__activate(0);
+            if (stbte__ui.undoing)
+               stbte__end_undo(tm);
+            stbte__ui.scrolling = 0;
+            stbte__ui.dragging = 0;
+            stbte__ui.linking = 0;
+         }
+      }
+   }
+
+   // we could do this stuff in the widgets directly, but it would keep recomputing
+   // the same thing on every tile, which seems dumb.
+
+   if (stbte__ui.pasting) {
+      if (STBTE__IS_MAP_HOT()) {
+         // compute pasting location based on last hot
+         stbte__ui.paste_x = ((stbte__ui.hot_id >> 19) & 4095) - (stbte__ui.copy_width >> 1);
+         stbte__ui.paste_y = ((stbte__ui.hot_id >>  7) & 4095) - (stbte__ui.copy_height >> 1);
+      }
+   }
+   if (stbte__ui.dragging) {
+      if (STBTE__IS_MAP_HOT()) {
+         stbte__ui.drag_dest_x = ((stbte__ui.hot_id >> 19) & 4095) - stbte__ui.drag_offx;
+         stbte__ui.drag_dest_y = ((stbte__ui.hot_id >>  7) & 4095) - stbte__ui.drag_offy;
+      }
+   }
+}
+
+static void stbte__set_event(int event, int x, int y)
+{
+   stbte__ui.event = event;
+   stbte__ui.mx    = x;
+   stbte__ui.my    = y;
+   stbte__ui.dx    = x - stbte__ui.last_mouse_x;
+   stbte__ui.dy    = y - stbte__ui.last_mouse_y;
+   stbte__ui.last_mouse_x = x;
+   stbte__ui.last_mouse_y = y;
+   stbte__ui.accum_x += stbte__ui.dx;
+   stbte__ui.accum_y += stbte__ui.dy;
+}
+
+void stbte_draw(stbte_tilemap *tm)
+{
+   stbte__ui.event = STBTE__paint;
+   stbte__editor_traverse(tm);
+}
+
+void stbte_mouse_move(stbte_tilemap *tm, int x, int y, int shifted, int scrollkey)
+{
+   stbte__set_event(STBTE__mousemove, x,y);
+   stbte__ui.shift = shifted;
+   stbte__ui.scrollkey = scrollkey;
+   stbte__do_event(tm);
+}
+
+void stbte_mouse_button(stbte_tilemap *tm, int x, int y, int right, int down, int shifted, int scrollkey)
+{
+   static int events[2][2] = { { STBTE__leftup , STBTE__leftdown  },
+                               { STBTE__rightup, STBTE__rightdown } };
+   stbte__set_event(events[right][down], x,y);
+   stbte__ui.shift = shifted;
+   stbte__ui.scrollkey = scrollkey;
+
+   stbte__do_event(tm);
+}
+
+void stbte_mouse_wheel(stbte_tilemap *tm, int x, int y, int vscroll)
+{
+   // not implemented yet -- need different way of hittesting
+}
+
+void stbte_action(stbte_tilemap *tm, enum stbte_action act)
+{
+   switch (act) {
+      case STBTE_tool_select:      stbte__ui.tool = STBTE__tool_select;               break;
+      case STBTE_tool_brush:       stbte__ui.tool = STBTE__tool_brush;                break;
+      case STBTE_tool_erase:       stbte__ui.tool = STBTE__tool_erase;                break;
+      case STBTE_tool_rectangle:   stbte__ui.tool = STBTE__tool_rect;                 break;
+      case STBTE_tool_eyedropper:  stbte__ui.tool = STBTE__tool_eyedrop;              break;
+      case STBTE_tool_link:        stbte__ui.tool = STBTE__tool_link;                 break;
+      case STBTE_act_toggle_grid:  stbte__ui.show_grid = (stbte__ui.show_grid+1) % 3; break;
+      case STBTE_act_toggle_links: stbte__ui.show_links ^= 1;                         break;
+      case STBTE_act_undo:         stbte__undo(tm);                                   break;
+      case STBTE_act_redo:         stbte__redo(tm);                                   break;
+      case STBTE_act_cut:          stbte__copy_cut(tm, 1);                            break;
+      case STBTE_act_copy:         stbte__copy_cut(tm, 0);                            break;
+      case STBTE_act_paste:        stbte__start_paste(tm);                            break;
+      case STBTE_scroll_left:      tm->scroll_x -= tm->spacing_x;                     break;
+      case STBTE_scroll_right:     tm->scroll_x += tm->spacing_x;                     break;
+      case STBTE_scroll_up:        tm->scroll_y -= tm->spacing_y;                     break;
+      case STBTE_scroll_down:      tm->scroll_y += tm->spacing_y;                     break;
+   }
+}
+
+void stbte_tick(stbte_tilemap *tm, float dt)
+{
+   stbte__ui.event = STBTE__tick;
+   stbte__ui.dt    = dt;
+   stbte__do_event(tm);
+   stbte__ui.ms_time += (int) (dt * 1024) + 1; // make sure if time is superfast it always updates a little
+}
+
+void stbte_mouse_sdl(stbte_tilemap *tm, const void *sdl_event, float xs, float ys, int xo, int yo)
+{
+#ifdef _SDL_H
+   SDL_Event *event = (SDL_Event *) sdl_event;
+   SDL_Keymod km = SDL_GetModState();
+   int shift = (km & KMOD_LCTRL) || (km & KMOD_RCTRL);
+   int scrollkey = 0 != SDL_GetKeyboardState(NULL)[SDL_SCANCODE_SPACE];
+   switch (event->type) {
+      case SDL_MOUSEMOTION:
+         stbte_mouse_move(tm, (int) (xs*event->motion.x+xo), (int) (ys*event->motion.y+yo), shift, scrollkey);
+         break;
+      case SDL_MOUSEBUTTONUP:
+         stbte_mouse_button(tm, (int) (xs*event->button.x+xo), (int) (ys*event->button.y+yo), event->button.button != SDL_BUTTON_LEFT, 0, shift, scrollkey);
+         break;
+      case SDL_MOUSEBUTTONDOWN:
+         stbte_mouse_button(tm, (int) (xs*event->button.x+xo), (int) (ys*event->button.y+yo), event->button.button != SDL_BUTTON_LEFT, 1, shift, scrollkey);
+         break;
+      case SDL_MOUSEWHEEL:
+         stbte_mouse_wheel(tm, stbte__ui.mx, stbte__ui.my, event->wheel.y);
+         break;
+   }
+#else
+   STBTE__NOTUSED(tm);
+   STBTE__NOTUSED(sdl_event);
+   STBTE__NOTUSED(xs);
+   STBTE__NOTUSED(ys);
+   STBTE__NOTUSED(xo);
+   STBTE__NOTUSED(yo);
+#endif
+}
+
+#endif // STB_TILEMAP_EDITOR_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_truetype.h b/vendor/stb/stb_truetype.h
new file mode 100644
index 0000000..90a5c2e
--- /dev/null
+++ b/vendor/stb/stb_truetype.h
@@ -0,0 +1,5079 @@
+// stb_truetype.h - v1.26 - public domain
+// authored from 2009-2021 by Sean Barrett / RAD Game Tools
+//
+// =======================================================================
+//
+//    NO SECURITY GUARANTEE -- DO NOT USE THIS ON UNTRUSTED FONT FILES
+//
+// This library does no range checking of the offsets found in the file,
+// meaning an attacker can use it to read arbitrary memory.
+//
+// =======================================================================
+//
+//   This library processes TrueType files:
+//        parse files
+//        extract glyph metrics
+//        extract glyph shapes
+//        render glyphs to one-channel bitmaps with antialiasing (box filter)
+//        render glyphs to one-channel SDF bitmaps (signed-distance field/function)
+//
+//   Todo:
+//        non-MS cmaps
+//        crashproof on bad data
+//        hinting? (no longer patented)
+//        cleartype-style AA?
+//        optimize: use simple memory allocator for intermediates
+//        optimize: build edge-list directly from curves
+//        optimize: rasterize directly from curves?
+//
+// ADDITIONAL CONTRIBUTORS
+//
+//   Mikko Mononen: compound shape support, more cmap formats
+//   Tor Andersson: kerning, subpixel rendering
+//   Dougall Johnson: OpenType / Type 2 font handling
+//   Daniel Ribeiro Maciel: basic GPOS-based kerning
+//
+//   Misc other:
+//       Ryan Gordon
+//       Simon Glass
+//       github:IntellectualKitty
+//       Imanol Celaya
+//       Daniel Ribeiro Maciel
+//
+//   Bug/warning reports/fixes:
+//       "Zer" on mollyrocket       Fabian "ryg" Giesen   github:NiLuJe
+//       Cass Everitt               Martins Mozeiko       github:aloucks
+//       stoiko (Haemimont Games)   Cap Petschulat        github:oyvindjam
+//       Brian Hook                 Omar Cornut           github:vassvik
+//       Walter van Niftrik         Ryan Griege
+//       David Gow                  Peter LaValle
+//       David Given                Sergey Popov
+//       Ivan-Assen Ivanov          Giumo X. Clanjor
+//       Anthony Pesch              Higor Euripedes
+//       Johan Duparc               Thomas Fields
+//       Hou Qiming                 Derek Vinyard
+//       Rob Loach                  Cort Stratton
+//       Kenney Phillis Jr.         Brian Costabile
+//       Ken Voskuil (kaesve)       Yakov Galka
+//
+// VERSION HISTORY
+//
+//   1.26 (2021-08-28) fix broken rasterizer
+//   1.25 (2021-07-11) many fixes
+//   1.24 (2020-02-05) fix warning
+//   1.23 (2020-02-02) query SVG data for glyphs; query whole kerning table (but only kern not GPOS)
+//   1.22 (2019-08-11) minimize missing-glyph duplication; fix kerning if both 'GPOS' and 'kern' are defined
+//   1.21 (2019-02-25) fix warning
+//   1.20 (2019-02-07) PackFontRange skips missing codepoints; GetScaleFontVMetrics()
+//   1.19 (2018-02-11) GPOS kerning, STBTT_fmod
+//   1.18 (2018-01-29) add missing function
+//   1.17 (2017-07-23) make more arguments const; doc fix
+//   1.16 (2017-07-12) SDF support
+//   1.15 (2017-03-03) make more arguments const
+//   1.14 (2017-01-16) num-fonts-in-TTC function
+//   1.13 (2017-01-02) support OpenType fonts, certain Apple fonts
+//   1.12 (2016-10-25) suppress warnings about casting away const with -Wcast-qual
+//   1.11 (2016-04-02) fix unused-variable warning
+//   1.10 (2016-04-02) user-defined fabs(); rare memory leak; remove duplicate typedef
+//   1.09 (2016-01-16) warning fix; avoid crash on outofmem; use allocation userdata properly
+//   1.08 (2015-09-13) document stbtt_Rasterize(); fixes for vertical & horizontal edges
+//   1.07 (2015-08-01) allow PackFontRanges to accept arrays of sparse codepoints;
+//                     variant PackFontRanges to pack and render in separate phases;
+//                     fix stbtt_GetFontOFfsetForIndex (never worked for non-0 input?);
+//                     fixed an assert() bug in the new rasterizer
+//                     replace assert() with STBTT_assert() in new rasterizer
+//
+//   Full history can be found at the end of this file.
+//
+// LICENSE
+//
+//   See end of file for license information.
+//
+// USAGE
+//
+//   Include this file in whatever places need to refer to it. In ONE C/C++
+//   file, write:
+//      #define STB_TRUETYPE_IMPLEMENTATION
+//   before the #include of this file. This expands out the actual
+//   implementation into that C/C++ file.
+//
+//   To make the implementation private to the file that generates the implementation,
+//      #define STBTT_STATIC
+//
+//   Simple 3D API (don't ship this, but it's fine for tools and quick start)
+//           stbtt_BakeFontBitmap()               -- bake a font to a bitmap for use as texture
+//           stbtt_GetBakedQuad()                 -- compute quad to draw for a given char
+//
+//   Improved 3D API (more shippable):
+//           #include "stb_rect_pack.h"           -- optional, but you really want it
+//           stbtt_PackBegin()
+//           stbtt_PackSetOversampling()          -- for improved quality on small fonts
+//           stbtt_PackFontRanges()               -- pack and renders
+//           stbtt_PackEnd()
+//           stbtt_GetPackedQuad()
+//
+//   "Load" a font file from a memory buffer (you have to keep the buffer loaded)
+//           stbtt_InitFont()
+//           stbtt_GetFontOffsetForIndex()        -- indexing for TTC font collections
+//           stbtt_GetNumberOfFonts()             -- number of fonts for TTC font collections
+//
+//   Render a unicode codepoint to a bitmap
+//           stbtt_GetCodepointBitmap()           -- allocates and returns a bitmap
+//           stbtt_MakeCodepointBitmap()          -- renders into bitmap you provide
+//           stbtt_GetCodepointBitmapBox()        -- how big the bitmap must be
+//
+//   Character advance/positioning
+//           stbtt_GetCodepointHMetrics()
+//           stbtt_GetFontVMetrics()
+//           stbtt_GetFontVMetricsOS2()
+//           stbtt_GetCodepointKernAdvance()
+//
+//   Starting with version 1.06, the rasterizer was replaced with a new,
+//   faster and generally-more-precise rasterizer. The new rasterizer more
+//   accurately measures pixel coverage for anti-aliasing, except in the case
+//   where multiple shapes overlap, in which case it overestimates the AA pixel
+//   coverage. Thus, anti-aliasing of intersecting shapes may look wrong. If
+//   this turns out to be a problem, you can re-enable the old rasterizer with
+//        #define STBTT_RASTERIZER_VERSION 1
+//   which will incur about a 15% speed hit.
+//
+// ADDITIONAL DOCUMENTATION
+//
+//   Immediately after this block comment are a series of sample programs.
+//
+//   After the sample programs is the "header file" section. This section
+//   includes documentation for each API function.
+//
+//   Some important concepts to understand to use this library:
+//
+//      Codepoint
+//         Characters are defined by unicode codepoints, e.g. 65 is
+//         uppercase A, 231 is lowercase c with a cedilla, 0x7e30 is
+//         the hiragana for "ma".
+//
+//      Glyph
+//         A visual character shape (every codepoint is rendered as
+//         some glyph)
+//
+//      Glyph index
+//         A font-specific integer ID representing a glyph
+//
+//      Baseline
+//         Glyph shapes are defined relative to a baseline, which is the
+//         bottom of uppercase characters. Characters extend both above
+//         and below the baseline.
+//
+//      Current Point
+//         As you draw text to the screen, you keep track of a "current point"
+//         which is the origin of each character. The current point's vertical
+//         position is the baseline. Even "baked fonts" use this model.
+//
+//      Vertical Font Metrics
+//         The vertical qualities of the font, used to vertically position
+//         and space the characters. See docs for stbtt_GetFontVMetrics.
+//
+//      Font Size in Pixels or Points
+//         The preferred interface for specifying font sizes in stb_truetype
+//         is to specify how tall the font's vertical extent should be in pixels.
+//         If that sounds good enough, skip the next paragraph.
+//
+//         Most font APIs instead use "points", which are a common typographic
+//         measurement for describing font size, defined as 72 points per inch.
+//         stb_truetype provides a point API for compatibility. However, true
+//         "per inch" conventions don't make much sense on computer displays
+//         since different monitors have different number of pixels per
+//         inch. For example, Windows traditionally uses a convention that
+//         there are 96 pixels per inch, thus making 'inch' measurements have
+//         nothing to do with inches, and thus effectively defining a point to
+//         be 1.333 pixels. Additionally, the TrueType font data provides
+//         an explicit scale factor to scale a given font's glyphs to points,
+//         but the author has observed that this scale factor is often wrong
+//         for non-commercial fonts, thus making fonts scaled in points
+//         according to the TrueType spec incoherently sized in practice.
+//
+// DETAILED USAGE:
+//
+//  Scale:
+//    Select how high you want the font to be, in points or pixels.
+//    Call ScaleForPixelHeight or ScaleForMappingEmToPixels to compute
+//    a scale factor SF that will be used by all other functions.
+//
+//  Baseline:
+//    You need to select a y-coordinate that is the baseline of where
+//    your text will appear. Call GetFontBoundingBox to get the baseline-relative
+//    bounding box for all characters. SF*-y0 will be the distance in pixels
+//    that the worst-case character could extend above the baseline, so if
+//    you want the top edge of characters to appear at the top of the
+//    screen where y=0, then you would set the baseline to SF*-y0.
+//
+//  Current point:
+//    Set the current point where the first character will appear. The
+//    first character could extend left of the current point; this is font
+//    dependent. You can either choose a current point that is the leftmost
+//    point and hope, or add some padding, or check the bounding box or
+//    left-side-bearing of the first character to be displayed and set
+//    the current point based on that.
+//
+//  Displaying a character:
+//    Compute the bounding box of the character. It will contain signed values
+//    relative to <current_point, baseline>. I.e. if it returns x0,y0,x1,y1,
+//    then the character should be displayed in the rectangle from
+//    <current_point+SF*x0, baseline+SF*y0> to <current_point+SF*x1,baseline+SF*y1).
+//
+//  Advancing for the next character:
+//    Call GlyphHMetrics, and compute 'current_point += SF * advance'.
+//
+//
+// ADVANCED USAGE
+//
+//   Quality:
+//
+//    - Use the functions with Subpixel at the end to allow your characters
+//      to have subpixel positioning. Since the font is anti-aliased, not
+//      hinted, this is very import for quality. (This is not possible with
+//      baked fonts.)
+//
+//    - Kerning is now supported, and if you're supporting subpixel rendering
+//      then kerning is worth using to give your text a polished look.
+//
+//   Performance:
+//
+//    - Convert Unicode codepoints to glyph indexes and operate on the glyphs;
+//      if you don't do this, stb_truetype is forced to do the conversion on
+//      every call.
+//
+//    - There are a lot of memory allocations. We should modify it to take
+//      a temp buffer and allocate from the temp buffer (without freeing),
+//      should help performance a lot.
+//
+// NOTES
+//
+//   The system uses the raw data found in the .ttf file without changing it
+//   and without building auxiliary data structures. This is a bit inefficient
+//   on little-endian systems (the data is big-endian), but assuming you're
+//   caching the bitmaps or glyph shapes this shouldn't be a big deal.
+//
+//   It appears to be very hard to programmatically determine what font a
+//   given file is in a general way. I provide an API for this, but I don't
+//   recommend it.
+//
+//
+// PERFORMANCE MEASUREMENTS FOR 1.06:
+//
+//                      32-bit     64-bit
+//   Previous release:  8.83 s     7.68 s
+//   Pool allocations:  7.72 s     6.34 s
+//   Inline sort     :  6.54 s     5.65 s
+//   New rasterizer  :  5.63 s     5.00 s
+
+//////////////////////////////////////////////////////////////////////////////
+//////////////////////////////////////////////////////////////////////////////
+////
+////  SAMPLE PROGRAMS
+////
+//
+//  Incomplete text-in-3d-api example, which draws quads properly aligned to be lossless.
+//  See "tests/truetype_demo_win32.c" for a complete version.
+#if 0
+#define STB_TRUETYPE_IMPLEMENTATION  // force following include to generate implementation
+#include "stb_truetype.h"
+
+unsigned char ttf_buffer[1<<20];
+unsigned char temp_bitmap[512*512];
+
+stbtt_bakedchar cdata[96]; // ASCII 32..126 is 95 glyphs
+GLuint ftex;
+
+void my_stbtt_initfont(void)
+{
+   fread(ttf_buffer, 1, 1<<20, fopen("c:/windows/fonts/times.ttf", "rb"));
+   stbtt_BakeFontBitmap(ttf_buffer,0, 32.0, temp_bitmap,512,512, 32,96, cdata); // no guarantee this fits!
+   // can free ttf_buffer at this point
+   glGenTextures(1, &ftex);
+   glBindTexture(GL_TEXTURE_2D, ftex);
+   glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, 512,512, 0, GL_ALPHA, GL_UNSIGNED_BYTE, temp_bitmap);
+   // can free temp_bitmap at this point
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
+}
+
+void my_stbtt_print(float x, float y, char *text)
+{
+   // assume orthographic projection with units = screen pixels, origin at top left
+   glEnable(GL_BLEND);
+   glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
+   glEnable(GL_TEXTURE_2D);
+   glBindTexture(GL_TEXTURE_2D, ftex);
+   glBegin(GL_QUADS);
+   while (*text) {
+      if (*text >= 32 && *text < 128) {
+         stbtt_aligned_quad q;
+         stbtt_GetBakedQuad(cdata, 512,512, *text-32, &x,&y,&q,1);//1=opengl & d3d10+,0=d3d9
+         glTexCoord2f(q.s0,q.t0); glVertex2f(q.x0,q.y0);
+         glTexCoord2f(q.s1,q.t0); glVertex2f(q.x1,q.y0);
+         glTexCoord2f(q.s1,q.t1); glVertex2f(q.x1,q.y1);
+         glTexCoord2f(q.s0,q.t1); glVertex2f(q.x0,q.y1);
+      }
+      ++text;
+   }
+   glEnd();
+}
+#endif
+//
+//
+//////////////////////////////////////////////////////////////////////////////
+//
+// Complete program (this compiles): get a single bitmap, print as ASCII art
+//
+#if 0
+#include <stdio.h>
+#define STB_TRUETYPE_IMPLEMENTATION  // force following include to generate implementation
+#include "stb_truetype.h"
+
+char ttf_buffer[1<<25];
+
+int main(int argc, char **argv)
+{
+   stbtt_fontinfo font;
+   unsigned char *bitmap;
+   int w,h,i,j,c = (argc > 1 ? atoi(argv[1]) : 'a'), s = (argc > 2 ? atoi(argv[2]) : 20);
+
+   fread(ttf_buffer, 1, 1<<25, fopen(argc > 3 ? argv[3] : "c:/windows/fonts/arialbd.ttf", "rb"));
+
+   stbtt_InitFont(&font, ttf_buffer, stbtt_GetFontOffsetForIndex(ttf_buffer,0));
+   bitmap = stbtt_GetCodepointBitmap(&font, 0,stbtt_ScaleForPixelHeight(&font, s), c, &w, &h, 0,0);
+
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i)
+         putchar(" .:ioVM@"[bitmap[j*w+i]>>5]);
+      putchar('\n');
+   }
+   return 0;
+}
+#endif
+//
+// Output:
+//
+//     .ii.
+//    @@@@@@.
+//   V@Mio@@o
+//   :i.  V@V
+//     :oM@@M
+//   :@@@MM@M
+//   @@o  o@M
+//  :@@.  M@M
+//   @@@o@@@@
+//   :M@@V:@@.
+//
+//////////////////////////////////////////////////////////////////////////////
+//
+// Complete program: print "Hello World!" banner, with bugs
+//
+#if 0
+char buffer[24<<20];
+unsigned char screen[20][79];
+
+int main(int arg, char **argv)
+{
+   stbtt_fontinfo font;
+   int i,j,ascent,baseline,ch=0;
+   float scale, xpos=2; // leave a little padding in case the character extends left
+   char *text = "Heljo World!"; // intentionally misspelled to show 'lj' brokenness
+
+   fread(buffer, 1, 1000000, fopen("c:/windows/fonts/arialbd.ttf", "rb"));
+   stbtt_InitFont(&font, buffer, 0);
+
+   scale = stbtt_ScaleForPixelHeight(&font, 15);
+   stbtt_GetFontVMetrics(&font, &ascent,0,0);
+   baseline = (int) (ascent*scale);
+
+   while (text[ch]) {
+      int advance,lsb,x0,y0,x1,y1;
+      float x_shift = xpos - (float) floor(xpos);
+      stbtt_GetCodepointHMetrics(&font, text[ch], &advance, &lsb);
+      stbtt_GetCodepointBitmapBoxSubpixel(&font, text[ch], scale,scale,x_shift,0, &x0,&y0,&x1,&y1);
+      stbtt_MakeCodepointBitmapSubpixel(&font, &screen[baseline + y0][(int) xpos + x0], x1-x0,y1-y0, 79, scale,scale,x_shift,0, text[ch]);
+      // note that this stomps the old data, so where character boxes overlap (e.g. 'lj') it's wrong
+      // because this API is really for baking character bitmaps into textures. if you want to render
+      // a sequence of characters, you really need to render each bitmap to a temp buffer, then
+      // "alpha blend" that into the working buffer
+      xpos += (advance * scale);
+      if (text[ch+1])
+         xpos += scale*stbtt_GetCodepointKernAdvance(&font, text[ch],text[ch+1]);
+      ++ch;
+   }
+
+   for (j=0; j < 20; ++j) {
+      for (i=0; i < 78; ++i)
+         putchar(" .:ioVM@"[screen[j][i]>>5]);
+      putchar('\n');
+   }
+
+   return 0;
+}
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//////////////////////////////////////////////////////////////////////////////
+////
+////   INTEGRATION WITH YOUR CODEBASE
+////
+////   The following sections allow you to supply alternate definitions
+////   of C library functions used by stb_truetype, e.g. if you don't
+////   link with the C runtime library.
+
+#ifdef STB_TRUETYPE_IMPLEMENTATION
+   // #define your own (u)stbtt_int8/16/32 before including to override this
+   #ifndef stbtt_uint8
+   typedef unsigned char   stbtt_uint8;
+   typedef signed   char   stbtt_int8;
+   typedef unsigned short  stbtt_uint16;
+   typedef signed   short  stbtt_int16;
+   typedef unsigned int    stbtt_uint32;
+   typedef signed   int    stbtt_int32;
+   #endif
+
+   typedef char stbtt__check_size32[sizeof(stbtt_int32)==4 ? 1 : -1];
+   typedef char stbtt__check_size16[sizeof(stbtt_int16)==2 ? 1 : -1];
+
+   // e.g. #define your own STBTT_ifloor/STBTT_iceil() to avoid math.h
+   #ifndef STBTT_ifloor
+   #include <math.h>
+   #define STBTT_ifloor(x)   ((int) floor(x))
+   #define STBTT_iceil(x)    ((int) ceil(x))
+   #endif
+
+   #ifndef STBTT_sqrt
+   #include <math.h>
+   #define STBTT_sqrt(x)      sqrt(x)
+   #define STBTT_pow(x,y)     pow(x,y)
+   #endif
+
+   #ifndef STBTT_fmod
+   #include <math.h>
+   #define STBTT_fmod(x,y)    fmod(x,y)
+   #endif
+
+   #ifndef STBTT_cos
+   #include <math.h>
+   #define STBTT_cos(x)       cos(x)
+   #define STBTT_acos(x)      acos(x)
+   #endif
+
+   #ifndef STBTT_fabs
+   #include <math.h>
+   #define STBTT_fabs(x)      fabs(x)
+   #endif
+
+   // #define your own functions "STBTT_malloc" / "STBTT_free" to avoid malloc.h
+   #ifndef STBTT_malloc
+   #include <stdlib.h>
+   #define STBTT_malloc(x,u)  ((void)(u),malloc(x))
+   #define STBTT_free(x,u)    ((void)(u),free(x))
+   #endif
+
+   #ifndef STBTT_assert
+   #include <assert.h>
+   #define STBTT_assert(x)    assert(x)
+   #endif
+
+   #ifndef STBTT_strlen
+   #include <string.h>
+   #define STBTT_strlen(x)    strlen(x)
+   #endif
+
+   #ifndef STBTT_memcpy
+   #include <string.h>
+   #define STBTT_memcpy       memcpy
+   #define STBTT_memset       memset
+   #endif
+#endif
+
+///////////////////////////////////////////////////////////////////////////////
+///////////////////////////////////////////////////////////////////////////////
+////
+////   INTERFACE
+////
+////
+
+#ifndef __STB_INCLUDE_STB_TRUETYPE_H__
+#define __STB_INCLUDE_STB_TRUETYPE_H__
+
+#ifdef STBTT_STATIC
+#define STBTT_DEF static
+#else
+#define STBTT_DEF extern
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// private structure
+typedef struct
+{
+   unsigned char *data;
+   int cursor;
+   int size;
+} stbtt__buf;
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// TEXTURE BAKING API
+//
+// If you use this API, you only have to call two functions ever.
+//
+
+typedef struct
+{
+   unsigned short x0,y0,x1,y1; // coordinates of bbox in bitmap
+   float xoff,yoff,xadvance;
+} stbtt_bakedchar;
+
+STBTT_DEF int stbtt_BakeFontBitmap(const unsigned char *data, int offset,  // font location (use offset=0 for plain .ttf)
+                                float pixel_height,                     // height of font in pixels
+                                unsigned char *pixels, int pw, int ph,  // bitmap to be filled in
+                                int first_char, int num_chars,          // characters to bake
+                                stbtt_bakedchar *chardata);             // you allocate this, it's num_chars long
+// if return is positive, the first unused row of the bitmap
+// if return is negative, returns the negative of the number of characters that fit
+// if return is 0, no characters fit and no rows were used
+// This uses a very crappy packing.
+
+typedef struct
+{
+   float x0,y0,s0,t0; // top-left
+   float x1,y1,s1,t1; // bottom-right
+} stbtt_aligned_quad;
+
+STBTT_DEF void stbtt_GetBakedQuad(const stbtt_bakedchar *chardata, int pw, int ph,  // same data as above
+                               int char_index,             // character to display
+                               float *xpos, float *ypos,   // pointers to current position in screen pixel space
+                               stbtt_aligned_quad *q,      // output: quad to draw
+                               int opengl_fillrule);       // true if opengl fill rule; false if DX9 or earlier
+// Call GetBakedQuad with char_index = 'character - first_char', and it
+// creates the quad you need to draw and advances the current position.
+//
+// The coordinate system used assumes y increases downwards.
+//
+// Characters will extend both above and below the current position;
+// see discussion of "BASELINE" above.
+//
+// It's inefficient; you might want to c&p it and optimize it.
+
+STBTT_DEF void stbtt_GetScaledFontVMetrics(const unsigned char *fontdata, int index, float size, float *ascent, float *descent, float *lineGap);
+// Query the font vertical metrics without having to create a font first.
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// NEW TEXTURE BAKING API
+//
+// This provides options for packing multiple fonts into one atlas, not
+// perfectly but better than nothing.
+
+typedef struct
+{
+   unsigned short x0,y0,x1,y1; // coordinates of bbox in bitmap
+   float xoff,yoff,xadvance;
+   float xoff2,yoff2;
+} stbtt_packedchar;
+
+typedef struct stbtt_pack_context stbtt_pack_context;
+typedef struct stbtt_fontinfo stbtt_fontinfo;
+#ifndef STB_RECT_PACK_VERSION
+typedef struct stbrp_rect stbrp_rect;
+#endif
+
+STBTT_DEF int  stbtt_PackBegin(stbtt_pack_context *spc, unsigned char *pixels, int width, int height, int stride_in_bytes, int padding, void *alloc_context);
+// Initializes a packing context stored in the passed-in stbtt_pack_context.
+// Future calls using this context will pack characters into the bitmap passed
+// in here: a 1-channel bitmap that is width * height. stride_in_bytes is
+// the distance from one row to the next (or 0 to mean they are packed tightly
+// together). "padding" is the amount of padding to leave between each
+// character (normally you want '1' for bitmaps you'll use as textures with
+// bilinear filtering).
+//
+// Returns 0 on failure, 1 on success.
+
+STBTT_DEF void stbtt_PackEnd  (stbtt_pack_context *spc);
+// Cleans up the packing context and frees all memory.
+
+#define STBTT_POINT_SIZE(x)   (-(x))
+
+STBTT_DEF int  stbtt_PackFontRange(stbtt_pack_context *spc, const unsigned char *fontdata, int font_index, float font_size,
+                                int first_unicode_char_in_range, int num_chars_in_range, stbtt_packedchar *chardata_for_range);
+// Creates character bitmaps from the font_index'th font found in fontdata (use
+// font_index=0 if you don't know what that is). It creates num_chars_in_range
+// bitmaps for characters with unicode values starting at first_unicode_char_in_range
+// and increasing. Data for how to render them is stored in chardata_for_range;
+// pass these to stbtt_GetPackedQuad to get back renderable quads.
+//
+// font_size is the full height of the character from ascender to descender,
+// as computed by stbtt_ScaleForPixelHeight. To use a point size as computed
+// by stbtt_ScaleForMappingEmToPixels, wrap the point size in STBTT_POINT_SIZE()
+// and pass that result as 'font_size':
+//       ...,                  20 , ... // font max minus min y is 20 pixels tall
+//       ..., STBTT_POINT_SIZE(20), ... // 'M' is 20 pixels tall
+
+typedef struct
+{
+   float font_size;
+   int first_unicode_codepoint_in_range;  // if non-zero, then the chars are continuous, and this is the first codepoint
+   int *array_of_unicode_codepoints;       // if non-zero, then this is an array of unicode codepoints
+   int num_chars;
+   stbtt_packedchar *chardata_for_range; // output
+   unsigned char h_oversample, v_oversample; // don't set these, they're used internally
+} stbtt_pack_range;
+
+STBTT_DEF int  stbtt_PackFontRanges(stbtt_pack_context *spc, const unsigned char *fontdata, int font_index, stbtt_pack_range *ranges, int num_ranges);
+// Creates character bitmaps from multiple ranges of characters stored in
+// ranges. This will usually create a better-packed bitmap than multiple
+// calls to stbtt_PackFontRange. Note that you can call this multiple
+// times within a single PackBegin/PackEnd.
+
+STBTT_DEF void stbtt_PackSetOversampling(stbtt_pack_context *spc, unsigned int h_oversample, unsigned int v_oversample);
+// Oversampling a font increases the quality by allowing higher-quality subpixel
+// positioning, and is especially valuable at smaller text sizes.
+//
+// This function sets the amount of oversampling for all following calls to
+// stbtt_PackFontRange(s) or stbtt_PackFontRangesGatherRects for a given
+// pack context. The default (no oversampling) is achieved by h_oversample=1
+// and v_oversample=1. The total number of pixels required is
+// h_oversample*v_oversample larger than the default; for example, 2x2
+// oversampling requires 4x the storage of 1x1. For best results, render
+// oversampled textures with bilinear filtering. Look at the readme in
+// stb/tests/oversample for information about oversampled fonts
+//
+// To use with PackFontRangesGather etc., you must set it before calls
+// call to PackFontRangesGatherRects.
+
+STBTT_DEF void stbtt_PackSetSkipMissingCodepoints(stbtt_pack_context *spc, int skip);
+// If skip != 0, this tells stb_truetype to skip any codepoints for which
+// there is no corresponding glyph. If skip=0, which is the default, then
+// codepoints without a glyph recived the font's "missing character" glyph,
+// typically an empty box by convention.
+
+STBTT_DEF void stbtt_GetPackedQuad(const stbtt_packedchar *chardata, int pw, int ph,  // same data as above
+                               int char_index,             // character to display
+                               float *xpos, float *ypos,   // pointers to current position in screen pixel space
+                               stbtt_aligned_quad *q,      // output: quad to draw
+                               int align_to_integer);
+
+STBTT_DEF int  stbtt_PackFontRangesGatherRects(stbtt_pack_context *spc, const stbtt_fontinfo *info, stbtt_pack_range *ranges, int num_ranges, stbrp_rect *rects);
+STBTT_DEF void stbtt_PackFontRangesPackRects(stbtt_pack_context *spc, stbrp_rect *rects, int num_rects);
+STBTT_DEF int  stbtt_PackFontRangesRenderIntoRects(stbtt_pack_context *spc, const stbtt_fontinfo *info, stbtt_pack_range *ranges, int num_ranges, stbrp_rect *rects);
+// Calling these functions in sequence is roughly equivalent to calling
+// stbtt_PackFontRanges(). If you more control over the packing of multiple
+// fonts, or if you want to pack custom data into a font texture, take a look
+// at the source to of stbtt_PackFontRanges() and create a custom version
+// using these functions, e.g. call GatherRects multiple times,
+// building up a single array of rects, then call PackRects once,
+// then call RenderIntoRects repeatedly. This may result in a
+// better packing than calling PackFontRanges multiple times
+// (or it may not).
+
+// this is an opaque structure that you shouldn't mess with which holds
+// all the context needed from PackBegin to PackEnd.
+struct stbtt_pack_context {
+   void *user_allocator_context;
+   void *pack_info;
+   int   width;
+   int   height;
+   int   stride_in_bytes;
+   int   padding;
+   int   skip_missing;
+   unsigned int   h_oversample, v_oversample;
+   unsigned char *pixels;
+   void  *nodes;
+};
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// FONT LOADING
+//
+//
+
+STBTT_DEF int stbtt_GetNumberOfFonts(const unsigned char *data);
+// This function will determine the number of fonts in a font file.  TrueType
+// collection (.ttc) files may contain multiple fonts, while TrueType font
+// (.ttf) files only contain one font. The number of fonts can be used for
+// indexing with the previous function where the index is between zero and one
+// less than the total fonts. If an error occurs, -1 is returned.
+
+STBTT_DEF int stbtt_GetFontOffsetForIndex(const unsigned char *data, int index);
+// Each .ttf/.ttc file may have more than one font. Each font has a sequential
+// index number starting from 0. Call this function to get the font offset for
+// a given index; it returns -1 if the index is out of range. A regular .ttf
+// file will only define one font and it always be at offset 0, so it will
+// return '0' for index 0, and -1 for all other indices.
+
+// The following structure is defined publicly so you can declare one on
+// the stack or as a global or etc, but you should treat it as opaque.
+struct stbtt_fontinfo
+{
+   void           * userdata;
+   unsigned char  * data;              // pointer to .ttf file
+   int              fontstart;         // offset of start of font
+
+   int numGlyphs;                     // number of glyphs, needed for range checking
+
+   int loca,head,glyf,hhea,hmtx,kern,gpos,svg; // table locations as offset from start of .ttf
+   int index_map;                     // a cmap mapping for our chosen character encoding
+   int indexToLocFormat;              // format needed to map from glyph index to glyph
+
+   stbtt__buf cff;                    // cff font data
+   stbtt__buf charstrings;            // the charstring index
+   stbtt__buf gsubrs;                 // global charstring subroutines index
+   stbtt__buf subrs;                  // private charstring subroutines index
+   stbtt__buf fontdicts;              // array of font dicts
+   stbtt__buf fdselect;               // map from glyph to fontdict
+};
+
+STBTT_DEF int stbtt_InitFont(stbtt_fontinfo *info, const unsigned char *data, int offset);
+// Given an offset into the file that defines a font, this function builds
+// the necessary cached info for the rest of the system. You must allocate
+// the stbtt_fontinfo yourself, and stbtt_InitFont will fill it out. You don't
+// need to do anything special to free it, because the contents are pure
+// value data with no additional data structures. Returns 0 on failure.
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// CHARACTER TO GLYPH-INDEX CONVERSIOn
+
+STBTT_DEF int stbtt_FindGlyphIndex(const stbtt_fontinfo *info, int unicode_codepoint);
+// If you're going to perform multiple operations on the same character
+// and you want a speed-up, call this function with the character you're
+// going to process, then use glyph-based functions instead of the
+// codepoint-based functions.
+// Returns 0 if the character codepoint is not defined in the font.
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// CHARACTER PROPERTIES
+//
+
+STBTT_DEF float stbtt_ScaleForPixelHeight(const stbtt_fontinfo *info, float pixels);
+// computes a scale factor to produce a font whose "height" is 'pixels' tall.
+// Height is measured as the distance from the highest ascender to the lowest
+// descender; in other words, it's equivalent to calling stbtt_GetFontVMetrics
+// and computing:
+//       scale = pixels / (ascent - descent)
+// so if you prefer to measure height by the ascent only, use a similar calculation.
+
+STBTT_DEF float stbtt_ScaleForMappingEmToPixels(const stbtt_fontinfo *info, float pixels);
+// computes a scale factor to produce a font whose EM size is mapped to
+// 'pixels' tall. This is probably what traditional APIs compute, but
+// I'm not positive.
+
+STBTT_DEF void stbtt_GetFontVMetrics(const stbtt_fontinfo *info, int *ascent, int *descent, int *lineGap);
+// ascent is the coordinate above the baseline the font extends; descent
+// is the coordinate below the baseline the font extends (i.e. it is typically negative)
+// lineGap is the spacing between one row's descent and the next row's ascent...
+// so you should advance the vertical position by "*ascent - *descent + *lineGap"
+//   these are expressed in unscaled coordinates, so you must multiply by
+//   the scale factor for a given size
+
+STBTT_DEF int  stbtt_GetFontVMetricsOS2(const stbtt_fontinfo *info, int *typoAscent, int *typoDescent, int *typoLineGap);
+// analogous to GetFontVMetrics, but returns the "typographic" values from the OS/2
+// table (specific to MS/Windows TTF files).
+//
+// Returns 1 on success (table present), 0 on failure.
+
+STBTT_DEF void stbtt_GetFontBoundingBox(const stbtt_fontinfo *info, int *x0, int *y0, int *x1, int *y1);
+// the bounding box around all possible characters
+
+STBTT_DEF void stbtt_GetCodepointHMetrics(const stbtt_fontinfo *info, int codepoint, int *advanceWidth, int *leftSideBearing);
+// leftSideBearing is the offset from the current horizontal position to the left edge of the character
+// advanceWidth is the offset from the current horizontal position to the next horizontal position
+//   these are expressed in unscaled coordinates
+
+STBTT_DEF int  stbtt_GetCodepointKernAdvance(const stbtt_fontinfo *info, int ch1, int ch2);
+// an additional amount to add to the 'advance' value between ch1 and ch2
+
+STBTT_DEF int stbtt_GetCodepointBox(const stbtt_fontinfo *info, int codepoint, int *x0, int *y0, int *x1, int *y1);
+// Gets the bounding box of the visible part of the glyph, in unscaled coordinates
+
+STBTT_DEF void stbtt_GetGlyphHMetrics(const stbtt_fontinfo *info, int glyph_index, int *advanceWidth, int *leftSideBearing);
+STBTT_DEF int  stbtt_GetGlyphKernAdvance(const stbtt_fontinfo *info, int glyph1, int glyph2);
+STBTT_DEF int  stbtt_GetGlyphBox(const stbtt_fontinfo *info, int glyph_index, int *x0, int *y0, int *x1, int *y1);
+// as above, but takes one or more glyph indices for greater efficiency
+
+typedef struct stbtt_kerningentry
+{
+   int glyph1; // use stbtt_FindGlyphIndex
+   int glyph2;
+   int advance;
+} stbtt_kerningentry;
+
+STBTT_DEF int  stbtt_GetKerningTableLength(const stbtt_fontinfo *info);
+STBTT_DEF int  stbtt_GetKerningTable(const stbtt_fontinfo *info, stbtt_kerningentry* table, int table_length);
+// Retrieves a complete list of all of the kerning pairs provided by the font
+// stbtt_GetKerningTable never writes more than table_length entries and returns how many entries it did write.
+// The table will be sorted by (a.glyph1 == b.glyph1)?(a.glyph2 < b.glyph2):(a.glyph1 < b.glyph1)
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// GLYPH SHAPES (you probably don't need these, but they have to go before
+// the bitmaps for C declaration-order reasons)
+//
+
+#ifndef STBTT_vmove // you can predefine these to use different values (but why?)
+   enum {
+      STBTT_vmove=1,
+      STBTT_vline,
+      STBTT_vcurve,
+      STBTT_vcubic
+   };
+#endif
+
+#ifndef stbtt_vertex // you can predefine this to use different values
+                   // (we share this with other code at RAD)
+   #define stbtt_vertex_type short // can't use stbtt_int16 because that's not visible in the header file
+   typedef struct
+   {
+      stbtt_vertex_type x,y,cx,cy,cx1,cy1;
+      unsigned char type,padding;
+   } stbtt_vertex;
+#endif
+
+STBTT_DEF int stbtt_IsGlyphEmpty(const stbtt_fontinfo *info, int glyph_index);
+// returns non-zero if nothing is drawn for this glyph
+
+STBTT_DEF int stbtt_GetCodepointShape(const stbtt_fontinfo *info, int unicode_codepoint, stbtt_vertex **vertices);
+STBTT_DEF int stbtt_GetGlyphShape(const stbtt_fontinfo *info, int glyph_index, stbtt_vertex **vertices);
+// returns # of vertices and fills *vertices with the pointer to them
+//   these are expressed in "unscaled" coordinates
+//
+// The shape is a series of contours. Each one starts with
+// a STBTT_moveto, then consists of a series of mixed
+// STBTT_lineto and STBTT_curveto segments. A lineto
+// draws a line from previous endpoint to its x,y; a curveto
+// draws a quadratic bezier from previous endpoint to
+// its x,y, using cx,cy as the bezier control point.
+
+STBTT_DEF void stbtt_FreeShape(const stbtt_fontinfo *info, stbtt_vertex *vertices);
+// frees the data allocated above
+
+STBTT_DEF unsigned char *stbtt_FindSVGDoc(const stbtt_fontinfo *info, int gl);
+STBTT_DEF int stbtt_GetCodepointSVG(const stbtt_fontinfo *info, int unicode_codepoint, const char **svg);
+STBTT_DEF int stbtt_GetGlyphSVG(const stbtt_fontinfo *info, int gl, const char **svg);
+// fills svg with the character's SVG data.
+// returns data size or 0 if SVG not found.
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// BITMAP RENDERING
+//
+
+STBTT_DEF void stbtt_FreeBitmap(unsigned char *bitmap, void *userdata);
+// frees the bitmap allocated below
+
+STBTT_DEF unsigned char *stbtt_GetCodepointBitmap(const stbtt_fontinfo *info, float scale_x, float scale_y, int codepoint, int *width, int *height, int *xoff, int *yoff);
+// allocates a large-enough single-channel 8bpp bitmap and renders the
+// specified character/glyph at the specified scale into it, with
+// antialiasing. 0 is no coverage (transparent), 255 is fully covered (opaque).
+// *width & *height are filled out with the width & height of the bitmap,
+// which is stored left-to-right, top-to-bottom.
+//
+// xoff/yoff are the offset it pixel space from the glyph origin to the top-left of the bitmap
+
+STBTT_DEF unsigned char *stbtt_GetCodepointBitmapSubpixel(const stbtt_fontinfo *info, float scale_x, float scale_y, float shift_x, float shift_y, int codepoint, int *width, int *height, int *xoff, int *yoff);
+// the same as stbtt_GetCodepoitnBitmap, but you can specify a subpixel
+// shift for the character
+
+STBTT_DEF void stbtt_MakeCodepointBitmap(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, int codepoint);
+// the same as stbtt_GetCodepointBitmap, but you pass in storage for the bitmap
+// in the form of 'output', with row spacing of 'out_stride' bytes. the bitmap
+// is clipped to out_w/out_h bytes. Call stbtt_GetCodepointBitmapBox to get the
+// width and height and positioning info for it first.
+
+STBTT_DEF void stbtt_MakeCodepointBitmapSubpixel(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int codepoint);
+// same as stbtt_MakeCodepointBitmap, but you can specify a subpixel
+// shift for the character
+
+STBTT_DEF void stbtt_MakeCodepointBitmapSubpixelPrefilter(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int oversample_x, int oversample_y, float *sub_x, float *sub_y, int codepoint);
+// same as stbtt_MakeCodepointBitmapSubpixel, but prefiltering
+// is performed (see stbtt_PackSetOversampling)
+
+STBTT_DEF void stbtt_GetCodepointBitmapBox(const stbtt_fontinfo *font, int codepoint, float scale_x, float scale_y, int *ix0, int *iy0, int *ix1, int *iy1);
+// get the bbox of the bitmap centered around the glyph origin; so the
+// bitmap width is ix1-ix0, height is iy1-iy0, and location to place
+// the bitmap top left is (leftSideBearing*scale,iy0).
+// (Note that the bitmap uses y-increases-down, but the shape uses
+// y-increases-up, so CodepointBitmapBox and CodepointBox are inverted.)
+
+STBTT_DEF void stbtt_GetCodepointBitmapBoxSubpixel(const stbtt_fontinfo *font, int codepoint, float scale_x, float scale_y, float shift_x, float shift_y, int *ix0, int *iy0, int *ix1, int *iy1);
+// same as stbtt_GetCodepointBitmapBox, but you can specify a subpixel
+// shift for the character
+
+// the following functions are equivalent to the above functions, but operate
+// on glyph indices instead of Unicode codepoints (for efficiency)
+STBTT_DEF unsigned char *stbtt_GetGlyphBitmap(const stbtt_fontinfo *info, float scale_x, float scale_y, int glyph, int *width, int *height, int *xoff, int *yoff);
+STBTT_DEF unsigned char *stbtt_GetGlyphBitmapSubpixel(const stbtt_fontinfo *info, float scale_x, float scale_y, float shift_x, float shift_y, int glyph, int *width, int *height, int *xoff, int *yoff);
+STBTT_DEF void stbtt_MakeGlyphBitmap(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, int glyph);
+STBTT_DEF void stbtt_MakeGlyphBitmapSubpixel(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int glyph);
+STBTT_DEF void stbtt_MakeGlyphBitmapSubpixelPrefilter(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int oversample_x, int oversample_y, float *sub_x, float *sub_y, int glyph);
+STBTT_DEF void stbtt_GetGlyphBitmapBox(const stbtt_fontinfo *font, int glyph, float scale_x, float scale_y, int *ix0, int *iy0, int *ix1, int *iy1);
+STBTT_DEF void stbtt_GetGlyphBitmapBoxSubpixel(const stbtt_fontinfo *font, int glyph, float scale_x, float scale_y,float shift_x, float shift_y, int *ix0, int *iy0, int *ix1, int *iy1);
+
+
+// @TODO: don't expose this structure
+typedef struct
+{
+   int w,h,stride;
+   unsigned char *pixels;
+} stbtt__bitmap;
+
+// rasterize a shape with quadratic beziers into a bitmap
+STBTT_DEF void stbtt_Rasterize(stbtt__bitmap *result,        // 1-channel bitmap to draw into
+                               float flatness_in_pixels,     // allowable error of curve in pixels
+                               stbtt_vertex *vertices,       // array of vertices defining shape
+                               int num_verts,                // number of vertices in above array
+                               float scale_x, float scale_y, // scale applied to input vertices
+                               float shift_x, float shift_y, // translation applied to input vertices
+                               int x_off, int y_off,         // another translation applied to input
+                               int invert,                   // if non-zero, vertically flip shape
+                               void *userdata);              // context for to STBTT_MALLOC
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Signed Distance Function (or Field) rendering
+
+STBTT_DEF void stbtt_FreeSDF(unsigned char *bitmap, void *userdata);
+// frees the SDF bitmap allocated below
+
+STBTT_DEF unsigned char * stbtt_GetGlyphSDF(const stbtt_fontinfo *info, float scale, int glyph, int padding, unsigned char onedge_value, float pixel_dist_scale, int *width, int *height, int *xoff, int *yoff);
+STBTT_DEF unsigned char * stbtt_GetCodepointSDF(const stbtt_fontinfo *info, float scale, int codepoint, int padding, unsigned char onedge_value, float pixel_dist_scale, int *width, int *height, int *xoff, int *yoff);
+// These functions compute a discretized SDF field for a single character, suitable for storing
+// in a single-channel texture, sampling with bilinear filtering, and testing against
+// larger than some threshold to produce scalable fonts.
+//        info              --  the font
+//        scale             --  controls the size of the resulting SDF bitmap, same as it would be creating a regular bitmap
+//        glyph/codepoint   --  the character to generate the SDF for
+//        padding           --  extra "pixels" around the character which are filled with the distance to the character (not 0),
+//                                 which allows effects like bit outlines
+//        onedge_value      --  value 0-255 to test the SDF against to reconstruct the character (i.e. the isocontour of the character)
+//        pixel_dist_scale  --  what value the SDF should increase by when moving one SDF "pixel" away from the edge (on the 0..255 scale)
+//                                 if positive, > onedge_value is inside; if negative, < onedge_value is inside
+//        width,height      --  output height & width of the SDF bitmap (including padding)
+//        xoff,yoff         --  output origin of the character
+//        return value      --  a 2D array of bytes 0..255, width*height in size
+//
+// pixel_dist_scale & onedge_value are a scale & bias that allows you to make
+// optimal use of the limited 0..255 for your application, trading off precision
+// and special effects. SDF values outside the range 0..255 are clamped to 0..255.
+//
+// Example:
+//      scale = stbtt_ScaleForPixelHeight(22)
+//      padding = 5
+//      onedge_value = 180
+//      pixel_dist_scale = 180/5.0 = 36.0
+//
+//      This will create an SDF bitmap in which the character is about 22 pixels
+//      high but the whole bitmap is about 22+5+5=32 pixels high. To produce a filled
+//      shape, sample the SDF at each pixel and fill the pixel if the SDF value
+//      is greater than or equal to 180/255. (You'll actually want to antialias,
+//      which is beyond the scope of this example.) Additionally, you can compute
+//      offset outlines (e.g. to stroke the character border inside & outside,
+//      or only outside). For example, to fill outside the character up to 3 SDF
+//      pixels, you would compare against (180-36.0*3)/255 = 72/255. The above
+//      choice of variables maps a range from 5 pixels outside the shape to
+//      2 pixels inside the shape to 0..255; this is intended primarily for apply
+//      outside effects only (the interior range is needed to allow proper
+//      antialiasing of the font at *smaller* sizes)
+//
+// The function computes the SDF analytically at each SDF pixel, not by e.g.
+// building a higher-res bitmap and approximating it. In theory the quality
+// should be as high as possible for an SDF of this size & representation, but
+// unclear if this is true in practice (perhaps building a higher-res bitmap
+// and computing from that can allow drop-out prevention).
+//
+// The algorithm has not been optimized at all, so expect it to be slow
+// if computing lots of characters or very large sizes.
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Finding the right font...
+//
+// You should really just solve this offline, keep your own tables
+// of what font is what, and don't try to get it out of the .ttf file.
+// That's because getting it out of the .ttf file is really hard, because
+// the names in the file can appear in many possible encodings, in many
+// possible languages, and e.g. if you need a case-insensitive comparison,
+// the details of that depend on the encoding & language in a complex way
+// (actually underspecified in truetype, but also gigantic).
+//
+// But you can use the provided functions in two possible ways:
+//     stbtt_FindMatchingFont() will use *case-sensitive* comparisons on
+//             unicode-encoded names to try to find the font you want;
+//             you can run this before calling stbtt_InitFont()
+//
+//     stbtt_GetFontNameString() lets you get any of the various strings
+//             from the file yourself and do your own comparisons on them.
+//             You have to have called stbtt_InitFont() first.
+
+
+STBTT_DEF int stbtt_FindMatchingFont(const unsigned char *fontdata, const char *name, int flags);
+// returns the offset (not index) of the font that matches, or -1 if none
+//   if you use STBTT_MACSTYLE_DONTCARE, use a font name like "Arial Bold".
+//   if you use any other flag, use a font name like "Arial"; this checks
+//     the 'macStyle' header field; i don't know if fonts set this consistently
+#define STBTT_MACSTYLE_DONTCARE     0
+#define STBTT_MACSTYLE_BOLD         1
+#define STBTT_MACSTYLE_ITALIC       2
+#define STBTT_MACSTYLE_UNDERSCORE   4
+#define STBTT_MACSTYLE_NONE         8   // <= not same as 0, this makes us check the bitfield is 0
+
+STBTT_DEF int stbtt_CompareUTF8toUTF16_bigendian(const char *s1, int len1, const char *s2, int len2);
+// returns 1/0 whether the first string interpreted as utf8 is identical to
+// the second string interpreted as big-endian utf16... useful for strings from next func
+
+STBTT_DEF const char *stbtt_GetFontNameString(const stbtt_fontinfo *font, int *length, int platformID, int encodingID, int languageID, int nameID);
+// returns the string (which may be big-endian double byte, e.g. for unicode)
+// and puts the length in bytes in *length.
+//
+// some of the values for the IDs are below; for more see the truetype spec:
+//     http://developer.apple.com/textfonts/TTRefMan/RM06/Chap6name.html
+//     http://www.microsoft.com/typography/otspec/name.htm
+
+enum { // platformID
+   STBTT_PLATFORM_ID_UNICODE   =0,
+   STBTT_PLATFORM_ID_MAC       =1,
+   STBTT_PLATFORM_ID_ISO       =2,
+   STBTT_PLATFORM_ID_MICROSOFT =3
+};
+
+enum { // encodingID for STBTT_PLATFORM_ID_UNICODE
+   STBTT_UNICODE_EID_UNICODE_1_0    =0,
+   STBTT_UNICODE_EID_UNICODE_1_1    =1,
+   STBTT_UNICODE_EID_ISO_10646      =2,
+   STBTT_UNICODE_EID_UNICODE_2_0_BMP=3,
+   STBTT_UNICODE_EID_UNICODE_2_0_FULL=4
+};
+
+enum { // encodingID for STBTT_PLATFORM_ID_MICROSOFT
+   STBTT_MS_EID_SYMBOL        =0,
+   STBTT_MS_EID_UNICODE_BMP   =1,
+   STBTT_MS_EID_SHIFTJIS      =2,
+   STBTT_MS_EID_UNICODE_FULL  =10
+};
+
+enum { // encodingID for STBTT_PLATFORM_ID_MAC; same as Script Manager codes
+   STBTT_MAC_EID_ROMAN        =0,   STBTT_MAC_EID_ARABIC       =4,
+   STBTT_MAC_EID_JAPANESE     =1,   STBTT_MAC_EID_HEBREW       =5,
+   STBTT_MAC_EID_CHINESE_TRAD =2,   STBTT_MAC_EID_GREEK        =6,
+   STBTT_MAC_EID_KOREAN       =3,   STBTT_MAC_EID_RUSSIAN      =7
+};
+
+enum { // languageID for STBTT_PLATFORM_ID_MICROSOFT; same as LCID...
+       // problematic because there are e.g. 16 english LCIDs and 16 arabic LCIDs
+   STBTT_MS_LANG_ENGLISH     =0x0409,   STBTT_MS_LANG_ITALIAN     =0x0410,
+   STBTT_MS_LANG_CHINESE     =0x0804,   STBTT_MS_LANG_JAPANESE    =0x0411,
+   STBTT_MS_LANG_DUTCH       =0x0413,   STBTT_MS_LANG_KOREAN      =0x0412,
+   STBTT_MS_LANG_FRENCH      =0x040c,   STBTT_MS_LANG_RUSSIAN     =0x0419,
+   STBTT_MS_LANG_GERMAN      =0x0407,   STBTT_MS_LANG_SPANISH     =0x0409,
+   STBTT_MS_LANG_HEBREW      =0x040d,   STBTT_MS_LANG_SWEDISH     =0x041D
+};
+
+enum { // languageID for STBTT_PLATFORM_ID_MAC
+   STBTT_MAC_LANG_ENGLISH      =0 ,   STBTT_MAC_LANG_JAPANESE     =11,
+   STBTT_MAC_LANG_ARABIC       =12,   STBTT_MAC_LANG_KOREAN       =23,
+   STBTT_MAC_LANG_DUTCH        =4 ,   STBTT_MAC_LANG_RUSSIAN      =32,
+   STBTT_MAC_LANG_FRENCH       =1 ,   STBTT_MAC_LANG_SPANISH      =6 ,
+   STBTT_MAC_LANG_GERMAN       =2 ,   STBTT_MAC_LANG_SWEDISH      =5 ,
+   STBTT_MAC_LANG_HEBREW       =10,   STBTT_MAC_LANG_CHINESE_SIMPLIFIED =33,
+   STBTT_MAC_LANG_ITALIAN      =3 ,   STBTT_MAC_LANG_CHINESE_TRAD =19
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // __STB_INCLUDE_STB_TRUETYPE_H__
+
+///////////////////////////////////////////////////////////////////////////////
+///////////////////////////////////////////////////////////////////////////////
+////
+////   IMPLEMENTATION
+////
+////
+
+#ifdef STB_TRUETYPE_IMPLEMENTATION
+
+#ifndef STBTT_MAX_OVERSAMPLE
+#define STBTT_MAX_OVERSAMPLE   8
+#endif
+
+#if STBTT_MAX_OVERSAMPLE > 255
+#error "STBTT_MAX_OVERSAMPLE cannot be > 255"
+#endif
+
+typedef int stbtt__test_oversample_pow2[(STBTT_MAX_OVERSAMPLE & (STBTT_MAX_OVERSAMPLE-1)) == 0 ? 1 : -1];
+
+#ifndef STBTT_RASTERIZER_VERSION
+#define STBTT_RASTERIZER_VERSION 2
+#endif
+
+#ifdef _MSC_VER
+#define STBTT__NOTUSED(v)  (void)(v)
+#else
+#define STBTT__NOTUSED(v)  (void)sizeof(v)
+#endif
+
+//////////////////////////////////////////////////////////////////////////
+//
+// stbtt__buf helpers to parse data from file
+//
+
+static stbtt_uint8 stbtt__buf_get8(stbtt__buf *b)
+{
+   if (b->cursor >= b->size)
+      return 0;
+   return b->data[b->cursor++];
+}
+
+static stbtt_uint8 stbtt__buf_peek8(stbtt__buf *b)
+{
+   if (b->cursor >= b->size)
+      return 0;
+   return b->data[b->cursor];
+}
+
+static void stbtt__buf_seek(stbtt__buf *b, int o)
+{
+   STBTT_assert(!(o > b->size || o < 0));
+   b->cursor = (o > b->size || o < 0) ? b->size : o;
+}
+
+static void stbtt__buf_skip(stbtt__buf *b, int o)
+{
+   stbtt__buf_seek(b, b->cursor + o);
+}
+
+static stbtt_uint32 stbtt__buf_get(stbtt__buf *b, int n)
+{
+   stbtt_uint32 v = 0;
+   int i;
+   STBTT_assert(n >= 1 && n <= 4);
+   for (i = 0; i < n; i++)
+      v = (v << 8) | stbtt__buf_get8(b);
+   return v;
+}
+
+static stbtt__buf stbtt__new_buf(const void *p, size_t size)
+{
+   stbtt__buf r;
+   STBTT_assert(size < 0x40000000);
+   r.data = (stbtt_uint8*) p;
+   r.size = (int) size;
+   r.cursor = 0;
+   return r;
+}
+
+#define stbtt__buf_get16(b)  stbtt__buf_get((b), 2)
+#define stbtt__buf_get32(b)  stbtt__buf_get((b), 4)
+
+static stbtt__buf stbtt__buf_range(const stbtt__buf *b, int o, int s)
+{
+   stbtt__buf r = stbtt__new_buf(NULL, 0);
+   if (o < 0 || s < 0 || o > b->size || s > b->size - o) return r;
+   r.data = b->data + o;
+   r.size = s;
+   return r;
+}
+
+static stbtt__buf stbtt__cff_get_index(stbtt__buf *b)
+{
+   int count, start, offsize;
+   start = b->cursor;
+   count = stbtt__buf_get16(b);
+   if (count) {
+      offsize = stbtt__buf_get8(b);
+      STBTT_assert(offsize >= 1 && offsize <= 4);
+      stbtt__buf_skip(b, offsize * count);
+      stbtt__buf_skip(b, stbtt__buf_get(b, offsize) - 1);
+   }
+   return stbtt__buf_range(b, start, b->cursor - start);
+}
+
+static stbtt_uint32 stbtt__cff_int(stbtt__buf *b)
+{
+   int b0 = stbtt__buf_get8(b);
+   if (b0 >= 32 && b0 <= 246)       return b0 - 139;
+   else if (b0 >= 247 && b0 <= 250) return (b0 - 247)*256 + stbtt__buf_get8(b) + 108;
+   else if (b0 >= 251 && b0 <= 254) return -(b0 - 251)*256 - stbtt__buf_get8(b) - 108;
+   else if (b0 == 28)               return stbtt__buf_get16(b);
+   else if (b0 == 29)               return stbtt__buf_get32(b);
+   STBTT_assert(0);
+   return 0;
+}
+
+static void stbtt__cff_skip_operand(stbtt__buf *b) {
+   int v, b0 = stbtt__buf_peek8(b);
+   STBTT_assert(b0 >= 28);
+   if (b0 == 30) {
+      stbtt__buf_skip(b, 1);
+      while (b->cursor < b->size) {
+         v = stbtt__buf_get8(b);
+         if ((v & 0xF) == 0xF || (v >> 4) == 0xF)
+            break;
+      }
+   } else {
+      stbtt__cff_int(b);
+   }
+}
+
+static stbtt__buf stbtt__dict_get(stbtt__buf *b, int key)
+{
+   stbtt__buf_seek(b, 0);
+   while (b->cursor < b->size) {
+      int start = b->cursor, end, op;
+      while (stbtt__buf_peek8(b) >= 28)
+         stbtt__cff_skip_operand(b);
+      end = b->cursor;
+      op = stbtt__buf_get8(b);
+      if (op == 12)  op = stbtt__buf_get8(b) | 0x100;
+      if (op == key) return stbtt__buf_range(b, start, end-start);
+   }
+   return stbtt__buf_range(b, 0, 0);
+}
+
+static void stbtt__dict_get_ints(stbtt__buf *b, int key, int outcount, stbtt_uint32 *out)
+{
+   int i;
+   stbtt__buf operands = stbtt__dict_get(b, key);
+   for (i = 0; i < outcount && operands.cursor < operands.size; i++)
+      out[i] = stbtt__cff_int(&operands);
+}
+
+static int stbtt__cff_index_count(stbtt__buf *b)
+{
+   stbtt__buf_seek(b, 0);
+   return stbtt__buf_get16(b);
+}
+
+static stbtt__buf stbtt__cff_index_get(stbtt__buf b, int i)
+{
+   int count, offsize, start, end;
+   stbtt__buf_seek(&b, 0);
+   count = stbtt__buf_get16(&b);
+   offsize = stbtt__buf_get8(&b);
+   STBTT_assert(i >= 0 && i < count);
+   STBTT_assert(offsize >= 1 && offsize <= 4);
+   stbtt__buf_skip(&b, i*offsize);
+   start = stbtt__buf_get(&b, offsize);
+   end = stbtt__buf_get(&b, offsize);
+   return stbtt__buf_range(&b, 2+(count+1)*offsize+start, end - start);
+}
+
+//////////////////////////////////////////////////////////////////////////
+//
+// accessors to parse data from file
+//
+
+// on platforms that don't allow misaligned reads, if we want to allow
+// truetype fonts that aren't padded to alignment, define ALLOW_UNALIGNED_TRUETYPE
+
+#define ttBYTE(p)     (* (stbtt_uint8 *) (p))
+#define ttCHAR(p)     (* (stbtt_int8 *) (p))
+#define ttFixed(p)    ttLONG(p)
+
+static stbtt_uint16 ttUSHORT(stbtt_uint8 *p) { return p[0]*256 + p[1]; }
+static stbtt_int16 ttSHORT(stbtt_uint8 *p)   { return p[0]*256 + p[1]; }
+static stbtt_uint32 ttULONG(stbtt_uint8 *p)  { return (p[0]<<24) + (p[1]<<16) + (p[2]<<8) + p[3]; }
+static stbtt_int32 ttLONG(stbtt_uint8 *p)    { return (p[0]<<24) + (p[1]<<16) + (p[2]<<8) + p[3]; }
+
+#define stbtt_tag4(p,c0,c1,c2,c3) ((p)[0] == (c0) && (p)[1] == (c1) && (p)[2] == (c2) && (p)[3] == (c3))
+#define stbtt_tag(p,str)           stbtt_tag4(p,str[0],str[1],str[2],str[3])
+
+static int stbtt__isfont(stbtt_uint8 *font)
+{
+   // check the version number
+   if (stbtt_tag4(font, '1',0,0,0))  return 1; // TrueType 1
+   if (stbtt_tag(font, "typ1"))   return 1; // TrueType with type 1 font -- we don't support this!
+   if (stbtt_tag(font, "OTTO"))   return 1; // OpenType with CFF
+   if (stbtt_tag4(font, 0,1,0,0)) return 1; // OpenType 1.0
+   if (stbtt_tag(font, "true"))   return 1; // Apple specification for TrueType fonts
+   return 0;
+}
+
+// @OPTIMIZE: binary search
+static stbtt_uint32 stbtt__find_table(stbtt_uint8 *data, stbtt_uint32 fontstart, const char *tag)
+{
+   stbtt_int32 num_tables = ttUSHORT(data+fontstart+4);
+   stbtt_uint32 tabledir = fontstart + 12;
+   stbtt_int32 i;
+   for (i=0; i < num_tables; ++i) {
+      stbtt_uint32 loc = tabledir + 16*i;
+      if (stbtt_tag(data+loc+0, tag))
+         return ttULONG(data+loc+8);
+   }
+   return 0;
+}
+
+static int stbtt_GetFontOffsetForIndex_internal(unsigned char *font_collection, int index)
+{
+   // if it's just a font, there's only one valid index
+   if (stbtt__isfont(font_collection))
+      return index == 0 ? 0 : -1;
+
+   // check if it's a TTC
+   if (stbtt_tag(font_collection, "ttcf")) {
+      // version 1?
+      if (ttULONG(font_collection+4) == 0x00010000 || ttULONG(font_collection+4) == 0x00020000) {
+         stbtt_int32 n = ttLONG(font_collection+8);
+         if (index >= n)
+            return -1;
+         return ttULONG(font_collection+12+index*4);
+      }
+   }
+   return -1;
+}
+
+static int stbtt_GetNumberOfFonts_internal(unsigned char *font_collection)
+{
+   // if it's just a font, there's only one valid font
+   if (stbtt__isfont(font_collection))
+      return 1;
+
+   // check if it's a TTC
+   if (stbtt_tag(font_collection, "ttcf")) {
+      // version 1?
+      if (ttULONG(font_collection+4) == 0x00010000 || ttULONG(font_collection+4) == 0x00020000) {
+         return ttLONG(font_collection+8);
+      }
+   }
+   return 0;
+}
+
+static stbtt__buf stbtt__get_subrs(stbtt__buf cff, stbtt__buf fontdict)
+{
+   stbtt_uint32 subrsoff = 0, private_loc[2] = { 0, 0 };
+   stbtt__buf pdict;
+   stbtt__dict_get_ints(&fontdict, 18, 2, private_loc);
+   if (!private_loc[1] || !private_loc[0]) return stbtt__new_buf(NULL, 0);
+   pdict = stbtt__buf_range(&cff, private_loc[1], private_loc[0]);
+   stbtt__dict_get_ints(&pdict, 19, 1, &subrsoff);
+   if (!subrsoff) return stbtt__new_buf(NULL, 0);
+   stbtt__buf_seek(&cff, private_loc[1]+subrsoff);
+   return stbtt__cff_get_index(&cff);
+}
+
+// since most people won't use this, find this table the first time it's needed
+static int stbtt__get_svg(stbtt_fontinfo *info)
+{
+   stbtt_uint32 t;
+   if (info->svg < 0) {
+      t = stbtt__find_table(info->data, info->fontstart, "SVG ");
+      if (t) {
+         stbtt_uint32 offset = ttULONG(info->data + t + 2);
+         info->svg = t + offset;
+      } else {
+         info->svg = 0;
+      }
+   }
+   return info->svg;
+}
+
+static int stbtt_InitFont_internal(stbtt_fontinfo *info, unsigned char *data, int fontstart)
+{
+   stbtt_uint32 cmap, t;
+   stbtt_int32 i,numTables;
+
+   info->data = data;
+   info->fontstart = fontstart;
+   info->cff = stbtt__new_buf(NULL, 0);
+
+   cmap = stbtt__find_table(data, fontstart, "cmap");       // required
+   info->loca = stbtt__find_table(data, fontstart, "loca"); // required
+   info->head = stbtt__find_table(data, fontstart, "head"); // required
+   info->glyf = stbtt__find_table(data, fontstart, "glyf"); // required
+   info->hhea = stbtt__find_table(data, fontstart, "hhea"); // required
+   info->hmtx = stbtt__find_table(data, fontstart, "hmtx"); // required
+   info->kern = stbtt__find_table(data, fontstart, "kern"); // not required
+   info->gpos = stbtt__find_table(data, fontstart, "GPOS"); // not required
+
+   if (!cmap || !info->head || !info->hhea || !info->hmtx)
+      return 0;
+   if (info->glyf) {
+      // required for truetype
+      if (!info->loca) return 0;
+   } else {
+      // initialization for CFF / Type2 fonts (OTF)
+      stbtt__buf b, topdict, topdictidx;
+      stbtt_uint32 cstype = 2, charstrings = 0, fdarrayoff = 0, fdselectoff = 0;
+      stbtt_uint32 cff;
+
+      cff = stbtt__find_table(data, fontstart, "CFF ");
+      if (!cff) return 0;
+
+      info->fontdicts = stbtt__new_buf(NULL, 0);
+      info->fdselect = stbtt__new_buf(NULL, 0);
+
+      // @TODO this should use size from table (not 512MB)
+      info->cff = stbtt__new_buf(data+cff, 512*1024*1024);
+      b = info->cff;
+
+      // read the header
+      stbtt__buf_skip(&b, 2);
+      stbtt__buf_seek(&b, stbtt__buf_get8(&b)); // hdrsize
+
+      // @TODO the name INDEX could list multiple fonts,
+      // but we just use the first one.
+      stbtt__cff_get_index(&b);  // name INDEX
+      topdictidx = stbtt__cff_get_index(&b);
+      topdict = stbtt__cff_index_get(topdictidx, 0);
+      stbtt__cff_get_index(&b);  // string INDEX
+      info->gsubrs = stbtt__cff_get_index(&b);
+
+      stbtt__dict_get_ints(&topdict, 17, 1, &charstrings);
+      stbtt__dict_get_ints(&topdict, 0x100 | 6, 1, &cstype);
+      stbtt__dict_get_ints(&topdict, 0x100 | 36, 1, &fdarrayoff);
+      stbtt__dict_get_ints(&topdict, 0x100 | 37, 1, &fdselectoff);
+      info->subrs = stbtt__get_subrs(b, topdict);
+
+      // we only support Type 2 charstrings
+      if (cstype != 2) return 0;
+      if (charstrings == 0) return 0;
+
+      if (fdarrayoff) {
+         // looks like a CID font
+         if (!fdselectoff) return 0;
+         stbtt__buf_seek(&b, fdarrayoff);
+         info->fontdicts = stbtt__cff_get_index(&b);
+         info->fdselect = stbtt__buf_range(&b, fdselectoff, b.size-fdselectoff);
+      }
+
+      stbtt__buf_seek(&b, charstrings);
+      info->charstrings = stbtt__cff_get_index(&b);
+   }
+
+   t = stbtt__find_table(data, fontstart, "maxp");
+   if (t)
+      info->numGlyphs = ttUSHORT(data+t+4);
+   else
+      info->numGlyphs = 0xffff;
+
+   info->svg = -1;
+
+   // find a cmap encoding table we understand *now* to avoid searching
+   // later. (todo: could make this installable)
+   // the same regardless of glyph.
+   numTables = ttUSHORT(data + cmap + 2);
+   info->index_map = 0;
+   for (i=0; i < numTables; ++i) {
+      stbtt_uint32 encoding_record = cmap + 4 + 8 * i;
+      // find an encoding we understand:
+      switch(ttUSHORT(data+encoding_record)) {
+         case STBTT_PLATFORM_ID_MICROSOFT:
+            switch (ttUSHORT(data+encoding_record+2)) {
+               case STBTT_MS_EID_UNICODE_BMP:
+               case STBTT_MS_EID_UNICODE_FULL:
+                  // MS/Unicode
+                  info->index_map = cmap + ttULONG(data+encoding_record+4);
+                  break;
+            }
+            break;
+        case STBTT_PLATFORM_ID_UNICODE:
+            // Mac/iOS has these
+            // all the encodingIDs are unicode, so we don't bother to check it
+            info->index_map = cmap + ttULONG(data+encoding_record+4);
+            break;
+      }
+   }
+   if (info->index_map == 0)
+      return 0;
+
+   info->indexToLocFormat = ttUSHORT(data+info->head + 50);
+   return 1;
+}
+
+STBTT_DEF int stbtt_FindGlyphIndex(const stbtt_fontinfo *info, int unicode_codepoint)
+{
+   stbtt_uint8 *data = info->data;
+   stbtt_uint32 index_map = info->index_map;
+
+   stbtt_uint16 format = ttUSHORT(data + index_map + 0);
+   if (format == 0) { // apple byte encoding
+      stbtt_int32 bytes = ttUSHORT(data + index_map + 2);
+      if (unicode_codepoint < bytes-6)
+         return ttBYTE(data + index_map + 6 + unicode_codepoint);
+      return 0;
+   } else if (format == 6) {
+      stbtt_uint32 first = ttUSHORT(data + index_map + 6);
+      stbtt_uint32 count = ttUSHORT(data + index_map + 8);
+      if ((stbtt_uint32) unicode_codepoint >= first && (stbtt_uint32) unicode_codepoint < first+count)
+         return ttUSHORT(data + index_map + 10 + (unicode_codepoint - first)*2);
+      return 0;
+   } else if (format == 2) {
+      STBTT_assert(0); // @TODO: high-byte mapping for japanese/chinese/korean
+      return 0;
+   } else if (format == 4) { // standard mapping for windows fonts: binary search collection of ranges
+      stbtt_uint16 segcount = ttUSHORT(data+index_map+6) >> 1;
+      stbtt_uint16 searchRange = ttUSHORT(data+index_map+8) >> 1;
+      stbtt_uint16 entrySelector = ttUSHORT(data+index_map+10);
+      stbtt_uint16 rangeShift = ttUSHORT(data+index_map+12) >> 1;
+
+      // do a binary search of the segments
+      stbtt_uint32 endCount = index_map + 14;
+      stbtt_uint32 search = endCount;
+
+      if (unicode_codepoint > 0xffff)
+         return 0;
+
+      // they lie from endCount .. endCount + segCount
+      // but searchRange is the nearest power of two, so...
+      if (unicode_codepoint >= ttUSHORT(data + search + rangeShift*2))
+         search += rangeShift*2;
+
+      // now decrement to bias correctly to find smallest
+      search -= 2;
+      while (entrySelector) {
+         stbtt_uint16 end;
+         searchRange >>= 1;
+         end = ttUSHORT(data + search + searchRange*2);
+         if (unicode_codepoint > end)
+            search += searchRange*2;
+         --entrySelector;
+      }
+      search += 2;
+
+      {
+         stbtt_uint16 offset, start, last;
+         stbtt_uint16 item = (stbtt_uint16) ((search - endCount) >> 1);
+
+         start = ttUSHORT(data + index_map + 14 + segcount*2 + 2 + 2*item);
+         last = ttUSHORT(data + endCount + 2*item);
+         if (unicode_codepoint < start || unicode_codepoint > last)
+            return 0;
+
+         offset = ttUSHORT(data + index_map + 14 + segcount*6 + 2 + 2*item);
+         if (offset == 0)
+            return (stbtt_uint16) (unicode_codepoint + ttSHORT(data + index_map + 14 + segcount*4 + 2 + 2*item));
+
+         return ttUSHORT(data + offset + (unicode_codepoint-start)*2 + index_map + 14 + segcount*6 + 2 + 2*item);
+      }
+   } else if (format == 12 || format == 13) {
+      stbtt_uint32 ngroups = ttULONG(data+index_map+12);
+      stbtt_int32 low,high;
+      low = 0; high = (stbtt_int32)ngroups;
+      // Binary search the right group.
+      while (low < high) {
+         stbtt_int32 mid = low + ((high-low) >> 1); // rounds down, so low <= mid < high
+         stbtt_uint32 start_char = ttULONG(data+index_map+16+mid*12);
+         stbtt_uint32 end_char = ttULONG(data+index_map+16+mid*12+4);
+         if ((stbtt_uint32) unicode_codepoint < start_char)
+            high = mid;
+         else if ((stbtt_uint32) unicode_codepoint > end_char)
+            low = mid+1;
+         else {
+            stbtt_uint32 start_glyph = ttULONG(data+index_map+16+mid*12+8);
+            if (format == 12)
+               return start_glyph + unicode_codepoint-start_char;
+            else // format == 13
+               return start_glyph;
+         }
+      }
+      return 0; // not found
+   }
+   // @TODO
+   STBTT_assert(0);
+   return 0;
+}
+
+STBTT_DEF int stbtt_GetCodepointShape(const stbtt_fontinfo *info, int unicode_codepoint, stbtt_vertex **vertices)
+{
+   return stbtt_GetGlyphShape(info, stbtt_FindGlyphIndex(info, unicode_codepoint), vertices);
+}
+
+static void stbtt_setvertex(stbtt_vertex *v, stbtt_uint8 type, stbtt_int32 x, stbtt_int32 y, stbtt_int32 cx, stbtt_int32 cy)
+{
+   v->type = type;
+   v->x = (stbtt_int16) x;
+   v->y = (stbtt_int16) y;
+   v->cx = (stbtt_int16) cx;
+   v->cy = (stbtt_int16) cy;
+}
+
+static int stbtt__GetGlyfOffset(const stbtt_fontinfo *info, int glyph_index)
+{
+   int g1,g2;
+
+   STBTT_assert(!info->cff.size);
+
+   if (glyph_index >= info->numGlyphs) return -1; // glyph index out of range
+   if (info->indexToLocFormat >= 2)    return -1; // unknown index->glyph map format
+
+   if (info->indexToLocFormat == 0) {
+      g1 = info->glyf + ttUSHORT(info->data + info->loca + glyph_index * 2) * 2;
+      g2 = info->glyf + ttUSHORT(info->data + info->loca + glyph_index * 2 + 2) * 2;
+   } else {
+      g1 = info->glyf + ttULONG (info->data + info->loca + glyph_index * 4);
+      g2 = info->glyf + ttULONG (info->data + info->loca + glyph_index * 4 + 4);
+   }
+
+   return g1==g2 ? -1 : g1; // if length is 0, return -1
+}
+
+static int stbtt__GetGlyphInfoT2(const stbtt_fontinfo *info, int glyph_index, int *x0, int *y0, int *x1, int *y1);
+
+STBTT_DEF int stbtt_GetGlyphBox(const stbtt_fontinfo *info, int glyph_index, int *x0, int *y0, int *x1, int *y1)
+{
+   if (info->cff.size) {
+      stbtt__GetGlyphInfoT2(info, glyph_index, x0, y0, x1, y1);
+   } else {
+      int g = stbtt__GetGlyfOffset(info, glyph_index);
+      if (g < 0) return 0;
+
+      if (x0) *x0 = ttSHORT(info->data + g + 2);
+      if (y0) *y0 = ttSHORT(info->data + g + 4);
+      if (x1) *x1 = ttSHORT(info->data + g + 6);
+      if (y1) *y1 = ttSHORT(info->data + g + 8);
+   }
+   return 1;
+}
+
+STBTT_DEF int stbtt_GetCodepointBox(const stbtt_fontinfo *info, int codepoint, int *x0, int *y0, int *x1, int *y1)
+{
+   return stbtt_GetGlyphBox(info, stbtt_FindGlyphIndex(info,codepoint), x0,y0,x1,y1);
+}
+
+STBTT_DEF int stbtt_IsGlyphEmpty(const stbtt_fontinfo *info, int glyph_index)
+{
+   stbtt_int16 numberOfContours;
+   int g;
+   if (info->cff.size)
+      return stbtt__GetGlyphInfoT2(info, glyph_index, NULL, NULL, NULL, NULL) == 0;
+   g = stbtt__GetGlyfOffset(info, glyph_index);
+   if (g < 0) return 1;
+   numberOfContours = ttSHORT(info->data + g);
+   return numberOfContours == 0;
+}
+
+static int stbtt__close_shape(stbtt_vertex *vertices, int num_vertices, int was_off, int start_off,
+    stbtt_int32 sx, stbtt_int32 sy, stbtt_int32 scx, stbtt_int32 scy, stbtt_int32 cx, stbtt_int32 cy)
+{
+   if (start_off) {
+      if (was_off)
+         stbtt_setvertex(&vertices[num_vertices++], STBTT_vcurve, (cx+scx)>>1, (cy+scy)>>1, cx,cy);
+      stbtt_setvertex(&vertices[num_vertices++], STBTT_vcurve, sx,sy,scx,scy);
+   } else {
+      if (was_off)
+         stbtt_setvertex(&vertices[num_vertices++], STBTT_vcurve,sx,sy,cx,cy);
+      else
+         stbtt_setvertex(&vertices[num_vertices++], STBTT_vline,sx,sy,0,0);
+   }
+   return num_vertices;
+}
+
+static int stbtt__GetGlyphShapeTT(const stbtt_fontinfo *info, int glyph_index, stbtt_vertex **pvertices)
+{
+   stbtt_int16 numberOfContours;
+   stbtt_uint8 *endPtsOfContours;
+   stbtt_uint8 *data = info->data;
+   stbtt_vertex *vertices=0;
+   int num_vertices=0;
+   int g = stbtt__GetGlyfOffset(info, glyph_index);
+
+   *pvertices = NULL;
+
+   if (g < 0) return 0;
+
+   numberOfContours = ttSHORT(data + g);
+
+   if (numberOfContours > 0) {
+      stbtt_uint8 flags=0,flagcount;
+      stbtt_int32 ins, i,j=0,m,n, next_move, was_off=0, off, start_off=0;
+      stbtt_int32 x,y,cx,cy,sx,sy, scx,scy;
+      stbtt_uint8 *points;
+      endPtsOfContours = (data + g + 10);
+      ins = ttUSHORT(data + g + 10 + numberOfContours * 2);
+      points = data + g + 10 + numberOfContours * 2 + 2 + ins;
+
+      n = 1+ttUSHORT(endPtsOfContours + numberOfContours*2-2);
+
+      m = n + 2*numberOfContours;  // a loose bound on how many vertices we might need
+      vertices = (stbtt_vertex *) STBTT_malloc(m * sizeof(vertices[0]), info->userdata);
+      if (vertices == 0)
+         return 0;
+
+      next_move = 0;
+      flagcount=0;
+
+      // in first pass, we load uninterpreted data into the allocated array
+      // above, shifted to the end of the array so we won't overwrite it when
+      // we create our final data starting from the front
+
+      off = m - n; // starting offset for uninterpreted data, regardless of how m ends up being calculated
+
+      // first load flags
+
+      for (i=0; i < n; ++i) {
+         if (flagcount == 0) {
+            flags = *points++;
+            if (flags & 8)
+               flagcount = *points++;
+         } else
+            --flagcount;
+         vertices[off+i].type = flags;
+      }
+
+      // now load x coordinates
+      x=0;
+      for (i=0; i < n; ++i) {
+         flags = vertices[off+i].type;
+         if (flags & 2) {
+            stbtt_int16 dx = *points++;
+            x += (flags & 16) ? dx : -dx; // ???
+         } else {
+            if (!(flags & 16)) {
+               x = x + (stbtt_int16) (points[0]*256 + points[1]);
+               points += 2;
+            }
+         }
+         vertices[off+i].x = (stbtt_int16) x;
+      }
+
+      // now load y coordinates
+      y=0;
+      for (i=0; i < n; ++i) {
+         flags = vertices[off+i].type;
+         if (flags & 4) {
+            stbtt_int16 dy = *points++;
+            y += (flags & 32) ? dy : -dy; // ???
+         } else {
+            if (!(flags & 32)) {
+               y = y + (stbtt_int16) (points[0]*256 + points[1]);
+               points += 2;
+            }
+         }
+         vertices[off+i].y = (stbtt_int16) y;
+      }
+
+      // now convert them to our format
+      num_vertices=0;
+      sx = sy = cx = cy = scx = scy = 0;
+      for (i=0; i < n; ++i) {
+         flags = vertices[off+i].type;
+         x     = (stbtt_int16) vertices[off+i].x;
+         y     = (stbtt_int16) vertices[off+i].y;
+
+         if (next_move == i) {
+            if (i != 0)
+               num_vertices = stbtt__close_shape(vertices, num_vertices, was_off, start_off, sx,sy,scx,scy,cx,cy);
+
+            // now start the new one
+            start_off = !(flags & 1);
+            if (start_off) {
+               // if we start off with an off-curve point, then when we need to find a point on the curve
+               // where we can start, and we need to save some state for when we wraparound.
+               scx = x;
+               scy = y;
+               if (!(vertices[off+i+1].type & 1)) {
+                  // next point is also a curve point, so interpolate an on-point curve
+                  sx = (x + (stbtt_int32) vertices[off+i+1].x) >> 1;
+                  sy = (y + (stbtt_int32) vertices[off+i+1].y) >> 1;
+               } else {
+                  // otherwise just use the next point as our start point
+                  sx = (stbtt_int32) vertices[off+i+1].x;
+                  sy = (stbtt_int32) vertices[off+i+1].y;
+                  ++i; // we're using point i+1 as the starting point, so skip it
+               }
+            } else {
+               sx = x;
+               sy = y;
+            }
+            stbtt_setvertex(&vertices[num_vertices++], STBTT_vmove,sx,sy,0,0);
+            was_off = 0;
+            next_move = 1 + ttUSHORT(endPtsOfContours+j*2);
+            ++j;
+         } else {
+            if (!(flags & 1)) { // if it's a curve
+               if (was_off) // two off-curve control points in a row means interpolate an on-curve midpoint
+                  stbtt_setvertex(&vertices[num_vertices++], STBTT_vcurve, (cx+x)>>1, (cy+y)>>1, cx, cy);
+               cx = x;
+               cy = y;
+               was_off = 1;
+            } else {
+               if (was_off)
+                  stbtt_setvertex(&vertices[num_vertices++], STBTT_vcurve, x,y, cx, cy);
+               else
+                  stbtt_setvertex(&vertices[num_vertices++], STBTT_vline, x,y,0,0);
+               was_off = 0;
+            }
+         }
+      }
+      num_vertices = stbtt__close_shape(vertices, num_vertices, was_off, start_off, sx,sy,scx,scy,cx,cy);
+   } else if (numberOfContours < 0) {
+      // Compound shapes.
+      int more = 1;
+      stbtt_uint8 *comp = data + g + 10;
+      num_vertices = 0;
+      vertices = 0;
+      while (more) {
+         stbtt_uint16 flags, gidx;
+         int comp_num_verts = 0, i;
+         stbtt_vertex *comp_verts = 0, *tmp = 0;
+         float mtx[6] = {1,0,0,1,0,0}, m, n;
+
+         flags = ttSHORT(comp); comp+=2;
+         gidx = ttSHORT(comp); comp+=2;
+
+         if (flags & 2) { // XY values
+            if (flags & 1) { // shorts
+               mtx[4] = ttSHORT(comp); comp+=2;
+               mtx[5] = ttSHORT(comp); comp+=2;
+            } else {
+               mtx[4] = ttCHAR(comp); comp+=1;
+               mtx[5] = ttCHAR(comp); comp+=1;
+            }
+         }
+         else {
+            // @TODO handle matching point
+            STBTT_assert(0);
+         }
+         if (flags & (1<<3)) { // WE_HAVE_A_SCALE
+            mtx[0] = mtx[3] = ttSHORT(comp)/16384.0f; comp+=2;
+            mtx[1] = mtx[2] = 0;
+         } else if (flags & (1<<6)) { // WE_HAVE_AN_X_AND_YSCALE
+            mtx[0] = ttSHORT(comp)/16384.0f; comp+=2;
+            mtx[1] = mtx[2] = 0;
+            mtx[3] = ttSHORT(comp)/16384.0f; comp+=2;
+         } else if (flags & (1<<7)) { // WE_HAVE_A_TWO_BY_TWO
+            mtx[0] = ttSHORT(comp)/16384.0f; comp+=2;
+            mtx[1] = ttSHORT(comp)/16384.0f; comp+=2;
+            mtx[2] = ttSHORT(comp)/16384.0f; comp+=2;
+            mtx[3] = ttSHORT(comp)/16384.0f; comp+=2;
+         }
+
+         // Find transformation scales.
+         m = (float) STBTT_sqrt(mtx[0]*mtx[0] + mtx[1]*mtx[1]);
+         n = (float) STBTT_sqrt(mtx[2]*mtx[2] + mtx[3]*mtx[3]);
+
+         // Get indexed glyph.
+         comp_num_verts = stbtt_GetGlyphShape(info, gidx, &comp_verts);
+         if (comp_num_verts > 0) {
+            // Transform vertices.
+            for (i = 0; i < comp_num_verts; ++i) {
+               stbtt_vertex* v = &comp_verts[i];
+               stbtt_vertex_type x,y;
+               x=v->x; y=v->y;
+               v->x = (stbtt_vertex_type)(m * (mtx[0]*x + mtx[2]*y + mtx[4]));
+               v->y = (stbtt_vertex_type)(n * (mtx[1]*x + mtx[3]*y + mtx[5]));
+               x=v->cx; y=v->cy;
+               v->cx = (stbtt_vertex_type)(m * (mtx[0]*x + mtx[2]*y + mtx[4]));
+               v->cy = (stbtt_vertex_type)(n * (mtx[1]*x + mtx[3]*y + mtx[5]));
+            }
+            // Append vertices.
+            tmp = (stbtt_vertex*)STBTT_malloc((num_vertices+comp_num_verts)*sizeof(stbtt_vertex), info->userdata);
+            if (!tmp) {
+               if (vertices) STBTT_free(vertices, info->userdata);
+               if (comp_verts) STBTT_free(comp_verts, info->userdata);
+               return 0;
+            }
+            if (num_vertices > 0 && vertices) STBTT_memcpy(tmp, vertices, num_vertices*sizeof(stbtt_vertex));
+            STBTT_memcpy(tmp+num_vertices, comp_verts, comp_num_verts*sizeof(stbtt_vertex));
+            if (vertices) STBTT_free(vertices, info->userdata);
+            vertices = tmp;
+            STBTT_free(comp_verts, info->userdata);
+            num_vertices += comp_num_verts;
+         }
+         // More components ?
+         more = flags & (1<<5);
+      }
+   } else {
+      // numberOfCounters == 0, do nothing
+   }
+
+   *pvertices = vertices;
+   return num_vertices;
+}
+
+typedef struct
+{
+   int bounds;
+   int started;
+   float first_x, first_y;
+   float x, y;
+   stbtt_int32 min_x, max_x, min_y, max_y;
+
+   stbtt_vertex *pvertices;
+   int num_vertices;
+} stbtt__csctx;
+
+#define STBTT__CSCTX_INIT(bounds) {bounds,0, 0,0, 0,0, 0,0,0,0, NULL, 0}
+
+static void stbtt__track_vertex(stbtt__csctx *c, stbtt_int32 x, stbtt_int32 y)
+{
+   if (x > c->max_x || !c->started) c->max_x = x;
+   if (y > c->max_y || !c->started) c->max_y = y;
+   if (x < c->min_x || !c->started) c->min_x = x;
+   if (y < c->min_y || !c->started) c->min_y = y;
+   c->started = 1;
+}
+
+static void stbtt__csctx_v(stbtt__csctx *c, stbtt_uint8 type, stbtt_int32 x, stbtt_int32 y, stbtt_int32 cx, stbtt_int32 cy, stbtt_int32 cx1, stbtt_int32 cy1)
+{
+   if (c->bounds) {
+      stbtt__track_vertex(c, x, y);
+      if (type == STBTT_vcubic) {
+         stbtt__track_vertex(c, cx, cy);
+         stbtt__track_vertex(c, cx1, cy1);
+      }
+   } else {
+      stbtt_setvertex(&c->pvertices[c->num_vertices], type, x, y, cx, cy);
+      c->pvertices[c->num_vertices].cx1 = (stbtt_int16) cx1;
+      c->pvertices[c->num_vertices].cy1 = (stbtt_int16) cy1;
+   }
+   c->num_vertices++;
+}
+
+static void stbtt__csctx_close_shape(stbtt__csctx *ctx)
+{
+   if (ctx->first_x != ctx->x || ctx->first_y != ctx->y)
+      stbtt__csctx_v(ctx, STBTT_vline, (int)ctx->first_x, (int)ctx->first_y, 0, 0, 0, 0);
+}
+
+static void stbtt__csctx_rmove_to(stbtt__csctx *ctx, float dx, float dy)
+{
+   stbtt__csctx_close_shape(ctx);
+   ctx->first_x = ctx->x = ctx->x + dx;
+   ctx->first_y = ctx->y = ctx->y + dy;
+   stbtt__csctx_v(ctx, STBTT_vmove, (int)ctx->x, (int)ctx->y, 0, 0, 0, 0);
+}
+
+static void stbtt__csctx_rline_to(stbtt__csctx *ctx, float dx, float dy)
+{
+   ctx->x += dx;
+   ctx->y += dy;
+   stbtt__csctx_v(ctx, STBTT_vline, (int)ctx->x, (int)ctx->y, 0, 0, 0, 0);
+}
+
+static void stbtt__csctx_rccurve_to(stbtt__csctx *ctx, float dx1, float dy1, float dx2, float dy2, float dx3, float dy3)
+{
+   float cx1 = ctx->x + dx1;
+   float cy1 = ctx->y + dy1;
+   float cx2 = cx1 + dx2;
+   float cy2 = cy1 + dy2;
+   ctx->x = cx2 + dx3;
+   ctx->y = cy2 + dy3;
+   stbtt__csctx_v(ctx, STBTT_vcubic, (int)ctx->x, (int)ctx->y, (int)cx1, (int)cy1, (int)cx2, (int)cy2);
+}
+
+static stbtt__buf stbtt__get_subr(stbtt__buf idx, int n)
+{
+   int count = stbtt__cff_index_count(&idx);
+   int bias = 107;
+   if (count >= 33900)
+      bias = 32768;
+   else if (count >= 1240)
+      bias = 1131;
+   n += bias;
+   if (n < 0 || n >= count)
+      return stbtt__new_buf(NULL, 0);
+   return stbtt__cff_index_get(idx, n);
+}
+
+static stbtt__buf stbtt__cid_get_glyph_subrs(const stbtt_fontinfo *info, int glyph_index)
+{
+   stbtt__buf fdselect = info->fdselect;
+   int nranges, start, end, v, fmt, fdselector = -1, i;
+
+   stbtt__buf_seek(&fdselect, 0);
+   fmt = stbtt__buf_get8(&fdselect);
+   if (fmt == 0) {
+      // untested
+      stbtt__buf_skip(&fdselect, glyph_index);
+      fdselector = stbtt__buf_get8(&fdselect);
+   } else if (fmt == 3) {
+      nranges = stbtt__buf_get16(&fdselect);
+      start = stbtt__buf_get16(&fdselect);
+      for (i = 0; i < nranges; i++) {
+         v = stbtt__buf_get8(&fdselect);
+         end = stbtt__buf_get16(&fdselect);
+         if (glyph_index >= start && glyph_index < end) {
+            fdselector = v;
+            break;
+         }
+         start = end;
+      }
+   }
+   if (fdselector == -1) stbtt__new_buf(NULL, 0);
+   return stbtt__get_subrs(info->cff, stbtt__cff_index_get(info->fontdicts, fdselector));
+}
+
+static int stbtt__run_charstring(const stbtt_fontinfo *info, int glyph_index, stbtt__csctx *c)
+{
+   int in_header = 1, maskbits = 0, subr_stack_height = 0, sp = 0, v, i, b0;
+   int has_subrs = 0, clear_stack;
+   float s[48];
+   stbtt__buf subr_stack[10], subrs = info->subrs, b;
+   float f;
+
+#define STBTT__CSERR(s) (0)
+
+   // this currently ignores the initial width value, which isn't needed if we have hmtx
+   b = stbtt__cff_index_get(info->charstrings, glyph_index);
+   while (b.cursor < b.size) {
+      i = 0;
+      clear_stack = 1;
+      b0 = stbtt__buf_get8(&b);
+      switch (b0) {
+      // @TODO implement hinting
+      case 0x13: // hintmask
+      case 0x14: // cntrmask
+         if (in_header)
+            maskbits += (sp / 2); // implicit "vstem"
+         in_header = 0;
+         stbtt__buf_skip(&b, (maskbits + 7) / 8);
+         break;
+
+      case 0x01: // hstem
+      case 0x03: // vstem
+      case 0x12: // hstemhm
+      case 0x17: // vstemhm
+         maskbits += (sp / 2);
+         break;
+
+      case 0x15: // rmoveto
+         in_header = 0;
+         if (sp < 2) return STBTT__CSERR("rmoveto stack");
+         stbtt__csctx_rmove_to(c, s[sp-2], s[sp-1]);
+         break;
+      case 0x04: // vmoveto
+         in_header = 0;
+         if (sp < 1) return STBTT__CSERR("vmoveto stack");
+         stbtt__csctx_rmove_to(c, 0, s[sp-1]);
+         break;
+      case 0x16: // hmoveto
+         in_header = 0;
+         if (sp < 1) return STBTT__CSERR("hmoveto stack");
+         stbtt__csctx_rmove_to(c, s[sp-1], 0);
+         break;
+
+      case 0x05: // rlineto
+         if (sp < 2) return STBTT__CSERR("rlineto stack");
+         for (; i + 1 < sp; i += 2)
+            stbtt__csctx_rline_to(c, s[i], s[i+1]);
+         break;
+
+      // hlineto/vlineto and vhcurveto/hvcurveto alternate horizontal and vertical
+      // starting from a different place.
+
+      case 0x07: // vlineto
+         if (sp < 1) return STBTT__CSERR("vlineto stack");
+         goto vlineto;
+      case 0x06: // hlineto
+         if (sp < 1) return STBTT__CSERR("hlineto stack");
+         for (;;) {
+            if (i >= sp) break;
+            stbtt__csctx_rline_to(c, s[i], 0);
+            i++;
+      vlineto:
+            if (i >= sp) break;
+            stbtt__csctx_rline_to(c, 0, s[i]);
+            i++;
+         }
+         break;
+
+      case 0x1F: // hvcurveto
+         if (sp < 4) return STBTT__CSERR("hvcurveto stack");
+         goto hvcurveto;
+      case 0x1E: // vhcurveto
+         if (sp < 4) return STBTT__CSERR("vhcurveto stack");
+         for (;;) {
+            if (i + 3 >= sp) break;
+            stbtt__csctx_rccurve_to(c, 0, s[i], s[i+1], s[i+2], s[i+3], (sp - i == 5) ? s[i + 4] : 0.0f);
+            i += 4;
+      hvcurveto:
+            if (i + 3 >= sp) break;
+            stbtt__csctx_rccurve_to(c, s[i], 0, s[i+1], s[i+2], (sp - i == 5) ? s[i+4] : 0.0f, s[i+3]);
+            i += 4;
+         }
+         break;
+
+      case 0x08: // rrcurveto
+         if (sp < 6) return STBTT__CSERR("rcurveline stack");
+         for (; i + 5 < sp; i += 6)
+            stbtt__csctx_rccurve_to(c, s[i], s[i+1], s[i+2], s[i+3], s[i+4], s[i+5]);
+         break;
+
+      case 0x18: // rcurveline
+         if (sp < 8) return STBTT__CSERR("rcurveline stack");
+         for (; i + 5 < sp - 2; i += 6)
+            stbtt__csctx_rccurve_to(c, s[i], s[i+1], s[i+2], s[i+3], s[i+4], s[i+5]);
+         if (i + 1 >= sp) return STBTT__CSERR("rcurveline stack");
+         stbtt__csctx_rline_to(c, s[i], s[i+1]);
+         break;
+
+      case 0x19: // rlinecurve
+         if (sp < 8) return STBTT__CSERR("rlinecurve stack");
+         for (; i + 1 < sp - 6; i += 2)
+            stbtt__csctx_rline_to(c, s[i], s[i+1]);
+         if (i + 5 >= sp) return STBTT__CSERR("rlinecurve stack");
+         stbtt__csctx_rccurve_to(c, s[i], s[i+1], s[i+2], s[i+3], s[i+4], s[i+5]);
+         break;
+
+      case 0x1A: // vvcurveto
+      case 0x1B: // hhcurveto
+         if (sp < 4) return STBTT__CSERR("(vv|hh)curveto stack");
+         f = 0.0;
+         if (sp & 1) { f = s[i]; i++; }
+         for (; i + 3 < sp; i += 4) {
+            if (b0 == 0x1B)
+               stbtt__csctx_rccurve_to(c, s[i], f, s[i+1], s[i+2], s[i+3], 0.0);
+            else
+               stbtt__csctx_rccurve_to(c, f, s[i], s[i+1], s[i+2], 0.0, s[i+3]);
+            f = 0.0;
+         }
+         break;
+
+      case 0x0A: // callsubr
+         if (!has_subrs) {
+            if (info->fdselect.size)
+               subrs = stbtt__cid_get_glyph_subrs(info, glyph_index);
+            has_subrs = 1;
+         }
+         // FALLTHROUGH
+      case 0x1D: // callgsubr
+         if (sp < 1) return STBTT__CSERR("call(g|)subr stack");
+         v = (int) s[--sp];
+         if (subr_stack_height >= 10) return STBTT__CSERR("recursion limit");
+         subr_stack[subr_stack_height++] = b;
+         b = stbtt__get_subr(b0 == 0x0A ? subrs : info->gsubrs, v);
+         if (b.size == 0) return STBTT__CSERR("subr not found");
+         b.cursor = 0;
+         clear_stack = 0;
+         break;
+
+      case 0x0B: // return
+         if (subr_stack_height <= 0) return STBTT__CSERR("return outside subr");
+         b = subr_stack[--subr_stack_height];
+         clear_stack = 0;
+         break;
+
+      case 0x0E: // endchar
+         stbtt__csctx_close_shape(c);
+         return 1;
+
+      case 0x0C: { // two-byte escape
+         float dx1, dx2, dx3, dx4, dx5, dx6, dy1, dy2, dy3, dy4, dy5, dy6;
+         float dx, dy;
+         int b1 = stbtt__buf_get8(&b);
+         switch (b1) {
+         // @TODO These "flex" implementations ignore the flex-depth and resolution,
+         // and always draw beziers.
+         case 0x22: // hflex
+            if (sp < 7) return STBTT__CSERR("hflex stack");
+            dx1 = s[0];
+            dx2 = s[1];
+            dy2 = s[2];
+            dx3 = s[3];
+            dx4 = s[4];
+            dx5 = s[5];
+            dx6 = s[6];
+            stbtt__csctx_rccurve_to(c, dx1, 0, dx2, dy2, dx3, 0);
+            stbtt__csctx_rccurve_to(c, dx4, 0, dx5, -dy2, dx6, 0);
+            break;
+
+         case 0x23: // flex
+            if (sp < 13) return STBTT__CSERR("flex stack");
+            dx1 = s[0];
+            dy1 = s[1];
+            dx2 = s[2];
+            dy2 = s[3];
+            dx3 = s[4];
+            dy3 = s[5];
+            dx4 = s[6];
+            dy4 = s[7];
+            dx5 = s[8];
+            dy5 = s[9];
+            dx6 = s[10];
+            dy6 = s[11];
+            //fd is s[12]
+            stbtt__csctx_rccurve_to(c, dx1, dy1, dx2, dy2, dx3, dy3);
+            stbtt__csctx_rccurve_to(c, dx4, dy4, dx5, dy5, dx6, dy6);
+            break;
+
+         case 0x24: // hflex1
+            if (sp < 9) return STBTT__CSERR("hflex1 stack");
+            dx1 = s[0];
+            dy1 = s[1];
+            dx2 = s[2];
+            dy2 = s[3];
+            dx3 = s[4];
+            dx4 = s[5];
+            dx5 = s[6];
+            dy5 = s[7];
+            dx6 = s[8];
+            stbtt__csctx_rccurve_to(c, dx1, dy1, dx2, dy2, dx3, 0);
+            stbtt__csctx_rccurve_to(c, dx4, 0, dx5, dy5, dx6, -(dy1+dy2+dy5));
+            break;
+
+         case 0x25: // flex1
+            if (sp < 11) return STBTT__CSERR("flex1 stack");
+            dx1 = s[0];
+            dy1 = s[1];
+            dx2 = s[2];
+            dy2 = s[3];
+            dx3 = s[4];
+            dy3 = s[5];
+            dx4 = s[6];
+            dy4 = s[7];
+            dx5 = s[8];
+            dy5 = s[9];
+            dx6 = dy6 = s[10];
+            dx = dx1+dx2+dx3+dx4+dx5;
+            dy = dy1+dy2+dy3+dy4+dy5;
+            if (STBTT_fabs(dx) > STBTT_fabs(dy))
+               dy6 = -dy;
+            else
+               dx6 = -dx;
+            stbtt__csctx_rccurve_to(c, dx1, dy1, dx2, dy2, dx3, dy3);
+            stbtt__csctx_rccurve_to(c, dx4, dy4, dx5, dy5, dx6, dy6);
+            break;
+
+         default:
+            return STBTT__CSERR("unimplemented");
+         }
+      } break;
+
+      default:
+         if (b0 != 255 && b0 != 28 && b0 < 32)
+            return STBTT__CSERR("reserved operator");
+
+         // push immediate
+         if (b0 == 255) {
+            f = (float)(stbtt_int32)stbtt__buf_get32(&b) / 0x10000;
+         } else {
+            stbtt__buf_skip(&b, -1);
+            f = (float)(stbtt_int16)stbtt__cff_int(&b);
+         }
+         if (sp >= 48) return STBTT__CSERR("push stack overflow");
+         s[sp++] = f;
+         clear_stack = 0;
+         break;
+      }
+      if (clear_stack) sp = 0;
+   }
+   return STBTT__CSERR("no endchar");
+
+#undef STBTT__CSERR
+}
+
+static int stbtt__GetGlyphShapeT2(const stbtt_fontinfo *info, int glyph_index, stbtt_vertex **pvertices)
+{
+   // runs the charstring twice, once to count and once to output (to avoid realloc)
+   stbtt__csctx count_ctx = STBTT__CSCTX_INIT(1);
+   stbtt__csctx output_ctx = STBTT__CSCTX_INIT(0);
+   if (stbtt__run_charstring(info, glyph_index, &count_ctx)) {
+      *pvertices = (stbtt_vertex*)STBTT_malloc(count_ctx.num_vertices*sizeof(stbtt_vertex), info->userdata);
+      output_ctx.pvertices = *pvertices;
+      if (stbtt__run_charstring(info, glyph_index, &output_ctx)) {
+         STBTT_assert(output_ctx.num_vertices == count_ctx.num_vertices);
+         return output_ctx.num_vertices;
+      }
+   }
+   *pvertices = NULL;
+   return 0;
+}
+
+static int stbtt__GetGlyphInfoT2(const stbtt_fontinfo *info, int glyph_index, int *x0, int *y0, int *x1, int *y1)
+{
+   stbtt__csctx c = STBTT__CSCTX_INIT(1);
+   int r = stbtt__run_charstring(info, glyph_index, &c);
+   if (x0)  *x0 = r ? c.min_x : 0;
+   if (y0)  *y0 = r ? c.min_y : 0;
+   if (x1)  *x1 = r ? c.max_x : 0;
+   if (y1)  *y1 = r ? c.max_y : 0;
+   return r ? c.num_vertices : 0;
+}
+
+STBTT_DEF int stbtt_GetGlyphShape(const stbtt_fontinfo *info, int glyph_index, stbtt_vertex **pvertices)
+{
+   if (!info->cff.size)
+      return stbtt__GetGlyphShapeTT(info, glyph_index, pvertices);
+   else
+      return stbtt__GetGlyphShapeT2(info, glyph_index, pvertices);
+}
+
+STBTT_DEF void stbtt_GetGlyphHMetrics(const stbtt_fontinfo *info, int glyph_index, int *advanceWidth, int *leftSideBearing)
+{
+   stbtt_uint16 numOfLongHorMetrics = ttUSHORT(info->data+info->hhea + 34);
+   if (glyph_index < numOfLongHorMetrics) {
+      if (advanceWidth)     *advanceWidth    = ttSHORT(info->data + info->hmtx + 4*glyph_index);
+      if (leftSideBearing)  *leftSideBearing = ttSHORT(info->data + info->hmtx + 4*glyph_index + 2);
+   } else {
+      if (advanceWidth)     *advanceWidth    = ttSHORT(info->data + info->hmtx + 4*(numOfLongHorMetrics-1));
+      if (leftSideBearing)  *leftSideBearing = ttSHORT(info->data + info->hmtx + 4*numOfLongHorMetrics + 2*(glyph_index - numOfLongHorMetrics));
+   }
+}
+
+STBTT_DEF int  stbtt_GetKerningTableLength(const stbtt_fontinfo *info)
+{
+   stbtt_uint8 *data = info->data + info->kern;
+
+   // we only look at the first table. it must be 'horizontal' and format 0.
+   if (!info->kern)
+      return 0;
+   if (ttUSHORT(data+2) < 1) // number of tables, need at least 1
+      return 0;
+   if (ttUSHORT(data+8) != 1) // horizontal flag must be set in format
+      return 0;
+
+   return ttUSHORT(data+10);
+}
+
+STBTT_DEF int stbtt_GetKerningTable(const stbtt_fontinfo *info, stbtt_kerningentry* table, int table_length)
+{
+   stbtt_uint8 *data = info->data + info->kern;
+   int k, length;
+
+   // we only look at the first table. it must be 'horizontal' and format 0.
+   if (!info->kern)
+      return 0;
+   if (ttUSHORT(data+2) < 1) // number of tables, need at least 1
+      return 0;
+   if (ttUSHORT(data+8) != 1) // horizontal flag must be set in format
+      return 0;
+
+   length = ttUSHORT(data+10);
+   if (table_length < length)
+      length = table_length;
+
+   for (k = 0; k < length; k++)
+   {
+      table[k].glyph1 = ttUSHORT(data+18+(k*6));
+      table[k].glyph2 = ttUSHORT(data+20+(k*6));
+      table[k].advance = ttSHORT(data+22+(k*6));
+   }
+
+   return length;
+}
+
+static int stbtt__GetGlyphKernInfoAdvance(const stbtt_fontinfo *info, int glyph1, int glyph2)
+{
+   stbtt_uint8 *data = info->data + info->kern;
+   stbtt_uint32 needle, straw;
+   int l, r, m;
+
+   // we only look at the first table. it must be 'horizontal' and format 0.
+   if (!info->kern)
+      return 0;
+   if (ttUSHORT(data+2) < 1) // number of tables, need at least 1
+      return 0;
+   if (ttUSHORT(data+8) != 1) // horizontal flag must be set in format
+      return 0;
+
+   l = 0;
+   r = ttUSHORT(data+10) - 1;
+   needle = glyph1 << 16 | glyph2;
+   while (l <= r) {
+      m = (l + r) >> 1;
+      straw = ttULONG(data+18+(m*6)); // note: unaligned read
+      if (needle < straw)
+         r = m - 1;
+      else if (needle > straw)
+         l = m + 1;
+      else
+         return ttSHORT(data+22+(m*6));
+   }
+   return 0;
+}
+
+static stbtt_int32 stbtt__GetCoverageIndex(stbtt_uint8 *coverageTable, int glyph)
+{
+   stbtt_uint16 coverageFormat = ttUSHORT(coverageTable);
+   switch (coverageFormat) {
+      case 1: {
+         stbtt_uint16 glyphCount = ttUSHORT(coverageTable + 2);
+
+         // Binary search.
+         stbtt_int32 l=0, r=glyphCount-1, m;
+         int straw, needle=glyph;
+         while (l <= r) {
+            stbtt_uint8 *glyphArray = coverageTable + 4;
+            stbtt_uint16 glyphID;
+            m = (l + r) >> 1;
+            glyphID = ttUSHORT(glyphArray + 2 * m);
+            straw = glyphID;
+            if (needle < straw)
+               r = m - 1;
+            else if (needle > straw)
+               l = m + 1;
+            else {
+               return m;
+            }
+         }
+         break;
+      }
+
+      case 2: {
+         stbtt_uint16 rangeCount = ttUSHORT(coverageTable + 2);
+         stbtt_uint8 *rangeArray = coverageTable + 4;
+
+         // Binary search.
+         stbtt_int32 l=0, r=rangeCount-1, m;
+         int strawStart, strawEnd, needle=glyph;
+         while (l <= r) {
+            stbtt_uint8 *rangeRecord;
+            m = (l + r) >> 1;
+            rangeRecord = rangeArray + 6 * m;
+            strawStart = ttUSHORT(rangeRecord);
+            strawEnd = ttUSHORT(rangeRecord + 2);
+            if (needle < strawStart)
+               r = m - 1;
+            else if (needle > strawEnd)
+               l = m + 1;
+            else {
+               stbtt_uint16 startCoverageIndex = ttUSHORT(rangeRecord + 4);
+               return startCoverageIndex + glyph - strawStart;
+            }
+         }
+         break;
+      }
+
+      default: return -1; // unsupported
+   }
+
+   return -1;
+}
+
+static stbtt_int32  stbtt__GetGlyphClass(stbtt_uint8 *classDefTable, int glyph)
+{
+   stbtt_uint16 classDefFormat = ttUSHORT(classDefTable);
+   switch (classDefFormat)
+   {
+      case 1: {
+         stbtt_uint16 startGlyphID = ttUSHORT(classDefTable + 2);
+         stbtt_uint16 glyphCount = ttUSHORT(classDefTable + 4);
+         stbtt_uint8 *classDef1ValueArray = classDefTable + 6;
+
+         if (glyph >= startGlyphID && glyph < startGlyphID + glyphCount)
+            return (stbtt_int32)ttUSHORT(classDef1ValueArray + 2 * (glyph - startGlyphID));
+         break;
+      }
+
+      case 2: {
+         stbtt_uint16 classRangeCount = ttUSHORT(classDefTable + 2);
+         stbtt_uint8 *classRangeRecords = classDefTable + 4;
+
+         // Binary search.
+         stbtt_int32 l=0, r=classRangeCount-1, m;
+         int strawStart, strawEnd, needle=glyph;
+         while (l <= r) {
+            stbtt_uint8 *classRangeRecord;
+            m = (l + r) >> 1;
+            classRangeRecord = classRangeRecords + 6 * m;
+            strawStart = ttUSHORT(classRangeRecord);
+            strawEnd = ttUSHORT(classRangeRecord + 2);
+            if (needle < strawStart)
+               r = m - 1;
+            else if (needle > strawEnd)
+               l = m + 1;
+            else
+               return (stbtt_int32)ttUSHORT(classRangeRecord + 4);
+         }
+         break;
+      }
+
+      default:
+         return -1; // Unsupported definition type, return an error.
+   }
+
+   // "All glyphs not assigned to a class fall into class 0". (OpenType spec)
+   return 0;
+}
+
+// Define to STBTT_assert(x) if you want to break on unimplemented formats.
+#define STBTT_GPOS_TODO_assert(x)
+
+static stbtt_int32 stbtt__GetGlyphGPOSInfoAdvance(const stbtt_fontinfo *info, int glyph1, int glyph2)
+{
+   stbtt_uint16 lookupListOffset;
+   stbtt_uint8 *lookupList;
+   stbtt_uint16 lookupCount;
+   stbtt_uint8 *data;
+   stbtt_int32 i, sti;
+
+   if (!info->gpos) return 0;
+
+   data = info->data + info->gpos;
+
+   if (ttUSHORT(data+0) != 1) return 0; // Major version 1
+   if (ttUSHORT(data+2) != 0) return 0; // Minor version 0
+
+   lookupListOffset = ttUSHORT(data+8);
+   lookupList = data + lookupListOffset;
+   lookupCount = ttUSHORT(lookupList);
+
+   for (i=0; i<lookupCount; ++i) {
+      stbtt_uint16 lookupOffset = ttUSHORT(lookupList + 2 + 2 * i);
+      stbtt_uint8 *lookupTable = lookupList + lookupOffset;
+
+      stbtt_uint16 lookupType = ttUSHORT(lookupTable);
+      stbtt_uint16 subTableCount = ttUSHORT(lookupTable + 4);
+      stbtt_uint8 *subTableOffsets = lookupTable + 6;
+      if (lookupType != 2) // Pair Adjustment Positioning Subtable
+         continue;
+
+      for (sti=0; sti<subTableCount; sti++) {
+         stbtt_uint16 subtableOffset = ttUSHORT(subTableOffsets + 2 * sti);
+         stbtt_uint8 *table = lookupTable + subtableOffset;
+         stbtt_uint16 posFormat = ttUSHORT(table);
+         stbtt_uint16 coverageOffset = ttUSHORT(table + 2);
+         stbtt_int32 coverageIndex = stbtt__GetCoverageIndex(table + coverageOffset, glyph1);
+         if (coverageIndex == -1) continue;
+
+         switch (posFormat) {
+            case 1: {
+               stbtt_int32 l, r, m;
+               int straw, needle;
+               stbtt_uint16 valueFormat1 = ttUSHORT(table + 4);
+               stbtt_uint16 valueFormat2 = ttUSHORT(table + 6);
+               if (valueFormat1 == 4 && valueFormat2 == 0) { // Support more formats?
+                  stbtt_int32 valueRecordPairSizeInBytes = 2;
+                  stbtt_uint16 pairSetCount = ttUSHORT(table + 8);
+                  stbtt_uint16 pairPosOffset = ttUSHORT(table + 10 + 2 * coverageIndex);
+                  stbtt_uint8 *pairValueTable = table + pairPosOffset;
+                  stbtt_uint16 pairValueCount = ttUSHORT(pairValueTable);
+                  stbtt_uint8 *pairValueArray = pairValueTable + 2;
+
+                  if (coverageIndex >= pairSetCount) return 0;
+
+                  needle=glyph2;
+                  r=pairValueCount-1;
+                  l=0;
+
+                  // Binary search.
+                  while (l <= r) {
+                     stbtt_uint16 secondGlyph;
+                     stbtt_uint8 *pairValue;
+                     m = (l + r) >> 1;
+                     pairValue = pairValueArray + (2 + valueRecordPairSizeInBytes) * m;
+                     secondGlyph = ttUSHORT(pairValue);
+                     straw = secondGlyph;
+                     if (needle < straw)
+                        r = m - 1;
+                     else if (needle > straw)
+                        l = m + 1;
+                     else {
+                        stbtt_int16 xAdvance = ttSHORT(pairValue + 2);
+                        return xAdvance;
+                     }
+                  }
+               } else
+                  return 0;
+               break;
+            }
+
+            case 2: {
+               stbtt_uint16 valueFormat1 = ttUSHORT(table + 4);
+               stbtt_uint16 valueFormat2 = ttUSHORT(table + 6);
+               if (valueFormat1 == 4 && valueFormat2 == 0) { // Support more formats?
+                  stbtt_uint16 classDef1Offset = ttUSHORT(table + 8);
+                  stbtt_uint16 classDef2Offset = ttUSHORT(table + 10);
+                  int glyph1class = stbtt__GetGlyphClass(table + classDef1Offset, glyph1);
+                  int glyph2class = stbtt__GetGlyphClass(table + classDef2Offset, glyph2);
+
+                  stbtt_uint16 class1Count = ttUSHORT(table + 12);
+                  stbtt_uint16 class2Count = ttUSHORT(table + 14);
+                  stbtt_uint8 *class1Records, *class2Records;
+                  stbtt_int16 xAdvance;
+
+                  if (glyph1class < 0 || glyph1class >= class1Count) return 0; // malformed
+                  if (glyph2class < 0 || glyph2class >= class2Count) return 0; // malformed
+
+                  class1Records = table + 16;
+                  class2Records = class1Records + 2 * (glyph1class * class2Count);
+                  xAdvance = ttSHORT(class2Records + 2 * glyph2class);
+                  return xAdvance;
+               } else
+                  return 0;
+               break;
+            }
+
+            default:
+               return 0; // Unsupported position format
+         }
+      }
+   }
+
+   return 0;
+}
+
+STBTT_DEF int  stbtt_GetGlyphKernAdvance(const stbtt_fontinfo *info, int g1, int g2)
+{
+   int xAdvance = 0;
+
+   if (info->gpos)
+      xAdvance += stbtt__GetGlyphGPOSInfoAdvance(info, g1, g2);
+   else if (info->kern)
+      xAdvance += stbtt__GetGlyphKernInfoAdvance(info, g1, g2);
+
+   return xAdvance;
+}
+
+STBTT_DEF int  stbtt_GetCodepointKernAdvance(const stbtt_fontinfo *info, int ch1, int ch2)
+{
+   if (!info->kern && !info->gpos) // if no kerning table, don't waste time looking up both codepoint->glyphs
+      return 0;
+   return stbtt_GetGlyphKernAdvance(info, stbtt_FindGlyphIndex(info,ch1), stbtt_FindGlyphIndex(info,ch2));
+}
+
+STBTT_DEF void stbtt_GetCodepointHMetrics(const stbtt_fontinfo *info, int codepoint, int *advanceWidth, int *leftSideBearing)
+{
+   stbtt_GetGlyphHMetrics(info, stbtt_FindGlyphIndex(info,codepoint), advanceWidth, leftSideBearing);
+}
+
+STBTT_DEF void stbtt_GetFontVMetrics(const stbtt_fontinfo *info, int *ascent, int *descent, int *lineGap)
+{
+   if (ascent ) *ascent  = ttSHORT(info->data+info->hhea + 4);
+   if (descent) *descent = ttSHORT(info->data+info->hhea + 6);
+   if (lineGap) *lineGap = ttSHORT(info->data+info->hhea + 8);
+}
+
+STBTT_DEF int  stbtt_GetFontVMetricsOS2(const stbtt_fontinfo *info, int *typoAscent, int *typoDescent, int *typoLineGap)
+{
+   int tab = stbtt__find_table(info->data, info->fontstart, "OS/2");
+   if (!tab)
+      return 0;
+   if (typoAscent ) *typoAscent  = ttSHORT(info->data+tab + 68);
+   if (typoDescent) *typoDescent = ttSHORT(info->data+tab + 70);
+   if (typoLineGap) *typoLineGap = ttSHORT(info->data+tab + 72);
+   return 1;
+}
+
+STBTT_DEF void stbtt_GetFontBoundingBox(const stbtt_fontinfo *info, int *x0, int *y0, int *x1, int *y1)
+{
+   *x0 = ttSHORT(info->data + info->head + 36);
+   *y0 = ttSHORT(info->data + info->head + 38);
+   *x1 = ttSHORT(info->data + info->head + 40);
+   *y1 = ttSHORT(info->data + info->head + 42);
+}
+
+STBTT_DEF float stbtt_ScaleForPixelHeight(const stbtt_fontinfo *info, float height)
+{
+   int fheight = ttSHORT(info->data + info->hhea + 4) - ttSHORT(info->data + info->hhea + 6);
+   return (float) height / fheight;
+}
+
+STBTT_DEF float stbtt_ScaleForMappingEmToPixels(const stbtt_fontinfo *info, float pixels)
+{
+   int unitsPerEm = ttUSHORT(info->data + info->head + 18);
+   return pixels / unitsPerEm;
+}
+
+STBTT_DEF void stbtt_FreeShape(const stbtt_fontinfo *info, stbtt_vertex *v)
+{
+   STBTT_free(v, info->userdata);
+}
+
+STBTT_DEF stbtt_uint8 *stbtt_FindSVGDoc(const stbtt_fontinfo *info, int gl)
+{
+   int i;
+   stbtt_uint8 *data = info->data;
+   stbtt_uint8 *svg_doc_list = data + stbtt__get_svg((stbtt_fontinfo *) info);
+
+   int numEntries = ttUSHORT(svg_doc_list);
+   stbtt_uint8 *svg_docs = svg_doc_list + 2;
+
+   for(i=0; i<numEntries; i++) {
+      stbtt_uint8 *svg_doc = svg_docs + (12 * i);
+      if ((gl >= ttUSHORT(svg_doc)) && (gl <= ttUSHORT(svg_doc + 2)))
+         return svg_doc;
+   }
+   return 0;
+}
+
+STBTT_DEF int stbtt_GetGlyphSVG(const stbtt_fontinfo *info, int gl, const char **svg)
+{
+   stbtt_uint8 *data = info->data;
+   stbtt_uint8 *svg_doc;
+
+   if (info->svg == 0)
+      return 0;
+
+   svg_doc = stbtt_FindSVGDoc(info, gl);
+   if (svg_doc != NULL) {
+      *svg = (char *) data + info->svg + ttULONG(svg_doc + 4);
+      return ttULONG(svg_doc + 8);
+   } else {
+      return 0;
+   }
+}
+
+STBTT_DEF int stbtt_GetCodepointSVG(const stbtt_fontinfo *info, int unicode_codepoint, const char **svg)
+{
+   return stbtt_GetGlyphSVG(info, stbtt_FindGlyphIndex(info, unicode_codepoint), svg);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// antialiasing software rasterizer
+//
+
+STBTT_DEF void stbtt_GetGlyphBitmapBoxSubpixel(const stbtt_fontinfo *font, int glyph, float scale_x, float scale_y,float shift_x, float shift_y, int *ix0, int *iy0, int *ix1, int *iy1)
+{
+   int x0=0,y0=0,x1,y1; // =0 suppresses compiler warning
+   if (!stbtt_GetGlyphBox(font, glyph, &x0,&y0,&x1,&y1)) {
+      // e.g. space character
+      if (ix0) *ix0 = 0;
+      if (iy0) *iy0 = 0;
+      if (ix1) *ix1 = 0;
+      if (iy1) *iy1 = 0;
+   } else {
+      // move to integral bboxes (treating pixels as little squares, what pixels get touched)?
+      if (ix0) *ix0 = STBTT_ifloor( x0 * scale_x + shift_x);
+      if (iy0) *iy0 = STBTT_ifloor(-y1 * scale_y + shift_y);
+      if (ix1) *ix1 = STBTT_iceil ( x1 * scale_x + shift_x);
+      if (iy1) *iy1 = STBTT_iceil (-y0 * scale_y + shift_y);
+   }
+}
+
+STBTT_DEF void stbtt_GetGlyphBitmapBox(const stbtt_fontinfo *font, int glyph, float scale_x, float scale_y, int *ix0, int *iy0, int *ix1, int *iy1)
+{
+   stbtt_GetGlyphBitmapBoxSubpixel(font, glyph, scale_x, scale_y,0.0f,0.0f, ix0, iy0, ix1, iy1);
+}
+
+STBTT_DEF void stbtt_GetCodepointBitmapBoxSubpixel(const stbtt_fontinfo *font, int codepoint, float scale_x, float scale_y, float shift_x, float shift_y, int *ix0, int *iy0, int *ix1, int *iy1)
+{
+   stbtt_GetGlyphBitmapBoxSubpixel(font, stbtt_FindGlyphIndex(font,codepoint), scale_x, scale_y,shift_x,shift_y, ix0,iy0,ix1,iy1);
+}
+
+STBTT_DEF void stbtt_GetCodepointBitmapBox(const stbtt_fontinfo *font, int codepoint, float scale_x, float scale_y, int *ix0, int *iy0, int *ix1, int *iy1)
+{
+   stbtt_GetCodepointBitmapBoxSubpixel(font, codepoint, scale_x, scale_y,0.0f,0.0f, ix0,iy0,ix1,iy1);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  Rasterizer
+
+typedef struct stbtt__hheap_chunk
+{
+   struct stbtt__hheap_chunk *next;
+} stbtt__hheap_chunk;
+
+typedef struct stbtt__hheap
+{
+   struct stbtt__hheap_chunk *head;
+   void   *first_free;
+   int    num_remaining_in_head_chunk;
+} stbtt__hheap;
+
+static void *stbtt__hheap_alloc(stbtt__hheap *hh, size_t size, void *userdata)
+{
+   if (hh->first_free) {
+      void *p = hh->first_free;
+      hh->first_free = * (void **) p;
+      return p;
+   } else {
+      if (hh->num_remaining_in_head_chunk == 0) {
+         int count = (size < 32 ? 2000 : size < 128 ? 800 : 100);
+         stbtt__hheap_chunk *c = (stbtt__hheap_chunk *) STBTT_malloc(sizeof(stbtt__hheap_chunk) + size * count, userdata);
+         if (c == NULL)
+            return NULL;
+         c->next = hh->head;
+         hh->head = c;
+         hh->num_remaining_in_head_chunk = count;
+      }
+      --hh->num_remaining_in_head_chunk;
+      return (char *) (hh->head) + sizeof(stbtt__hheap_chunk) + size * hh->num_remaining_in_head_chunk;
+   }
+}
+
+static void stbtt__hheap_free(stbtt__hheap *hh, void *p)
+{
+   *(void **) p = hh->first_free;
+   hh->first_free = p;
+}
+
+static void stbtt__hheap_cleanup(stbtt__hheap *hh, void *userdata)
+{
+   stbtt__hheap_chunk *c = hh->head;
+   while (c) {
+      stbtt__hheap_chunk *n = c->next;
+      STBTT_free(c, userdata);
+      c = n;
+   }
+}
+
+typedef struct stbtt__edge {
+   float x0,y0, x1,y1;
+   int invert;
+} stbtt__edge;
+
+
+typedef struct stbtt__active_edge
+{
+   struct stbtt__active_edge *next;
+   #if STBTT_RASTERIZER_VERSION==1
+   int x,dx;
+   float ey;
+   int direction;
+   #elif STBTT_RASTERIZER_VERSION==2
+   float fx,fdx,fdy;
+   float direction;
+   float sy;
+   float ey;
+   #else
+   #error "Unrecognized value of STBTT_RASTERIZER_VERSION"
+   #endif
+} stbtt__active_edge;
+
+#if STBTT_RASTERIZER_VERSION == 1
+#define STBTT_FIXSHIFT   10
+#define STBTT_FIX        (1 << STBTT_FIXSHIFT)
+#define STBTT_FIXMASK    (STBTT_FIX-1)
+
+static stbtt__active_edge *stbtt__new_active(stbtt__hheap *hh, stbtt__edge *e, int off_x, float start_point, void *userdata)
+{
+   stbtt__active_edge *z = (stbtt__active_edge *) stbtt__hheap_alloc(hh, sizeof(*z), userdata);
+   float dxdy = (e->x1 - e->x0) / (e->y1 - e->y0);
+   STBTT_assert(z != NULL);
+   if (!z) return z;
+
+   // round dx down to avoid overshooting
+   if (dxdy < 0)
+      z->dx = -STBTT_ifloor(STBTT_FIX * -dxdy);
+   else
+      z->dx = STBTT_ifloor(STBTT_FIX * dxdy);
+
+   z->x = STBTT_ifloor(STBTT_FIX * e->x0 + z->dx * (start_point - e->y0)); // use z->dx so when we offset later it's by the same amount
+   z->x -= off_x * STBTT_FIX;
+
+   z->ey = e->y1;
+   z->next = 0;
+   z->direction = e->invert ? 1 : -1;
+   return z;
+}
+#elif STBTT_RASTERIZER_VERSION == 2
+static stbtt__active_edge *stbtt__new_active(stbtt__hheap *hh, stbtt__edge *e, int off_x, float start_point, void *userdata)
+{
+   stbtt__active_edge *z = (stbtt__active_edge *) stbtt__hheap_alloc(hh, sizeof(*z), userdata);
+   float dxdy = (e->x1 - e->x0) / (e->y1 - e->y0);
+   STBTT_assert(z != NULL);
+   //STBTT_assert(e->y0 <= start_point);
+   if (!z) return z;
+   z->fdx = dxdy;
+   z->fdy = dxdy != 0.0f ? (1.0f/dxdy) : 0.0f;
+   z->fx = e->x0 + dxdy * (start_point - e->y0);
+   z->fx -= off_x;
+   z->direction = e->invert ? 1.0f : -1.0f;
+   z->sy = e->y0;
+   z->ey = e->y1;
+   z->next = 0;
+   return z;
+}
+#else
+#error "Unrecognized value of STBTT_RASTERIZER_VERSION"
+#endif
+
+#if STBTT_RASTERIZER_VERSION == 1
+// note: this routine clips fills that extend off the edges... ideally this
+// wouldn't happen, but it could happen if the truetype glyph bounding boxes
+// are wrong, or if the user supplies a too-small bitmap
+static void stbtt__fill_active_edges(unsigned char *scanline, int len, stbtt__active_edge *e, int max_weight)
+{
+   // non-zero winding fill
+   int x0=0, w=0;
+
+   while (e) {
+      if (w == 0) {
+         // if we're currently at zero, we need to record the edge start point
+         x0 = e->x; w += e->direction;
+      } else {
+         int x1 = e->x; w += e->direction;
+         // if we went to zero, we need to draw
+         if (w == 0) {
+            int i = x0 >> STBTT_FIXSHIFT;
+            int j = x1 >> STBTT_FIXSHIFT;
+
+            if (i < len && j >= 0) {
+               if (i == j) {
+                  // x0,x1 are the same pixel, so compute combined coverage
+                  scanline[i] = scanline[i] + (stbtt_uint8) ((x1 - x0) * max_weight >> STBTT_FIXSHIFT);
+               } else {
+                  if (i >= 0) // add antialiasing for x0
+                     scanline[i] = scanline[i] + (stbtt_uint8) (((STBTT_FIX - (x0 & STBTT_FIXMASK)) * max_weight) >> STBTT_FIXSHIFT);
+                  else
+                     i = -1; // clip
+
+                  if (j < len) // add antialiasing for x1
+                     scanline[j] = scanline[j] + (stbtt_uint8) (((x1 & STBTT_FIXMASK) * max_weight) >> STBTT_FIXSHIFT);
+                  else
+                     j = len; // clip
+
+                  for (++i; i < j; ++i) // fill pixels between x0 and x1
+                     scanline[i] = scanline[i] + (stbtt_uint8) max_weight;
+               }
+            }
+         }
+      }
+
+      e = e->next;
+   }
+}
+
+static void stbtt__rasterize_sorted_edges(stbtt__bitmap *result, stbtt__edge *e, int n, int vsubsample, int off_x, int off_y, void *userdata)
+{
+   stbtt__hheap hh = { 0, 0, 0 };
+   stbtt__active_edge *active = NULL;
+   int y,j=0;
+   int max_weight = (255 / vsubsample);  // weight per vertical scanline
+   int s; // vertical subsample index
+   unsigned char scanline_data[512], *scanline;
+
+   if (result->w > 512)
+      scanline = (unsigned char *) STBTT_malloc(result->w, userdata);
+   else
+      scanline = scanline_data;
+
+   y = off_y * vsubsample;
+   e[n].y0 = (off_y + result->h) * (float) vsubsample + 1;
+
+   while (j < result->h) {
+      STBTT_memset(scanline, 0, result->w);
+      for (s=0; s < vsubsample; ++s) {
+         // find center of pixel for this scanline
+         float scan_y = y + 0.5f;
+         stbtt__active_edge **step = &active;
+
+         // update all active edges;
+         // remove all active edges that terminate before the center of this scanline
+         while (*step) {
+            stbtt__active_edge * z = *step;
+            if (z->ey <= scan_y) {
+               *step = z->next; // delete from list
+               STBTT_assert(z->direction);
+               z->direction = 0;
+               stbtt__hheap_free(&hh, z);
+            } else {
+               z->x += z->dx; // advance to position for current scanline
+               step = &((*step)->next); // advance through list
+            }
+         }
+
+         // resort the list if needed
+         for(;;) {
+            int changed=0;
+            step = &active;
+            while (*step && (*step)->next) {
+               if ((*step)->x > (*step)->next->x) {
+                  stbtt__active_edge *t = *step;
+                  stbtt__active_edge *q = t->next;
+
+                  t->next = q->next;
+                  q->next = t;
+                  *step = q;
+                  changed = 1;
+               }
+               step = &(*step)->next;
+            }
+            if (!changed) break;
+         }
+
+         // insert all edges that start before the center of this scanline -- omit ones that also end on this scanline
+         while (e->y0 <= scan_y) {
+            if (e->y1 > scan_y) {
+               stbtt__active_edge *z = stbtt__new_active(&hh, e, off_x, scan_y, userdata);
+               if (z != NULL) {
+                  // find insertion point
+                  if (active == NULL)
+                     active = z;
+                  else if (z->x < active->x) {
+                     // insert at front
+                     z->next = active;
+                     active = z;
+                  } else {
+                     // find thing to insert AFTER
+                     stbtt__active_edge *p = active;
+                     while (p->next && p->next->x < z->x)
+                        p = p->next;
+                     // at this point, p->next->x is NOT < z->x
+                     z->next = p->next;
+                     p->next = z;
+                  }
+               }
+            }
+            ++e;
+         }
+
+         // now process all active edges in XOR fashion
+         if (active)
+            stbtt__fill_active_edges(scanline, result->w, active, max_weight);
+
+         ++y;
+      }
+      STBTT_memcpy(result->pixels + j * result->stride, scanline, result->w);
+      ++j;
+   }
+
+   stbtt__hheap_cleanup(&hh, userdata);
+
+   if (scanline != scanline_data)
+      STBTT_free(scanline, userdata);
+}
+
+#elif STBTT_RASTERIZER_VERSION == 2
+
+// the edge passed in here does not cross the vertical line at x or the vertical line at x+1
+// (i.e. it has already been clipped to those)
+static void stbtt__handle_clipped_edge(float *scanline, int x, stbtt__active_edge *e, float x0, float y0, float x1, float y1)
+{
+   if (y0 == y1) return;
+   STBTT_assert(y0 < y1);
+   STBTT_assert(e->sy <= e->ey);
+   if (y0 > e->ey) return;
+   if (y1 < e->sy) return;
+   if (y0 < e->sy) {
+      x0 += (x1-x0) * (e->sy - y0) / (y1-y0);
+      y0 = e->sy;
+   }
+   if (y1 > e->ey) {
+      x1 += (x1-x0) * (e->ey - y1) / (y1-y0);
+      y1 = e->ey;
+   }
+
+   if (x0 == x)
+      STBTT_assert(x1 <= x+1);
+   else if (x0 == x+1)
+      STBTT_assert(x1 >= x);
+   else if (x0 <= x)
+      STBTT_assert(x1 <= x);
+   else if (x0 >= x+1)
+      STBTT_assert(x1 >= x+1);
+   else
+      STBTT_assert(x1 >= x && x1 <= x+1);
+
+   if (x0 <= x && x1 <= x)
+      scanline[x] += e->direction * (y1-y0);
+   else if (x0 >= x+1 && x1 >= x+1)
+      ;
+   else {
+      STBTT_assert(x0 >= x && x0 <= x+1 && x1 >= x && x1 <= x+1);
+      scanline[x] += e->direction * (y1-y0) * (1-((x0-x)+(x1-x))/2); // coverage = 1 - average x position
+   }
+}
+
+static float stbtt__sized_trapezoid_area(float height, float top_width, float bottom_width)
+{
+   STBTT_assert(top_width >= 0);
+   STBTT_assert(bottom_width >= 0);
+   return (top_width + bottom_width) / 2.0f * height;
+}
+
+static float stbtt__position_trapezoid_area(float height, float tx0, float tx1, float bx0, float bx1)
+{
+   return stbtt__sized_trapezoid_area(height, tx1 - tx0, bx1 - bx0);
+}
+
+static float stbtt__sized_triangle_area(float height, float width)
+{
+   return height * width / 2;
+}
+
+static void stbtt__fill_active_edges_new(float *scanline, float *scanline_fill, int len, stbtt__active_edge *e, float y_top)
+{
+   float y_bottom = y_top+1;
+
+   while (e) {
+      // brute force every pixel
+
+      // compute intersection points with top & bottom
+      STBTT_assert(e->ey >= y_top);
+
+      if (e->fdx == 0) {
+         float x0 = e->fx;
+         if (x0 < len) {
+            if (x0 >= 0) {
+               stbtt__handle_clipped_edge(scanline,(int) x0,e, x0,y_top, x0,y_bottom);
+               stbtt__handle_clipped_edge(scanline_fill-1,(int) x0+1,e, x0,y_top, x0,y_bottom);
+            } else {
+               stbtt__handle_clipped_edge(scanline_fill-1,0,e, x0,y_top, x0,y_bottom);
+            }
+         }
+      } else {
+         float x0 = e->fx;
+         float dx = e->fdx;
+         float xb = x0 + dx;
+         float x_top, x_bottom;
+         float sy0,sy1;
+         float dy = e->fdy;
+         STBTT_assert(e->sy <= y_bottom && e->ey >= y_top);
+
+         // compute endpoints of line segment clipped to this scanline (if the
+         // line segment starts on this scanline. x0 is the intersection of the
+         // line with y_top, but that may be off the line segment.
+         if (e->sy > y_top) {
+            x_top = x0 + dx * (e->sy - y_top);
+            sy0 = e->sy;
+         } else {
+            x_top = x0;
+            sy0 = y_top;
+         }
+         if (e->ey < y_bottom) {
+            x_bottom = x0 + dx * (e->ey - y_top);
+            sy1 = e->ey;
+         } else {
+            x_bottom = xb;
+            sy1 = y_bottom;
+         }
+
+         if (x_top >= 0 && x_bottom >= 0 && x_top < len && x_bottom < len) {
+            // from here on, we don't have to range check x values
+
+            if ((int) x_top == (int) x_bottom) {
+               float height;
+               // simple case, only spans one pixel
+               int x = (int) x_top;
+               height = (sy1 - sy0) * e->direction;
+               STBTT_assert(x >= 0 && x < len);
+               scanline[x]      += stbtt__position_trapezoid_area(height, x_top, x+1.0f, x_bottom, x+1.0f);
+               scanline_fill[x] += height; // everything right of this pixel is filled
+            } else {
+               int x,x1,x2;
+               float y_crossing, y_final, step, sign, area;
+               // covers 2+ pixels
+               if (x_top > x_bottom) {
+                  // flip scanline vertically; signed area is the same
+                  float t;
+                  sy0 = y_bottom - (sy0 - y_top);
+                  sy1 = y_bottom - (sy1 - y_top);
+                  t = sy0, sy0 = sy1, sy1 = t;
+                  t = x_bottom, x_bottom = x_top, x_top = t;
+                  dx = -dx;
+                  dy = -dy;
+                  t = x0, x0 = xb, xb = t;
+               }
+               STBTT_assert(dy >= 0);
+               STBTT_assert(dx >= 0);
+
+               x1 = (int) x_top;
+               x2 = (int) x_bottom;
+               // compute intersection with y axis at x1+1
+               y_crossing = y_top + dy * (x1+1 - x0);
+
+               // compute intersection with y axis at x2
+               y_final = y_top + dy * (x2 - x0);
+
+               //           x1    x_top                            x2    x_bottom
+               //     y_top  +------|-----+------------+------------+--------|---+------------+
+               //            |            |            |            |            |            |
+               //            |            |            |            |            |            |
+               //       sy0  |      Txxxxx|............|............|............|............|
+               // y_crossing |            *xxxxx.......|............|............|............|
+               //            |            |     xxxxx..|............|............|............|
+               //            |            |     /-   xx*xxxx........|............|............|
+               //            |            | dy <       |    xxxxxx..|............|............|
+               //   y_final  |            |     \-     |          xx*xxx.........|............|
+               //       sy1  |            |            |            |   xxxxxB...|............|
+               //            |            |            |            |            |            |
+               //            |            |            |            |            |            |
+               //  y_bottom  +------------+------------+------------+------------+------------+
+               //
+               // goal is to measure the area covered by '.' in each pixel
+
+               // if x2 is right at the right edge of x1, y_crossing can blow up, github #1057
+               // @TODO: maybe test against sy1 rather than y_bottom?
+               if (y_crossing > y_bottom)
+                  y_crossing = y_bottom;
+
+               sign = e->direction;
+
+               // area of the rectangle covered from sy0..y_crossing
+               area = sign * (y_crossing-sy0);
+
+               // area of the triangle (x_top,sy0), (x1+1,sy0), (x1+1,y_crossing)
+               scanline[x1] += stbtt__sized_triangle_area(area, x1+1 - x_top);
+
+               // check if final y_crossing is blown up; no test case for this
+               if (y_final > y_bottom) {
+                  y_final = y_bottom;
+                  dy = (y_final - y_crossing ) / (x2 - (x1+1)); // if denom=0, y_final = y_crossing, so y_final <= y_bottom
+               }
+
+               // in second pixel, area covered by line segment found in first pixel
+               // is always a rectangle 1 wide * the height of that line segment; this
+               // is exactly what the variable 'area' stores. it also gets a contribution
+               // from the line segment within it. the THIRD pixel will get the first
+               // pixel's rectangle contribution, the second pixel's rectangle contribution,
+               // and its own contribution. the 'own contribution' is the same in every pixel except
+               // the leftmost and rightmost, a trapezoid that slides down in each pixel.
+               // the second pixel's contribution to the third pixel will be the
+               // rectangle 1 wide times the height change in the second pixel, which is dy.
+
+               step = sign * dy * 1; // dy is dy/dx, change in y for every 1 change in x,
+               // which multiplied by 1-pixel-width is how much pixel area changes for each step in x
+               // so the area advances by 'step' every time
+
+               for (x = x1+1; x < x2; ++x) {
+                  scanline[x] += area + step/2; // area of trapezoid is 1*step/2
+                  area += step;
+               }
+               STBTT_assert(STBTT_fabs(area) <= 1.01f); // accumulated error from area += step unless we round step down
+               STBTT_assert(sy1 > y_final-0.01f);
+
+               // area covered in the last pixel is the rectangle from all the pixels to the left,
+               // plus the trapezoid filled by the line segment in this pixel all the way to the right edge
+               scanline[x2] += area + sign * stbtt__position_trapezoid_area(sy1-y_final, (float) x2, x2+1.0f, x_bottom, x2+1.0f);
+
+               // the rest of the line is filled based on the total height of the line segment in this pixel
+               scanline_fill[x2] += sign * (sy1-sy0);
+            }
+         } else {
+            // if edge goes outside of box we're drawing, we require
+            // clipping logic. since this does not match the intended use
+            // of this library, we use a different, very slow brute
+            // force implementation
+            // note though that this does happen some of the time because
+            // x_top and x_bottom can be extrapolated at the top & bottom of
+            // the shape and actually lie outside the bounding box
+            int x;
+            for (x=0; x < len; ++x) {
+               // cases:
+               //
+               // there can be up to two intersections with the pixel. any intersection
+               // with left or right edges can be handled by splitting into two (or three)
+               // regions. intersections with top & bottom do not necessitate case-wise logic.
+               //
+               // the old way of doing this found the intersections with the left & right edges,
+               // then used some simple logic to produce up to three segments in sorted order
+               // from top-to-bottom. however, this had a problem: if an x edge was epsilon
+               // across the x border, then the corresponding y position might not be distinct
+               // from the other y segment, and it might ignored as an empty segment. to avoid
+               // that, we need to explicitly produce segments based on x positions.
+
+               // rename variables to clearly-defined pairs
+               float y0 = y_top;
+               float x1 = (float) (x);
+               float x2 = (float) (x+1);
+               float x3 = xb;
+               float y3 = y_bottom;
+
+               // x = e->x + e->dx * (y-y_top)
+               // (y-y_top) = (x - e->x) / e->dx
+               // y = (x - e->x) / e->dx + y_top
+               float y1 = (x - x0) / dx + y_top;
+               float y2 = (x+1 - x0) / dx + y_top;
+
+               if (x0 < x1 && x3 > x2) {         // three segments descending down-right
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x1,y1);
+                  stbtt__handle_clipped_edge(scanline,x,e, x1,y1, x2,y2);
+                  stbtt__handle_clipped_edge(scanline,x,e, x2,y2, x3,y3);
+               } else if (x3 < x1 && x0 > x2) {  // three segments descending down-left
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x2,y2);
+                  stbtt__handle_clipped_edge(scanline,x,e, x2,y2, x1,y1);
+                  stbtt__handle_clipped_edge(scanline,x,e, x1,y1, x3,y3);
+               } else if (x0 < x1 && x3 > x1) {  // two segments across x, down-right
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x1,y1);
+                  stbtt__handle_clipped_edge(scanline,x,e, x1,y1, x3,y3);
+               } else if (x3 < x1 && x0 > x1) {  // two segments across x, down-left
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x1,y1);
+                  stbtt__handle_clipped_edge(scanline,x,e, x1,y1, x3,y3);
+               } else if (x0 < x2 && x3 > x2) {  // two segments across x+1, down-right
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x2,y2);
+                  stbtt__handle_clipped_edge(scanline,x,e, x2,y2, x3,y3);
+               } else if (x3 < x2 && x0 > x2) {  // two segments across x+1, down-left
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x2,y2);
+                  stbtt__handle_clipped_edge(scanline,x,e, x2,y2, x3,y3);
+               } else {  // one segment
+                  stbtt__handle_clipped_edge(scanline,x,e, x0,y0, x3,y3);
+               }
+            }
+         }
+      }
+      e = e->next;
+   }
+}
+
+// directly AA rasterize edges w/o supersampling
+static void stbtt__rasterize_sorted_edges(stbtt__bitmap *result, stbtt__edge *e, int n, int vsubsample, int off_x, int off_y, void *userdata)
+{
+   stbtt__hheap hh = { 0, 0, 0 };
+   stbtt__active_edge *active = NULL;
+   int y,j=0, i;
+   float scanline_data[129], *scanline, *scanline2;
+
+   STBTT__NOTUSED(vsubsample);
+
+   if (result->w > 64)
+      scanline = (float *) STBTT_malloc((result->w*2+1) * sizeof(float), userdata);
+   else
+      scanline = scanline_data;
+
+   scanline2 = scanline + result->w;
+
+   y = off_y;
+   e[n].y0 = (float) (off_y + result->h) + 1;
+
+   while (j < result->h) {
+      // find center of pixel for this scanline
+      float scan_y_top    = y + 0.0f;
+      float scan_y_bottom = y + 1.0f;
+      stbtt__active_edge **step = &active;
+
+      STBTT_memset(scanline , 0, result->w*sizeof(scanline[0]));
+      STBTT_memset(scanline2, 0, (result->w+1)*sizeof(scanline[0]));
+
+      // update all active edges;
+      // remove all active edges that terminate before the top of this scanline
+      while (*step) {
+         stbtt__active_edge * z = *step;
+         if (z->ey <= scan_y_top) {
+            *step = z->next; // delete from list
+            STBTT_assert(z->direction);
+            z->direction = 0;
+            stbtt__hheap_free(&hh, z);
+         } else {
+            step = &((*step)->next); // advance through list
+         }
+      }
+
+      // insert all edges that start before the bottom of this scanline
+      while (e->y0 <= scan_y_bottom) {
+         if (e->y0 != e->y1) {
+            stbtt__active_edge *z = stbtt__new_active(&hh, e, off_x, scan_y_top, userdata);
+            if (z != NULL) {
+               if (j == 0 && off_y != 0) {
+                  if (z->ey < scan_y_top) {
+                     // this can happen due to subpixel positioning and some kind of fp rounding error i think
+                     z->ey = scan_y_top;
+                  }
+               }
+               STBTT_assert(z->ey >= scan_y_top); // if we get really unlucky a tiny bit of an edge can be out of bounds
+               // insert at front
+               z->next = active;
+               active = z;
+            }
+         }
+         ++e;
+      }
+
+      // now process all active edges
+      if (active)
+         stbtt__fill_active_edges_new(scanline, scanline2+1, result->w, active, scan_y_top);
+
+      {
+         float sum = 0;
+         for (i=0; i < result->w; ++i) {
+            float k;
+            int m;
+            sum += scanline2[i];
+            k = scanline[i] + sum;
+            k = (float) STBTT_fabs(k)*255 + 0.5f;
+            m = (int) k;
+            if (m > 255) m = 255;
+            result->pixels[j*result->stride + i] = (unsigned char) m;
+         }
+      }
+      // advance all the edges
+      step = &active;
+      while (*step) {
+         stbtt__active_edge *z = *step;
+         z->fx += z->fdx; // advance to position for current scanline
+         step = &((*step)->next); // advance through list
+      }
+
+      ++y;
+      ++j;
+   }
+
+   stbtt__hheap_cleanup(&hh, userdata);
+
+   if (scanline != scanline_data)
+      STBTT_free(scanline, userdata);
+}
+#else
+#error "Unrecognized value of STBTT_RASTERIZER_VERSION"
+#endif
+
+#define STBTT__COMPARE(a,b)  ((a)->y0 < (b)->y0)
+
+static void stbtt__sort_edges_ins_sort(stbtt__edge *p, int n)
+{
+   int i,j;
+   for (i=1; i < n; ++i) {
+      stbtt__edge t = p[i], *a = &t;
+      j = i;
+      while (j > 0) {
+         stbtt__edge *b = &p[j-1];
+         int c = STBTT__COMPARE(a,b);
+         if (!c) break;
+         p[j] = p[j-1];
+         --j;
+      }
+      if (i != j)
+         p[j] = t;
+   }
+}
+
+static void stbtt__sort_edges_quicksort(stbtt__edge *p, int n)
+{
+   /* threshold for transitioning to insertion sort */
+   while (n > 12) {
+      stbtt__edge t;
+      int c01,c12,c,m,i,j;
+
+      /* compute median of three */
+      m = n >> 1;
+      c01 = STBTT__COMPARE(&p[0],&p[m]);
+      c12 = STBTT__COMPARE(&p[m],&p[n-1]);
+      /* if 0 >= mid >= end, or 0 < mid < end, then use mid */
+      if (c01 != c12) {
+         /* otherwise, we'll need to swap something else to middle */
+         int z;
+         c = STBTT__COMPARE(&p[0],&p[n-1]);
+         /* 0>mid && mid<n:  0>n => n; 0<n => 0 */
+         /* 0<mid && mid>n:  0>n => 0; 0<n => n */
+         z = (c == c12) ? 0 : n-1;
+         t = p[z];
+         p[z] = p[m];
+         p[m] = t;
+      }
+      /* now p[m] is the median-of-three */
+      /* swap it to the beginning so it won't move around */
+      t = p[0];
+      p[0] = p[m];
+      p[m] = t;
+
+      /* partition loop */
+      i=1;
+      j=n-1;
+      for(;;) {
+         /* handling of equality is crucial here */
+         /* for sentinels & efficiency with duplicates */
+         for (;;++i) {
+            if (!STBTT__COMPARE(&p[i], &p[0])) break;
+         }
+         for (;;--j) {
+            if (!STBTT__COMPARE(&p[0], &p[j])) break;
+         }
+         /* make sure we haven't crossed */
+         if (i >= j) break;
+         t = p[i];
+         p[i] = p[j];
+         p[j] = t;
+
+         ++i;
+         --j;
+      }
+      /* recurse on smaller side, iterate on larger */
+      if (j < (n-i)) {
+         stbtt__sort_edges_quicksort(p,j);
+         p = p+i;
+         n = n-i;
+      } else {
+         stbtt__sort_edges_quicksort(p+i, n-i);
+         n = j;
+      }
+   }
+}
+
+static void stbtt__sort_edges(stbtt__edge *p, int n)
+{
+   stbtt__sort_edges_quicksort(p, n);
+   stbtt__sort_edges_ins_sort(p, n);
+}
+
+typedef struct
+{
+   float x,y;
+} stbtt__point;
+
+static void stbtt__rasterize(stbtt__bitmap *result, stbtt__point *pts, int *wcount, int windings, float scale_x, float scale_y, float shift_x, float shift_y, int off_x, int off_y, int invert, void *userdata)
+{
+   float y_scale_inv = invert ? -scale_y : scale_y;
+   stbtt__edge *e;
+   int n,i,j,k,m;
+#if STBTT_RASTERIZER_VERSION == 1
+   int vsubsample = result->h < 8 ? 15 : 5;
+#elif STBTT_RASTERIZER_VERSION == 2
+   int vsubsample = 1;
+#else
+   #error "Unrecognized value of STBTT_RASTERIZER_VERSION"
+#endif
+   // vsubsample should divide 255 evenly; otherwise we won't reach full opacity
+
+   // now we have to blow out the windings into explicit edge lists
+   n = 0;
+   for (i=0; i < windings; ++i)
+      n += wcount[i];
+
+   e = (stbtt__edge *) STBTT_malloc(sizeof(*e) * (n+1), userdata); // add an extra one as a sentinel
+   if (e == 0) return;
+   n = 0;
+
+   m=0;
+   for (i=0; i < windings; ++i) {
+      stbtt__point *p = pts + m;
+      m += wcount[i];
+      j = wcount[i]-1;
+      for (k=0; k < wcount[i]; j=k++) {
+         int a=k,b=j;
+         // skip the edge if horizontal
+         if (p[j].y == p[k].y)
+            continue;
+         // add edge from j to k to the list
+         e[n].invert = 0;
+         if (invert ? p[j].y > p[k].y : p[j].y < p[k].y) {
+            e[n].invert = 1;
+            a=j,b=k;
+         }
+         e[n].x0 = p[a].x * scale_x + shift_x;
+         e[n].y0 = (p[a].y * y_scale_inv + shift_y) * vsubsample;
+         e[n].x1 = p[b].x * scale_x + shift_x;
+         e[n].y1 = (p[b].y * y_scale_inv + shift_y) * vsubsample;
+         ++n;
+      }
+   }
+
+   // now sort the edges by their highest point (should snap to integer, and then by x)
+   //STBTT_sort(e, n, sizeof(e[0]), stbtt__edge_compare);
+   stbtt__sort_edges(e, n);
+
+   // now, traverse the scanlines and find the intersections on each scanline, use xor winding rule
+   stbtt__rasterize_sorted_edges(result, e, n, vsubsample, off_x, off_y, userdata);
+
+   STBTT_free(e, userdata);
+}
+
+static void stbtt__add_point(stbtt__point *points, int n, float x, float y)
+{
+   if (!points) return; // during first pass, it's unallocated
+   points[n].x = x;
+   points[n].y = y;
+}
+
+// tessellate until threshold p is happy... @TODO warped to compensate for non-linear stretching
+static int stbtt__tesselate_curve(stbtt__point *points, int *num_points, float x0, float y0, float x1, float y1, float x2, float y2, float objspace_flatness_squared, int n)
+{
+   // midpoint
+   float mx = (x0 + 2*x1 + x2)/4;
+   float my = (y0 + 2*y1 + y2)/4;
+   // versus directly drawn line
+   float dx = (x0+x2)/2 - mx;
+   float dy = (y0+y2)/2 - my;
+   if (n > 16) // 65536 segments on one curve better be enough!
+      return 1;
+   if (dx*dx+dy*dy > objspace_flatness_squared) { // half-pixel error allowed... need to be smaller if AA
+      stbtt__tesselate_curve(points, num_points, x0,y0, (x0+x1)/2.0f,(y0+y1)/2.0f, mx,my, objspace_flatness_squared,n+1);
+      stbtt__tesselate_curve(points, num_points, mx,my, (x1+x2)/2.0f,(y1+y2)/2.0f, x2,y2, objspace_flatness_squared,n+1);
+   } else {
+      stbtt__add_point(points, *num_points,x2,y2);
+      *num_points = *num_points+1;
+   }
+   return 1;
+}
+
+static void stbtt__tesselate_cubic(stbtt__point *points, int *num_points, float x0, float y0, float x1, float y1, float x2, float y2, float x3, float y3, float objspace_flatness_squared, int n)
+{
+   // @TODO this "flatness" calculation is just made-up nonsense that seems to work well enough
+   float dx0 = x1-x0;
+   float dy0 = y1-y0;
+   float dx1 = x2-x1;
+   float dy1 = y2-y1;
+   float dx2 = x3-x2;
+   float dy2 = y3-y2;
+   float dx = x3-x0;
+   float dy = y3-y0;
+   float longlen = (float) (STBTT_sqrt(dx0*dx0+dy0*dy0)+STBTT_sqrt(dx1*dx1+dy1*dy1)+STBTT_sqrt(dx2*dx2+dy2*dy2));
+   float shortlen = (float) STBTT_sqrt(dx*dx+dy*dy);
+   float flatness_squared = longlen*longlen-shortlen*shortlen;
+
+   if (n > 16) // 65536 segments on one curve better be enough!
+      return;
+
+   if (flatness_squared > objspace_flatness_squared) {
+      float x01 = (x0+x1)/2;
+      float y01 = (y0+y1)/2;
+      float x12 = (x1+x2)/2;
+      float y12 = (y1+y2)/2;
+      float x23 = (x2+x3)/2;
+      float y23 = (y2+y3)/2;
+
+      float xa = (x01+x12)/2;
+      float ya = (y01+y12)/2;
+      float xb = (x12+x23)/2;
+      float yb = (y12+y23)/2;
+
+      float mx = (xa+xb)/2;
+      float my = (ya+yb)/2;
+
+      stbtt__tesselate_cubic(points, num_points, x0,y0, x01,y01, xa,ya, mx,my, objspace_flatness_squared,n+1);
+      stbtt__tesselate_cubic(points, num_points, mx,my, xb,yb, x23,y23, x3,y3, objspace_flatness_squared,n+1);
+   } else {
+      stbtt__add_point(points, *num_points,x3,y3);
+      *num_points = *num_points+1;
+   }
+}
+
+// returns number of contours
+static stbtt__point *stbtt_FlattenCurves(stbtt_vertex *vertices, int num_verts, float objspace_flatness, int **contour_lengths, int *num_contours, void *userdata)
+{
+   stbtt__point *points=0;
+   int num_points=0;
+
+   float objspace_flatness_squared = objspace_flatness * objspace_flatness;
+   int i,n=0,start=0, pass;
+
+   // count how many "moves" there are to get the contour count
+   for (i=0; i < num_verts; ++i)
+      if (vertices[i].type == STBTT_vmove)
+         ++n;
+
+   *num_contours = n;
+   if (n == 0) return 0;
+
+   *contour_lengths = (int *) STBTT_malloc(sizeof(**contour_lengths) * n, userdata);
+
+   if (*contour_lengths == 0) {
+      *num_contours = 0;
+      return 0;
+   }
+
+   // make two passes through the points so we don't need to realloc
+   for (pass=0; pass < 2; ++pass) {
+      float x=0,y=0;
+      if (pass == 1) {
+         points = (stbtt__point *) STBTT_malloc(num_points * sizeof(points[0]), userdata);
+         if (points == NULL) goto error;
+      }
+      num_points = 0;
+      n= -1;
+      for (i=0; i < num_verts; ++i) {
+         switch (vertices[i].type) {
+            case STBTT_vmove:
+               // start the next contour
+               if (n >= 0)
+                  (*contour_lengths)[n] = num_points - start;
+               ++n;
+               start = num_points;
+
+               x = vertices[i].x, y = vertices[i].y;
+               stbtt__add_point(points, num_points++, x,y);
+               break;
+            case STBTT_vline:
+               x = vertices[i].x, y = vertices[i].y;
+               stbtt__add_point(points, num_points++, x, y);
+               break;
+            case STBTT_vcurve:
+               stbtt__tesselate_curve(points, &num_points, x,y,
+                                        vertices[i].cx, vertices[i].cy,
+                                        vertices[i].x,  vertices[i].y,
+                                        objspace_flatness_squared, 0);
+               x = vertices[i].x, y = vertices[i].y;
+               break;
+            case STBTT_vcubic:
+               stbtt__tesselate_cubic(points, &num_points, x,y,
+                                        vertices[i].cx, vertices[i].cy,
+                                        vertices[i].cx1, vertices[i].cy1,
+                                        vertices[i].x,  vertices[i].y,
+                                        objspace_flatness_squared, 0);
+               x = vertices[i].x, y = vertices[i].y;
+               break;
+         }
+      }
+      (*contour_lengths)[n] = num_points - start;
+   }
+
+   return points;
+error:
+   STBTT_free(points, userdata);
+   STBTT_free(*contour_lengths, userdata);
+   *contour_lengths = 0;
+   *num_contours = 0;
+   return NULL;
+}
+
+STBTT_DEF void stbtt_Rasterize(stbtt__bitmap *result, float flatness_in_pixels, stbtt_vertex *vertices, int num_verts, float scale_x, float scale_y, float shift_x, float shift_y, int x_off, int y_off, int invert, void *userdata)
+{
+   float scale            = scale_x > scale_y ? scale_y : scale_x;
+   int winding_count      = 0;
+   int *winding_lengths   = NULL;
+   stbtt__point *windings = stbtt_FlattenCurves(vertices, num_verts, flatness_in_pixels / scale, &winding_lengths, &winding_count, userdata);
+   if (windings) {
+      stbtt__rasterize(result, windings, winding_lengths, winding_count, scale_x, scale_y, shift_x, shift_y, x_off, y_off, invert, userdata);
+      STBTT_free(winding_lengths, userdata);
+      STBTT_free(windings, userdata);
+   }
+}
+
+STBTT_DEF void stbtt_FreeBitmap(unsigned char *bitmap, void *userdata)
+{
+   STBTT_free(bitmap, userdata);
+}
+
+STBTT_DEF unsigned char *stbtt_GetGlyphBitmapSubpixel(const stbtt_fontinfo *info, float scale_x, float scale_y, float shift_x, float shift_y, int glyph, int *width, int *height, int *xoff, int *yoff)
+{
+   int ix0,iy0,ix1,iy1;
+   stbtt__bitmap gbm;
+   stbtt_vertex *vertices;
+   int num_verts = stbtt_GetGlyphShape(info, glyph, &vertices);
+
+   if (scale_x == 0) scale_x = scale_y;
+   if (scale_y == 0) {
+      if (scale_x == 0) {
+         STBTT_free(vertices, info->userdata);
+         return NULL;
+      }
+      scale_y = scale_x;
+   }
+
+   stbtt_GetGlyphBitmapBoxSubpixel(info, glyph, scale_x, scale_y, shift_x, shift_y, &ix0,&iy0,&ix1,&iy1);
+
+   // now we get the size
+   gbm.w = (ix1 - ix0);
+   gbm.h = (iy1 - iy0);
+   gbm.pixels = NULL; // in case we error
+
+   if (width ) *width  = gbm.w;
+   if (height) *height = gbm.h;
+   if (xoff  ) *xoff   = ix0;
+   if (yoff  ) *yoff   = iy0;
+
+   if (gbm.w && gbm.h) {
+      gbm.pixels = (unsigned char *) STBTT_malloc(gbm.w * gbm.h, info->userdata);
+      if (gbm.pixels) {
+         gbm.stride = gbm.w;
+
+         stbtt_Rasterize(&gbm, 0.35f, vertices, num_verts, scale_x, scale_y, shift_x, shift_y, ix0, iy0, 1, info->userdata);
+      }
+   }
+   STBTT_free(vertices, info->userdata);
+   return gbm.pixels;
+}
+
+STBTT_DEF unsigned char *stbtt_GetGlyphBitmap(const stbtt_fontinfo *info, float scale_x, float scale_y, int glyph, int *width, int *height, int *xoff, int *yoff)
+{
+   return stbtt_GetGlyphBitmapSubpixel(info, scale_x, scale_y, 0.0f, 0.0f, glyph, width, height, xoff, yoff);
+}
+
+STBTT_DEF void stbtt_MakeGlyphBitmapSubpixel(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int glyph)
+{
+   int ix0,iy0;
+   stbtt_vertex *vertices;
+   int num_verts = stbtt_GetGlyphShape(info, glyph, &vertices);
+   stbtt__bitmap gbm;
+
+   stbtt_GetGlyphBitmapBoxSubpixel(info, glyph, scale_x, scale_y, shift_x, shift_y, &ix0,&iy0,0,0);
+   gbm.pixels = output;
+   gbm.w = out_w;
+   gbm.h = out_h;
+   gbm.stride = out_stride;
+
+   if (gbm.w && gbm.h)
+      stbtt_Rasterize(&gbm, 0.35f, vertices, num_verts, scale_x, scale_y, shift_x, shift_y, ix0,iy0, 1, info->userdata);
+
+   STBTT_free(vertices, info->userdata);
+}
+
+STBTT_DEF void stbtt_MakeGlyphBitmap(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, int glyph)
+{
+   stbtt_MakeGlyphBitmapSubpixel(info, output, out_w, out_h, out_stride, scale_x, scale_y, 0.0f,0.0f, glyph);
+}
+
+STBTT_DEF unsigned char *stbtt_GetCodepointBitmapSubpixel(const stbtt_fontinfo *info, float scale_x, float scale_y, float shift_x, float shift_y, int codepoint, int *width, int *height, int *xoff, int *yoff)
+{
+   return stbtt_GetGlyphBitmapSubpixel(info, scale_x, scale_y,shift_x,shift_y, stbtt_FindGlyphIndex(info,codepoint), width,height,xoff,yoff);
+}
+
+STBTT_DEF void stbtt_MakeCodepointBitmapSubpixelPrefilter(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int oversample_x, int oversample_y, float *sub_x, float *sub_y, int codepoint)
+{
+   stbtt_MakeGlyphBitmapSubpixelPrefilter(info, output, out_w, out_h, out_stride, scale_x, scale_y, shift_x, shift_y, oversample_x, oversample_y, sub_x, sub_y, stbtt_FindGlyphIndex(info,codepoint));
+}
+
+STBTT_DEF void stbtt_MakeCodepointBitmapSubpixel(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int codepoint)
+{
+   stbtt_MakeGlyphBitmapSubpixel(info, output, out_w, out_h, out_stride, scale_x, scale_y, shift_x, shift_y, stbtt_FindGlyphIndex(info,codepoint));
+}
+
+STBTT_DEF unsigned char *stbtt_GetCodepointBitmap(const stbtt_fontinfo *info, float scale_x, float scale_y, int codepoint, int *width, int *height, int *xoff, int *yoff)
+{
+   return stbtt_GetCodepointBitmapSubpixel(info, scale_x, scale_y, 0.0f,0.0f, codepoint, width,height,xoff,yoff);
+}
+
+STBTT_DEF void stbtt_MakeCodepointBitmap(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, int codepoint)
+{
+   stbtt_MakeCodepointBitmapSubpixel(info, output, out_w, out_h, out_stride, scale_x, scale_y, 0.0f,0.0f, codepoint);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// bitmap baking
+//
+// This is SUPER-CRAPPY packing to keep source code small
+
+static int stbtt_BakeFontBitmap_internal(unsigned char *data, int offset,  // font location (use offset=0 for plain .ttf)
+                                float pixel_height,                     // height of font in pixels
+                                unsigned char *pixels, int pw, int ph,  // bitmap to be filled in
+                                int first_char, int num_chars,          // characters to bake
+                                stbtt_bakedchar *chardata)
+{
+   float scale;
+   int x,y,bottom_y, i;
+   stbtt_fontinfo f;
+   f.userdata = NULL;
+   if (!stbtt_InitFont(&f, data, offset))
+      return -1;
+   STBTT_memset(pixels, 0, pw*ph); // background of 0 around pixels
+   x=y=1;
+   bottom_y = 1;
+
+   scale = stbtt_ScaleForPixelHeight(&f, pixel_height);
+
+   for (i=0; i < num_chars; ++i) {
+      int advance, lsb, x0,y0,x1,y1,gw,gh;
+      int g = stbtt_FindGlyphIndex(&f, first_char + i);
+      stbtt_GetGlyphHMetrics(&f, g, &advance, &lsb);
+      stbtt_GetGlyphBitmapBox(&f, g, scale,scale, &x0,&y0,&x1,&y1);
+      gw = x1-x0;
+      gh = y1-y0;
+      if (x + gw + 1 >= pw)
+         y = bottom_y, x = 1; // advance to next row
+      if (y + gh + 1 >= ph) // check if it fits vertically AFTER potentially moving to next row
+         return -i;
+      STBTT_assert(x+gw < pw);
+      STBTT_assert(y+gh < ph);
+      stbtt_MakeGlyphBitmap(&f, pixels+x+y*pw, gw,gh,pw, scale,scale, g);
+      chardata[i].x0 = (stbtt_int16) x;
+      chardata[i].y0 = (stbtt_int16) y;
+      chardata[i].x1 = (stbtt_int16) (x + gw);
+      chardata[i].y1 = (stbtt_int16) (y + gh);
+      chardata[i].xadvance = scale * advance;
+      chardata[i].xoff     = (float) x0;
+      chardata[i].yoff     = (float) y0;
+      x = x + gw + 1;
+      if (y+gh+1 > bottom_y)
+         bottom_y = y+gh+1;
+   }
+   return bottom_y;
+}
+
+STBTT_DEF void stbtt_GetBakedQuad(const stbtt_bakedchar *chardata, int pw, int ph, int char_index, float *xpos, float *ypos, stbtt_aligned_quad *q, int opengl_fillrule)
+{
+   float d3d_bias = opengl_fillrule ? 0 : -0.5f;
+   float ipw = 1.0f / pw, iph = 1.0f / ph;
+   const stbtt_bakedchar *b = chardata + char_index;
+   int round_x = STBTT_ifloor((*xpos + b->xoff) + 0.5f);
+   int round_y = STBTT_ifloor((*ypos + b->yoff) + 0.5f);
+
+   q->x0 = round_x + d3d_bias;
+   q->y0 = round_y + d3d_bias;
+   q->x1 = round_x + b->x1 - b->x0 + d3d_bias;
+   q->y1 = round_y + b->y1 - b->y0 + d3d_bias;
+
+   q->s0 = b->x0 * ipw;
+   q->t0 = b->y0 * iph;
+   q->s1 = b->x1 * ipw;
+   q->t1 = b->y1 * iph;
+
+   *xpos += b->xadvance;
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// rectangle packing replacement routines if you don't have stb_rect_pack.h
+//
+
+#ifndef STB_RECT_PACK_VERSION
+
+typedef int stbrp_coord;
+
+////////////////////////////////////////////////////////////////////////////////////
+//                                                                                //
+//                                                                                //
+// COMPILER WARNING ?!?!?                                                         //
+//                                                                                //
+//                                                                                //
+// if you get a compile warning due to these symbols being defined more than      //
+// once, move #include "stb_rect_pack.h" before #include "stb_truetype.h"         //
+//                                                                                //
+////////////////////////////////////////////////////////////////////////////////////
+
+typedef struct
+{
+   int width,height;
+   int x,y,bottom_y;
+} stbrp_context;
+
+typedef struct
+{
+   unsigned char x;
+} stbrp_node;
+
+struct stbrp_rect
+{
+   stbrp_coord x,y;
+   int id,w,h,was_packed;
+};
+
+static void stbrp_init_target(stbrp_context *con, int pw, int ph, stbrp_node *nodes, int num_nodes)
+{
+   con->width  = pw;
+   con->height = ph;
+   con->x = 0;
+   con->y = 0;
+   con->bottom_y = 0;
+   STBTT__NOTUSED(nodes);
+   STBTT__NOTUSED(num_nodes);
+}
+
+static void stbrp_pack_rects(stbrp_context *con, stbrp_rect *rects, int num_rects)
+{
+   int i;
+   for (i=0; i < num_rects; ++i) {
+      if (con->x + rects[i].w > con->width) {
+         con->x = 0;
+         con->y = con->bottom_y;
+      }
+      if (con->y + rects[i].h > con->height)
+         break;
+      rects[i].x = con->x;
+      rects[i].y = con->y;
+      rects[i].was_packed = 1;
+      con->x += rects[i].w;
+      if (con->y + rects[i].h > con->bottom_y)
+         con->bottom_y = con->y + rects[i].h;
+   }
+   for (   ; i < num_rects; ++i)
+      rects[i].was_packed = 0;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// bitmap baking
+//
+// This is SUPER-AWESOME (tm Ryan Gordon) packing using stb_rect_pack.h. If
+// stb_rect_pack.h isn't available, it uses the BakeFontBitmap strategy.
+
+STBTT_DEF int stbtt_PackBegin(stbtt_pack_context *spc, unsigned char *pixels, int pw, int ph, int stride_in_bytes, int padding, void *alloc_context)
+{
+   stbrp_context *context = (stbrp_context *) STBTT_malloc(sizeof(*context)            ,alloc_context);
+   int            num_nodes = pw - padding;
+   stbrp_node    *nodes   = (stbrp_node    *) STBTT_malloc(sizeof(*nodes  ) * num_nodes,alloc_context);
+
+   if (context == NULL || nodes == NULL) {
+      if (context != NULL) STBTT_free(context, alloc_context);
+      if (nodes   != NULL) STBTT_free(nodes  , alloc_context);
+      return 0;
+   }
+
+   spc->user_allocator_context = alloc_context;
+   spc->width = pw;
+   spc->height = ph;
+   spc->pixels = pixels;
+   spc->pack_info = context;
+   spc->nodes = nodes;
+   spc->padding = padding;
+   spc->stride_in_bytes = stride_in_bytes != 0 ? stride_in_bytes : pw;
+   spc->h_oversample = 1;
+   spc->v_oversample = 1;
+   spc->skip_missing = 0;
+
+   stbrp_init_target(context, pw-padding, ph-padding, nodes, num_nodes);
+
+   if (pixels)
+      STBTT_memset(pixels, 0, pw*ph); // background of 0 around pixels
+
+   return 1;
+}
+
+STBTT_DEF void stbtt_PackEnd  (stbtt_pack_context *spc)
+{
+   STBTT_free(spc->nodes    , spc->user_allocator_context);
+   STBTT_free(spc->pack_info, spc->user_allocator_context);
+}
+
+STBTT_DEF void stbtt_PackSetOversampling(stbtt_pack_context *spc, unsigned int h_oversample, unsigned int v_oversample)
+{
+   STBTT_assert(h_oversample <= STBTT_MAX_OVERSAMPLE);
+   STBTT_assert(v_oversample <= STBTT_MAX_OVERSAMPLE);
+   if (h_oversample <= STBTT_MAX_OVERSAMPLE)
+      spc->h_oversample = h_oversample;
+   if (v_oversample <= STBTT_MAX_OVERSAMPLE)
+      spc->v_oversample = v_oversample;
+}
+
+STBTT_DEF void stbtt_PackSetSkipMissingCodepoints(stbtt_pack_context *spc, int skip)
+{
+   spc->skip_missing = skip;
+}
+
+#define STBTT__OVER_MASK  (STBTT_MAX_OVERSAMPLE-1)
+
+static void stbtt__h_prefilter(unsigned char *pixels, int w, int h, int stride_in_bytes, unsigned int kernel_width)
+{
+   unsigned char buffer[STBTT_MAX_OVERSAMPLE];
+   int safe_w = w - kernel_width;
+   int j;
+   STBTT_memset(buffer, 0, STBTT_MAX_OVERSAMPLE); // suppress bogus warning from VS2013 -analyze
+   for (j=0; j < h; ++j) {
+      int i;
+      unsigned int total;
+      STBTT_memset(buffer, 0, kernel_width);
+
+      total = 0;
+
+      // make kernel_width a constant in common cases so compiler can optimize out the divide
+      switch (kernel_width) {
+         case 2:
+            for (i=0; i <= safe_w; ++i) {
+               total += pixels[i] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i];
+               pixels[i] = (unsigned char) (total / 2);
+            }
+            break;
+         case 3:
+            for (i=0; i <= safe_w; ++i) {
+               total += pixels[i] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i];
+               pixels[i] = (unsigned char) (total / 3);
+            }
+            break;
+         case 4:
+            for (i=0; i <= safe_w; ++i) {
+               total += pixels[i] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i];
+               pixels[i] = (unsigned char) (total / 4);
+            }
+            break;
+         case 5:
+            for (i=0; i <= safe_w; ++i) {
+               total += pixels[i] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i];
+               pixels[i] = (unsigned char) (total / 5);
+            }
+            break;
+         default:
+            for (i=0; i <= safe_w; ++i) {
+               total += pixels[i] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i];
+               pixels[i] = (unsigned char) (total / kernel_width);
+            }
+            break;
+      }
+
+      for (; i < w; ++i) {
+         STBTT_assert(pixels[i] == 0);
+         total -= buffer[i & STBTT__OVER_MASK];
+         pixels[i] = (unsigned char) (total / kernel_width);
+      }
+
+      pixels += stride_in_bytes;
+   }
+}
+
+static void stbtt__v_prefilter(unsigned char *pixels, int w, int h, int stride_in_bytes, unsigned int kernel_width)
+{
+   unsigned char buffer[STBTT_MAX_OVERSAMPLE];
+   int safe_h = h - kernel_width;
+   int j;
+   STBTT_memset(buffer, 0, STBTT_MAX_OVERSAMPLE); // suppress bogus warning from VS2013 -analyze
+   for (j=0; j < w; ++j) {
+      int i;
+      unsigned int total;
+      STBTT_memset(buffer, 0, kernel_width);
+
+      total = 0;
+
+      // make kernel_width a constant in common cases so compiler can optimize out the divide
+      switch (kernel_width) {
+         case 2:
+            for (i=0; i <= safe_h; ++i) {
+               total += pixels[i*stride_in_bytes] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i*stride_in_bytes];
+               pixels[i*stride_in_bytes] = (unsigned char) (total / 2);
+            }
+            break;
+         case 3:
+            for (i=0; i <= safe_h; ++i) {
+               total += pixels[i*stride_in_bytes] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i*stride_in_bytes];
+               pixels[i*stride_in_bytes] = (unsigned char) (total / 3);
+            }
+            break;
+         case 4:
+            for (i=0; i <= safe_h; ++i) {
+               total += pixels[i*stride_in_bytes] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i*stride_in_bytes];
+               pixels[i*stride_in_bytes] = (unsigned char) (total / 4);
+            }
+            break;
+         case 5:
+            for (i=0; i <= safe_h; ++i) {
+               total += pixels[i*stride_in_bytes] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i*stride_in_bytes];
+               pixels[i*stride_in_bytes] = (unsigned char) (total / 5);
+            }
+            break;
+         default:
+            for (i=0; i <= safe_h; ++i) {
+               total += pixels[i*stride_in_bytes] - buffer[i & STBTT__OVER_MASK];
+               buffer[(i+kernel_width) & STBTT__OVER_MASK] = pixels[i*stride_in_bytes];
+               pixels[i*stride_in_bytes] = (unsigned char) (total / kernel_width);
+            }
+            break;
+      }
+
+      for (; i < h; ++i) {
+         STBTT_assert(pixels[i*stride_in_bytes] == 0);
+         total -= buffer[i & STBTT__OVER_MASK];
+         pixels[i*stride_in_bytes] = (unsigned char) (total / kernel_width);
+      }
+
+      pixels += 1;
+   }
+}
+
+static float stbtt__oversample_shift(int oversample)
+{
+   if (!oversample)
+      return 0.0f;
+
+   // The prefilter is a box filter of width "oversample",
+   // which shifts phase by (oversample - 1)/2 pixels in
+   // oversampled space. We want to shift in the opposite
+   // direction to counter this.
+   return (float)-(oversample - 1) / (2.0f * (float)oversample);
+}
+
+// rects array must be big enough to accommodate all characters in the given ranges
+STBTT_DEF int stbtt_PackFontRangesGatherRects(stbtt_pack_context *spc, const stbtt_fontinfo *info, stbtt_pack_range *ranges, int num_ranges, stbrp_rect *rects)
+{
+   int i,j,k;
+   int missing_glyph_added = 0;
+
+   k=0;
+   for (i=0; i < num_ranges; ++i) {
+      float fh = ranges[i].font_size;
+      float scale = fh > 0 ? stbtt_ScaleForPixelHeight(info, fh) : stbtt_ScaleForMappingEmToPixels(info, -fh);
+      ranges[i].h_oversample = (unsigned char) spc->h_oversample;
+      ranges[i].v_oversample = (unsigned char) spc->v_oversample;
+      for (j=0; j < ranges[i].num_chars; ++j) {
+         int x0,y0,x1,y1;
+         int codepoint = ranges[i].array_of_unicode_codepoints == NULL ? ranges[i].first_unicode_codepoint_in_range + j : ranges[i].array_of_unicode_codepoints[j];
+         int glyph = stbtt_FindGlyphIndex(info, codepoint);
+         if (glyph == 0 && (spc->skip_missing || missing_glyph_added)) {
+            rects[k].w = rects[k].h = 0;
+         } else {
+            stbtt_GetGlyphBitmapBoxSubpixel(info,glyph,
+                                            scale * spc->h_oversample,
+                                            scale * spc->v_oversample,
+                                            0,0,
+                                            &x0,&y0,&x1,&y1);
+            rects[k].w = (stbrp_coord) (x1-x0 + spc->padding + spc->h_oversample-1);
+            rects[k].h = (stbrp_coord) (y1-y0 + spc->padding + spc->v_oversample-1);
+            if (glyph == 0)
+               missing_glyph_added = 1;
+         }
+         ++k;
+      }
+   }
+
+   return k;
+}
+
+STBTT_DEF void stbtt_MakeGlyphBitmapSubpixelPrefilter(const stbtt_fontinfo *info, unsigned char *output, int out_w, int out_h, int out_stride, float scale_x, float scale_y, float shift_x, float shift_y, int prefilter_x, int prefilter_y, float *sub_x, float *sub_y, int glyph)
+{
+   stbtt_MakeGlyphBitmapSubpixel(info,
+                                 output,
+                                 out_w - (prefilter_x - 1),
+                                 out_h - (prefilter_y - 1),
+                                 out_stride,
+                                 scale_x,
+                                 scale_y,
+                                 shift_x,
+                                 shift_y,
+                                 glyph);
+
+   if (prefilter_x > 1)
+      stbtt__h_prefilter(output, out_w, out_h, out_stride, prefilter_x);
+
+   if (prefilter_y > 1)
+      stbtt__v_prefilter(output, out_w, out_h, out_stride, prefilter_y);
+
+   *sub_x = stbtt__oversample_shift(prefilter_x);
+   *sub_y = stbtt__oversample_shift(prefilter_y);
+}
+
+// rects array must be big enough to accommodate all characters in the given ranges
+STBTT_DEF int stbtt_PackFontRangesRenderIntoRects(stbtt_pack_context *spc, const stbtt_fontinfo *info, stbtt_pack_range *ranges, int num_ranges, stbrp_rect *rects)
+{
+   int i,j,k, missing_glyph = -1, return_value = 1;
+
+   // save current values
+   int old_h_over = spc->h_oversample;
+   int old_v_over = spc->v_oversample;
+
+   k = 0;
+   for (i=0; i < num_ranges; ++i) {
+      float fh = ranges[i].font_size;
+      float scale = fh > 0 ? stbtt_ScaleForPixelHeight(info, fh) : stbtt_ScaleForMappingEmToPixels(info, -fh);
+      float recip_h,recip_v,sub_x,sub_y;
+      spc->h_oversample = ranges[i].h_oversample;
+      spc->v_oversample = ranges[i].v_oversample;
+      recip_h = 1.0f / spc->h_oversample;
+      recip_v = 1.0f / spc->v_oversample;
+      sub_x = stbtt__oversample_shift(spc->h_oversample);
+      sub_y = stbtt__oversample_shift(spc->v_oversample);
+      for (j=0; j < ranges[i].num_chars; ++j) {
+         stbrp_rect *r = &rects[k];
+         if (r->was_packed && r->w != 0 && r->h != 0) {
+            stbtt_packedchar *bc = &ranges[i].chardata_for_range[j];
+            int advance, lsb, x0,y0,x1,y1;
+            int codepoint = ranges[i].array_of_unicode_codepoints == NULL ? ranges[i].first_unicode_codepoint_in_range + j : ranges[i].array_of_unicode_codepoints[j];
+            int glyph = stbtt_FindGlyphIndex(info, codepoint);
+            stbrp_coord pad = (stbrp_coord) spc->padding;
+
+            // pad on left and top
+            r->x += pad;
+            r->y += pad;
+            r->w -= pad;
+            r->h -= pad;
+            stbtt_GetGlyphHMetrics(info, glyph, &advance, &lsb);
+            stbtt_GetGlyphBitmapBox(info, glyph,
+                                    scale * spc->h_oversample,
+                                    scale * spc->v_oversample,
+                                    &x0,&y0,&x1,&y1);
+            stbtt_MakeGlyphBitmapSubpixel(info,
+                                          spc->pixels + r->x + r->y*spc->stride_in_bytes,
+                                          r->w - spc->h_oversample+1,
+                                          r->h - spc->v_oversample+1,
+                                          spc->stride_in_bytes,
+                                          scale * spc->h_oversample,
+                                          scale * spc->v_oversample,
+                                          0,0,
+                                          glyph);
+
+            if (spc->h_oversample > 1)
+               stbtt__h_prefilter(spc->pixels + r->x + r->y*spc->stride_in_bytes,
+                                  r->w, r->h, spc->stride_in_bytes,
+                                  spc->h_oversample);
+
+            if (spc->v_oversample > 1)
+               stbtt__v_prefilter(spc->pixels + r->x + r->y*spc->stride_in_bytes,
+                                  r->w, r->h, spc->stride_in_bytes,
+                                  spc->v_oversample);
+
+            bc->x0       = (stbtt_int16)  r->x;
+            bc->y0       = (stbtt_int16)  r->y;
+            bc->x1       = (stbtt_int16) (r->x + r->w);
+            bc->y1       = (stbtt_int16) (r->y + r->h);
+            bc->xadvance =                scale * advance;
+            bc->xoff     =       (float)  x0 * recip_h + sub_x;
+            bc->yoff     =       (float)  y0 * recip_v + sub_y;
+            bc->xoff2    =                (x0 + r->w) * recip_h + sub_x;
+            bc->yoff2    =                (y0 + r->h) * recip_v + sub_y;
+
+            if (glyph == 0)
+               missing_glyph = j;
+         } else if (spc->skip_missing) {
+            return_value = 0;
+         } else if (r->was_packed && r->w == 0 && r->h == 0 && missing_glyph >= 0) {
+            ranges[i].chardata_for_range[j] = ranges[i].chardata_for_range[missing_glyph];
+         } else {
+            return_value = 0; // if any fail, report failure
+         }
+
+         ++k;
+      }
+   }
+
+   // restore original values
+   spc->h_oversample = old_h_over;
+   spc->v_oversample = old_v_over;
+
+   return return_value;
+}
+
+STBTT_DEF void stbtt_PackFontRangesPackRects(stbtt_pack_context *spc, stbrp_rect *rects, int num_rects)
+{
+   stbrp_pack_rects((stbrp_context *) spc->pack_info, rects, num_rects);
+}
+
+STBTT_DEF int stbtt_PackFontRanges(stbtt_pack_context *spc, const unsigned char *fontdata, int font_index, stbtt_pack_range *ranges, int num_ranges)
+{
+   stbtt_fontinfo info;
+   int i,j,n, return_value = 1;
+   //stbrp_context *context = (stbrp_context *) spc->pack_info;
+   stbrp_rect    *rects;
+
+   // flag all characters as NOT packed
+   for (i=0; i < num_ranges; ++i)
+      for (j=0; j < ranges[i].num_chars; ++j)
+         ranges[i].chardata_for_range[j].x0 =
+         ranges[i].chardata_for_range[j].y0 =
+         ranges[i].chardata_for_range[j].x1 =
+         ranges[i].chardata_for_range[j].y1 = 0;
+
+   n = 0;
+   for (i=0; i < num_ranges; ++i)
+      n += ranges[i].num_chars;
+
+   rects = (stbrp_rect *) STBTT_malloc(sizeof(*rects) * n, spc->user_allocator_context);
+   if (rects == NULL)
+      return 0;
+
+   info.userdata = spc->user_allocator_context;
+   stbtt_InitFont(&info, fontdata, stbtt_GetFontOffsetForIndex(fontdata,font_index));
+
+   n = stbtt_PackFontRangesGatherRects(spc, &info, ranges, num_ranges, rects);
+
+   stbtt_PackFontRangesPackRects(spc, rects, n);
+
+   return_value = stbtt_PackFontRangesRenderIntoRects(spc, &info, ranges, num_ranges, rects);
+
+   STBTT_free(rects, spc->user_allocator_context);
+   return return_value;
+}
+
+STBTT_DEF int stbtt_PackFontRange(stbtt_pack_context *spc, const unsigned char *fontdata, int font_index, float font_size,
+            int first_unicode_codepoint_in_range, int num_chars_in_range, stbtt_packedchar *chardata_for_range)
+{
+   stbtt_pack_range range;
+   range.first_unicode_codepoint_in_range = first_unicode_codepoint_in_range;
+   range.array_of_unicode_codepoints = NULL;
+   range.num_chars                   = num_chars_in_range;
+   range.chardata_for_range          = chardata_for_range;
+   range.font_size                   = font_size;
+   return stbtt_PackFontRanges(spc, fontdata, font_index, &range, 1);
+}
+
+STBTT_DEF void stbtt_GetScaledFontVMetrics(const unsigned char *fontdata, int index, float size, float *ascent, float *descent, float *lineGap)
+{
+   int i_ascent, i_descent, i_lineGap;
+   float scale;
+   stbtt_fontinfo info;
+   stbtt_InitFont(&info, fontdata, stbtt_GetFontOffsetForIndex(fontdata, index));
+   scale = size > 0 ? stbtt_ScaleForPixelHeight(&info, size) : stbtt_ScaleForMappingEmToPixels(&info, -size);
+   stbtt_GetFontVMetrics(&info, &i_ascent, &i_descent, &i_lineGap);
+   *ascent  = (float) i_ascent  * scale;
+   *descent = (float) i_descent * scale;
+   *lineGap = (float) i_lineGap * scale;
+}
+
+STBTT_DEF void stbtt_GetPackedQuad(const stbtt_packedchar *chardata, int pw, int ph, int char_index, float *xpos, float *ypos, stbtt_aligned_quad *q, int align_to_integer)
+{
+   float ipw = 1.0f / pw, iph = 1.0f / ph;
+   const stbtt_packedchar *b = chardata + char_index;
+
+   if (align_to_integer) {
+      float x = (float) STBTT_ifloor((*xpos + b->xoff) + 0.5f);
+      float y = (float) STBTT_ifloor((*ypos + b->yoff) + 0.5f);
+      q->x0 = x;
+      q->y0 = y;
+      q->x1 = x + b->xoff2 - b->xoff;
+      q->y1 = y + b->yoff2 - b->yoff;
+   } else {
+      q->x0 = *xpos + b->xoff;
+      q->y0 = *ypos + b->yoff;
+      q->x1 = *xpos + b->xoff2;
+      q->y1 = *ypos + b->yoff2;
+   }
+
+   q->s0 = b->x0 * ipw;
+   q->t0 = b->y0 * iph;
+   q->s1 = b->x1 * ipw;
+   q->t1 = b->y1 * iph;
+
+   *xpos += b->xadvance;
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// sdf computation
+//
+
+#define STBTT_min(a,b)  ((a) < (b) ? (a) : (b))
+#define STBTT_max(a,b)  ((a) < (b) ? (b) : (a))
+
+static int stbtt__ray_intersect_bezier(float orig[2], float ray[2], float q0[2], float q1[2], float q2[2], float hits[2][2])
+{
+   float q0perp = q0[1]*ray[0] - q0[0]*ray[1];
+   float q1perp = q1[1]*ray[0] - q1[0]*ray[1];
+   float q2perp = q2[1]*ray[0] - q2[0]*ray[1];
+   float roperp = orig[1]*ray[0] - orig[0]*ray[1];
+
+   float a = q0perp - 2*q1perp + q2perp;
+   float b = q1perp - q0perp;
+   float c = q0perp - roperp;
+
+   float s0 = 0., s1 = 0.;
+   int num_s = 0;
+
+   if (a != 0.0) {
+      float discr = b*b - a*c;
+      if (discr > 0.0) {
+         float rcpna = -1 / a;
+         float d = (float) STBTT_sqrt(discr);
+         s0 = (b+d) * rcpna;
+         s1 = (b-d) * rcpna;
+         if (s0 >= 0.0 && s0 <= 1.0)
+            num_s = 1;
+         if (d > 0.0 && s1 >= 0.0 && s1 <= 1.0) {
+            if (num_s == 0) s0 = s1;
+            ++num_s;
+         }
+      }
+   } else {
+      // 2*b*s + c = 0
+      // s = -c / (2*b)
+      s0 = c / (-2 * b);
+      if (s0 >= 0.0 && s0 <= 1.0)
+         num_s = 1;
+   }
+
+   if (num_s == 0)
+      return 0;
+   else {
+      float rcp_len2 = 1 / (ray[0]*ray[0] + ray[1]*ray[1]);
+      float rayn_x = ray[0] * rcp_len2, rayn_y = ray[1] * rcp_len2;
+
+      float q0d =   q0[0]*rayn_x +   q0[1]*rayn_y;
+      float q1d =   q1[0]*rayn_x +   q1[1]*rayn_y;
+      float q2d =   q2[0]*rayn_x +   q2[1]*rayn_y;
+      float rod = orig[0]*rayn_x + orig[1]*rayn_y;
+
+      float q10d = q1d - q0d;
+      float q20d = q2d - q0d;
+      float q0rd = q0d - rod;
+
+      hits[0][0] = q0rd + s0*(2.0f - 2.0f*s0)*q10d + s0*s0*q20d;
+      hits[0][1] = a*s0+b;
+
+      if (num_s > 1) {
+         hits[1][0] = q0rd + s1*(2.0f - 2.0f*s1)*q10d + s1*s1*q20d;
+         hits[1][1] = a*s1+b;
+         return 2;
+      } else {
+         return 1;
+      }
+   }
+}
+
+static int equal(float *a, float *b)
+{
+   return (a[0] == b[0] && a[1] == b[1]);
+}
+
+static int stbtt__compute_crossings_x(float x, float y, int nverts, stbtt_vertex *verts)
+{
+   int i;
+   float orig[2], ray[2] = { 1, 0 };
+   float y_frac;
+   int winding = 0;
+
+   // make sure y never passes through a vertex of the shape
+   y_frac = (float) STBTT_fmod(y, 1.0f);
+   if (y_frac < 0.01f)
+      y += 0.01f;
+   else if (y_frac > 0.99f)
+      y -= 0.01f;
+
+   orig[0] = x;
+   orig[1] = y;
+
+   // test a ray from (-infinity,y) to (x,y)
+   for (i=0; i < nverts; ++i) {
+      if (verts[i].type == STBTT_vline) {
+         int x0 = (int) verts[i-1].x, y0 = (int) verts[i-1].y;
+         int x1 = (int) verts[i  ].x, y1 = (int) verts[i  ].y;
+         if (y > STBTT_min(y0,y1) && y < STBTT_max(y0,y1) && x > STBTT_min(x0,x1)) {
+            float x_inter = (y - y0) / (y1 - y0) * (x1-x0) + x0;
+            if (x_inter < x)
+               winding += (y0 < y1) ? 1 : -1;
+         }
+      }
+      if (verts[i].type == STBTT_vcurve) {
+         int x0 = (int) verts[i-1].x , y0 = (int) verts[i-1].y ;
+         int x1 = (int) verts[i  ].cx, y1 = (int) verts[i  ].cy;
+         int x2 = (int) verts[i  ].x , y2 = (int) verts[i  ].y ;
+         int ax = STBTT_min(x0,STBTT_min(x1,x2)), ay = STBTT_min(y0,STBTT_min(y1,y2));
+         int by = STBTT_max(y0,STBTT_max(y1,y2));
+         if (y > ay && y < by && x > ax) {
+            float q0[2],q1[2],q2[2];
+            float hits[2][2];
+            q0[0] = (float)x0;
+            q0[1] = (float)y0;
+            q1[0] = (float)x1;
+            q1[1] = (float)y1;
+            q2[0] = (float)x2;
+            q2[1] = (float)y2;
+            if (equal(q0,q1) || equal(q1,q2)) {
+               x0 = (int)verts[i-1].x;
+               y0 = (int)verts[i-1].y;
+               x1 = (int)verts[i  ].x;
+               y1 = (int)verts[i  ].y;
+               if (y > STBTT_min(y0,y1) && y < STBTT_max(y0,y1) && x > STBTT_min(x0,x1)) {
+                  float x_inter = (y - y0) / (y1 - y0) * (x1-x0) + x0;
+                  if (x_inter < x)
+                     winding += (y0 < y1) ? 1 : -1;
+               }
+            } else {
+               int num_hits = stbtt__ray_intersect_bezier(orig, ray, q0, q1, q2, hits);
+               if (num_hits >= 1)
+                  if (hits[0][0] < 0)
+                     winding += (hits[0][1] < 0 ? -1 : 1);
+               if (num_hits >= 2)
+                  if (hits[1][0] < 0)
+                     winding += (hits[1][1] < 0 ? -1 : 1);
+            }
+         }
+      }
+   }
+   return winding;
+}
+
+static float stbtt__cuberoot( float x )
+{
+   if (x<0)
+      return -(float) STBTT_pow(-x,1.0f/3.0f);
+   else
+      return  (float) STBTT_pow( x,1.0f/3.0f);
+}
+
+// x^3 + a*x^2 + b*x + c = 0
+static int stbtt__solve_cubic(float a, float b, float c, float* r)
+{
+   float s = -a / 3;
+   float p = b - a*a / 3;
+   float q = a * (2*a*a - 9*b) / 27 + c;
+   float p3 = p*p*p;
+   float d = q*q + 4*p3 / 27;
+   if (d >= 0) {
+      float z = (float) STBTT_sqrt(d);
+      float u = (-q + z) / 2;
+      float v = (-q - z) / 2;
+      u = stbtt__cuberoot(u);
+      v = stbtt__cuberoot(v);
+      r[0] = s + u + v;
+      return 1;
+   } else {
+      float u = (float) STBTT_sqrt(-p/3);
+      float v = (float) STBTT_acos(-STBTT_sqrt(-27/p3) * q / 2) / 3; // p3 must be negative, since d is negative
+      float m = (float) STBTT_cos(v);
+      float n = (float) STBTT_cos(v-3.141592/2)*1.732050808f;
+      r[0] = s + u * 2 * m;
+      r[1] = s - u * (m + n);
+      r[2] = s - u * (m - n);
+
+      //STBTT_assert( STBTT_fabs(((r[0]+a)*r[0]+b)*r[0]+c) < 0.05f);  // these asserts may not be safe at all scales, though they're in bezier t parameter units so maybe?
+      //STBTT_assert( STBTT_fabs(((r[1]+a)*r[1]+b)*r[1]+c) < 0.05f);
+      //STBTT_assert( STBTT_fabs(((r[2]+a)*r[2]+b)*r[2]+c) < 0.05f);
+      return 3;
+   }
+}
+
+STBTT_DEF unsigned char * stbtt_GetGlyphSDF(const stbtt_fontinfo *info, float scale, int glyph, int padding, unsigned char onedge_value, float pixel_dist_scale, int *width, int *height, int *xoff, int *yoff)
+{
+   float scale_x = scale, scale_y = scale;
+   int ix0,iy0,ix1,iy1;
+   int w,h;
+   unsigned char *data;
+
+   if (scale == 0) return NULL;
+
+   stbtt_GetGlyphBitmapBoxSubpixel(info, glyph, scale, scale, 0.0f,0.0f, &ix0,&iy0,&ix1,&iy1);
+
+   // if empty, return NULL
+   if (ix0 == ix1 || iy0 == iy1)
+      return NULL;
+
+   ix0 -= padding;
+   iy0 -= padding;
+   ix1 += padding;
+   iy1 += padding;
+
+   w = (ix1 - ix0);
+   h = (iy1 - iy0);
+
+   if (width ) *width  = w;
+   if (height) *height = h;
+   if (xoff  ) *xoff   = ix0;
+   if (yoff  ) *yoff   = iy0;
+
+   // invert for y-downwards bitmaps
+   scale_y = -scale_y;
+
+   {
+      // distance from singular values (in the same units as the pixel grid)
+      const float eps = 1./1024, eps2 = eps*eps;
+      int x,y,i,j;
+      float *precompute;
+      stbtt_vertex *verts;
+      int num_verts = stbtt_GetGlyphShape(info, glyph, &verts);
+      data = (unsigned char *) STBTT_malloc(w * h, info->userdata);
+      precompute = (float *) STBTT_malloc(num_verts * sizeof(float), info->userdata);
+
+      for (i=0,j=num_verts-1; i < num_verts; j=i++) {
+         if (verts[i].type == STBTT_vline) {
+            float x0 = verts[i].x*scale_x, y0 = verts[i].y*scale_y;
+            float x1 = verts[j].x*scale_x, y1 = verts[j].y*scale_y;
+            float dist = (float) STBTT_sqrt((x1-x0)*(x1-x0) + (y1-y0)*(y1-y0));
+            precompute[i] = (dist < eps) ? 0.0f : 1.0f / dist;
+         } else if (verts[i].type == STBTT_vcurve) {
+            float x2 = verts[j].x *scale_x, y2 = verts[j].y *scale_y;
+            float x1 = verts[i].cx*scale_x, y1 = verts[i].cy*scale_y;
+            float x0 = verts[i].x *scale_x, y0 = verts[i].y *scale_y;
+            float bx = x0 - 2*x1 + x2, by = y0 - 2*y1 + y2;
+            float len2 = bx*bx + by*by;
+            if (len2 >= eps2)
+               precompute[i] = 1.0f / len2;
+            else
+               precompute[i] = 0.0f;
+         } else
+            precompute[i] = 0.0f;
+      }
+
+      for (y=iy0; y < iy1; ++y) {
+         for (x=ix0; x < ix1; ++x) {
+            float val;
+            float min_dist = 999999.0f;
+            float sx = (float) x + 0.5f;
+            float sy = (float) y + 0.5f;
+            float x_gspace = (sx / scale_x);
+            float y_gspace = (sy / scale_y);
+
+            int winding = stbtt__compute_crossings_x(x_gspace, y_gspace, num_verts, verts); // @OPTIMIZE: this could just be a rasterization, but needs to be line vs. non-tesselated curves so a new path
+
+            for (i=0; i < num_verts; ++i) {
+               float x0 = verts[i].x*scale_x, y0 = verts[i].y*scale_y;
+
+               if (verts[i].type == STBTT_vline && precompute[i] != 0.0f) {
+                  float x1 = verts[i-1].x*scale_x, y1 = verts[i-1].y*scale_y;
+
+                  float dist,dist2 = (x0-sx)*(x0-sx) + (y0-sy)*(y0-sy);
+                  if (dist2 < min_dist*min_dist)
+                     min_dist = (float) STBTT_sqrt(dist2);
+
+                  // coarse culling against bbox
+                  //if (sx > STBTT_min(x0,x1)-min_dist && sx < STBTT_max(x0,x1)+min_dist &&
+                  //    sy > STBTT_min(y0,y1)-min_dist && sy < STBTT_max(y0,y1)+min_dist)
+                  dist = (float) STBTT_fabs((x1-x0)*(y0-sy) - (y1-y0)*(x0-sx)) * precompute[i];
+                  STBTT_assert(i != 0);
+                  if (dist < min_dist) {
+                     // check position along line
+                     // x' = x0 + t*(x1-x0), y' = y0 + t*(y1-y0)
+                     // minimize (x'-sx)*(x'-sx)+(y'-sy)*(y'-sy)
+                     float dx = x1-x0, dy = y1-y0;
+                     float px = x0-sx, py = y0-sy;
+                     // minimize (px+t*dx)^2 + (py+t*dy)^2 = px*px + 2*px*dx*t + t^2*dx*dx + py*py + 2*py*dy*t + t^2*dy*dy
+                     // derivative: 2*px*dx + 2*py*dy + (2*dx*dx+2*dy*dy)*t, set to 0 and solve
+                     float t = -(px*dx + py*dy) / (dx*dx + dy*dy);
+                     if (t >= 0.0f && t <= 1.0f)
+                        min_dist = dist;
+                  }
+               } else if (verts[i].type == STBTT_vcurve) {
+                  float x2 = verts[i-1].x *scale_x, y2 = verts[i-1].y *scale_y;
+                  float x1 = verts[i  ].cx*scale_x, y1 = verts[i  ].cy*scale_y;
+                  float box_x0 = STBTT_min(STBTT_min(x0,x1),x2);
+                  float box_y0 = STBTT_min(STBTT_min(y0,y1),y2);
+                  float box_x1 = STBTT_max(STBTT_max(x0,x1),x2);
+                  float box_y1 = STBTT_max(STBTT_max(y0,y1),y2);
+                  // coarse culling against bbox to avoid computing cubic unnecessarily
+                  if (sx > box_x0-min_dist && sx < box_x1+min_dist && sy > box_y0-min_dist && sy < box_y1+min_dist) {
+                     int num=0;
+                     float ax = x1-x0, ay = y1-y0;
+                     float bx = x0 - 2*x1 + x2, by = y0 - 2*y1 + y2;
+                     float mx = x0 - sx, my = y0 - sy;
+                     float res[3] = {0.f,0.f,0.f};
+                     float px,py,t,it,dist2;
+                     float a_inv = precompute[i];
+                     if (a_inv == 0.0) { // if a_inv is 0, it's 2nd degree so use quadratic formula
+                        float a = 3*(ax*bx + ay*by);
+                        float b = 2*(ax*ax + ay*ay) + (mx*bx+my*by);
+                        float c = mx*ax+my*ay;
+                        if (STBTT_fabs(a) < eps2) { // if a is 0, it's linear
+                           if (STBTT_fabs(b) >= eps2) {
+                              res[num++] = -c/b;
+                           }
+                        } else {
+                           float discriminant = b*b - 4*a*c;
+                           if (discriminant < 0)
+                              num = 0;
+                           else {
+                              float root = (float) STBTT_sqrt(discriminant);
+                              res[0] = (-b - root)/(2*a);
+                              res[1] = (-b + root)/(2*a);
+                              num = 2; // don't bother distinguishing 1-solution case, as code below will still work
+                           }
+                        }
+                     } else {
+                        float b = 3*(ax*bx + ay*by) * a_inv; // could precompute this as it doesn't depend on sample point
+                        float c = (2*(ax*ax + ay*ay) + (mx*bx+my*by)) * a_inv;
+                        float d = (mx*ax+my*ay) * a_inv;
+                        num = stbtt__solve_cubic(b, c, d, res);
+                     }
+                     dist2 = (x0-sx)*(x0-sx) + (y0-sy)*(y0-sy);
+                     if (dist2 < min_dist*min_dist)
+                        min_dist = (float) STBTT_sqrt(dist2);
+
+                     if (num >= 1 && res[0] >= 0.0f && res[0] <= 1.0f) {
+                        t = res[0], it = 1.0f - t;
+                        px = it*it*x0 + 2*t*it*x1 + t*t*x2;
+                        py = it*it*y0 + 2*t*it*y1 + t*t*y2;
+                        dist2 = (px-sx)*(px-sx) + (py-sy)*(py-sy);
+                        if (dist2 < min_dist * min_dist)
+                           min_dist = (float) STBTT_sqrt(dist2);
+                     }
+                     if (num >= 2 && res[1] >= 0.0f && res[1] <= 1.0f) {
+                        t = res[1], it = 1.0f - t;
+                        px = it*it*x0 + 2*t*it*x1 + t*t*x2;
+                        py = it*it*y0 + 2*t*it*y1 + t*t*y2;
+                        dist2 = (px-sx)*(px-sx) + (py-sy)*(py-sy);
+                        if (dist2 < min_dist * min_dist)
+                           min_dist = (float) STBTT_sqrt(dist2);
+                     }
+                     if (num >= 3 && res[2] >= 0.0f && res[2] <= 1.0f) {
+                        t = res[2], it = 1.0f - t;
+                        px = it*it*x0 + 2*t*it*x1 + t*t*x2;
+                        py = it*it*y0 + 2*t*it*y1 + t*t*y2;
+                        dist2 = (px-sx)*(px-sx) + (py-sy)*(py-sy);
+                        if (dist2 < min_dist * min_dist)
+                           min_dist = (float) STBTT_sqrt(dist2);
+                     }
+                  }
+               }
+            }
+            if (winding == 0)
+               min_dist = -min_dist;  // if outside the shape, value is negative
+            val = onedge_value + pixel_dist_scale * min_dist;
+            if (val < 0)
+               val = 0;
+            else if (val > 255)
+               val = 255;
+            data[(y-iy0)*w+(x-ix0)] = (unsigned char) val;
+         }
+      }
+      STBTT_free(precompute, info->userdata);
+      STBTT_free(verts, info->userdata);
+   }
+   return data;
+}
+
+STBTT_DEF unsigned char * stbtt_GetCodepointSDF(const stbtt_fontinfo *info, float scale, int codepoint, int padding, unsigned char onedge_value, float pixel_dist_scale, int *width, int *height, int *xoff, int *yoff)
+{
+   return stbtt_GetGlyphSDF(info, scale, stbtt_FindGlyphIndex(info, codepoint), padding, onedge_value, pixel_dist_scale, width, height, xoff, yoff);
+}
+
+STBTT_DEF void stbtt_FreeSDF(unsigned char *bitmap, void *userdata)
+{
+   STBTT_free(bitmap, userdata);
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// font name matching -- recommended not to use this
+//
+
+// check if a utf8 string contains a prefix which is the utf16 string; if so return length of matching utf8 string
+static stbtt_int32 stbtt__CompareUTF8toUTF16_bigendian_prefix(stbtt_uint8 *s1, stbtt_int32 len1, stbtt_uint8 *s2, stbtt_int32 len2)
+{
+   stbtt_int32 i=0;
+
+   // convert utf16 to utf8 and compare the results while converting
+   while (len2) {
+      stbtt_uint16 ch = s2[0]*256 + s2[1];
+      if (ch < 0x80) {
+         if (i >= len1) return -1;
+         if (s1[i++] != ch) return -1;
+      } else if (ch < 0x800) {
+         if (i+1 >= len1) return -1;
+         if (s1[i++] != 0xc0 + (ch >> 6)) return -1;
+         if (s1[i++] != 0x80 + (ch & 0x3f)) return -1;
+      } else if (ch >= 0xd800 && ch < 0xdc00) {
+         stbtt_uint32 c;
+         stbtt_uint16 ch2 = s2[2]*256 + s2[3];
+         if (i+3 >= len1) return -1;
+         c = ((ch - 0xd800) << 10) + (ch2 - 0xdc00) + 0x10000;
+         if (s1[i++] != 0xf0 + (c >> 18)) return -1;
+         if (s1[i++] != 0x80 + ((c >> 12) & 0x3f)) return -1;
+         if (s1[i++] != 0x80 + ((c >>  6) & 0x3f)) return -1;
+         if (s1[i++] != 0x80 + ((c      ) & 0x3f)) return -1;
+         s2 += 2; // plus another 2 below
+         len2 -= 2;
+      } else if (ch >= 0xdc00 && ch < 0xe000) {
+         return -1;
+      } else {
+         if (i+2 >= len1) return -1;
+         if (s1[i++] != 0xe0 + (ch >> 12)) return -1;
+         if (s1[i++] != 0x80 + ((ch >> 6) & 0x3f)) return -1;
+         if (s1[i++] != 0x80 + ((ch     ) & 0x3f)) return -1;
+      }
+      s2 += 2;
+      len2 -= 2;
+   }
+   return i;
+}
+
+static int stbtt_CompareUTF8toUTF16_bigendian_internal(char *s1, int len1, char *s2, int len2)
+{
+   return len1 == stbtt__CompareUTF8toUTF16_bigendian_prefix((stbtt_uint8*) s1, len1, (stbtt_uint8*) s2, len2);
+}
+
+// returns results in whatever encoding you request... but note that 2-byte encodings
+// will be BIG-ENDIAN... use stbtt_CompareUTF8toUTF16_bigendian() to compare
+STBTT_DEF const char *stbtt_GetFontNameString(const stbtt_fontinfo *font, int *length, int platformID, int encodingID, int languageID, int nameID)
+{
+   stbtt_int32 i,count,stringOffset;
+   stbtt_uint8 *fc = font->data;
+   stbtt_uint32 offset = font->fontstart;
+   stbtt_uint32 nm = stbtt__find_table(fc, offset, "name");
+   if (!nm) return NULL;
+
+   count = ttUSHORT(fc+nm+2);
+   stringOffset = nm + ttUSHORT(fc+nm+4);
+   for (i=0; i < count; ++i) {
+      stbtt_uint32 loc = nm + 6 + 12 * i;
+      if (platformID == ttUSHORT(fc+loc+0) && encodingID == ttUSHORT(fc+loc+2)
+          && languageID == ttUSHORT(fc+loc+4) && nameID == ttUSHORT(fc+loc+6)) {
+         *length = ttUSHORT(fc+loc+8);
+         return (const char *) (fc+stringOffset+ttUSHORT(fc+loc+10));
+      }
+   }
+   return NULL;
+}
+
+static int stbtt__matchpair(stbtt_uint8 *fc, stbtt_uint32 nm, stbtt_uint8 *name, stbtt_int32 nlen, stbtt_int32 target_id, stbtt_int32 next_id)
+{
+   stbtt_int32 i;
+   stbtt_int32 count = ttUSHORT(fc+nm+2);
+   stbtt_int32 stringOffset = nm + ttUSHORT(fc+nm+4);
+
+   for (i=0; i < count; ++i) {
+      stbtt_uint32 loc = nm + 6 + 12 * i;
+      stbtt_int32 id = ttUSHORT(fc+loc+6);
+      if (id == target_id) {
+         // find the encoding
+         stbtt_int32 platform = ttUSHORT(fc+loc+0), encoding = ttUSHORT(fc+loc+2), language = ttUSHORT(fc+loc+4);
+
+         // is this a Unicode encoding?
+         if (platform == 0 || (platform == 3 && encoding == 1) || (platform == 3 && encoding == 10)) {
+            stbtt_int32 slen = ttUSHORT(fc+loc+8);
+            stbtt_int32 off = ttUSHORT(fc+loc+10);
+
+            // check if there's a prefix match
+            stbtt_int32 matchlen = stbtt__CompareUTF8toUTF16_bigendian_prefix(name, nlen, fc+stringOffset+off,slen);
+            if (matchlen >= 0) {
+               // check for target_id+1 immediately following, with same encoding & language
+               if (i+1 < count && ttUSHORT(fc+loc+12+6) == next_id && ttUSHORT(fc+loc+12) == platform && ttUSHORT(fc+loc+12+2) == encoding && ttUSHORT(fc+loc+12+4) == language) {
+                  slen = ttUSHORT(fc+loc+12+8);
+                  off = ttUSHORT(fc+loc+12+10);
+                  if (slen == 0) {
+                     if (matchlen == nlen)
+                        return 1;
+                  } else if (matchlen < nlen && name[matchlen] == ' ') {
+                     ++matchlen;
+                     if (stbtt_CompareUTF8toUTF16_bigendian_internal((char*) (name+matchlen), nlen-matchlen, (char*)(fc+stringOffset+off),slen))
+                        return 1;
+                  }
+               } else {
+                  // if nothing immediately following
+                  if (matchlen == nlen)
+                     return 1;
+               }
+            }
+         }
+
+         // @TODO handle other encodings
+      }
+   }
+   return 0;
+}
+
+static int stbtt__matches(stbtt_uint8 *fc, stbtt_uint32 offset, stbtt_uint8 *name, stbtt_int32 flags)
+{
+   stbtt_int32 nlen = (stbtt_int32) STBTT_strlen((char *) name);
+   stbtt_uint32 nm,hd;
+   if (!stbtt__isfont(fc+offset)) return 0;
+
+   // check italics/bold/underline flags in macStyle...
+   if (flags) {
+      hd = stbtt__find_table(fc, offset, "head");
+      if ((ttUSHORT(fc+hd+44) & 7) != (flags & 7)) return 0;
+   }
+
+   nm = stbtt__find_table(fc, offset, "name");
+   if (!nm) return 0;
+
+   if (flags) {
+      // if we checked the macStyle flags, then just check the family and ignore the subfamily
+      if (stbtt__matchpair(fc, nm, name, nlen, 16, -1))  return 1;
+      if (stbtt__matchpair(fc, nm, name, nlen,  1, -1))  return 1;
+      if (stbtt__matchpair(fc, nm, name, nlen,  3, -1))  return 1;
+   } else {
+      if (stbtt__matchpair(fc, nm, name, nlen, 16, 17))  return 1;
+      if (stbtt__matchpair(fc, nm, name, nlen,  1,  2))  return 1;
+      if (stbtt__matchpair(fc, nm, name, nlen,  3, -1))  return 1;
+   }
+
+   return 0;
+}
+
+static int stbtt_FindMatchingFont_internal(unsigned char *font_collection, char *name_utf8, stbtt_int32 flags)
+{
+   stbtt_int32 i;
+   for (i=0;;++i) {
+      stbtt_int32 off = stbtt_GetFontOffsetForIndex(font_collection, i);
+      if (off < 0) return off;
+      if (stbtt__matches((stbtt_uint8 *) font_collection, off, (stbtt_uint8*) name_utf8, flags))
+         return off;
+   }
+}
+
+#if defined(__GNUC__) || defined(__clang__)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
+
+STBTT_DEF int stbtt_BakeFontBitmap(const unsigned char *data, int offset,
+                                float pixel_height, unsigned char *pixels, int pw, int ph,
+                                int first_char, int num_chars, stbtt_bakedchar *chardata)
+{
+   return stbtt_BakeFontBitmap_internal((unsigned char *) data, offset, pixel_height, pixels, pw, ph, first_char, num_chars, chardata);
+}
+
+STBTT_DEF int stbtt_GetFontOffsetForIndex(const unsigned char *data, int index)
+{
+   return stbtt_GetFontOffsetForIndex_internal((unsigned char *) data, index);
+}
+
+STBTT_DEF int stbtt_GetNumberOfFonts(const unsigned char *data)
+{
+   return stbtt_GetNumberOfFonts_internal((unsigned char *) data);
+}
+
+STBTT_DEF int stbtt_InitFont(stbtt_fontinfo *info, const unsigned char *data, int offset)
+{
+   return stbtt_InitFont_internal(info, (unsigned char *) data, offset);
+}
+
+STBTT_DEF int stbtt_FindMatchingFont(const unsigned char *fontdata, const char *name, int flags)
+{
+   return stbtt_FindMatchingFont_internal((unsigned char *) fontdata, (char *) name, flags);
+}
+
+STBTT_DEF int stbtt_CompareUTF8toUTF16_bigendian(const char *s1, int len1, const char *s2, int len2)
+{
+   return stbtt_CompareUTF8toUTF16_bigendian_internal((char *) s1, len1, (char *) s2, len2);
+}
+
+#if defined(__GNUC__) || defined(__clang__)
+#pragma GCC diagnostic pop
+#endif
+
+#endif // STB_TRUETYPE_IMPLEMENTATION
+
+
+// FULL VERSION HISTORY
+//
+//   1.25 (2021-07-11) many fixes
+//   1.24 (2020-02-05) fix warning
+//   1.23 (2020-02-02) query SVG data for glyphs; query whole kerning table (but only kern not GPOS)
+//   1.22 (2019-08-11) minimize missing-glyph duplication; fix kerning if both 'GPOS' and 'kern' are defined
+//   1.21 (2019-02-25) fix warning
+//   1.20 (2019-02-07) PackFontRange skips missing codepoints; GetScaleFontVMetrics()
+//   1.19 (2018-02-11) OpenType GPOS kerning (horizontal only), STBTT_fmod
+//   1.18 (2018-01-29) add missing function
+//   1.17 (2017-07-23) make more arguments const; doc fix
+//   1.16 (2017-07-12) SDF support
+//   1.15 (2017-03-03) make more arguments const
+//   1.14 (2017-01-16) num-fonts-in-TTC function
+//   1.13 (2017-01-02) support OpenType fonts, certain Apple fonts
+//   1.12 (2016-10-25) suppress warnings about casting away const with -Wcast-qual
+//   1.11 (2016-04-02) fix unused-variable warning
+//   1.10 (2016-04-02) allow user-defined fabs() replacement
+//                     fix memory leak if fontsize=0.0
+//                     fix warning from duplicate typedef
+//   1.09 (2016-01-16) warning fix; avoid crash on outofmem; use alloc userdata for PackFontRanges
+//   1.08 (2015-09-13) document stbtt_Rasterize(); fixes for vertical & horizontal edges
+//   1.07 (2015-08-01) allow PackFontRanges to accept arrays of sparse codepoints;
+//                     allow PackFontRanges to pack and render in separate phases;
+//                     fix stbtt_GetFontOFfsetForIndex (never worked for non-0 input?);
+//                     fixed an assert() bug in the new rasterizer
+//                     replace assert() with STBTT_assert() in new rasterizer
+//   1.06 (2015-07-14) performance improvements (~35% faster on x86 and x64 on test machine)
+//                     also more precise AA rasterizer, except if shapes overlap
+//                     remove need for STBTT_sort
+//   1.05 (2015-04-15) fix misplaced definitions for STBTT_STATIC
+//   1.04 (2015-04-15) typo in example
+//   1.03 (2015-04-12) STBTT_STATIC, fix memory leak in new packing, various fixes
+//   1.02 (2014-12-10) fix various warnings & compile issues w/ stb_rect_pack, C++
+//   1.01 (2014-12-08) fix subpixel position when oversampling to exactly match
+//                        non-oversampled; STBTT_POINT_SIZE for packed case only
+//   1.00 (2014-12-06) add new PackBegin etc. API, w/ support for oversampling
+//   0.99 (2014-09-18) fix multiple bugs with subpixel rendering (ryg)
+//   0.9  (2014-08-07) support certain mac/iOS fonts without an MS platformID
+//   0.8b (2014-07-07) fix a warning
+//   0.8  (2014-05-25) fix a few more warnings
+//   0.7  (2013-09-25) bugfix: subpixel glyph bug fixed in 0.5 had come back
+//   0.6c (2012-07-24) improve documentation
+//   0.6b (2012-07-20) fix a few more warnings
+//   0.6  (2012-07-17) fix warnings; added stbtt_ScaleForMappingEmToPixels,
+//                        stbtt_GetFontBoundingBox, stbtt_IsGlyphEmpty
+//   0.5  (2011-12-09) bugfixes:
+//                        subpixel glyph renderer computed wrong bounding box
+//                        first vertex of shape can be off-curve (FreeSans)
+//   0.4b (2011-12-03) fixed an error in the font baking example
+//   0.4  (2011-12-01) kerning, subpixel rendering (tor)
+//                    bugfixes for:
+//                        codepoint-to-glyph conversion using table fmt=12
+//                        codepoint-to-glyph conversion using table fmt=4
+//                        stbtt_GetBakedQuad with non-square texture (Zer)
+//                    updated Hello World! sample to use kerning and subpixel
+//                    fixed some warnings
+//   0.3  (2009-06-24) cmap fmt=12, compound shapes (MM)
+//                    userdata, malloc-from-userdata, non-zero fill (stb)
+//   0.2  (2009-03-11) Fix unsigned/signed char warnings
+//   0.1  (2009-03-09) First public release
+//
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_vorbis.c b/vendor/stb/stb_vorbis.c
new file mode 100644
index 0000000..3e5c250
--- /dev/null
+++ b/vendor/stb/stb_vorbis.c
@@ -0,0 +1,5584 @@
+// Ogg Vorbis audio decoder - v1.22 - public domain
+// http://nothings.org/stb_vorbis/
+//
+// Original version written by Sean Barrett in 2007.
+//
+// Originally sponsored by RAD Game Tools. Seeking implementation
+// sponsored by Phillip Bennefall, Marc Andersen, Aaron Baker,
+// Elias Software, Aras Pranckevicius, and Sean Barrett.
+//
+// LICENSE
+//
+//   See end of file for license information.
+//
+// Limitations:
+//
+//   - floor 0 not supported (used in old ogg vorbis files pre-2004)
+//   - lossless sample-truncation at beginning ignored
+//   - cannot concatenate multiple vorbis streams
+//   - sample positions are 32-bit, limiting seekable 192Khz
+//       files to around 6 hours (Ogg supports 64-bit)
+//
+// Feature contributors:
+//    Dougall Johnson (sample-exact seeking)
+//
+// Bugfix/warning contributors:
+//    Terje Mathisen     Niklas Frykholm     Andy Hill
+//    Casey Muratori     John Bolton         Gargaj
+//    Laurent Gomila     Marc LeBlanc        Ronny Chevalier
+//    Bernhard Wodo      Evan Balster        github:alxprd
+//    Tom Beaumont       Ingo Leitgeb        Nicolas Guillemot
+//    Phillip Bennefall  Rohit               Thiago Goulart
+//    github:manxorist   Saga Musix          github:infatum
+//    Timur Gagiev       Maxwell Koo         Peter Waller
+//    github:audinowho   Dougall Johnson     David Reid
+//    github:Clownacy    Pedro J. Estebanez  Remi Verschelde
+//    AnthoFoxo          github:morlat       Gabriel Ravier
+//
+// Partial history:
+//    1.22    - 2021-07-11 - various small fixes
+//    1.21    - 2021-07-02 - fix bug for files with no comments
+//    1.20    - 2020-07-11 - several small fixes
+//    1.19    - 2020-02-05 - warnings
+//    1.18    - 2020-02-02 - fix seek bugs; parse header comments; misc warnings etc.
+//    1.17    - 2019-07-08 - fix CVE-2019-13217..CVE-2019-13223 (by ForAllSecure)
+//    1.16    - 2019-03-04 - fix warnings
+//    1.15    - 2019-02-07 - explicit failure if Ogg Skeleton data is found
+//    1.14    - 2018-02-11 - delete bogus dealloca usage
+//    1.13    - 2018-01-29 - fix truncation of last frame (hopefully)
+//    1.12    - 2017-11-21 - limit residue begin/end to blocksize/2 to avoid large temp allocs in bad/corrupt files
+//    1.11    - 2017-07-23 - fix MinGW compilation
+//    1.10    - 2017-03-03 - more robust seeking; fix negative ilog(); clear error in open_memory
+//    1.09    - 2016-04-04 - back out 'truncation of last frame' fix from previous version
+//    1.08    - 2016-04-02 - warnings; setup memory leaks; truncation of last frame
+//    1.07    - 2015-01-16 - fixes for crashes on invalid files; warning fixes; const
+//    1.06    - 2015-08-31 - full, correct support for seeking API (Dougall Johnson)
+//                           some crash fixes when out of memory or with corrupt files
+//                           fix some inappropriately signed shifts
+//    1.05    - 2015-04-19 - don't define __forceinline if it's redundant
+//    1.04    - 2014-08-27 - fix missing const-correct case in API
+//    1.03    - 2014-08-07 - warning fixes
+//    1.02    - 2014-07-09 - declare qsort comparison as explicitly _cdecl in Windows
+//    1.01    - 2014-06-18 - fix stb_vorbis_get_samples_float (interleaved was correct)
+//    1.0     - 2014-05-26 - fix memory leaks; fix warnings; fix bugs in >2-channel;
+//                           (API change) report sample rate for decode-full-file funcs
+//
+// See end of file for full version history.
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  HEADER BEGINS HERE
+//
+
+#ifndef STB_VORBIS_INCLUDE_STB_VORBIS_H
+#define STB_VORBIS_INCLUDE_STB_VORBIS_H
+
+#if defined(STB_VORBIS_NO_CRT) && !defined(STB_VORBIS_NO_STDIO)
+#define STB_VORBIS_NO_STDIO 1
+#endif
+
+#ifndef STB_VORBIS_NO_STDIO
+#include <stdio.h>
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+///////////   THREAD SAFETY
+
+// Individual stb_vorbis* handles are not thread-safe; you cannot decode from
+// them from multiple threads at the same time. However, you can have multiple
+// stb_vorbis* handles and decode from them independently in multiple thrads.
+
+
+///////////   MEMORY ALLOCATION
+
+// normally stb_vorbis uses malloc() to allocate memory at startup,
+// and alloca() to allocate temporary memory during a frame on the
+// stack. (Memory consumption will depend on the amount of setup
+// data in the file and how you set the compile flags for speed
+// vs. size. In my test files the maximal-size usage is ~150KB.)
+//
+// You can modify the wrapper functions in the source (setup_malloc,
+// setup_temp_malloc, temp_malloc) to change this behavior, or you
+// can use a simpler allocation model: you pass in a buffer from
+// which stb_vorbis will allocate _all_ its memory (including the
+// temp memory). "open" may fail with a VORBIS_outofmem if you
+// do not pass in enough data; there is no way to determine how
+// much you do need except to succeed (at which point you can
+// query get_info to find the exact amount required. yes I know
+// this is lame).
+//
+// If you pass in a non-NULL buffer of the type below, allocation
+// will occur from it as described above. Otherwise just pass NULL
+// to use malloc()/alloca()
+
+typedef struct
+{
+   char *alloc_buffer;
+   int   alloc_buffer_length_in_bytes;
+} stb_vorbis_alloc;
+
+
+///////////   FUNCTIONS USEABLE WITH ALL INPUT MODES
+
+typedef struct stb_vorbis stb_vorbis;
+
+typedef struct
+{
+   unsigned int sample_rate;
+   int channels;
+
+   unsigned int setup_memory_required;
+   unsigned int setup_temp_memory_required;
+   unsigned int temp_memory_required;
+
+   int max_frame_size;
+} stb_vorbis_info;
+
+typedef struct
+{
+   char *vendor;
+
+   int comment_list_length;
+   char **comment_list;
+} stb_vorbis_comment;
+
+// get general information about the file
+extern stb_vorbis_info stb_vorbis_get_info(stb_vorbis *f);
+
+// get ogg comments
+extern stb_vorbis_comment stb_vorbis_get_comment(stb_vorbis *f);
+
+// get the last error detected (clears it, too)
+extern int stb_vorbis_get_error(stb_vorbis *f);
+
+// close an ogg vorbis file and free all memory in use
+extern void stb_vorbis_close(stb_vorbis *f);
+
+// this function returns the offset (in samples) from the beginning of the
+// file that will be returned by the next decode, if it is known, or -1
+// otherwise. after a flush_pushdata() call, this may take a while before
+// it becomes valid again.
+// NOT WORKING YET after a seek with PULLDATA API
+extern int stb_vorbis_get_sample_offset(stb_vorbis *f);
+
+// returns the current seek point within the file, or offset from the beginning
+// of the memory buffer. In pushdata mode it returns 0.
+extern unsigned int stb_vorbis_get_file_offset(stb_vorbis *f);
+
+///////////   PUSHDATA API
+
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+
+// this API allows you to get blocks of data from any source and hand
+// them to stb_vorbis. you have to buffer them; stb_vorbis will tell
+// you how much it used, and you have to give it the rest next time;
+// and stb_vorbis may not have enough data to work with and you will
+// need to give it the same data again PLUS more. Note that the Vorbis
+// specification does not bound the size of an individual frame.
+
+extern stb_vorbis *stb_vorbis_open_pushdata(
+         const unsigned char * datablock, int datablock_length_in_bytes,
+         int *datablock_memory_consumed_in_bytes,
+         int *error,
+         const stb_vorbis_alloc *alloc_buffer);
+// create a vorbis decoder by passing in the initial data block containing
+//    the ogg&vorbis headers (you don't need to do parse them, just provide
+//    the first N bytes of the file--you're told if it's not enough, see below)
+// on success, returns an stb_vorbis *, does not set error, returns the amount of
+//    data parsed/consumed on this call in *datablock_memory_consumed_in_bytes;
+// on failure, returns NULL on error and sets *error, does not change *datablock_memory_consumed
+// if returns NULL and *error is VORBIS_need_more_data, then the input block was
+//       incomplete and you need to pass in a larger block from the start of the file
+
+extern int stb_vorbis_decode_frame_pushdata(
+         stb_vorbis *f,
+         const unsigned char *datablock, int datablock_length_in_bytes,
+         int *channels,             // place to write number of float * buffers
+         float ***output,           // place to write float ** array of float * buffers
+         int *samples               // place to write number of output samples
+     );
+// decode a frame of audio sample data if possible from the passed-in data block
+//
+// return value: number of bytes we used from datablock
+//
+// possible cases:
+//     0 bytes used, 0 samples output (need more data)
+//     N bytes used, 0 samples output (resynching the stream, keep going)
+//     N bytes used, M samples output (one frame of data)
+// note that after opening a file, you will ALWAYS get one N-bytes,0-sample
+// frame, because Vorbis always "discards" the first frame.
+//
+// Note that on resynch, stb_vorbis will rarely consume all of the buffer,
+// instead only datablock_length_in_bytes-3 or less. This is because it wants
+// to avoid missing parts of a page header if they cross a datablock boundary,
+// without writing state-machiney code to record a partial detection.
+//
+// The number of channels returned are stored in *channels (which can be
+// NULL--it is always the same as the number of channels reported by
+// get_info). *output will contain an array of float* buffers, one per
+// channel. In other words, (*output)[0][0] contains the first sample from
+// the first channel, and (*output)[1][0] contains the first sample from
+// the second channel.
+//
+// *output points into stb_vorbis's internal output buffer storage; these
+// buffers are owned by stb_vorbis and application code should not free
+// them or modify their contents. They are transient and will be overwritten
+// once you ask for more data to get decoded, so be sure to grab any data
+// you need before then.
+
+extern void stb_vorbis_flush_pushdata(stb_vorbis *f);
+// inform stb_vorbis that your next datablock will not be contiguous with
+// previous ones (e.g. you've seeked in the data); future attempts to decode
+// frames will cause stb_vorbis to resynchronize (as noted above), and
+// once it sees a valid Ogg page (typically 4-8KB, as large as 64KB), it
+// will begin decoding the _next_ frame.
+//
+// if you want to seek using pushdata, you need to seek in your file, then
+// call stb_vorbis_flush_pushdata(), then start calling decoding, then once
+// decoding is returning you data, call stb_vorbis_get_sample_offset, and
+// if you don't like the result, seek your file again and repeat.
+#endif
+
+
+//////////   PULLING INPUT API
+
+#ifndef STB_VORBIS_NO_PULLDATA_API
+// This API assumes stb_vorbis is allowed to pull data from a source--
+// either a block of memory containing the _entire_ vorbis stream, or a
+// FILE * that you or it create, or possibly some other reading mechanism
+// if you go modify the source to replace the FILE * case with some kind
+// of callback to your code. (But if you don't support seeking, you may
+// just want to go ahead and use pushdata.)
+
+#if !defined(STB_VORBIS_NO_STDIO) && !defined(STB_VORBIS_NO_INTEGER_CONVERSION)
+extern int stb_vorbis_decode_filename(const char *filename, int *channels, int *sample_rate, short **output);
+#endif
+#if !defined(STB_VORBIS_NO_INTEGER_CONVERSION)
+extern int stb_vorbis_decode_memory(const unsigned char *mem, int len, int *channels, int *sample_rate, short **output);
+#endif
+// decode an entire file and output the data interleaved into a malloc()ed
+// buffer stored in *output. The return value is the number of samples
+// decoded, or -1 if the file could not be opened or was not an ogg vorbis file.
+// When you're done with it, just free() the pointer returned in *output.
+
+extern stb_vorbis * stb_vorbis_open_memory(const unsigned char *data, int len,
+                                  int *error, const stb_vorbis_alloc *alloc_buffer);
+// create an ogg vorbis decoder from an ogg vorbis stream in memory (note
+// this must be the entire stream!). on failure, returns NULL and sets *error
+
+#ifndef STB_VORBIS_NO_STDIO
+extern stb_vorbis * stb_vorbis_open_filename(const char *filename,
+                                  int *error, const stb_vorbis_alloc *alloc_buffer);
+// create an ogg vorbis decoder from a filename via fopen(). on failure,
+// returns NULL and sets *error (possibly to VORBIS_file_open_failure).
+
+extern stb_vorbis * stb_vorbis_open_file(FILE *f, int close_handle_on_close,
+                                  int *error, const stb_vorbis_alloc *alloc_buffer);
+// create an ogg vorbis decoder from an open FILE *, looking for a stream at
+// the _current_ seek point (ftell). on failure, returns NULL and sets *error.
+// note that stb_vorbis must "own" this stream; if you seek it in between
+// calls to stb_vorbis, it will become confused. Moreover, if you attempt to
+// perform stb_vorbis_seek_*() operations on this file, it will assume it
+// owns the _entire_ rest of the file after the start point. Use the next
+// function, stb_vorbis_open_file_section(), to limit it.
+
+extern stb_vorbis * stb_vorbis_open_file_section(FILE *f, int close_handle_on_close,
+                int *error, const stb_vorbis_alloc *alloc_buffer, unsigned int len);
+// create an ogg vorbis decoder from an open FILE *, looking for a stream at
+// the _current_ seek point (ftell); the stream will be of length 'len' bytes.
+// on failure, returns NULL and sets *error. note that stb_vorbis must "own"
+// this stream; if you seek it in between calls to stb_vorbis, it will become
+// confused.
+#endif
+
+extern int stb_vorbis_seek_frame(stb_vorbis *f, unsigned int sample_number);
+extern int stb_vorbis_seek(stb_vorbis *f, unsigned int sample_number);
+// these functions seek in the Vorbis file to (approximately) 'sample_number'.
+// after calling seek_frame(), the next call to get_frame_*() will include
+// the specified sample. after calling stb_vorbis_seek(), the next call to
+// stb_vorbis_get_samples_* will start with the specified sample. If you
+// do not need to seek to EXACTLY the target sample when using get_samples_*,
+// you can also use seek_frame().
+
+extern int stb_vorbis_seek_start(stb_vorbis *f);
+// this function is equivalent to stb_vorbis_seek(f,0)
+
+extern unsigned int stb_vorbis_stream_length_in_samples(stb_vorbis *f);
+extern float        stb_vorbis_stream_length_in_seconds(stb_vorbis *f);
+// these functions return the total length of the vorbis stream
+
+extern int stb_vorbis_get_frame_float(stb_vorbis *f, int *channels, float ***output);
+// decode the next frame and return the number of samples. the number of
+// channels returned are stored in *channels (which can be NULL--it is always
+// the same as the number of channels reported by get_info). *output will
+// contain an array of float* buffers, one per channel. These outputs will
+// be overwritten on the next call to stb_vorbis_get_frame_*.
+//
+// You generally should not intermix calls to stb_vorbis_get_frame_*()
+// and stb_vorbis_get_samples_*(), since the latter calls the former.
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+extern int stb_vorbis_get_frame_short_interleaved(stb_vorbis *f, int num_c, short *buffer, int num_shorts);
+extern int stb_vorbis_get_frame_short            (stb_vorbis *f, int num_c, short **buffer, int num_samples);
+#endif
+// decode the next frame and return the number of *samples* per channel.
+// Note that for interleaved data, you pass in the number of shorts (the
+// size of your array), but the return value is the number of samples per
+// channel, not the total number of samples.
+//
+// The data is coerced to the number of channels you request according to the
+// channel coercion rules (see below). You must pass in the size of your
+// buffer(s) so that stb_vorbis will not overwrite the end of the buffer.
+// The maximum buffer size needed can be gotten from get_info(); however,
+// the Vorbis I specification implies an absolute maximum of 4096 samples
+// per channel.
+
+// Channel coercion rules:
+//    Let M be the number of channels requested, and N the number of channels present,
+//    and Cn be the nth channel; let stereo L be the sum of all L and center channels,
+//    and stereo R be the sum of all R and center channels (channel assignment from the
+//    vorbis spec).
+//        M    N       output
+//        1    k      sum(Ck) for all k
+//        2    *      stereo L, stereo R
+//        k    l      k > l, the first l channels, then 0s
+//        k    l      k <= l, the first k channels
+//    Note that this is not _good_ surround etc. mixing at all! It's just so
+//    you get something useful.
+
+extern int stb_vorbis_get_samples_float_interleaved(stb_vorbis *f, int channels, float *buffer, int num_floats);
+extern int stb_vorbis_get_samples_float(stb_vorbis *f, int channels, float **buffer, int num_samples);
+// gets num_samples samples, not necessarily on a frame boundary--this requires
+// buffering so you have to supply the buffers. DOES NOT APPLY THE COERCION RULES.
+// Returns the number of samples stored per channel; it may be less than requested
+// at the end of the file. If there are no more samples in the file, returns 0.
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+extern int stb_vorbis_get_samples_short_interleaved(stb_vorbis *f, int channels, short *buffer, int num_shorts);
+extern int stb_vorbis_get_samples_short(stb_vorbis *f, int channels, short **buffer, int num_samples);
+#endif
+// gets num_samples samples, not necessarily on a frame boundary--this requires
+// buffering so you have to supply the buffers. Applies the coercion rules above
+// to produce 'channels' channels. Returns the number of samples stored per channel;
+// it may be less than requested at the end of the file. If there are no more
+// samples in the file, returns 0.
+
+#endif
+
+////////   ERROR CODES
+
+enum STBVorbisError
+{
+   VORBIS__no_error,
+
+   VORBIS_need_more_data=1,             // not a real error
+
+   VORBIS_invalid_api_mixing,           // can't mix API modes
+   VORBIS_outofmem,                     // not enough memory
+   VORBIS_feature_not_supported,        // uses floor 0
+   VORBIS_too_many_channels,            // STB_VORBIS_MAX_CHANNELS is too small
+   VORBIS_file_open_failure,            // fopen() failed
+   VORBIS_seek_without_length,          // can't seek in unknown-length file
+
+   VORBIS_unexpected_eof=10,            // file is truncated?
+   VORBIS_seek_invalid,                 // seek past EOF
+
+   // decoding errors (corrupt/invalid stream) -- you probably
+   // don't care about the exact details of these
+
+   // vorbis errors:
+   VORBIS_invalid_setup=20,
+   VORBIS_invalid_stream,
+
+   // ogg errors:
+   VORBIS_missing_capture_pattern=30,
+   VORBIS_invalid_stream_structure_version,
+   VORBIS_continued_packet_flag_invalid,
+   VORBIS_incorrect_stream_serial_number,
+   VORBIS_invalid_first_page,
+   VORBIS_bad_packet_type,
+   VORBIS_cant_find_last_page,
+   VORBIS_seek_failed,
+   VORBIS_ogg_skeleton_not_supported
+};
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // STB_VORBIS_INCLUDE_STB_VORBIS_H
+//
+//  HEADER ENDS HERE
+//
+//////////////////////////////////////////////////////////////////////////////
+
+#ifndef STB_VORBIS_HEADER_ONLY
+
+// global configuration settings (e.g. set these in the project/makefile),
+// or just set them in this file at the top (although ideally the first few
+// should be visible when the header file is compiled too, although it's not
+// crucial)
+
+// STB_VORBIS_NO_PUSHDATA_API
+//     does not compile the code for the various stb_vorbis_*_pushdata()
+//     functions
+// #define STB_VORBIS_NO_PUSHDATA_API
+
+// STB_VORBIS_NO_PULLDATA_API
+//     does not compile the code for the non-pushdata APIs
+// #define STB_VORBIS_NO_PULLDATA_API
+
+// STB_VORBIS_NO_STDIO
+//     does not compile the code for the APIs that use FILE *s internally
+//     or externally (implied by STB_VORBIS_NO_PULLDATA_API)
+// #define STB_VORBIS_NO_STDIO
+
+// STB_VORBIS_NO_INTEGER_CONVERSION
+//     does not compile the code for converting audio sample data from
+//     float to integer (implied by STB_VORBIS_NO_PULLDATA_API)
+// #define STB_VORBIS_NO_INTEGER_CONVERSION
+
+// STB_VORBIS_NO_FAST_SCALED_FLOAT
+//      does not use a fast float-to-int trick to accelerate float-to-int on
+//      most platforms which requires endianness be defined correctly.
+//#define STB_VORBIS_NO_FAST_SCALED_FLOAT
+
+
+// STB_VORBIS_MAX_CHANNELS [number]
+//     globally define this to the maximum number of channels you need.
+//     The spec does not put a restriction on channels except that
+//     the count is stored in a byte, so 255 is the hard limit.
+//     Reducing this saves about 16 bytes per value, so using 16 saves
+//     (255-16)*16 or around 4KB. Plus anything other memory usage
+//     I forgot to account for. Can probably go as low as 8 (7.1 audio),
+//     6 (5.1 audio), or 2 (stereo only).
+#ifndef STB_VORBIS_MAX_CHANNELS
+#define STB_VORBIS_MAX_CHANNELS    16  // enough for anyone?
+#endif
+
+// STB_VORBIS_PUSHDATA_CRC_COUNT [number]
+//     after a flush_pushdata(), stb_vorbis begins scanning for the
+//     next valid page, without backtracking. when it finds something
+//     that looks like a page, it streams through it and verifies its
+//     CRC32. Should that validation fail, it keeps scanning. But it's
+//     possible that _while_ streaming through to check the CRC32 of
+//     one candidate page, it sees another candidate page. This #define
+//     determines how many "overlapping" candidate pages it can search
+//     at once. Note that "real" pages are typically ~4KB to ~8KB, whereas
+//     garbage pages could be as big as 64KB, but probably average ~16KB.
+//     So don't hose ourselves by scanning an apparent 64KB page and
+//     missing a ton of real ones in the interim; so minimum of 2
+#ifndef STB_VORBIS_PUSHDATA_CRC_COUNT
+#define STB_VORBIS_PUSHDATA_CRC_COUNT  4
+#endif
+
+// STB_VORBIS_FAST_HUFFMAN_LENGTH [number]
+//     sets the log size of the huffman-acceleration table.  Maximum
+//     supported value is 24. with larger numbers, more decodings are O(1),
+//     but the table size is larger so worse cache missing, so you'll have
+//     to probe (and try multiple ogg vorbis files) to find the sweet spot.
+#ifndef STB_VORBIS_FAST_HUFFMAN_LENGTH
+#define STB_VORBIS_FAST_HUFFMAN_LENGTH   10
+#endif
+
+// STB_VORBIS_FAST_BINARY_LENGTH [number]
+//     sets the log size of the binary-search acceleration table. this
+//     is used in similar fashion to the fast-huffman size to set initial
+//     parameters for the binary search
+
+// STB_VORBIS_FAST_HUFFMAN_INT
+//     The fast huffman tables are much more efficient if they can be
+//     stored as 16-bit results instead of 32-bit results. This restricts
+//     the codebooks to having only 65535 possible outcomes, though.
+//     (At least, accelerated by the huffman table.)
+#ifndef STB_VORBIS_FAST_HUFFMAN_INT
+#define STB_VORBIS_FAST_HUFFMAN_SHORT
+#endif
+
+// STB_VORBIS_NO_HUFFMAN_BINARY_SEARCH
+//     If the 'fast huffman' search doesn't succeed, then stb_vorbis falls
+//     back on binary searching for the correct one. This requires storing
+//     extra tables with the huffman codes in sorted order. Defining this
+//     symbol trades off space for speed by forcing a linear search in the
+//     non-fast case, except for "sparse" codebooks.
+// #define STB_VORBIS_NO_HUFFMAN_BINARY_SEARCH
+
+// STB_VORBIS_DIVIDES_IN_RESIDUE
+//     stb_vorbis precomputes the result of the scalar residue decoding
+//     that would otherwise require a divide per chunk. you can trade off
+//     space for time by defining this symbol.
+// #define STB_VORBIS_DIVIDES_IN_RESIDUE
+
+// STB_VORBIS_DIVIDES_IN_CODEBOOK
+//     vorbis VQ codebooks can be encoded two ways: with every case explicitly
+//     stored, or with all elements being chosen from a small range of values,
+//     and all values possible in all elements. By default, stb_vorbis expands
+//     this latter kind out to look like the former kind for ease of decoding,
+//     because otherwise an integer divide-per-vector-element is required to
+//     unpack the index. If you define STB_VORBIS_DIVIDES_IN_CODEBOOK, you can
+//     trade off storage for speed.
+//#define STB_VORBIS_DIVIDES_IN_CODEBOOK
+
+#ifdef STB_VORBIS_CODEBOOK_SHORTS
+#error "STB_VORBIS_CODEBOOK_SHORTS is no longer supported as it produced incorrect results for some input formats"
+#endif
+
+// STB_VORBIS_DIVIDE_TABLE
+//     this replaces small integer divides in the floor decode loop with
+//     table lookups. made less than 1% difference, so disabled by default.
+
+// STB_VORBIS_NO_INLINE_DECODE
+//     disables the inlining of the scalar codebook fast-huffman decode.
+//     might save a little codespace; useful for debugging
+// #define STB_VORBIS_NO_INLINE_DECODE
+
+// STB_VORBIS_NO_DEFER_FLOOR
+//     Normally we only decode the floor without synthesizing the actual
+//     full curve. We can instead synthesize the curve immediately. This
+//     requires more memory and is very likely slower, so I don't think
+//     you'd ever want to do it except for debugging.
+// #define STB_VORBIS_NO_DEFER_FLOOR
+
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+
+#ifdef STB_VORBIS_NO_PULLDATA_API
+   #define STB_VORBIS_NO_INTEGER_CONVERSION
+   #define STB_VORBIS_NO_STDIO
+#endif
+
+#if defined(STB_VORBIS_NO_CRT) && !defined(STB_VORBIS_NO_STDIO)
+   #define STB_VORBIS_NO_STDIO 1
+#endif
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+#ifndef STB_VORBIS_NO_FAST_SCALED_FLOAT
+
+   // only need endianness for fast-float-to-int, which we don't
+   // use for pushdata
+
+   #ifndef STB_VORBIS_BIG_ENDIAN
+     #define STB_VORBIS_ENDIAN  0
+   #else
+     #define STB_VORBIS_ENDIAN  1
+   #endif
+
+#endif
+#endif
+
+
+#ifndef STB_VORBIS_NO_STDIO
+#include <stdio.h>
+#endif
+
+#ifndef STB_VORBIS_NO_CRT
+   #include <stdlib.h>
+   #include <string.h>
+   #include <assert.h>
+   #include <math.h>
+
+   // find definition of alloca if it's not in stdlib.h:
+   #if defined(_MSC_VER) || defined(__MINGW32__)
+      #include <malloc.h>
+   #endif
+   #if defined(__linux__) || defined(__linux) || defined(__sun__) || defined(__EMSCRIPTEN__) || defined(__NEWLIB__)
+      #include <alloca.h>
+   #endif
+#else // STB_VORBIS_NO_CRT
+   #define NULL 0
+   #define malloc(s)   0
+   #define free(s)     ((void) 0)
+   #define realloc(s)  0
+#endif // STB_VORBIS_NO_CRT
+
+#include <limits.h>
+
+#ifdef __MINGW32__
+   // eff you mingw:
+   //     "fixed":
+   //         http://sourceforge.net/p/mingw-w64/mailman/message/32882927/
+   //     "no that broke the build, reverted, who cares about C":
+   //         http://sourceforge.net/p/mingw-w64/mailman/message/32890381/
+   #ifdef __forceinline
+   #undef __forceinline
+   #endif
+   #define __forceinline
+   #ifndef alloca
+   #define alloca __builtin_alloca
+   #endif
+#elif !defined(_MSC_VER)
+   #if __GNUC__
+      #define __forceinline inline
+   #else
+      #define __forceinline
+   #endif
+#endif
+
+#if STB_VORBIS_MAX_CHANNELS > 256
+#error "Value of STB_VORBIS_MAX_CHANNELS outside of allowed range"
+#endif
+
+#if STB_VORBIS_FAST_HUFFMAN_LENGTH > 24
+#error "Value of STB_VORBIS_FAST_HUFFMAN_LENGTH outside of allowed range"
+#endif
+
+
+#if 0
+#include <crtdbg.h>
+#define CHECK(f)   _CrtIsValidHeapPointer(f->channel_buffers[1])
+#else
+#define CHECK(f)   ((void) 0)
+#endif
+
+#define MAX_BLOCKSIZE_LOG  13   // from specification
+#define MAX_BLOCKSIZE      (1 << MAX_BLOCKSIZE_LOG)
+
+
+typedef unsigned char  uint8;
+typedef   signed char   int8;
+typedef unsigned short uint16;
+typedef   signed short  int16;
+typedef unsigned int   uint32;
+typedef   signed int    int32;
+
+#ifndef TRUE
+#define TRUE 1
+#define FALSE 0
+#endif
+
+typedef float codetype;
+
+#ifdef _MSC_VER
+#define STBV_NOTUSED(v)  (void)(v)
+#else
+#define STBV_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+// @NOTE
+//
+// Some arrays below are tagged "//varies", which means it's actually
+// a variable-sized piece of data, but rather than malloc I assume it's
+// small enough it's better to just allocate it all together with the
+// main thing
+//
+// Most of the variables are specified with the smallest size I could pack
+// them into. It might give better performance to make them all full-sized
+// integers. It should be safe to freely rearrange the structures or change
+// the sizes larger--nothing relies on silently truncating etc., nor the
+// order of variables.
+
+#define FAST_HUFFMAN_TABLE_SIZE   (1 << STB_VORBIS_FAST_HUFFMAN_LENGTH)
+#define FAST_HUFFMAN_TABLE_MASK   (FAST_HUFFMAN_TABLE_SIZE - 1)
+
+typedef struct
+{
+   int dimensions, entries;
+   uint8 *codeword_lengths;
+   float  minimum_value;
+   float  delta_value;
+   uint8  value_bits;
+   uint8  lookup_type;
+   uint8  sequence_p;
+   uint8  sparse;
+   uint32 lookup_values;
+   codetype *multiplicands;
+   uint32 *codewords;
+   #ifdef STB_VORBIS_FAST_HUFFMAN_SHORT
+    int16  fast_huffman[FAST_HUFFMAN_TABLE_SIZE];
+   #else
+    int32  fast_huffman[FAST_HUFFMAN_TABLE_SIZE];
+   #endif
+   uint32 *sorted_codewords;
+   int    *sorted_values;
+   int     sorted_entries;
+} Codebook;
+
+typedef struct
+{
+   uint8 order;
+   uint16 rate;
+   uint16 bark_map_size;
+   uint8 amplitude_bits;
+   uint8 amplitude_offset;
+   uint8 number_of_books;
+   uint8 book_list[16]; // varies
+} Floor0;
+
+typedef struct
+{
+   uint8 partitions;
+   uint8 partition_class_list[32]; // varies
+   uint8 class_dimensions[16]; // varies
+   uint8 class_subclasses[16]; // varies
+   uint8 class_masterbooks[16]; // varies
+   int16 subclass_books[16][8]; // varies
+   uint16 Xlist[31*8+2]; // varies
+   uint8 sorted_order[31*8+2];
+   uint8 neighbors[31*8+2][2];
+   uint8 floor1_multiplier;
+   uint8 rangebits;
+   int values;
+} Floor1;
+
+typedef union
+{
+   Floor0 floor0;
+   Floor1 floor1;
+} Floor;
+
+typedef struct
+{
+   uint32 begin, end;
+   uint32 part_size;
+   uint8 classifications;
+   uint8 classbook;
+   uint8 **classdata;
+   int16 (*residue_books)[8];
+} Residue;
+
+typedef struct
+{
+   uint8 magnitude;
+   uint8 angle;
+   uint8 mux;
+} MappingChannel;
+
+typedef struct
+{
+   uint16 coupling_steps;
+   MappingChannel *chan;
+   uint8  submaps;
+   uint8  submap_floor[15]; // varies
+   uint8  submap_residue[15]; // varies
+} Mapping;
+
+typedef struct
+{
+   uint8 blockflag;
+   uint8 mapping;
+   uint16 windowtype;
+   uint16 transformtype;
+} Mode;
+
+typedef struct
+{
+   uint32  goal_crc;    // expected crc if match
+   int     bytes_left;  // bytes left in packet
+   uint32  crc_so_far;  // running crc
+   int     bytes_done;  // bytes processed in _current_ chunk
+   uint32  sample_loc;  // granule pos encoded in page
+} CRCscan;
+
+typedef struct
+{
+   uint32 page_start, page_end;
+   uint32 last_decoded_sample;
+} ProbedPage;
+
+struct stb_vorbis
+{
+  // user-accessible info
+   unsigned int sample_rate;
+   int channels;
+
+   unsigned int setup_memory_required;
+   unsigned int temp_memory_required;
+   unsigned int setup_temp_memory_required;
+
+   char *vendor;
+   int comment_list_length;
+   char **comment_list;
+
+  // input config
+#ifndef STB_VORBIS_NO_STDIO
+   FILE *f;
+   uint32 f_start;
+   int close_on_free;
+#endif
+
+   uint8 *stream;
+   uint8 *stream_start;
+   uint8 *stream_end;
+
+   uint32 stream_len;
+
+   uint8  push_mode;
+
+   // the page to seek to when seeking to start, may be zero
+   uint32 first_audio_page_offset;
+
+   // p_first is the page on which the first audio packet ends
+   // (but not necessarily the page on which it starts)
+   ProbedPage p_first, p_last;
+
+  // memory management
+   stb_vorbis_alloc alloc;
+   int setup_offset;
+   int temp_offset;
+
+  // run-time results
+   int eof;
+   enum STBVorbisError error;
+
+  // user-useful data
+
+  // header info
+   int blocksize[2];
+   int blocksize_0, blocksize_1;
+   int codebook_count;
+   Codebook *codebooks;
+   int floor_count;
+   uint16 floor_types[64]; // varies
+   Floor *floor_config;
+   int residue_count;
+   uint16 residue_types[64]; // varies
+   Residue *residue_config;
+   int mapping_count;
+   Mapping *mapping;
+   int mode_count;
+   Mode mode_config[64];  // varies
+
+   uint32 total_samples;
+
+  // decode buffer
+   float *channel_buffers[STB_VORBIS_MAX_CHANNELS];
+   float *outputs        [STB_VORBIS_MAX_CHANNELS];
+
+   float *previous_window[STB_VORBIS_MAX_CHANNELS];
+   int previous_length;
+
+   #ifndef STB_VORBIS_NO_DEFER_FLOOR
+   int16 *finalY[STB_VORBIS_MAX_CHANNELS];
+   #else
+   float *floor_buffers[STB_VORBIS_MAX_CHANNELS];
+   #endif
+
+   uint32 current_loc; // sample location of next frame to decode
+   int    current_loc_valid;
+
+  // per-blocksize precomputed data
+
+   // twiddle factors
+   float *A[2],*B[2],*C[2];
+   float *window[2];
+   uint16 *bit_reverse[2];
+
+  // current page/packet/segment streaming info
+   uint32 serial; // stream serial number for verification
+   int last_page;
+   int segment_count;
+   uint8 segments[255];
+   uint8 page_flag;
+   uint8 bytes_in_seg;
+   uint8 first_decode;
+   int next_seg;
+   int last_seg;  // flag that we're on the last segment
+   int last_seg_which; // what was the segment number of the last seg?
+   uint32 acc;
+   int valid_bits;
+   int packet_bytes;
+   int end_seg_with_known_loc;
+   uint32 known_loc_for_packet;
+   int discard_samples_deferred;
+   uint32 samples_output;
+
+  // push mode scanning
+   int page_crc_tests; // only in push_mode: number of tests active; -1 if not searching
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+   CRCscan scan[STB_VORBIS_PUSHDATA_CRC_COUNT];
+#endif
+
+  // sample-access
+   int channel_buffer_start;
+   int channel_buffer_end;
+};
+
+#if defined(STB_VORBIS_NO_PUSHDATA_API)
+   #define IS_PUSH_MODE(f)   FALSE
+#elif defined(STB_VORBIS_NO_PULLDATA_API)
+   #define IS_PUSH_MODE(f)   TRUE
+#else
+   #define IS_PUSH_MODE(f)   ((f)->push_mode)
+#endif
+
+typedef struct stb_vorbis vorb;
+
+static int error(vorb *f, enum STBVorbisError e)
+{
+   f->error = e;
+   if (!f->eof && e != VORBIS_need_more_data) {
+      f->error=e; // breakpoint for debugging
+   }
+   return 0;
+}
+
+
+// these functions are used for allocating temporary memory
+// while decoding. if you can afford the stack space, use
+// alloca(); otherwise, provide a temp buffer and it will
+// allocate out of those.
+
+#define array_size_required(count,size)  (count*(sizeof(void *)+(size)))
+
+#define temp_alloc(f,size)              (f->alloc.alloc_buffer ? setup_temp_malloc(f,size) : alloca(size))
+#define temp_free(f,p)                  (void)0
+#define temp_alloc_save(f)              ((f)->temp_offset)
+#define temp_alloc_restore(f,p)         ((f)->temp_offset = (p))
+
+#define temp_block_array(f,count,size)  make_block_array(temp_alloc(f,array_size_required(count,size)), count, size)
+
+// given a sufficiently large block of memory, make an array of pointers to subblocks of it
+static void *make_block_array(void *mem, int count, int size)
+{
+   int i;
+   void ** p = (void **) mem;
+   char *q = (char *) (p + count);
+   for (i=0; i < count; ++i) {
+      p[i] = q;
+      q += size;
+   }
+   return p;
+}
+
+static void *setup_malloc(vorb *f, int sz)
+{
+   sz = (sz+7) & ~7; // round up to nearest 8 for alignment of future allocs.
+   f->setup_memory_required += sz;
+   if (f->alloc.alloc_buffer) {
+      void *p = (char *) f->alloc.alloc_buffer + f->setup_offset;
+      if (f->setup_offset + sz > f->temp_offset) return NULL;
+      f->setup_offset += sz;
+      return p;
+   }
+   return sz ? malloc(sz) : NULL;
+}
+
+static void setup_free(vorb *f, void *p)
+{
+   if (f->alloc.alloc_buffer) return; // do nothing; setup mem is a stack
+   free(p);
+}
+
+static void *setup_temp_malloc(vorb *f, int sz)
+{
+   sz = (sz+7) & ~7; // round up to nearest 8 for alignment of future allocs.
+   if (f->alloc.alloc_buffer) {
+      if (f->temp_offset - sz < f->setup_offset) return NULL;
+      f->temp_offset -= sz;
+      return (char *) f->alloc.alloc_buffer + f->temp_offset;
+   }
+   return malloc(sz);
+}
+
+static void setup_temp_free(vorb *f, void *p, int sz)
+{
+   if (f->alloc.alloc_buffer) {
+      f->temp_offset += (sz+7)&~7;
+      return;
+   }
+   free(p);
+}
+
+#define CRC32_POLY    0x04c11db7   // from spec
+
+static uint32 crc_table[256];
+static void crc32_init(void)
+{
+   int i,j;
+   uint32 s;
+   for(i=0; i < 256; i++) {
+      for (s=(uint32) i << 24, j=0; j < 8; ++j)
+         s = (s << 1) ^ (s >= (1U<<31) ? CRC32_POLY : 0);
+      crc_table[i] = s;
+   }
+}
+
+static __forceinline uint32 crc32_update(uint32 crc, uint8 byte)
+{
+   return (crc << 8) ^ crc_table[byte ^ (crc >> 24)];
+}
+
+
+// used in setup, and for huffman that doesn't go fast path
+static unsigned int bit_reverse(unsigned int n)
+{
+  n = ((n & 0xAAAAAAAA) >>  1) | ((n & 0x55555555) << 1);
+  n = ((n & 0xCCCCCCCC) >>  2) | ((n & 0x33333333) << 2);
+  n = ((n & 0xF0F0F0F0) >>  4) | ((n & 0x0F0F0F0F) << 4);
+  n = ((n & 0xFF00FF00) >>  8) | ((n & 0x00FF00FF) << 8);
+  return (n >> 16) | (n << 16);
+}
+
+static float square(float x)
+{
+   return x*x;
+}
+
+// this is a weird definition of log2() for which log2(1) = 1, log2(2) = 2, log2(4) = 3
+// as required by the specification. fast(?) implementation from stb.h
+// @OPTIMIZE: called multiple times per-packet with "constants"; move to setup
+static int ilog(int32 n)
+{
+   static signed char log2_4[16] = { 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4 };
+
+   if (n < 0) return 0; // signed n returns 0
+
+   // 2 compares if n < 16, 3 compares otherwise (4 if signed or n > 1<<29)
+   if (n < (1 << 14))
+        if (n < (1 <<  4))            return  0 + log2_4[n      ];
+        else if (n < (1 <<  9))       return  5 + log2_4[n >>  5];
+             else                     return 10 + log2_4[n >> 10];
+   else if (n < (1 << 24))
+             if (n < (1 << 19))       return 15 + log2_4[n >> 15];
+             else                     return 20 + log2_4[n >> 20];
+        else if (n < (1 << 29))       return 25 + log2_4[n >> 25];
+             else                     return 30 + log2_4[n >> 30];
+}
+
+#ifndef M_PI
+  #define M_PI  3.14159265358979323846264f  // from CRC
+#endif
+
+// code length assigned to a value with no huffman encoding
+#define NO_CODE   255
+
+/////////////////////// LEAF SETUP FUNCTIONS //////////////////////////
+//
+// these functions are only called at setup, and only a few times
+// per file
+
+static float float32_unpack(uint32 x)
+{
+   // from the specification
+   uint32 mantissa = x & 0x1fffff;
+   uint32 sign = x & 0x80000000;
+   uint32 exp = (x & 0x7fe00000) >> 21;
+   double res = sign ? -(double)mantissa : (double)mantissa;
+   return (float) ldexp((float)res, (int)exp-788);
+}
+
+
+// zlib & jpeg huffman tables assume that the output symbols
+// can either be arbitrarily arranged, or have monotonically
+// increasing frequencies--they rely on the lengths being sorted;
+// this makes for a very simple generation algorithm.
+// vorbis allows a huffman table with non-sorted lengths. This
+// requires a more sophisticated construction, since symbols in
+// order do not map to huffman codes "in order".
+static void add_entry(Codebook *c, uint32 huff_code, int symbol, int count, int len, uint32 *values)
+{
+   if (!c->sparse) {
+      c->codewords      [symbol] = huff_code;
+   } else {
+      c->codewords       [count] = huff_code;
+      c->codeword_lengths[count] = len;
+      values             [count] = symbol;
+   }
+}
+
+static int compute_codewords(Codebook *c, uint8 *len, int n, uint32 *values)
+{
+   int i,k,m=0;
+   uint32 available[32];
+
+   memset(available, 0, sizeof(available));
+   // find the first entry
+   for (k=0; k < n; ++k) if (len[k] < NO_CODE) break;
+   if (k == n) { assert(c->sorted_entries == 0); return TRUE; }
+   assert(len[k] < 32); // no error return required, code reading lens checks this
+   // add to the list
+   add_entry(c, 0, k, m++, len[k], values);
+   // add all available leaves
+   for (i=1; i <= len[k]; ++i)
+      available[i] = 1U << (32-i);
+   // note that the above code treats the first case specially,
+   // but it's really the same as the following code, so they
+   // could probably be combined (except the initial code is 0,
+   // and I use 0 in available[] to mean 'empty')
+   for (i=k+1; i < n; ++i) {
+      uint32 res;
+      int z = len[i], y;
+      if (z == NO_CODE) continue;
+      assert(z < 32); // no error return required, code reading lens checks this
+      // find lowest available leaf (should always be earliest,
+      // which is what the specification calls for)
+      // note that this property, and the fact we can never have
+      // more than one free leaf at a given level, isn't totally
+      // trivial to prove, but it seems true and the assert never
+      // fires, so!
+      while (z > 0 && !available[z]) --z;
+      if (z == 0) { return FALSE; }
+      res = available[z];
+      available[z] = 0;
+      add_entry(c, bit_reverse(res), i, m++, len[i], values);
+      // propagate availability up the tree
+      if (z != len[i]) {
+         for (y=len[i]; y > z; --y) {
+            assert(available[y] == 0);
+            available[y] = res + (1 << (32-y));
+         }
+      }
+   }
+   return TRUE;
+}
+
+// accelerated huffman table allows fast O(1) match of all symbols
+// of length <= STB_VORBIS_FAST_HUFFMAN_LENGTH
+static void compute_accelerated_huffman(Codebook *c)
+{
+   int i, len;
+   for (i=0; i < FAST_HUFFMAN_TABLE_SIZE; ++i)
+      c->fast_huffman[i] = -1;
+
+   len = c->sparse ? c->sorted_entries : c->entries;
+   #ifdef STB_VORBIS_FAST_HUFFMAN_SHORT
+   if (len > 32767) len = 32767; // largest possible value we can encode!
+   #endif
+   for (i=0; i < len; ++i) {
+      if (c->codeword_lengths[i] <= STB_VORBIS_FAST_HUFFMAN_LENGTH) {
+         uint32 z = c->sparse ? bit_reverse(c->sorted_codewords[i]) : c->codewords[i];
+         // set table entries for all bit combinations in the higher bits
+         while (z < FAST_HUFFMAN_TABLE_SIZE) {
+             c->fast_huffman[z] = i;
+             z += 1 << c->codeword_lengths[i];
+         }
+      }
+   }
+}
+
+#ifdef _MSC_VER
+#define STBV_CDECL __cdecl
+#else
+#define STBV_CDECL
+#endif
+
+static int STBV_CDECL uint32_compare(const void *p, const void *q)
+{
+   uint32 x = * (uint32 *) p;
+   uint32 y = * (uint32 *) q;
+   return x < y ? -1 : x > y;
+}
+
+static int include_in_sort(Codebook *c, uint8 len)
+{
+   if (c->sparse) { assert(len != NO_CODE); return TRUE; }
+   if (len == NO_CODE) return FALSE;
+   if (len > STB_VORBIS_FAST_HUFFMAN_LENGTH) return TRUE;
+   return FALSE;
+}
+
+// if the fast table above doesn't work, we want to binary
+// search them... need to reverse the bits
+static void compute_sorted_huffman(Codebook *c, uint8 *lengths, uint32 *values)
+{
+   int i, len;
+   // build a list of all the entries
+   // OPTIMIZATION: don't include the short ones, since they'll be caught by FAST_HUFFMAN.
+   // this is kind of a frivolous optimization--I don't see any performance improvement,
+   // but it's like 4 extra lines of code, so.
+   if (!c->sparse) {
+      int k = 0;
+      for (i=0; i < c->entries; ++i)
+         if (include_in_sort(c, lengths[i]))
+            c->sorted_codewords[k++] = bit_reverse(c->codewords[i]);
+      assert(k == c->sorted_entries);
+   } else {
+      for (i=0; i < c->sorted_entries; ++i)
+         c->sorted_codewords[i] = bit_reverse(c->codewords[i]);
+   }
+
+   qsort(c->sorted_codewords, c->sorted_entries, sizeof(c->sorted_codewords[0]), uint32_compare);
+   c->sorted_codewords[c->sorted_entries] = 0xffffffff;
+
+   len = c->sparse ? c->sorted_entries : c->entries;
+   // now we need to indicate how they correspond; we could either
+   //   #1: sort a different data structure that says who they correspond to
+   //   #2: for each sorted entry, search the original list to find who corresponds
+   //   #3: for each original entry, find the sorted entry
+   // #1 requires extra storage, #2 is slow, #3 can use binary search!
+   for (i=0; i < len; ++i) {
+      int huff_len = c->sparse ? lengths[values[i]] : lengths[i];
+      if (include_in_sort(c,huff_len)) {
+         uint32 code = bit_reverse(c->codewords[i]);
+         int x=0, n=c->sorted_entries;
+         while (n > 1) {
+            // invariant: sc[x] <= code < sc[x+n]
+            int m = x + (n >> 1);
+            if (c->sorted_codewords[m] <= code) {
+               x = m;
+               n -= (n>>1);
+            } else {
+               n >>= 1;
+            }
+         }
+         assert(c->sorted_codewords[x] == code);
+         if (c->sparse) {
+            c->sorted_values[x] = values[i];
+            c->codeword_lengths[x] = huff_len;
+         } else {
+            c->sorted_values[x] = i;
+         }
+      }
+   }
+}
+
+// only run while parsing the header (3 times)
+static int vorbis_validate(uint8 *data)
+{
+   static uint8 vorbis[6] = { 'v', 'o', 'r', 'b', 'i', 's' };
+   return memcmp(data, vorbis, 6) == 0;
+}
+
+// called from setup only, once per code book
+// (formula implied by specification)
+static int lookup1_values(int entries, int dim)
+{
+   int r = (int) floor(exp((float) log((float) entries) / dim));
+   if ((int) floor(pow((float) r+1, dim)) <= entries)   // (int) cast for MinGW warning;
+      ++r;                                              // floor() to avoid _ftol() when non-CRT
+   if (pow((float) r+1, dim) <= entries)
+      return -1;
+   if ((int) floor(pow((float) r, dim)) > entries)
+      return -1;
+   return r;
+}
+
+// called twice per file
+static void compute_twiddle_factors(int n, float *A, float *B, float *C)
+{
+   int n4 = n >> 2, n8 = n >> 3;
+   int k,k2;
+
+   for (k=k2=0; k < n4; ++k,k2+=2) {
+      A[k2  ] = (float)  cos(4*k*M_PI/n);
+      A[k2+1] = (float) -sin(4*k*M_PI/n);
+      B[k2  ] = (float)  cos((k2+1)*M_PI/n/2) * 0.5f;
+      B[k2+1] = (float)  sin((k2+1)*M_PI/n/2) * 0.5f;
+   }
+   for (k=k2=0; k < n8; ++k,k2+=2) {
+      C[k2  ] = (float)  cos(2*(k2+1)*M_PI/n);
+      C[k2+1] = (float) -sin(2*(k2+1)*M_PI/n);
+   }
+}
+
+static void compute_window(int n, float *window)
+{
+   int n2 = n >> 1, i;
+   for (i=0; i < n2; ++i)
+      window[i] = (float) sin(0.5 * M_PI * square((float) sin((i - 0 + 0.5) / n2 * 0.5 * M_PI)));
+}
+
+static void compute_bitreverse(int n, uint16 *rev)
+{
+   int ld = ilog(n) - 1; // ilog is off-by-one from normal definitions
+   int i, n8 = n >> 3;
+   for (i=0; i < n8; ++i)
+      rev[i] = (bit_reverse(i) >> (32-ld+3)) << 2;
+}
+
+static int init_blocksize(vorb *f, int b, int n)
+{
+   int n2 = n >> 1, n4 = n >> 2, n8 = n >> 3;
+   f->A[b] = (float *) setup_malloc(f, sizeof(float) * n2);
+   f->B[b] = (float *) setup_malloc(f, sizeof(float) * n2);
+   f->C[b] = (float *) setup_malloc(f, sizeof(float) * n4);
+   if (!f->A[b] || !f->B[b] || !f->C[b]) return error(f, VORBIS_outofmem);
+   compute_twiddle_factors(n, f->A[b], f->B[b], f->C[b]);
+   f->window[b] = (float *) setup_malloc(f, sizeof(float) * n2);
+   if (!f->window[b]) return error(f, VORBIS_outofmem);
+   compute_window(n, f->window[b]);
+   f->bit_reverse[b] = (uint16 *) setup_malloc(f, sizeof(uint16) * n8);
+   if (!f->bit_reverse[b]) return error(f, VORBIS_outofmem);
+   compute_bitreverse(n, f->bit_reverse[b]);
+   return TRUE;
+}
+
+static void neighbors(uint16 *x, int n, int *plow, int *phigh)
+{
+   int low = -1;
+   int high = 65536;
+   int i;
+   for (i=0; i < n; ++i) {
+      if (x[i] > low  && x[i] < x[n]) { *plow  = i; low = x[i]; }
+      if (x[i] < high && x[i] > x[n]) { *phigh = i; high = x[i]; }
+   }
+}
+
+// this has been repurposed so y is now the original index instead of y
+typedef struct
+{
+   uint16 x,id;
+} stbv__floor_ordering;
+
+static int STBV_CDECL point_compare(const void *p, const void *q)
+{
+   stbv__floor_ordering *a = (stbv__floor_ordering *) p;
+   stbv__floor_ordering *b = (stbv__floor_ordering *) q;
+   return a->x < b->x ? -1 : a->x > b->x;
+}
+
+//
+/////////////////////// END LEAF SETUP FUNCTIONS //////////////////////////
+
+
+#if defined(STB_VORBIS_NO_STDIO)
+   #define USE_MEMORY(z)    TRUE
+#else
+   #define USE_MEMORY(z)    ((z)->stream)
+#endif
+
+static uint8 get8(vorb *z)
+{
+   if (USE_MEMORY(z)) {
+      if (z->stream >= z->stream_end) { z->eof = TRUE; return 0; }
+      return *z->stream++;
+   }
+
+   #ifndef STB_VORBIS_NO_STDIO
+   {
+   int c = fgetc(z->f);
+   if (c == EOF) { z->eof = TRUE; return 0; }
+   return c;
+   }
+   #endif
+}
+
+static uint32 get32(vorb *f)
+{
+   uint32 x;
+   x = get8(f);
+   x += get8(f) << 8;
+   x += get8(f) << 16;
+   x += (uint32) get8(f) << 24;
+   return x;
+}
+
+static int getn(vorb *z, uint8 *data, int n)
+{
+   if (USE_MEMORY(z)) {
+      if (z->stream+n > z->stream_end) { z->eof = 1; return 0; }
+      memcpy(data, z->stream, n);
+      z->stream += n;
+      return 1;
+   }
+
+   #ifndef STB_VORBIS_NO_STDIO
+   if (fread(data, n, 1, z->f) == 1)
+      return 1;
+   else {
+      z->eof = 1;
+      return 0;
+   }
+   #endif
+}
+
+static void skip(vorb *z, int n)
+{
+   if (USE_MEMORY(z)) {
+      z->stream += n;
+      if (z->stream >= z->stream_end) z->eof = 1;
+      return;
+   }
+   #ifndef STB_VORBIS_NO_STDIO
+   {
+      long x = ftell(z->f);
+      fseek(z->f, x+n, SEEK_SET);
+   }
+   #endif
+}
+
+static int set_file_offset(stb_vorbis *f, unsigned int loc)
+{
+   #ifndef STB_VORBIS_NO_PUSHDATA_API
+   if (f->push_mode) return 0;
+   #endif
+   f->eof = 0;
+   if (USE_MEMORY(f)) {
+      if (f->stream_start + loc >= f->stream_end || f->stream_start + loc < f->stream_start) {
+         f->stream = f->stream_end;
+         f->eof = 1;
+         return 0;
+      } else {
+         f->stream = f->stream_start + loc;
+         return 1;
+      }
+   }
+   #ifndef STB_VORBIS_NO_STDIO
+   if (loc + f->f_start < loc || loc >= 0x80000000) {
+      loc = 0x7fffffff;
+      f->eof = 1;
+   } else {
+      loc += f->f_start;
+   }
+   if (!fseek(f->f, loc, SEEK_SET))
+      return 1;
+   f->eof = 1;
+   fseek(f->f, f->f_start, SEEK_END);
+   return 0;
+   #endif
+}
+
+
+static uint8 ogg_page_header[4] = { 0x4f, 0x67, 0x67, 0x53 };
+
+static int capture_pattern(vorb *f)
+{
+   if (0x4f != get8(f)) return FALSE;
+   if (0x67 != get8(f)) return FALSE;
+   if (0x67 != get8(f)) return FALSE;
+   if (0x53 != get8(f)) return FALSE;
+   return TRUE;
+}
+
+#define PAGEFLAG_continued_packet   1
+#define PAGEFLAG_first_page         2
+#define PAGEFLAG_last_page          4
+
+static int start_page_no_capturepattern(vorb *f)
+{
+   uint32 loc0,loc1,n;
+   if (f->first_decode && !IS_PUSH_MODE(f)) {
+      f->p_first.page_start = stb_vorbis_get_file_offset(f) - 4;
+   }
+   // stream structure version
+   if (0 != get8(f)) return error(f, VORBIS_invalid_stream_structure_version);
+   // header flag
+   f->page_flag = get8(f);
+   // absolute granule position
+   loc0 = get32(f);
+   loc1 = get32(f);
+   // @TODO: validate loc0,loc1 as valid positions?
+   // stream serial number -- vorbis doesn't interleave, so discard
+   get32(f);
+   //if (f->serial != get32(f)) return error(f, VORBIS_incorrect_stream_serial_number);
+   // page sequence number
+   n = get32(f);
+   f->last_page = n;
+   // CRC32
+   get32(f);
+   // page_segments
+   f->segment_count = get8(f);
+   if (!getn(f, f->segments, f->segment_count))
+      return error(f, VORBIS_unexpected_eof);
+   // assume we _don't_ know any the sample position of any segments
+   f->end_seg_with_known_loc = -2;
+   if (loc0 != ~0U || loc1 != ~0U) {
+      int i;
+      // determine which packet is the last one that will complete
+      for (i=f->segment_count-1; i >= 0; --i)
+         if (f->segments[i] < 255)
+            break;
+      // 'i' is now the index of the _last_ segment of a packet that ends
+      if (i >= 0) {
+         f->end_seg_with_known_loc = i;
+         f->known_loc_for_packet   = loc0;
+      }
+   }
+   if (f->first_decode) {
+      int i,len;
+      len = 0;
+      for (i=0; i < f->segment_count; ++i)
+         len += f->segments[i];
+      len += 27 + f->segment_count;
+      f->p_first.page_end = f->p_first.page_start + len;
+      f->p_first.last_decoded_sample = loc0;
+   }
+   f->next_seg = 0;
+   return TRUE;
+}
+
+static int start_page(vorb *f)
+{
+   if (!capture_pattern(f)) return error(f, VORBIS_missing_capture_pattern);
+   return start_page_no_capturepattern(f);
+}
+
+static int start_packet(vorb *f)
+{
+   while (f->next_seg == -1) {
+      if (!start_page(f)) return FALSE;
+      if (f->page_flag & PAGEFLAG_continued_packet)
+         return error(f, VORBIS_continued_packet_flag_invalid);
+   }
+   f->last_seg = FALSE;
+   f->valid_bits = 0;
+   f->packet_bytes = 0;
+   f->bytes_in_seg = 0;
+   // f->next_seg is now valid
+   return TRUE;
+}
+
+static int maybe_start_packet(vorb *f)
+{
+   if (f->next_seg == -1) {
+      int x = get8(f);
+      if (f->eof) return FALSE; // EOF at page boundary is not an error!
+      if (0x4f != x      ) return error(f, VORBIS_missing_capture_pattern);
+      if (0x67 != get8(f)) return error(f, VORBIS_missing_capture_pattern);
+      if (0x67 != get8(f)) return error(f, VORBIS_missing_capture_pattern);
+      if (0x53 != get8(f)) return error(f, VORBIS_missing_capture_pattern);
+      if (!start_page_no_capturepattern(f)) return FALSE;
+      if (f->page_flag & PAGEFLAG_continued_packet) {
+         // set up enough state that we can read this packet if we want,
+         // e.g. during recovery
+         f->last_seg = FALSE;
+         f->bytes_in_seg = 0;
+         return error(f, VORBIS_continued_packet_flag_invalid);
+      }
+   }
+   return start_packet(f);
+}
+
+static int next_segment(vorb *f)
+{
+   int len;
+   if (f->last_seg) return 0;
+   if (f->next_seg == -1) {
+      f->last_seg_which = f->segment_count-1; // in case start_page fails
+      if (!start_page(f)) { f->last_seg = 1; return 0; }
+      if (!(f->page_flag & PAGEFLAG_continued_packet)) return error(f, VORBIS_continued_packet_flag_invalid);
+   }
+   len = f->segments[f->next_seg++];
+   if (len < 255) {
+      f->last_seg = TRUE;
+      f->last_seg_which = f->next_seg-1;
+   }
+   if (f->next_seg >= f->segment_count)
+      f->next_seg = -1;
+   assert(f->bytes_in_seg == 0);
+   f->bytes_in_seg = len;
+   return len;
+}
+
+#define EOP    (-1)
+#define INVALID_BITS  (-1)
+
+static int get8_packet_raw(vorb *f)
+{
+   if (!f->bytes_in_seg) {  // CLANG!
+      if (f->last_seg) return EOP;
+      else if (!next_segment(f)) return EOP;
+   }
+   assert(f->bytes_in_seg > 0);
+   --f->bytes_in_seg;
+   ++f->packet_bytes;
+   return get8(f);
+}
+
+static int get8_packet(vorb *f)
+{
+   int x = get8_packet_raw(f);
+   f->valid_bits = 0;
+   return x;
+}
+
+static int get32_packet(vorb *f)
+{
+   uint32 x;
+   x = get8_packet(f);
+   x += get8_packet(f) << 8;
+   x += get8_packet(f) << 16;
+   x += (uint32) get8_packet(f) << 24;
+   return x;
+}
+
+static void flush_packet(vorb *f)
+{
+   while (get8_packet_raw(f) != EOP);
+}
+
+// @OPTIMIZE: this is the secondary bit decoder, so it's probably not as important
+// as the huffman decoder?
+static uint32 get_bits(vorb *f, int n)
+{
+   uint32 z;
+
+   if (f->valid_bits < 0) return 0;
+   if (f->valid_bits < n) {
+      if (n > 24) {
+         // the accumulator technique below would not work correctly in this case
+         z = get_bits(f, 24);
+         z += get_bits(f, n-24) << 24;
+         return z;
+      }
+      if (f->valid_bits == 0) f->acc = 0;
+      while (f->valid_bits < n) {
+         int z = get8_packet_raw(f);
+         if (z == EOP) {
+            f->valid_bits = INVALID_BITS;
+            return 0;
+         }
+         f->acc += z << f->valid_bits;
+         f->valid_bits += 8;
+      }
+   }
+
+   assert(f->valid_bits >= n);
+   z = f->acc & ((1 << n)-1);
+   f->acc >>= n;
+   f->valid_bits -= n;
+   return z;
+}
+
+// @OPTIMIZE: primary accumulator for huffman
+// expand the buffer to as many bits as possible without reading off end of packet
+// it might be nice to allow f->valid_bits and f->acc to be stored in registers,
+// e.g. cache them locally and decode locally
+static __forceinline void prep_huffman(vorb *f)
+{
+   if (f->valid_bits <= 24) {
+      if (f->valid_bits == 0) f->acc = 0;
+      do {
+         int z;
+         if (f->last_seg && !f->bytes_in_seg) return;
+         z = get8_packet_raw(f);
+         if (z == EOP) return;
+         f->acc += (unsigned) z << f->valid_bits;
+         f->valid_bits += 8;
+      } while (f->valid_bits <= 24);
+   }
+}
+
+enum
+{
+   VORBIS_packet_id = 1,
+   VORBIS_packet_comment = 3,
+   VORBIS_packet_setup = 5
+};
+
+static int codebook_decode_scalar_raw(vorb *f, Codebook *c)
+{
+   int i;
+   prep_huffman(f);
+
+   if (c->codewords == NULL && c->sorted_codewords == NULL)
+      return -1;
+
+   // cases to use binary search: sorted_codewords && !c->codewords
+   //                             sorted_codewords && c->entries > 8
+   if (c->entries > 8 ? c->sorted_codewords!=NULL : !c->codewords) {
+      // binary search
+      uint32 code = bit_reverse(f->acc);
+      int x=0, n=c->sorted_entries, len;
+
+      while (n > 1) {
+         // invariant: sc[x] <= code < sc[x+n]
+         int m = x + (n >> 1);
+         if (c->sorted_codewords[m] <= code) {
+            x = m;
+            n -= (n>>1);
+         } else {
+            n >>= 1;
+         }
+      }
+      // x is now the sorted index
+      if (!c->sparse) x = c->sorted_values[x];
+      // x is now sorted index if sparse, or symbol otherwise
+      len = c->codeword_lengths[x];
+      if (f->valid_bits >= len) {
+         f->acc >>= len;
+         f->valid_bits -= len;
+         return x;
+      }
+
+      f->valid_bits = 0;
+      return -1;
+   }
+
+   // if small, linear search
+   assert(!c->sparse);
+   for (i=0; i < c->entries; ++i) {
+      if (c->codeword_lengths[i] == NO_CODE) continue;
+      if (c->codewords[i] == (f->acc & ((1 << c->codeword_lengths[i])-1))) {
+         if (f->valid_bits >= c->codeword_lengths[i]) {
+            f->acc >>= c->codeword_lengths[i];
+            f->valid_bits -= c->codeword_lengths[i];
+            return i;
+         }
+         f->valid_bits = 0;
+         return -1;
+      }
+   }
+
+   error(f, VORBIS_invalid_stream);
+   f->valid_bits = 0;
+   return -1;
+}
+
+#ifndef STB_VORBIS_NO_INLINE_DECODE
+
+#define DECODE_RAW(var, f,c)                                  \
+   if (f->valid_bits < STB_VORBIS_FAST_HUFFMAN_LENGTH)        \
+      prep_huffman(f);                                        \
+   var = f->acc & FAST_HUFFMAN_TABLE_MASK;                    \
+   var = c->fast_huffman[var];                                \
+   if (var >= 0) {                                            \
+      int n = c->codeword_lengths[var];                       \
+      f->acc >>= n;                                           \
+      f->valid_bits -= n;                                     \
+      if (f->valid_bits < 0) { f->valid_bits = 0; var = -1; } \
+   } else {                                                   \
+      var = codebook_decode_scalar_raw(f,c);                  \
+   }
+
+#else
+
+static int codebook_decode_scalar(vorb *f, Codebook *c)
+{
+   int i;
+   if (f->valid_bits < STB_VORBIS_FAST_HUFFMAN_LENGTH)
+      prep_huffman(f);
+   // fast huffman table lookup
+   i = f->acc & FAST_HUFFMAN_TABLE_MASK;
+   i = c->fast_huffman[i];
+   if (i >= 0) {
+      f->acc >>= c->codeword_lengths[i];
+      f->valid_bits -= c->codeword_lengths[i];
+      if (f->valid_bits < 0) { f->valid_bits = 0; return -1; }
+      return i;
+   }
+   return codebook_decode_scalar_raw(f,c);
+}
+
+#define DECODE_RAW(var,f,c)    var = codebook_decode_scalar(f,c);
+
+#endif
+
+#define DECODE(var,f,c)                                       \
+   DECODE_RAW(var,f,c)                                        \
+   if (c->sparse) var = c->sorted_values[var];
+
+#ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+  #define DECODE_VQ(var,f,c)   DECODE_RAW(var,f,c)
+#else
+  #define DECODE_VQ(var,f,c)   DECODE(var,f,c)
+#endif
+
+
+
+
+
+
+// CODEBOOK_ELEMENT_FAST is an optimization for the CODEBOOK_FLOATS case
+// where we avoid one addition
+#define CODEBOOK_ELEMENT(c,off)          (c->multiplicands[off])
+#define CODEBOOK_ELEMENT_FAST(c,off)     (c->multiplicands[off])
+#define CODEBOOK_ELEMENT_BASE(c)         (0)
+
+static int codebook_decode_start(vorb *f, Codebook *c)
+{
+   int z = -1;
+
+   // type 0 is only legal in a scalar context
+   if (c->lookup_type == 0)
+      error(f, VORBIS_invalid_stream);
+   else {
+      DECODE_VQ(z,f,c);
+      if (c->sparse) assert(z < c->sorted_entries);
+      if (z < 0) {  // check for EOP
+         if (!f->bytes_in_seg)
+            if (f->last_seg)
+               return z;
+         error(f, VORBIS_invalid_stream);
+      }
+   }
+   return z;
+}
+
+static int codebook_decode(vorb *f, Codebook *c, float *output, int len)
+{
+   int i,z = codebook_decode_start(f,c);
+   if (z < 0) return FALSE;
+   if (len > c->dimensions) len = c->dimensions;
+
+#ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+   if (c->lookup_type == 1) {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      int div = 1;
+      for (i=0; i < len; ++i) {
+         int off = (z / div) % c->lookup_values;
+         float val = CODEBOOK_ELEMENT_FAST(c,off) + last;
+         output[i] += val;
+         if (c->sequence_p) last = val + c->minimum_value;
+         div *= c->lookup_values;
+      }
+      return TRUE;
+   }
+#endif
+
+   z *= c->dimensions;
+   if (c->sequence_p) {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      for (i=0; i < len; ++i) {
+         float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+         output[i] += val;
+         last = val + c->minimum_value;
+      }
+   } else {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      for (i=0; i < len; ++i) {
+         output[i] += CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+      }
+   }
+
+   return TRUE;
+}
+
+static int codebook_decode_step(vorb *f, Codebook *c, float *output, int len, int step)
+{
+   int i,z = codebook_decode_start(f,c);
+   float last = CODEBOOK_ELEMENT_BASE(c);
+   if (z < 0) return FALSE;
+   if (len > c->dimensions) len = c->dimensions;
+
+#ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+   if (c->lookup_type == 1) {
+      int div = 1;
+      for (i=0; i < len; ++i) {
+         int off = (z / div) % c->lookup_values;
+         float val = CODEBOOK_ELEMENT_FAST(c,off) + last;
+         output[i*step] += val;
+         if (c->sequence_p) last = val;
+         div *= c->lookup_values;
+      }
+      return TRUE;
+   }
+#endif
+
+   z *= c->dimensions;
+   for (i=0; i < len; ++i) {
+      float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+      output[i*step] += val;
+      if (c->sequence_p) last = val;
+   }
+
+   return TRUE;
+}
+
+static int codebook_decode_deinterleave_repeat(vorb *f, Codebook *c, float **outputs, int ch, int *c_inter_p, int *p_inter_p, int len, int total_decode)
+{
+   int c_inter = *c_inter_p;
+   int p_inter = *p_inter_p;
+   int i,z, effective = c->dimensions;
+
+   // type 0 is only legal in a scalar context
+   if (c->lookup_type == 0)   return error(f, VORBIS_invalid_stream);
+
+   while (total_decode > 0) {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      DECODE_VQ(z,f,c);
+      #ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+      assert(!c->sparse || z < c->sorted_entries);
+      #endif
+      if (z < 0) {
+         if (!f->bytes_in_seg)
+            if (f->last_seg) return FALSE;
+         return error(f, VORBIS_invalid_stream);
+      }
+
+      // if this will take us off the end of the buffers, stop short!
+      // we check by computing the length of the virtual interleaved
+      // buffer (len*ch), our current offset within it (p_inter*ch)+(c_inter),
+      // and the length we'll be using (effective)
+      if (c_inter + p_inter*ch + effective > len * ch) {
+         effective = len*ch - (p_inter*ch - c_inter);
+      }
+
+   #ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+      if (c->lookup_type == 1) {
+         int div = 1;
+         for (i=0; i < effective; ++i) {
+            int off = (z / div) % c->lookup_values;
+            float val = CODEBOOK_ELEMENT_FAST(c,off) + last;
+            if (outputs[c_inter])
+               outputs[c_inter][p_inter] += val;
+            if (++c_inter == ch) { c_inter = 0; ++p_inter; }
+            if (c->sequence_p) last = val;
+            div *= c->lookup_values;
+         }
+      } else
+   #endif
+      {
+         z *= c->dimensions;
+         if (c->sequence_p) {
+            for (i=0; i < effective; ++i) {
+               float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+               if (outputs[c_inter])
+                  outputs[c_inter][p_inter] += val;
+               if (++c_inter == ch) { c_inter = 0; ++p_inter; }
+               last = val;
+            }
+         } else {
+            for (i=0; i < effective; ++i) {
+               float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+               if (outputs[c_inter])
+                  outputs[c_inter][p_inter] += val;
+               if (++c_inter == ch) { c_inter = 0; ++p_inter; }
+            }
+         }
+      }
+
+      total_decode -= effective;
+   }
+   *c_inter_p = c_inter;
+   *p_inter_p = p_inter;
+   return TRUE;
+}
+
+static int predict_point(int x, int x0, int x1, int y0, int y1)
+{
+   int dy = y1 - y0;
+   int adx = x1 - x0;
+   // @OPTIMIZE: force int division to round in the right direction... is this necessary on x86?
+   int err = abs(dy) * (x - x0);
+   int off = err / adx;
+   return dy < 0 ? y0 - off : y0 + off;
+}
+
+// the following table is block-copied from the specification
+static float inverse_db_table[256] =
+{
+  1.0649863e-07f, 1.1341951e-07f, 1.2079015e-07f, 1.2863978e-07f,
+  1.3699951e-07f, 1.4590251e-07f, 1.5538408e-07f, 1.6548181e-07f,
+  1.7623575e-07f, 1.8768855e-07f, 1.9988561e-07f, 2.1287530e-07f,
+  2.2670913e-07f, 2.4144197e-07f, 2.5713223e-07f, 2.7384213e-07f,
+  2.9163793e-07f, 3.1059021e-07f, 3.3077411e-07f, 3.5226968e-07f,
+  3.7516214e-07f, 3.9954229e-07f, 4.2550680e-07f, 4.5315863e-07f,
+  4.8260743e-07f, 5.1396998e-07f, 5.4737065e-07f, 5.8294187e-07f,
+  6.2082472e-07f, 6.6116941e-07f, 7.0413592e-07f, 7.4989464e-07f,
+  7.9862701e-07f, 8.5052630e-07f, 9.0579828e-07f, 9.6466216e-07f,
+  1.0273513e-06f, 1.0941144e-06f, 1.1652161e-06f, 1.2409384e-06f,
+  1.3215816e-06f, 1.4074654e-06f, 1.4989305e-06f, 1.5963394e-06f,
+  1.7000785e-06f, 1.8105592e-06f, 1.9282195e-06f, 2.0535261e-06f,
+  2.1869758e-06f, 2.3290978e-06f, 2.4804557e-06f, 2.6416497e-06f,
+  2.8133190e-06f, 2.9961443e-06f, 3.1908506e-06f, 3.3982101e-06f,
+  3.6190449e-06f, 3.8542308e-06f, 4.1047004e-06f, 4.3714470e-06f,
+  4.6555282e-06f, 4.9580707e-06f, 5.2802740e-06f, 5.6234160e-06f,
+  5.9888572e-06f, 6.3780469e-06f, 6.7925283e-06f, 7.2339451e-06f,
+  7.7040476e-06f, 8.2047000e-06f, 8.7378876e-06f, 9.3057248e-06f,
+  9.9104632e-06f, 1.0554501e-05f, 1.1240392e-05f, 1.1970856e-05f,
+  1.2748789e-05f, 1.3577278e-05f, 1.4459606e-05f, 1.5399272e-05f,
+  1.6400004e-05f, 1.7465768e-05f, 1.8600792e-05f, 1.9809576e-05f,
+  2.1096914e-05f, 2.2467911e-05f, 2.3928002e-05f, 2.5482978e-05f,
+  2.7139006e-05f, 2.8902651e-05f, 3.0780908e-05f, 3.2781225e-05f,
+  3.4911534e-05f, 3.7180282e-05f, 3.9596466e-05f, 4.2169667e-05f,
+  4.4910090e-05f, 4.7828601e-05f, 5.0936773e-05f, 5.4246931e-05f,
+  5.7772202e-05f, 6.1526565e-05f, 6.5524908e-05f, 6.9783085e-05f,
+  7.4317983e-05f, 7.9147585e-05f, 8.4291040e-05f, 8.9768747e-05f,
+  9.5602426e-05f, 0.00010181521f, 0.00010843174f, 0.00011547824f,
+  0.00012298267f, 0.00013097477f, 0.00013948625f, 0.00014855085f,
+  0.00015820453f, 0.00016848555f, 0.00017943469f, 0.00019109536f,
+  0.00020351382f, 0.00021673929f, 0.00023082423f, 0.00024582449f,
+  0.00026179955f, 0.00027881276f, 0.00029693158f, 0.00031622787f,
+  0.00033677814f, 0.00035866388f, 0.00038197188f, 0.00040679456f,
+  0.00043323036f, 0.00046138411f, 0.00049136745f, 0.00052329927f,
+  0.00055730621f, 0.00059352311f, 0.00063209358f, 0.00067317058f,
+  0.00071691700f, 0.00076350630f, 0.00081312324f, 0.00086596457f,
+  0.00092223983f, 0.00098217216f, 0.0010459992f,  0.0011139742f,
+  0.0011863665f,  0.0012634633f,  0.0013455702f,  0.0014330129f,
+  0.0015261382f,  0.0016253153f,  0.0017309374f,  0.0018434235f,
+  0.0019632195f,  0.0020908006f,  0.0022266726f,  0.0023713743f,
+  0.0025254795f,  0.0026895994f,  0.0028643847f,  0.0030505286f,
+  0.0032487691f,  0.0034598925f,  0.0036847358f,  0.0039241906f,
+  0.0041792066f,  0.0044507950f,  0.0047400328f,  0.0050480668f,
+  0.0053761186f,  0.0057254891f,  0.0060975636f,  0.0064938176f,
+  0.0069158225f,  0.0073652516f,  0.0078438871f,  0.0083536271f,
+  0.0088964928f,  0.009474637f,   0.010090352f,   0.010746080f,
+  0.011444421f,   0.012188144f,   0.012980198f,   0.013823725f,
+  0.014722068f,   0.015678791f,   0.016697687f,   0.017782797f,
+  0.018938423f,   0.020169149f,   0.021479854f,   0.022875735f,
+  0.024362330f,   0.025945531f,   0.027631618f,   0.029427276f,
+  0.031339626f,   0.033376252f,   0.035545228f,   0.037855157f,
+  0.040315199f,   0.042935108f,   0.045725273f,   0.048696758f,
+  0.051861348f,   0.055231591f,   0.058820850f,   0.062643361f,
+  0.066714279f,   0.071049749f,   0.075666962f,   0.080584227f,
+  0.085821044f,   0.091398179f,   0.097337747f,   0.10366330f,
+  0.11039993f,    0.11757434f,    0.12521498f,    0.13335215f,
+  0.14201813f,    0.15124727f,    0.16107617f,    0.17154380f,
+  0.18269168f,    0.19456402f,    0.20720788f,    0.22067342f,
+  0.23501402f,    0.25028656f,    0.26655159f,    0.28387361f,
+  0.30232132f,    0.32196786f,    0.34289114f,    0.36517414f,
+  0.38890521f,    0.41417847f,    0.44109412f,    0.46975890f,
+  0.50028648f,    0.53279791f,    0.56742212f,    0.60429640f,
+  0.64356699f,    0.68538959f,    0.72993007f,    0.77736504f,
+  0.82788260f,    0.88168307f,    0.9389798f,     1.0f
+};
+
+
+// @OPTIMIZE: if you want to replace this bresenham line-drawing routine,
+// note that you must produce bit-identical output to decode correctly;
+// this specific sequence of operations is specified in the spec (it's
+// drawing integer-quantized frequency-space lines that the encoder
+// expects to be exactly the same)
+//     ... also, isn't the whole point of Bresenham's algorithm to NOT
+// have to divide in the setup? sigh.
+#ifndef STB_VORBIS_NO_DEFER_FLOOR
+#define LINE_OP(a,b)   a *= b
+#else
+#define LINE_OP(a,b)   a = b
+#endif
+
+#ifdef STB_VORBIS_DIVIDE_TABLE
+#define DIVTAB_NUMER   32
+#define DIVTAB_DENOM   64
+int8 integer_divide_table[DIVTAB_NUMER][DIVTAB_DENOM]; // 2KB
+#endif
+
+static __forceinline void draw_line(float *output, int x0, int y0, int x1, int y1, int n)
+{
+   int dy = y1 - y0;
+   int adx = x1 - x0;
+   int ady = abs(dy);
+   int base;
+   int x=x0,y=y0;
+   int err = 0;
+   int sy;
+
+#ifdef STB_VORBIS_DIVIDE_TABLE
+   if (adx < DIVTAB_DENOM && ady < DIVTAB_NUMER) {
+      if (dy < 0) {
+         base = -integer_divide_table[ady][adx];
+         sy = base-1;
+      } else {
+         base =  integer_divide_table[ady][adx];
+         sy = base+1;
+      }
+   } else {
+      base = dy / adx;
+      if (dy < 0)
+         sy = base - 1;
+      else
+         sy = base+1;
+   }
+#else
+   base = dy / adx;
+   if (dy < 0)
+      sy = base - 1;
+   else
+      sy = base+1;
+#endif
+   ady -= abs(base) * adx;
+   if (x1 > n) x1 = n;
+   if (x < x1) {
+      LINE_OP(output[x], inverse_db_table[y&255]);
+      for (++x; x < x1; ++x) {
+         err += ady;
+         if (err >= adx) {
+            err -= adx;
+            y += sy;
+         } else
+            y += base;
+         LINE_OP(output[x], inverse_db_table[y&255]);
+      }
+   }
+}
+
+static int residue_decode(vorb *f, Codebook *book, float *target, int offset, int n, int rtype)
+{
+   int k;
+   if (rtype == 0) {
+      int step = n / book->dimensions;
+      for (k=0; k < step; ++k)
+         if (!codebook_decode_step(f, book, target+offset+k, n-offset-k, step))
+            return FALSE;
+   } else {
+      for (k=0; k < n; ) {
+         if (!codebook_decode(f, book, target+offset, n-k))
+            return FALSE;
+         k += book->dimensions;
+         offset += book->dimensions;
+      }
+   }
+   return TRUE;
+}
+
+// n is 1/2 of the blocksize --
+// specification: "Correct per-vector decode length is [n]/2"
+static void decode_residue(vorb *f, float *residue_buffers[], int ch, int n, int rn, uint8 *do_not_decode)
+{
+   int i,j,pass;
+   Residue *r = f->residue_config + rn;
+   int rtype = f->residue_types[rn];
+   int c = r->classbook;
+   int classwords = f->codebooks[c].dimensions;
+   unsigned int actual_size = rtype == 2 ? n*2 : n;
+   unsigned int limit_r_begin = (r->begin < actual_size ? r->begin : actual_size);
+   unsigned int limit_r_end   = (r->end   < actual_size ? r->end   : actual_size);
+   int n_read = limit_r_end - limit_r_begin;
+   int part_read = n_read / r->part_size;
+   int temp_alloc_point = temp_alloc_save(f);
+   #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+   uint8 ***part_classdata = (uint8 ***) temp_block_array(f,f->channels, part_read * sizeof(**part_classdata));
+   #else
+   int **classifications = (int **) temp_block_array(f,f->channels, part_read * sizeof(**classifications));
+   #endif
+
+   CHECK(f);
+
+   for (i=0; i < ch; ++i)
+      if (!do_not_decode[i])
+         memset(residue_buffers[i], 0, sizeof(float) * n);
+
+   if (rtype == 2 && ch != 1) {
+      for (j=0; j < ch; ++j)
+         if (!do_not_decode[j])
+            break;
+      if (j == ch)
+         goto done;
+
+      for (pass=0; pass < 8; ++pass) {
+         int pcount = 0, class_set = 0;
+         if (ch == 2) {
+            while (pcount < part_read) {
+               int z = r->begin + pcount*r->part_size;
+               int c_inter = (z & 1), p_inter = z>>1;
+               if (pass == 0) {
+                  Codebook *c = f->codebooks+r->classbook;
+                  int q;
+                  DECODE(q,f,c);
+                  if (q == EOP) goto done;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  part_classdata[0][class_set] = r->classdata[q];
+                  #else
+                  for (i=classwords-1; i >= 0; --i) {
+                     classifications[0][i+pcount] = q % r->classifications;
+                     q /= r->classifications;
+                  }
+                  #endif
+               }
+               for (i=0; i < classwords && pcount < part_read; ++i, ++pcount) {
+                  int z = r->begin + pcount*r->part_size;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  int c = part_classdata[0][class_set][i];
+                  #else
+                  int c = classifications[0][pcount];
+                  #endif
+                  int b = r->residue_books[c][pass];
+                  if (b >= 0) {
+                     Codebook *book = f->codebooks + b;
+                     #ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+                     if (!codebook_decode_deinterleave_repeat(f, book, residue_buffers, ch, &c_inter, &p_inter, n, r->part_size))
+                        goto done;
+                     #else
+                     // saves 1%
+                     if (!codebook_decode_deinterleave_repeat(f, book, residue_buffers, ch, &c_inter, &p_inter, n, r->part_size))
+                        goto done;
+                     #endif
+                  } else {
+                     z += r->part_size;
+                     c_inter = z & 1;
+                     p_inter = z >> 1;
+                  }
+               }
+               #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+               ++class_set;
+               #endif
+            }
+         } else if (ch > 2) {
+            while (pcount < part_read) {
+               int z = r->begin + pcount*r->part_size;
+               int c_inter = z % ch, p_inter = z/ch;
+               if (pass == 0) {
+                  Codebook *c = f->codebooks+r->classbook;
+                  int q;
+                  DECODE(q,f,c);
+                  if (q == EOP) goto done;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  part_classdata[0][class_set] = r->classdata[q];
+                  #else
+                  for (i=classwords-1; i >= 0; --i) {
+                     classifications[0][i+pcount] = q % r->classifications;
+                     q /= r->classifications;
+                  }
+                  #endif
+               }
+               for (i=0; i < classwords && pcount < part_read; ++i, ++pcount) {
+                  int z = r->begin + pcount*r->part_size;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  int c = part_classdata[0][class_set][i];
+                  #else
+                  int c = classifications[0][pcount];
+                  #endif
+                  int b = r->residue_books[c][pass];
+                  if (b >= 0) {
+                     Codebook *book = f->codebooks + b;
+                     if (!codebook_decode_deinterleave_repeat(f, book, residue_buffers, ch, &c_inter, &p_inter, n, r->part_size))
+                        goto done;
+                  } else {
+                     z += r->part_size;
+                     c_inter = z % ch;
+                     p_inter = z / ch;
+                  }
+               }
+               #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+               ++class_set;
+               #endif
+            }
+         }
+      }
+      goto done;
+   }
+   CHECK(f);
+
+   for (pass=0; pass < 8; ++pass) {
+      int pcount = 0, class_set=0;
+      while (pcount < part_read) {
+         if (pass == 0) {
+            for (j=0; j < ch; ++j) {
+               if (!do_not_decode[j]) {
+                  Codebook *c = f->codebooks+r->classbook;
+                  int temp;
+                  DECODE(temp,f,c);
+                  if (temp == EOP) goto done;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  part_classdata[j][class_set] = r->classdata[temp];
+                  #else
+                  for (i=classwords-1; i >= 0; --i) {
+                     classifications[j][i+pcount] = temp % r->classifications;
+                     temp /= r->classifications;
+                  }
+                  #endif
+               }
+            }
+         }
+         for (i=0; i < classwords && pcount < part_read; ++i, ++pcount) {
+            for (j=0; j < ch; ++j) {
+               if (!do_not_decode[j]) {
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  int c = part_classdata[j][class_set][i];
+                  #else
+                  int c = classifications[j][pcount];
+                  #endif
+                  int b = r->residue_books[c][pass];
+                  if (b >= 0) {
+                     float *target = residue_buffers[j];
+                     int offset = r->begin + pcount * r->part_size;
+                     int n = r->part_size;
+                     Codebook *book = f->codebooks + b;
+                     if (!residue_decode(f, book, target, offset, n, rtype))
+                        goto done;
+                  }
+               }
+            }
+         }
+         #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+         ++class_set;
+         #endif
+      }
+   }
+  done:
+   CHECK(f);
+   #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+   temp_free(f,part_classdata);
+   #else
+   temp_free(f,classifications);
+   #endif
+   temp_alloc_restore(f,temp_alloc_point);
+}
+
+
+#if 0
+// slow way for debugging
+void inverse_mdct_slow(float *buffer, int n)
+{
+   int i,j;
+   int n2 = n >> 1;
+   float *x = (float *) malloc(sizeof(*x) * n2);
+   memcpy(x, buffer, sizeof(*x) * n2);
+   for (i=0; i < n; ++i) {
+      float acc = 0;
+      for (j=0; j < n2; ++j)
+         // formula from paper:
+         //acc += n/4.0f * x[j] * (float) cos(M_PI / 2 / n * (2 * i + 1 + n/2.0)*(2*j+1));
+         // formula from wikipedia
+         //acc += 2.0f / n2 * x[j] * (float) cos(M_PI/n2 * (i + 0.5 + n2/2)*(j + 0.5));
+         // these are equivalent, except the formula from the paper inverts the multiplier!
+         // however, what actually works is NO MULTIPLIER!?!
+         //acc += 64 * 2.0f / n2 * x[j] * (float) cos(M_PI/n2 * (i + 0.5 + n2/2)*(j + 0.5));
+         acc += x[j] * (float) cos(M_PI / 2 / n * (2 * i + 1 + n/2.0)*(2*j+1));
+      buffer[i] = acc;
+   }
+   free(x);
+}
+#elif 0
+// same as above, but just barely able to run in real time on modern machines
+void inverse_mdct_slow(float *buffer, int n, vorb *f, int blocktype)
+{
+   float mcos[16384];
+   int i,j;
+   int n2 = n >> 1, nmask = (n << 2) -1;
+   float *x = (float *) malloc(sizeof(*x) * n2);
+   memcpy(x, buffer, sizeof(*x) * n2);
+   for (i=0; i < 4*n; ++i)
+      mcos[i] = (float) cos(M_PI / 2 * i / n);
+
+   for (i=0; i < n; ++i) {
+      float acc = 0;
+      for (j=0; j < n2; ++j)
+         acc += x[j] * mcos[(2 * i + 1 + n2)*(2*j+1) & nmask];
+      buffer[i] = acc;
+   }
+   free(x);
+}
+#elif 0
+// transform to use a slow dct-iv; this is STILL basically trivial,
+// but only requires half as many ops
+void dct_iv_slow(float *buffer, int n)
+{
+   float mcos[16384];
+   float x[2048];
+   int i,j;
+   int n2 = n >> 1, nmask = (n << 3) - 1;
+   memcpy(x, buffer, sizeof(*x) * n);
+   for (i=0; i < 8*n; ++i)
+      mcos[i] = (float) cos(M_PI / 4 * i / n);
+   for (i=0; i < n; ++i) {
+      float acc = 0;
+      for (j=0; j < n; ++j)
+         acc += x[j] * mcos[((2 * i + 1)*(2*j+1)) & nmask];
+      buffer[i] = acc;
+   }
+}
+
+void inverse_mdct_slow(float *buffer, int n, vorb *f, int blocktype)
+{
+   int i, n4 = n >> 2, n2 = n >> 1, n3_4 = n - n4;
+   float temp[4096];
+
+   memcpy(temp, buffer, n2 * sizeof(float));
+   dct_iv_slow(temp, n2);  // returns -c'-d, a-b'
+
+   for (i=0; i < n4  ; ++i) buffer[i] = temp[i+n4];            // a-b'
+   for (   ; i < n3_4; ++i) buffer[i] = -temp[n3_4 - i - 1];   // b-a', c+d'
+   for (   ; i < n   ; ++i) buffer[i] = -temp[i - n3_4];       // c'+d
+}
+#endif
+
+#ifndef LIBVORBIS_MDCT
+#define LIBVORBIS_MDCT 0
+#endif
+
+#if LIBVORBIS_MDCT
+// directly call the vorbis MDCT using an interface documented
+// by Jeff Roberts... useful for performance comparison
+typedef struct
+{
+  int n;
+  int log2n;
+
+  float *trig;
+  int   *bitrev;
+
+  float scale;
+} mdct_lookup;
+
+extern void mdct_init(mdct_lookup *lookup, int n);
+extern void mdct_clear(mdct_lookup *l);
+extern void mdct_backward(mdct_lookup *init, float *in, float *out);
+
+mdct_lookup M1,M2;
+
+void inverse_mdct(float *buffer, int n, vorb *f, int blocktype)
+{
+   mdct_lookup *M;
+   if (M1.n == n) M = &M1;
+   else if (M2.n == n) M = &M2;
+   else if (M1.n == 0) { mdct_init(&M1, n); M = &M1; }
+   else {
+      if (M2.n) __asm int 3;
+      mdct_init(&M2, n);
+      M = &M2;
+   }
+
+   mdct_backward(M, buffer, buffer);
+}
+#endif
+
+
+// the following were split out into separate functions while optimizing;
+// they could be pushed back up but eh. __forceinline showed no change;
+// they're probably already being inlined.
+static void imdct_step3_iter0_loop(int n, float *e, int i_off, int k_off, float *A)
+{
+   float *ee0 = e + i_off;
+   float *ee2 = ee0 + k_off;
+   int i;
+
+   assert((n & 3) == 0);
+   for (i=(n>>2); i > 0; --i) {
+      float k00_20, k01_21;
+      k00_20  = ee0[ 0] - ee2[ 0];
+      k01_21  = ee0[-1] - ee2[-1];
+      ee0[ 0] += ee2[ 0];//ee0[ 0] = ee0[ 0] + ee2[ 0];
+      ee0[-1] += ee2[-1];//ee0[-1] = ee0[-1] + ee2[-1];
+      ee2[ 0] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-1] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+
+      k00_20  = ee0[-2] - ee2[-2];
+      k01_21  = ee0[-3] - ee2[-3];
+      ee0[-2] += ee2[-2];//ee0[-2] = ee0[-2] + ee2[-2];
+      ee0[-3] += ee2[-3];//ee0[-3] = ee0[-3] + ee2[-3];
+      ee2[-2] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-3] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+
+      k00_20  = ee0[-4] - ee2[-4];
+      k01_21  = ee0[-5] - ee2[-5];
+      ee0[-4] += ee2[-4];//ee0[-4] = ee0[-4] + ee2[-4];
+      ee0[-5] += ee2[-5];//ee0[-5] = ee0[-5] + ee2[-5];
+      ee2[-4] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-5] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+
+      k00_20  = ee0[-6] - ee2[-6];
+      k01_21  = ee0[-7] - ee2[-7];
+      ee0[-6] += ee2[-6];//ee0[-6] = ee0[-6] + ee2[-6];
+      ee0[-7] += ee2[-7];//ee0[-7] = ee0[-7] + ee2[-7];
+      ee2[-6] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-7] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+      ee0 -= 8;
+      ee2 -= 8;
+   }
+}
+
+static void imdct_step3_inner_r_loop(int lim, float *e, int d0, int k_off, float *A, int k1)
+{
+   int i;
+   float k00_20, k01_21;
+
+   float *e0 = e + d0;
+   float *e2 = e0 + k_off;
+
+   for (i=lim >> 2; i > 0; --i) {
+      k00_20 = e0[-0] - e2[-0];
+      k01_21 = e0[-1] - e2[-1];
+      e0[-0] += e2[-0];//e0[-0] = e0[-0] + e2[-0];
+      e0[-1] += e2[-1];//e0[-1] = e0[-1] + e2[-1];
+      e2[-0] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-1] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      A += k1;
+
+      k00_20 = e0[-2] - e2[-2];
+      k01_21 = e0[-3] - e2[-3];
+      e0[-2] += e2[-2];//e0[-2] = e0[-2] + e2[-2];
+      e0[-3] += e2[-3];//e0[-3] = e0[-3] + e2[-3];
+      e2[-2] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-3] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      A += k1;
+
+      k00_20 = e0[-4] - e2[-4];
+      k01_21 = e0[-5] - e2[-5];
+      e0[-4] += e2[-4];//e0[-4] = e0[-4] + e2[-4];
+      e0[-5] += e2[-5];//e0[-5] = e0[-5] + e2[-5];
+      e2[-4] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-5] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      A += k1;
+
+      k00_20 = e0[-6] - e2[-6];
+      k01_21 = e0[-7] - e2[-7];
+      e0[-6] += e2[-6];//e0[-6] = e0[-6] + e2[-6];
+      e0[-7] += e2[-7];//e0[-7] = e0[-7] + e2[-7];
+      e2[-6] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-7] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      e0 -= 8;
+      e2 -= 8;
+
+      A += k1;
+   }
+}
+
+static void imdct_step3_inner_s_loop(int n, float *e, int i_off, int k_off, float *A, int a_off, int k0)
+{
+   int i;
+   float A0 = A[0];
+   float A1 = A[0+1];
+   float A2 = A[0+a_off];
+   float A3 = A[0+a_off+1];
+   float A4 = A[0+a_off*2+0];
+   float A5 = A[0+a_off*2+1];
+   float A6 = A[0+a_off*3+0];
+   float A7 = A[0+a_off*3+1];
+
+   float k00,k11;
+
+   float *ee0 = e  +i_off;
+   float *ee2 = ee0+k_off;
+
+   for (i=n; i > 0; --i) {
+      k00     = ee0[ 0] - ee2[ 0];
+      k11     = ee0[-1] - ee2[-1];
+      ee0[ 0] =  ee0[ 0] + ee2[ 0];
+      ee0[-1] =  ee0[-1] + ee2[-1];
+      ee2[ 0] = (k00) * A0 - (k11) * A1;
+      ee2[-1] = (k11) * A0 + (k00) * A1;
+
+      k00     = ee0[-2] - ee2[-2];
+      k11     = ee0[-3] - ee2[-3];
+      ee0[-2] =  ee0[-2] + ee2[-2];
+      ee0[-3] =  ee0[-3] + ee2[-3];
+      ee2[-2] = (k00) * A2 - (k11) * A3;
+      ee2[-3] = (k11) * A2 + (k00) * A3;
+
+      k00     = ee0[-4] - ee2[-4];
+      k11     = ee0[-5] - ee2[-5];
+      ee0[-4] =  ee0[-4] + ee2[-4];
+      ee0[-5] =  ee0[-5] + ee2[-5];
+      ee2[-4] = (k00) * A4 - (k11) * A5;
+      ee2[-5] = (k11) * A4 + (k00) * A5;
+
+      k00     = ee0[-6] - ee2[-6];
+      k11     = ee0[-7] - ee2[-7];
+      ee0[-6] =  ee0[-6] + ee2[-6];
+      ee0[-7] =  ee0[-7] + ee2[-7];
+      ee2[-6] = (k00) * A6 - (k11) * A7;
+      ee2[-7] = (k11) * A6 + (k00) * A7;
+
+      ee0 -= k0;
+      ee2 -= k0;
+   }
+}
+
+static __forceinline void iter_54(float *z)
+{
+   float k00,k11,k22,k33;
+   float y0,y1,y2,y3;
+
+   k00  = z[ 0] - z[-4];
+   y0   = z[ 0] + z[-4];
+   y2   = z[-2] + z[-6];
+   k22  = z[-2] - z[-6];
+
+   z[-0] = y0 + y2;      // z0 + z4 + z2 + z6
+   z[-2] = y0 - y2;      // z0 + z4 - z2 - z6
+
+   // done with y0,y2
+
+   k33  = z[-3] - z[-7];
+
+   z[-4] = k00 + k33;    // z0 - z4 + z3 - z7
+   z[-6] = k00 - k33;    // z0 - z4 - z3 + z7
+
+   // done with k33
+
+   k11  = z[-1] - z[-5];
+   y1   = z[-1] + z[-5];
+   y3   = z[-3] + z[-7];
+
+   z[-1] = y1 + y3;      // z1 + z5 + z3 + z7
+   z[-3] = y1 - y3;      // z1 + z5 - z3 - z7
+   z[-5] = k11 - k22;    // z1 - z5 + z2 - z6
+   z[-7] = k11 + k22;    // z1 - z5 - z2 + z6
+}
+
+static void imdct_step3_inner_s_loop_ld654(int n, float *e, int i_off, float *A, int base_n)
+{
+   int a_off = base_n >> 3;
+   float A2 = A[0+a_off];
+   float *z = e + i_off;
+   float *base = z - 16 * n;
+
+   while (z > base) {
+      float k00,k11;
+      float l00,l11;
+
+      k00    = z[-0] - z[ -8];
+      k11    = z[-1] - z[ -9];
+      l00    = z[-2] - z[-10];
+      l11    = z[-3] - z[-11];
+      z[ -0] = z[-0] + z[ -8];
+      z[ -1] = z[-1] + z[ -9];
+      z[ -2] = z[-2] + z[-10];
+      z[ -3] = z[-3] + z[-11];
+      z[ -8] = k00;
+      z[ -9] = k11;
+      z[-10] = (l00+l11) * A2;
+      z[-11] = (l11-l00) * A2;
+
+      k00    = z[ -4] - z[-12];
+      k11    = z[ -5] - z[-13];
+      l00    = z[ -6] - z[-14];
+      l11    = z[ -7] - z[-15];
+      z[ -4] = z[ -4] + z[-12];
+      z[ -5] = z[ -5] + z[-13];
+      z[ -6] = z[ -6] + z[-14];
+      z[ -7] = z[ -7] + z[-15];
+      z[-12] = k11;
+      z[-13] = -k00;
+      z[-14] = (l11-l00) * A2;
+      z[-15] = (l00+l11) * -A2;
+
+      iter_54(z);
+      iter_54(z-8);
+      z -= 16;
+   }
+}
+
+static void inverse_mdct(float *buffer, int n, vorb *f, int blocktype)
+{
+   int n2 = n >> 1, n4 = n >> 2, n8 = n >> 3, l;
+   int ld;
+   // @OPTIMIZE: reduce register pressure by using fewer variables?
+   int save_point = temp_alloc_save(f);
+   float *buf2 = (float *) temp_alloc(f, n2 * sizeof(*buf2));
+   float *u=NULL,*v=NULL;
+   // twiddle factors
+   float *A = f->A[blocktype];
+
+   // IMDCT algorithm from "The use of multirate filter banks for coding of high quality digital audio"
+   // See notes about bugs in that paper in less-optimal implementation 'inverse_mdct_old' after this function.
+
+   // kernel from paper
+
+
+   // merged:
+   //   copy and reflect spectral data
+   //   step 0
+
+   // note that it turns out that the items added together during
+   // this step are, in fact, being added to themselves (as reflected
+   // by step 0). inexplicable inefficiency! this became obvious
+   // once I combined the passes.
+
+   // so there's a missing 'times 2' here (for adding X to itself).
+   // this propagates through linearly to the end, where the numbers
+   // are 1/2 too small, and need to be compensated for.
+
+   {
+      float *d,*e, *AA, *e_stop;
+      d = &buf2[n2-2];
+      AA = A;
+      e = &buffer[0];
+      e_stop = &buffer[n2];
+      while (e != e_stop) {
+         d[1] = (e[0] * AA[0] - e[2]*AA[1]);
+         d[0] = (e[0] * AA[1] + e[2]*AA[0]);
+         d -= 2;
+         AA += 2;
+         e += 4;
+      }
+
+      e = &buffer[n2-3];
+      while (d >= buf2) {
+         d[1] = (-e[2] * AA[0] - -e[0]*AA[1]);
+         d[0] = (-e[2] * AA[1] + -e[0]*AA[0]);
+         d -= 2;
+         AA += 2;
+         e -= 4;
+      }
+   }
+
+   // now we use symbolic names for these, so that we can
+   // possibly swap their meaning as we change which operations
+   // are in place
+
+   u = buffer;
+   v = buf2;
+
+   // step 2    (paper output is w, now u)
+   // this could be in place, but the data ends up in the wrong
+   // place... _somebody_'s got to swap it, so this is nominated
+   {
+      float *AA = &A[n2-8];
+      float *d0,*d1, *e0, *e1;
+
+      e0 = &v[n4];
+      e1 = &v[0];
+
+      d0 = &u[n4];
+      d1 = &u[0];
+
+      while (AA >= A) {
+         float v40_20, v41_21;
+
+         v41_21 = e0[1] - e1[1];
+         v40_20 = e0[0] - e1[0];
+         d0[1]  = e0[1] + e1[1];
+         d0[0]  = e0[0] + e1[0];
+         d1[1]  = v41_21*AA[4] - v40_20*AA[5];
+         d1[0]  = v40_20*AA[4] + v41_21*AA[5];
+
+         v41_21 = e0[3] - e1[3];
+         v40_20 = e0[2] - e1[2];
+         d0[3]  = e0[3] + e1[3];
+         d0[2]  = e0[2] + e1[2];
+         d1[3]  = v41_21*AA[0] - v40_20*AA[1];
+         d1[2]  = v40_20*AA[0] + v41_21*AA[1];
+
+         AA -= 8;
+
+         d0 += 4;
+         d1 += 4;
+         e0 += 4;
+         e1 += 4;
+      }
+   }
+
+   // step 3
+   ld = ilog(n) - 1; // ilog is off-by-one from normal definitions
+
+   // optimized step 3:
+
+   // the original step3 loop can be nested r inside s or s inside r;
+   // it's written originally as s inside r, but this is dumb when r
+   // iterates many times, and s few. So I have two copies of it and
+   // switch between them halfway.
+
+   // this is iteration 0 of step 3
+   imdct_step3_iter0_loop(n >> 4, u, n2-1-n4*0, -(n >> 3), A);
+   imdct_step3_iter0_loop(n >> 4, u, n2-1-n4*1, -(n >> 3), A);
+
+   // this is iteration 1 of step 3
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*0, -(n >> 4), A, 16);
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*1, -(n >> 4), A, 16);
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*2, -(n >> 4), A, 16);
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*3, -(n >> 4), A, 16);
+
+   l=2;
+   for (; l < (ld-3)>>1; ++l) {
+      int k0 = n >> (l+2), k0_2 = k0>>1;
+      int lim = 1 << (l+1);
+      int i;
+      for (i=0; i < lim; ++i)
+         imdct_step3_inner_r_loop(n >> (l+4), u, n2-1 - k0*i, -k0_2, A, 1 << (l+3));
+   }
+
+   for (; l < ld-6; ++l) {
+      int k0 = n >> (l+2), k1 = 1 << (l+3), k0_2 = k0>>1;
+      int rlim = n >> (l+6), r;
+      int lim = 1 << (l+1);
+      int i_off;
+      float *A0 = A;
+      i_off = n2-1;
+      for (r=rlim; r > 0; --r) {
+         imdct_step3_inner_s_loop(lim, u, i_off, -k0_2, A0, k1, k0);
+         A0 += k1*4;
+         i_off -= 8;
+      }
+   }
+
+   // iterations with count:
+   //   ld-6,-5,-4 all interleaved together
+   //       the big win comes from getting rid of needless flops
+   //         due to the constants on pass 5 & 4 being all 1 and 0;
+   //       combining them to be simultaneous to improve cache made little difference
+   imdct_step3_inner_s_loop_ld654(n >> 5, u, n2-1, A, n);
+
+   // output is u
+
+   // step 4, 5, and 6
+   // cannot be in-place because of step 5
+   {
+      uint16 *bitrev = f->bit_reverse[blocktype];
+      // weirdly, I'd have thought reading sequentially and writing
+      // erratically would have been better than vice-versa, but in
+      // fact that's not what my testing showed. (That is, with
+      // j = bitreverse(i), do you read i and write j, or read j and write i.)
+
+      float *d0 = &v[n4-4];
+      float *d1 = &v[n2-4];
+      while (d0 >= v) {
+         int k4;
+
+         k4 = bitrev[0];
+         d1[3] = u[k4+0];
+         d1[2] = u[k4+1];
+         d0[3] = u[k4+2];
+         d0[2] = u[k4+3];
+
+         k4 = bitrev[1];
+         d1[1] = u[k4+0];
+         d1[0] = u[k4+1];
+         d0[1] = u[k4+2];
+         d0[0] = u[k4+3];
+
+         d0 -= 4;
+         d1 -= 4;
+         bitrev += 2;
+      }
+   }
+   // (paper output is u, now v)
+
+
+   // data must be in buf2
+   assert(v == buf2);
+
+   // step 7   (paper output is v, now v)
+   // this is now in place
+   {
+      float *C = f->C[blocktype];
+      float *d, *e;
+
+      d = v;
+      e = v + n2 - 4;
+
+      while (d < e) {
+         float a02,a11,b0,b1,b2,b3;
+
+         a02 = d[0] - e[2];
+         a11 = d[1] + e[3];
+
+         b0 = C[1]*a02 + C[0]*a11;
+         b1 = C[1]*a11 - C[0]*a02;
+
+         b2 = d[0] + e[ 2];
+         b3 = d[1] - e[ 3];
+
+         d[0] = b2 + b0;
+         d[1] = b3 + b1;
+         e[2] = b2 - b0;
+         e[3] = b1 - b3;
+
+         a02 = d[2] - e[0];
+         a11 = d[3] + e[1];
+
+         b0 = C[3]*a02 + C[2]*a11;
+         b1 = C[3]*a11 - C[2]*a02;
+
+         b2 = d[2] + e[ 0];
+         b3 = d[3] - e[ 1];
+
+         d[2] = b2 + b0;
+         d[3] = b3 + b1;
+         e[0] = b2 - b0;
+         e[1] = b1 - b3;
+
+         C += 4;
+         d += 4;
+         e -= 4;
+      }
+   }
+
+   // data must be in buf2
+
+
+   // step 8+decode   (paper output is X, now buffer)
+   // this generates pairs of data a la 8 and pushes them directly through
+   // the decode kernel (pushing rather than pulling) to avoid having
+   // to make another pass later
+
+   // this cannot POSSIBLY be in place, so we refer to the buffers directly
+
+   {
+      float *d0,*d1,*d2,*d3;
+
+      float *B = f->B[blocktype] + n2 - 8;
+      float *e = buf2 + n2 - 8;
+      d0 = &buffer[0];
+      d1 = &buffer[n2-4];
+      d2 = &buffer[n2];
+      d3 = &buffer[n-4];
+      while (e >= v) {
+         float p0,p1,p2,p3;
+
+         p3 =  e[6]*B[7] - e[7]*B[6];
+         p2 = -e[6]*B[6] - e[7]*B[7];
+
+         d0[0] =   p3;
+         d1[3] = - p3;
+         d2[0] =   p2;
+         d3[3] =   p2;
+
+         p1 =  e[4]*B[5] - e[5]*B[4];
+         p0 = -e[4]*B[4] - e[5]*B[5];
+
+         d0[1] =   p1;
+         d1[2] = - p1;
+         d2[1] =   p0;
+         d3[2] =   p0;
+
+         p3 =  e[2]*B[3] - e[3]*B[2];
+         p2 = -e[2]*B[2] - e[3]*B[3];
+
+         d0[2] =   p3;
+         d1[1] = - p3;
+         d2[2] =   p2;
+         d3[1] =   p2;
+
+         p1 =  e[0]*B[1] - e[1]*B[0];
+         p0 = -e[0]*B[0] - e[1]*B[1];
+
+         d0[3] =   p1;
+         d1[0] = - p1;
+         d2[3] =   p0;
+         d3[0] =   p0;
+
+         B -= 8;
+         e -= 8;
+         d0 += 4;
+         d2 += 4;
+         d1 -= 4;
+         d3 -= 4;
+      }
+   }
+
+   temp_free(f,buf2);
+   temp_alloc_restore(f,save_point);
+}
+
+#if 0
+// this is the original version of the above code, if you want to optimize it from scratch
+void inverse_mdct_naive(float *buffer, int n)
+{
+   float s;
+   float A[1 << 12], B[1 << 12], C[1 << 11];
+   int i,k,k2,k4, n2 = n >> 1, n4 = n >> 2, n8 = n >> 3, l;
+   int n3_4 = n - n4, ld;
+   // how can they claim this only uses N words?!
+   // oh, because they're only used sparsely, whoops
+   float u[1 << 13], X[1 << 13], v[1 << 13], w[1 << 13];
+   // set up twiddle factors
+
+   for (k=k2=0; k < n4; ++k,k2+=2) {
+      A[k2  ] = (float)  cos(4*k*M_PI/n);
+      A[k2+1] = (float) -sin(4*k*M_PI/n);
+      B[k2  ] = (float)  cos((k2+1)*M_PI/n/2);
+      B[k2+1] = (float)  sin((k2+1)*M_PI/n/2);
+   }
+   for (k=k2=0; k < n8; ++k,k2+=2) {
+      C[k2  ] = (float)  cos(2*(k2+1)*M_PI/n);
+      C[k2+1] = (float) -sin(2*(k2+1)*M_PI/n);
+   }
+
+   // IMDCT algorithm from "The use of multirate filter banks for coding of high quality digital audio"
+   // Note there are bugs in that pseudocode, presumably due to them attempting
+   // to rename the arrays nicely rather than representing the way their actual
+   // implementation bounces buffers back and forth. As a result, even in the
+   // "some formulars corrected" version, a direct implementation fails. These
+   // are noted below as "paper bug".
+
+   // copy and reflect spectral data
+   for (k=0; k < n2; ++k) u[k] = buffer[k];
+   for (   ; k < n ; ++k) u[k] = -buffer[n - k - 1];
+   // kernel from paper
+   // step 1
+   for (k=k2=k4=0; k < n4; k+=1, k2+=2, k4+=4) {
+      v[n-k4-1] = (u[k4] - u[n-k4-1]) * A[k2]   - (u[k4+2] - u[n-k4-3])*A[k2+1];
+      v[n-k4-3] = (u[k4] - u[n-k4-1]) * A[k2+1] + (u[k4+2] - u[n-k4-3])*A[k2];
+   }
+   // step 2
+   for (k=k4=0; k < n8; k+=1, k4+=4) {
+      w[n2+3+k4] = v[n2+3+k4] + v[k4+3];
+      w[n2+1+k4] = v[n2+1+k4] + v[k4+1];
+      w[k4+3]    = (v[n2+3+k4] - v[k4+3])*A[n2-4-k4] - (v[n2+1+k4]-v[k4+1])*A[n2-3-k4];
+      w[k4+1]    = (v[n2+1+k4] - v[k4+1])*A[n2-4-k4] + (v[n2+3+k4]-v[k4+3])*A[n2-3-k4];
+   }
+   // step 3
+   ld = ilog(n) - 1; // ilog is off-by-one from normal definitions
+   for (l=0; l < ld-3; ++l) {
+      int k0 = n >> (l+2), k1 = 1 << (l+3);
+      int rlim = n >> (l+4), r4, r;
+      int s2lim = 1 << (l+2), s2;
+      for (r=r4=0; r < rlim; r4+=4,++r) {
+         for (s2=0; s2 < s2lim; s2+=2) {
+            u[n-1-k0*s2-r4] = w[n-1-k0*s2-r4] + w[n-1-k0*(s2+1)-r4];
+            u[n-3-k0*s2-r4] = w[n-3-k0*s2-r4] + w[n-3-k0*(s2+1)-r4];
+            u[n-1-k0*(s2+1)-r4] = (w[n-1-k0*s2-r4] - w[n-1-k0*(s2+1)-r4]) * A[r*k1]
+                                - (w[n-3-k0*s2-r4] - w[n-3-k0*(s2+1)-r4]) * A[r*k1+1];
+            u[n-3-k0*(s2+1)-r4] = (w[n-3-k0*s2-r4] - w[n-3-k0*(s2+1)-r4]) * A[r*k1]
+                                + (w[n-1-k0*s2-r4] - w[n-1-k0*(s2+1)-r4]) * A[r*k1+1];
+         }
+      }
+      if (l+1 < ld-3) {
+         // paper bug: ping-ponging of u&w here is omitted
+         memcpy(w, u, sizeof(u));
+      }
+   }
+
+   // step 4
+   for (i=0; i < n8; ++i) {
+      int j = bit_reverse(i) >> (32-ld+3);
+      assert(j < n8);
+      if (i == j) {
+         // paper bug: original code probably swapped in place; if copying,
+         //            need to directly copy in this case
+         int i8 = i << 3;
+         v[i8+1] = u[i8+1];
+         v[i8+3] = u[i8+3];
+         v[i8+5] = u[i8+5];
+         v[i8+7] = u[i8+7];
+      } else if (i < j) {
+         int i8 = i << 3, j8 = j << 3;
+         v[j8+1] = u[i8+1], v[i8+1] = u[j8 + 1];
+         v[j8+3] = u[i8+3], v[i8+3] = u[j8 + 3];
+         v[j8+5] = u[i8+5], v[i8+5] = u[j8 + 5];
+         v[j8+7] = u[i8+7], v[i8+7] = u[j8 + 7];
+      }
+   }
+   // step 5
+   for (k=0; k < n2; ++k) {
+      w[k] = v[k*2+1];
+   }
+   // step 6
+   for (k=k2=k4=0; k < n8; ++k, k2 += 2, k4 += 4) {
+      u[n-1-k2] = w[k4];
+      u[n-2-k2] = w[k4+1];
+      u[n3_4 - 1 - k2] = w[k4+2];
+      u[n3_4 - 2 - k2] = w[k4+3];
+   }
+   // step 7
+   for (k=k2=0; k < n8; ++k, k2 += 2) {
+      v[n2 + k2 ] = ( u[n2 + k2] + u[n-2-k2] + C[k2+1]*(u[n2+k2]-u[n-2-k2]) + C[k2]*(u[n2+k2+1]+u[n-2-k2+1]))/2;
+      v[n-2 - k2] = ( u[n2 + k2] + u[n-2-k2] - C[k2+1]*(u[n2+k2]-u[n-2-k2]) - C[k2]*(u[n2+k2+1]+u[n-2-k2+1]))/2;
+      v[n2+1+ k2] = ( u[n2+1+k2] - u[n-1-k2] + C[k2+1]*(u[n2+1+k2]+u[n-1-k2]) - C[k2]*(u[n2+k2]-u[n-2-k2]))/2;
+      v[n-1 - k2] = (-u[n2+1+k2] + u[n-1-k2] + C[k2+1]*(u[n2+1+k2]+u[n-1-k2]) - C[k2]*(u[n2+k2]-u[n-2-k2]))/2;
+   }
+   // step 8
+   for (k=k2=0; k < n4; ++k,k2 += 2) {
+      X[k]      = v[k2+n2]*B[k2  ] + v[k2+1+n2]*B[k2+1];
+      X[n2-1-k] = v[k2+n2]*B[k2+1] - v[k2+1+n2]*B[k2  ];
+   }
+
+   // decode kernel to output
+   // determined the following value experimentally
+   // (by first figuring out what made inverse_mdct_slow work); then matching that here
+   // (probably vorbis encoder premultiplies by n or n/2, to save it on the decoder?)
+   s = 0.5; // theoretically would be n4
+
+   // [[[ note! the s value of 0.5 is compensated for by the B[] in the current code,
+   //     so it needs to use the "old" B values to behave correctly, or else
+   //     set s to 1.0 ]]]
+   for (i=0; i < n4  ; ++i) buffer[i] = s * X[i+n4];
+   for (   ; i < n3_4; ++i) buffer[i] = -s * X[n3_4 - i - 1];
+   for (   ; i < n   ; ++i) buffer[i] = -s * X[i - n3_4];
+}
+#endif
+
+static float *get_window(vorb *f, int len)
+{
+   len <<= 1;
+   if (len == f->blocksize_0) return f->window[0];
+   if (len == f->blocksize_1) return f->window[1];
+   return NULL;
+}
+
+#ifndef STB_VORBIS_NO_DEFER_FLOOR
+typedef int16 YTYPE;
+#else
+typedef int YTYPE;
+#endif
+static int do_floor(vorb *f, Mapping *map, int i, int n, float *target, YTYPE *finalY, uint8 *step2_flag)
+{
+   int n2 = n >> 1;
+   int s = map->chan[i].mux, floor;
+   floor = map->submap_floor[s];
+   if (f->floor_types[floor] == 0) {
+      return error(f, VORBIS_invalid_stream);
+   } else {
+      Floor1 *g = &f->floor_config[floor].floor1;
+      int j,q;
+      int lx = 0, ly = finalY[0] * g->floor1_multiplier;
+      for (q=1; q < g->values; ++q) {
+         j = g->sorted_order[q];
+         #ifndef STB_VORBIS_NO_DEFER_FLOOR
+         STBV_NOTUSED(step2_flag);
+         if (finalY[j] >= 0)
+         #else
+         if (step2_flag[j])
+         #endif
+         {
+            int hy = finalY[j] * g->floor1_multiplier;
+            int hx = g->Xlist[j];
+            if (lx != hx)
+               draw_line(target, lx,ly, hx,hy, n2);
+            CHECK(f);
+            lx = hx, ly = hy;
+         }
+      }
+      if (lx < n2) {
+         // optimization of: draw_line(target, lx,ly, n,ly, n2);
+         for (j=lx; j < n2; ++j)
+            LINE_OP(target[j], inverse_db_table[ly]);
+         CHECK(f);
+      }
+   }
+   return TRUE;
+}
+
+// The meaning of "left" and "right"
+//
+// For a given frame:
+//     we compute samples from 0..n
+//     window_center is n/2
+//     we'll window and mix the samples from left_start to left_end with data from the previous frame
+//     all of the samples from left_end to right_start can be output without mixing; however,
+//        this interval is 0-length except when transitioning between short and long frames
+//     all of the samples from right_start to right_end need to be mixed with the next frame,
+//        which we don't have, so those get saved in a buffer
+//     frame N's right_end-right_start, the number of samples to mix with the next frame,
+//        has to be the same as frame N+1's left_end-left_start (which they are by
+//        construction)
+
+static int vorbis_decode_initial(vorb *f, int *p_left_start, int *p_left_end, int *p_right_start, int *p_right_end, int *mode)
+{
+   Mode *m;
+   int i, n, prev, next, window_center;
+   f->channel_buffer_start = f->channel_buffer_end = 0;
+
+  retry:
+   if (f->eof) return FALSE;
+   if (!maybe_start_packet(f))
+      return FALSE;
+   // check packet type
+   if (get_bits(f,1) != 0) {
+      if (IS_PUSH_MODE(f))
+         return error(f,VORBIS_bad_packet_type);
+      while (EOP != get8_packet(f));
+      goto retry;
+   }
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+
+   i = get_bits(f, ilog(f->mode_count-1));
+   if (i == EOP) return FALSE;
+   if (i >= f->mode_count) return FALSE;
+   *mode = i;
+   m = f->mode_config + i;
+   if (m->blockflag) {
+      n = f->blocksize_1;
+      prev = get_bits(f,1);
+      next = get_bits(f,1);
+   } else {
+      prev = next = 0;
+      n = f->blocksize_0;
+   }
+
+// WINDOWING
+
+   window_center = n >> 1;
+   if (m->blockflag && !prev) {
+      *p_left_start = (n - f->blocksize_0) >> 2;
+      *p_left_end   = (n + f->blocksize_0) >> 2;
+   } else {
+      *p_left_start = 0;
+      *p_left_end   = window_center;
+   }
+   if (m->blockflag && !next) {
+      *p_right_start = (n*3 - f->blocksize_0) >> 2;
+      *p_right_end   = (n*3 + f->blocksize_0) >> 2;
+   } else {
+      *p_right_start = window_center;
+      *p_right_end   = n;
+   }
+
+   return TRUE;
+}
+
+static int vorbis_decode_packet_rest(vorb *f, int *len, Mode *m, int left_start, int left_end, int right_start, int right_end, int *p_left)
+{
+   Mapping *map;
+   int i,j,k,n,n2;
+   int zero_channel[256];
+   int really_zero_channel[256];
+
+// WINDOWING
+
+   STBV_NOTUSED(left_end);
+   n = f->blocksize[m->blockflag];
+   map = &f->mapping[m->mapping];
+
+// FLOORS
+   n2 = n >> 1;
+
+   CHECK(f);
+
+   for (i=0; i < f->channels; ++i) {
+      int s = map->chan[i].mux, floor;
+      zero_channel[i] = FALSE;
+      floor = map->submap_floor[s];
+      if (f->floor_types[floor] == 0) {
+         return error(f, VORBIS_invalid_stream);
+      } else {
+         Floor1 *g = &f->floor_config[floor].floor1;
+         if (get_bits(f, 1)) {
+            short *finalY;
+            uint8 step2_flag[256];
+            static int range_list[4] = { 256, 128, 86, 64 };
+            int range = range_list[g->floor1_multiplier-1];
+            int offset = 2;
+            finalY = f->finalY[i];
+            finalY[0] = get_bits(f, ilog(range)-1);
+            finalY[1] = get_bits(f, ilog(range)-1);
+            for (j=0; j < g->partitions; ++j) {
+               int pclass = g->partition_class_list[j];
+               int cdim = g->class_dimensions[pclass];
+               int cbits = g->class_subclasses[pclass];
+               int csub = (1 << cbits)-1;
+               int cval = 0;
+               if (cbits) {
+                  Codebook *c = f->codebooks + g->class_masterbooks[pclass];
+                  DECODE(cval,f,c);
+               }
+               for (k=0; k < cdim; ++k) {
+                  int book = g->subclass_books[pclass][cval & csub];
+                  cval = cval >> cbits;
+                  if (book >= 0) {
+                     int temp;
+                     Codebook *c = f->codebooks + book;
+                     DECODE(temp,f,c);
+                     finalY[offset++] = temp;
+                  } else
+                     finalY[offset++] = 0;
+               }
+            }
+            if (f->valid_bits == INVALID_BITS) goto error; // behavior according to spec
+            step2_flag[0] = step2_flag[1] = 1;
+            for (j=2; j < g->values; ++j) {
+               int low, high, pred, highroom, lowroom, room, val;
+               low = g->neighbors[j][0];
+               high = g->neighbors[j][1];
+               //neighbors(g->Xlist, j, &low, &high);
+               pred = predict_point(g->Xlist[j], g->Xlist[low], g->Xlist[high], finalY[low], finalY[high]);
+               val = finalY[j];
+               highroom = range - pred;
+               lowroom = pred;
+               if (highroom < lowroom)
+                  room = highroom * 2;
+               else
+                  room = lowroom * 2;
+               if (val) {
+                  step2_flag[low] = step2_flag[high] = 1;
+                  step2_flag[j] = 1;
+                  if (val >= room)
+                     if (highroom > lowroom)
+                        finalY[j] = val - lowroom + pred;
+                     else
+                        finalY[j] = pred - val + highroom - 1;
+                  else
+                     if (val & 1)
+                        finalY[j] = pred - ((val+1)>>1);
+                     else
+                        finalY[j] = pred + (val>>1);
+               } else {
+                  step2_flag[j] = 0;
+                  finalY[j] = pred;
+               }
+            }
+
+#ifdef STB_VORBIS_NO_DEFER_FLOOR
+            do_floor(f, map, i, n, f->floor_buffers[i], finalY, step2_flag);
+#else
+            // defer final floor computation until _after_ residue
+            for (j=0; j < g->values; ++j) {
+               if (!step2_flag[j])
+                  finalY[j] = -1;
+            }
+#endif
+         } else {
+           error:
+            zero_channel[i] = TRUE;
+         }
+         // So we just defer everything else to later
+
+         // at this point we've decoded the floor into buffer
+      }
+   }
+   CHECK(f);
+   // at this point we've decoded all floors
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+
+   // re-enable coupled channels if necessary
+   memcpy(really_zero_channel, zero_channel, sizeof(really_zero_channel[0]) * f->channels);
+   for (i=0; i < map->coupling_steps; ++i)
+      if (!zero_channel[map->chan[i].magnitude] || !zero_channel[map->chan[i].angle]) {
+         zero_channel[map->chan[i].magnitude] = zero_channel[map->chan[i].angle] = FALSE;
+      }
+
+   CHECK(f);
+// RESIDUE DECODE
+   for (i=0; i < map->submaps; ++i) {
+      float *residue_buffers[STB_VORBIS_MAX_CHANNELS];
+      int r;
+      uint8 do_not_decode[256];
+      int ch = 0;
+      for (j=0; j < f->channels; ++j) {
+         if (map->chan[j].mux == i) {
+            if (zero_channel[j]) {
+               do_not_decode[ch] = TRUE;
+               residue_buffers[ch] = NULL;
+            } else {
+               do_not_decode[ch] = FALSE;
+               residue_buffers[ch] = f->channel_buffers[j];
+            }
+            ++ch;
+         }
+      }
+      r = map->submap_residue[i];
+      decode_residue(f, residue_buffers, ch, n2, r, do_not_decode);
+   }
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+   CHECK(f);
+
+// INVERSE COUPLING
+   for (i = map->coupling_steps-1; i >= 0; --i) {
+      int n2 = n >> 1;
+      float *m = f->channel_buffers[map->chan[i].magnitude];
+      float *a = f->channel_buffers[map->chan[i].angle    ];
+      for (j=0; j < n2; ++j) {
+         float a2,m2;
+         if (m[j] > 0)
+            if (a[j] > 0)
+               m2 = m[j], a2 = m[j] - a[j];
+            else
+               a2 = m[j], m2 = m[j] + a[j];
+         else
+            if (a[j] > 0)
+               m2 = m[j], a2 = m[j] + a[j];
+            else
+               a2 = m[j], m2 = m[j] - a[j];
+         m[j] = m2;
+         a[j] = a2;
+      }
+   }
+   CHECK(f);
+
+   // finish decoding the floors
+#ifndef STB_VORBIS_NO_DEFER_FLOOR
+   for (i=0; i < f->channels; ++i) {
+      if (really_zero_channel[i]) {
+         memset(f->channel_buffers[i], 0, sizeof(*f->channel_buffers[i]) * n2);
+      } else {
+         do_floor(f, map, i, n, f->channel_buffers[i], f->finalY[i], NULL);
+      }
+   }
+#else
+   for (i=0; i < f->channels; ++i) {
+      if (really_zero_channel[i]) {
+         memset(f->channel_buffers[i], 0, sizeof(*f->channel_buffers[i]) * n2);
+      } else {
+         for (j=0; j < n2; ++j)
+            f->channel_buffers[i][j] *= f->floor_buffers[i][j];
+      }
+   }
+#endif
+
+// INVERSE MDCT
+   CHECK(f);
+   for (i=0; i < f->channels; ++i)
+      inverse_mdct(f->channel_buffers[i], n, f, m->blockflag);
+   CHECK(f);
+
+   // this shouldn't be necessary, unless we exited on an error
+   // and want to flush to get to the next packet
+   flush_packet(f);
+
+   if (f->first_decode) {
+      // assume we start so first non-discarded sample is sample 0
+      // this isn't to spec, but spec would require us to read ahead
+      // and decode the size of all current frames--could be done,
+      // but presumably it's not a commonly used feature
+      f->current_loc = 0u - n2; // start of first frame is positioned for discard (NB this is an intentional unsigned overflow/wrap-around)
+      // we might have to discard samples "from" the next frame too,
+      // if we're lapping a large block then a small at the start?
+      f->discard_samples_deferred = n - right_end;
+      f->current_loc_valid = TRUE;
+      f->first_decode = FALSE;
+   } else if (f->discard_samples_deferred) {
+      if (f->discard_samples_deferred >= right_start - left_start) {
+         f->discard_samples_deferred -= (right_start - left_start);
+         left_start = right_start;
+         *p_left = left_start;
+      } else {
+         left_start += f->discard_samples_deferred;
+         *p_left = left_start;
+         f->discard_samples_deferred = 0;
+      }
+   } else if (f->previous_length == 0 && f->current_loc_valid) {
+      // we're recovering from a seek... that means we're going to discard
+      // the samples from this packet even though we know our position from
+      // the last page header, so we need to update the position based on
+      // the discarded samples here
+      // but wait, the code below is going to add this in itself even
+      // on a discard, so we don't need to do it here...
+   }
+
+   // check if we have ogg information about the sample # for this packet
+   if (f->last_seg_which == f->end_seg_with_known_loc) {
+      // if we have a valid current loc, and this is final:
+      if (f->current_loc_valid && (f->page_flag & PAGEFLAG_last_page)) {
+         uint32 current_end = f->known_loc_for_packet;
+         // then let's infer the size of the (probably) short final frame
+         if (current_end < f->current_loc + (right_end-left_start)) {
+            if (current_end < f->current_loc) {
+               // negative truncation, that's impossible!
+               *len = 0;
+            } else {
+               *len = current_end - f->current_loc;
+            }
+            *len += left_start; // this doesn't seem right, but has no ill effect on my test files
+            if (*len > right_end) *len = right_end; // this should never happen
+            f->current_loc += *len;
+            return TRUE;
+         }
+      }
+      // otherwise, just set our sample loc
+      // guess that the ogg granule pos refers to the _middle_ of the
+      // last frame?
+      // set f->current_loc to the position of left_start
+      f->current_loc = f->known_loc_for_packet - (n2-left_start);
+      f->current_loc_valid = TRUE;
+   }
+   if (f->current_loc_valid)
+      f->current_loc += (right_start - left_start);
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+   *len = right_end;  // ignore samples after the window goes to 0
+   CHECK(f);
+
+   return TRUE;
+}
+
+static int vorbis_decode_packet(vorb *f, int *len, int *p_left, int *p_right)
+{
+   int mode, left_end, right_end;
+   if (!vorbis_decode_initial(f, p_left, &left_end, p_right, &right_end, &mode)) return 0;
+   return vorbis_decode_packet_rest(f, len, f->mode_config + mode, *p_left, left_end, *p_right, right_end, p_left);
+}
+
+static int vorbis_finish_frame(stb_vorbis *f, int len, int left, int right)
+{
+   int prev,i,j;
+   // we use right&left (the start of the right- and left-window sin()-regions)
+   // to determine how much to return, rather than inferring from the rules
+   // (same result, clearer code); 'left' indicates where our sin() window
+   // starts, therefore where the previous window's right edge starts, and
+   // therefore where to start mixing from the previous buffer. 'right'
+   // indicates where our sin() ending-window starts, therefore that's where
+   // we start saving, and where our returned-data ends.
+
+   // mixin from previous window
+   if (f->previous_length) {
+      int i,j, n = f->previous_length;
+      float *w = get_window(f, n);
+      if (w == NULL) return 0;
+      for (i=0; i < f->channels; ++i) {
+         for (j=0; j < n; ++j)
+            f->channel_buffers[i][left+j] =
+               f->channel_buffers[i][left+j]*w[    j] +
+               f->previous_window[i][     j]*w[n-1-j];
+      }
+   }
+
+   prev = f->previous_length;
+
+   // last half of this data becomes previous window
+   f->previous_length = len - right;
+
+   // @OPTIMIZE: could avoid this copy by double-buffering the
+   // output (flipping previous_window with channel_buffers), but
+   // then previous_window would have to be 2x as large, and
+   // channel_buffers couldn't be temp mem (although they're NOT
+   // currently temp mem, they could be (unless we want to level
+   // performance by spreading out the computation))
+   for (i=0; i < f->channels; ++i)
+      for (j=0; right+j < len; ++j)
+         f->previous_window[i][j] = f->channel_buffers[i][right+j];
+
+   if (!prev)
+      // there was no previous packet, so this data isn't valid...
+      // this isn't entirely true, only the would-have-overlapped data
+      // isn't valid, but this seems to be what the spec requires
+      return 0;
+
+   // truncate a short frame
+   if (len < right) right = len;
+
+   f->samples_output += right-left;
+
+   return right - left;
+}
+
+static int vorbis_pump_first_frame(stb_vorbis *f)
+{
+   int len, right, left, res;
+   res = vorbis_decode_packet(f, &len, &left, &right);
+   if (res)
+      vorbis_finish_frame(f, len, left, right);
+   return res;
+}
+
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+static int is_whole_packet_present(stb_vorbis *f)
+{
+   // make sure that we have the packet available before continuing...
+   // this requires a full ogg parse, but we know we can fetch from f->stream
+
+   // instead of coding this out explicitly, we could save the current read state,
+   // read the next packet with get8() until end-of-packet, check f->eof, then
+   // reset the state? but that would be slower, esp. since we'd have over 256 bytes
+   // of state to restore (primarily the page segment table)
+
+   int s = f->next_seg, first = TRUE;
+   uint8 *p = f->stream;
+
+   if (s != -1) { // if we're not starting the packet with a 'continue on next page' flag
+      for (; s < f->segment_count; ++s) {
+         p += f->segments[s];
+         if (f->segments[s] < 255)               // stop at first short segment
+            break;
+      }
+      // either this continues, or it ends it...
+      if (s == f->segment_count)
+         s = -1; // set 'crosses page' flag
+      if (p > f->stream_end)                     return error(f, VORBIS_need_more_data);
+      first = FALSE;
+   }
+   for (; s == -1;) {
+      uint8 *q;
+      int n;
+
+      // check that we have the page header ready
+      if (p + 26 >= f->stream_end)               return error(f, VORBIS_need_more_data);
+      // validate the page
+      if (memcmp(p, ogg_page_header, 4))         return error(f, VORBIS_invalid_stream);
+      if (p[4] != 0)                             return error(f, VORBIS_invalid_stream);
+      if (first) { // the first segment must NOT have 'continued_packet', later ones MUST
+         if (f->previous_length)
+            if ((p[5] & PAGEFLAG_continued_packet))  return error(f, VORBIS_invalid_stream);
+         // if no previous length, we're resynching, so we can come in on a continued-packet,
+         // which we'll just drop
+      } else {
+         if (!(p[5] & PAGEFLAG_continued_packet)) return error(f, VORBIS_invalid_stream);
+      }
+      n = p[26]; // segment counts
+      q = p+27;  // q points to segment table
+      p = q + n; // advance past header
+      // make sure we've read the segment table
+      if (p > f->stream_end)                     return error(f, VORBIS_need_more_data);
+      for (s=0; s < n; ++s) {
+         p += q[s];
+         if (q[s] < 255)
+            break;
+      }
+      if (s == n)
+         s = -1; // set 'crosses page' flag
+      if (p > f->stream_end)                     return error(f, VORBIS_need_more_data);
+      first = FALSE;
+   }
+   return TRUE;
+}
+#endif // !STB_VORBIS_NO_PUSHDATA_API
+
+static int start_decoder(vorb *f)
+{
+   uint8 header[6], x,y;
+   int len,i,j,k, max_submaps = 0;
+   int longest_floorlist=0;
+
+   // first page, first packet
+   f->first_decode = TRUE;
+
+   if (!start_page(f))                              return FALSE;
+   // validate page flag
+   if (!(f->page_flag & PAGEFLAG_first_page))       return error(f, VORBIS_invalid_first_page);
+   if (f->page_flag & PAGEFLAG_last_page)           return error(f, VORBIS_invalid_first_page);
+   if (f->page_flag & PAGEFLAG_continued_packet)    return error(f, VORBIS_invalid_first_page);
+   // check for expected packet length
+   if (f->segment_count != 1)                       return error(f, VORBIS_invalid_first_page);
+   if (f->segments[0] != 30) {
+      // check for the Ogg skeleton fishead identifying header to refine our error
+      if (f->segments[0] == 64 &&
+          getn(f, header, 6) &&
+          header[0] == 'f' &&
+          header[1] == 'i' &&
+          header[2] == 's' &&
+          header[3] == 'h' &&
+          header[4] == 'e' &&
+          header[5] == 'a' &&
+          get8(f)   == 'd' &&
+          get8(f)   == '\0')                        return error(f, VORBIS_ogg_skeleton_not_supported);
+      else
+                                                    return error(f, VORBIS_invalid_first_page);
+   }
+
+   // read packet
+   // check packet header
+   if (get8(f) != VORBIS_packet_id)                 return error(f, VORBIS_invalid_first_page);
+   if (!getn(f, header, 6))                         return error(f, VORBIS_unexpected_eof);
+   if (!vorbis_validate(header))                    return error(f, VORBIS_invalid_first_page);
+   // vorbis_version
+   if (get32(f) != 0)                               return error(f, VORBIS_invalid_first_page);
+   f->channels = get8(f); if (!f->channels)         return error(f, VORBIS_invalid_first_page);
+   if (f->channels > STB_VORBIS_MAX_CHANNELS)       return error(f, VORBIS_too_many_channels);
+   f->sample_rate = get32(f); if (!f->sample_rate)  return error(f, VORBIS_invalid_first_page);
+   get32(f); // bitrate_maximum
+   get32(f); // bitrate_nominal
+   get32(f); // bitrate_minimum
+   x = get8(f);
+   {
+      int log0,log1;
+      log0 = x & 15;
+      log1 = x >> 4;
+      f->blocksize_0 = 1 << log0;
+      f->blocksize_1 = 1 << log1;
+      if (log0 < 6 || log0 > 13)                       return error(f, VORBIS_invalid_setup);
+      if (log1 < 6 || log1 > 13)                       return error(f, VORBIS_invalid_setup);
+      if (log0 > log1)                                 return error(f, VORBIS_invalid_setup);
+   }
+
+   // framing_flag
+   x = get8(f);
+   if (!(x & 1))                                    return error(f, VORBIS_invalid_first_page);
+
+   // second packet!
+   if (!start_page(f))                              return FALSE;
+
+   if (!start_packet(f))                            return FALSE;
+
+   if (!next_segment(f))                            return FALSE;
+
+   if (get8_packet(f) != VORBIS_packet_comment)            return error(f, VORBIS_invalid_setup);
+   for (i=0; i < 6; ++i) header[i] = get8_packet(f);
+   if (!vorbis_validate(header))                    return error(f, VORBIS_invalid_setup);
+   //file vendor
+   len = get32_packet(f);
+   f->vendor = (char*)setup_malloc(f, sizeof(char) * (len+1));
+   if (f->vendor == NULL)                           return error(f, VORBIS_outofmem);
+   for(i=0; i < len; ++i) {
+      f->vendor[i] = get8_packet(f);
+   }
+   f->vendor[len] = (char)'\0';
+   //user comments
+   f->comment_list_length = get32_packet(f);
+   f->comment_list = NULL;
+   if (f->comment_list_length > 0)
+   {
+      f->comment_list = (char**) setup_malloc(f, sizeof(char*) * (f->comment_list_length));
+      if (f->comment_list == NULL)                  return error(f, VORBIS_outofmem);
+   }
+
+   for(i=0; i < f->comment_list_length; ++i) {
+      len = get32_packet(f);
+      f->comment_list[i] = (char*)setup_malloc(f, sizeof(char) * (len+1));
+      if (f->comment_list[i] == NULL)               return error(f, VORBIS_outofmem);
+
+      for(j=0; j < len; ++j) {
+         f->comment_list[i][j] = get8_packet(f);
+      }
+      f->comment_list[i][len] = (char)'\0';
+   }
+
+   // framing_flag
+   x = get8_packet(f);
+   if (!(x & 1))                                    return error(f, VORBIS_invalid_setup);
+
+
+   skip(f, f->bytes_in_seg);
+   f->bytes_in_seg = 0;
+
+   do {
+      len = next_segment(f);
+      skip(f, len);
+      f->bytes_in_seg = 0;
+   } while (len);
+
+   // third packet!
+   if (!start_packet(f))                            return FALSE;
+
+   #ifndef STB_VORBIS_NO_PUSHDATA_API
+   if (IS_PUSH_MODE(f)) {
+      if (!is_whole_packet_present(f)) {
+         // convert error in ogg header to write type
+         if (f->error == VORBIS_invalid_stream)
+            f->error = VORBIS_invalid_setup;
+         return FALSE;
+      }
+   }
+   #endif
+
+   crc32_init(); // always init it, to avoid multithread race conditions
+
+   if (get8_packet(f) != VORBIS_packet_setup)       return error(f, VORBIS_invalid_setup);
+   for (i=0; i < 6; ++i) header[i] = get8_packet(f);
+   if (!vorbis_validate(header))                    return error(f, VORBIS_invalid_setup);
+
+   // codebooks
+
+   f->codebook_count = get_bits(f,8) + 1;
+   f->codebooks = (Codebook *) setup_malloc(f, sizeof(*f->codebooks) * f->codebook_count);
+   if (f->codebooks == NULL)                        return error(f, VORBIS_outofmem);
+   memset(f->codebooks, 0, sizeof(*f->codebooks) * f->codebook_count);
+   for (i=0; i < f->codebook_count; ++i) {
+      uint32 *values;
+      int ordered, sorted_count;
+      int total=0;
+      uint8 *lengths;
+      Codebook *c = f->codebooks+i;
+      CHECK(f);
+      x = get_bits(f, 8); if (x != 0x42)            return error(f, VORBIS_invalid_setup);
+      x = get_bits(f, 8); if (x != 0x43)            return error(f, VORBIS_invalid_setup);
+      x = get_bits(f, 8); if (x != 0x56)            return error(f, VORBIS_invalid_setup);
+      x = get_bits(f, 8);
+      c->dimensions = (get_bits(f, 8)<<8) + x;
+      x = get_bits(f, 8);
+      y = get_bits(f, 8);
+      c->entries = (get_bits(f, 8)<<16) + (y<<8) + x;
+      ordered = get_bits(f,1);
+      c->sparse = ordered ? 0 : get_bits(f,1);
+
+      if (c->dimensions == 0 && c->entries != 0)    return error(f, VORBIS_invalid_setup);
+
+      if (c->sparse)
+         lengths = (uint8 *) setup_temp_malloc(f, c->entries);
+      else
+         lengths = c->codeword_lengths = (uint8 *) setup_malloc(f, c->entries);
+
+      if (!lengths) return error(f, VORBIS_outofmem);
+
+      if (ordered) {
+         int current_entry = 0;
+         int current_length = get_bits(f,5) + 1;
+         while (current_entry < c->entries) {
+            int limit = c->entries - current_entry;
+            int n = get_bits(f, ilog(limit));
+            if (current_length >= 32) return error(f, VORBIS_invalid_setup);
+            if (current_entry + n > (int) c->entries) { return error(f, VORBIS_invalid_setup); }
+            memset(lengths + current_entry, current_length, n);
+            current_entry += n;
+            ++current_length;
+         }
+      } else {
+         for (j=0; j < c->entries; ++j) {
+            int present = c->sparse ? get_bits(f,1) : 1;
+            if (present) {
+               lengths[j] = get_bits(f, 5) + 1;
+               ++total;
+               if (lengths[j] == 32)
+                  return error(f, VORBIS_invalid_setup);
+            } else {
+               lengths[j] = NO_CODE;
+            }
+         }
+      }
+
+      if (c->sparse && total >= c->entries >> 2) {
+         // convert sparse items to non-sparse!
+         if (c->entries > (int) f->setup_temp_memory_required)
+            f->setup_temp_memory_required = c->entries;
+
+         c->codeword_lengths = (uint8 *) setup_malloc(f, c->entries);
+         if (c->codeword_lengths == NULL) return error(f, VORBIS_outofmem);
+         memcpy(c->codeword_lengths, lengths, c->entries);
+         setup_temp_free(f, lengths, c->entries); // note this is only safe if there have been no intervening temp mallocs!
+         lengths = c->codeword_lengths;
+         c->sparse = 0;
+      }
+
+      // compute the size of the sorted tables
+      if (c->sparse) {
+         sorted_count = total;
+      } else {
+         sorted_count = 0;
+         #ifndef STB_VORBIS_NO_HUFFMAN_BINARY_SEARCH
+         for (j=0; j < c->entries; ++j)
+            if (lengths[j] > STB_VORBIS_FAST_HUFFMAN_LENGTH && lengths[j] != NO_CODE)
+               ++sorted_count;
+         #endif
+      }
+
+      c->sorted_entries = sorted_count;
+      values = NULL;
+
+      CHECK(f);
+      if (!c->sparse) {
+         c->codewords = (uint32 *) setup_malloc(f, sizeof(c->codewords[0]) * c->entries);
+         if (!c->codewords)                  return error(f, VORBIS_outofmem);
+      } else {
+         unsigned int size;
+         if (c->sorted_entries) {
+            c->codeword_lengths = (uint8 *) setup_malloc(f, c->sorted_entries);
+            if (!c->codeword_lengths)           return error(f, VORBIS_outofmem);
+            c->codewords = (uint32 *) setup_temp_malloc(f, sizeof(*c->codewords) * c->sorted_entries);
+            if (!c->codewords)                  return error(f, VORBIS_outofmem);
+            values = (uint32 *) setup_temp_malloc(f, sizeof(*values) * c->sorted_entries);
+            if (!values)                        return error(f, VORBIS_outofmem);
+         }
+         size = c->entries + (sizeof(*c->codewords) + sizeof(*values)) * c->sorted_entries;
+         if (size > f->setup_temp_memory_required)
+            f->setup_temp_memory_required = size;
+      }
+
+      if (!compute_codewords(c, lengths, c->entries, values)) {
+         if (c->sparse) setup_temp_free(f, values, 0);
+         return error(f, VORBIS_invalid_setup);
+      }
+
+      if (c->sorted_entries) {
+         // allocate an extra slot for sentinels
+         c->sorted_codewords = (uint32 *) setup_malloc(f, sizeof(*c->sorted_codewords) * (c->sorted_entries+1));
+         if (c->sorted_codewords == NULL) return error(f, VORBIS_outofmem);
+         // allocate an extra slot at the front so that c->sorted_values[-1] is defined
+         // so that we can catch that case without an extra if
+         c->sorted_values    = ( int   *) setup_malloc(f, sizeof(*c->sorted_values   ) * (c->sorted_entries+1));
+         if (c->sorted_values == NULL) return error(f, VORBIS_outofmem);
+         ++c->sorted_values;
+         c->sorted_values[-1] = -1;
+         compute_sorted_huffman(c, lengths, values);
+      }
+
+      if (c->sparse) {
+         setup_temp_free(f, values, sizeof(*values)*c->sorted_entries);
+         setup_temp_free(f, c->codewords, sizeof(*c->codewords)*c->sorted_entries);
+         setup_temp_free(f, lengths, c->entries);
+         c->codewords = NULL;
+      }
+
+      compute_accelerated_huffman(c);
+
+      CHECK(f);
+      c->lookup_type = get_bits(f, 4);
+      if (c->lookup_type > 2) return error(f, VORBIS_invalid_setup);
+      if (c->lookup_type > 0) {
+         uint16 *mults;
+         c->minimum_value = float32_unpack(get_bits(f, 32));
+         c->delta_value = float32_unpack(get_bits(f, 32));
+         c->value_bits = get_bits(f, 4)+1;
+         c->sequence_p = get_bits(f,1);
+         if (c->lookup_type == 1) {
+            int values = lookup1_values(c->entries, c->dimensions);
+            if (values < 0) return error(f, VORBIS_invalid_setup);
+            c->lookup_values = (uint32) values;
+         } else {
+            c->lookup_values = c->entries * c->dimensions;
+         }
+         if (c->lookup_values == 0) return error(f, VORBIS_invalid_setup);
+         mults = (uint16 *) setup_temp_malloc(f, sizeof(mults[0]) * c->lookup_values);
+         if (mults == NULL) return error(f, VORBIS_outofmem);
+         for (j=0; j < (int) c->lookup_values; ++j) {
+            int q = get_bits(f, c->value_bits);
+            if (q == EOP) { setup_temp_free(f,mults,sizeof(mults[0])*c->lookup_values); return error(f, VORBIS_invalid_setup); }
+            mults[j] = q;
+         }
+
+#ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+         if (c->lookup_type == 1) {
+            int len, sparse = c->sparse;
+            float last=0;
+            // pre-expand the lookup1-style multiplicands, to avoid a divide in the inner loop
+            if (sparse) {
+               if (c->sorted_entries == 0) goto skip;
+               c->multiplicands = (codetype *) setup_malloc(f, sizeof(c->multiplicands[0]) * c->sorted_entries * c->dimensions);
+            } else
+               c->multiplicands = (codetype *) setup_malloc(f, sizeof(c->multiplicands[0]) * c->entries        * c->dimensions);
+            if (c->multiplicands == NULL) { setup_temp_free(f,mults,sizeof(mults[0])*c->lookup_values); return error(f, VORBIS_outofmem); }
+            len = sparse ? c->sorted_entries : c->entries;
+            for (j=0; j < len; ++j) {
+               unsigned int z = sparse ? c->sorted_values[j] : j;
+               unsigned int div=1;
+               for (k=0; k < c->dimensions; ++k) {
+                  int off = (z / div) % c->lookup_values;
+                  float val = mults[off]*c->delta_value + c->minimum_value + last;
+                  c->multiplicands[j*c->dimensions + k] = val;
+                  if (c->sequence_p)
+                     last = val;
+                  if (k+1 < c->dimensions) {
+                     if (div > UINT_MAX / (unsigned int) c->lookup_values) {
+                        setup_temp_free(f, mults,sizeof(mults[0])*c->lookup_values);
+                        return error(f, VORBIS_invalid_setup);
+                     }
+                     div *= c->lookup_values;
+                  }
+               }
+            }
+            c->lookup_type = 2;
+         }
+         else
+#endif
+         {
+            float last=0;
+            CHECK(f);
+            c->multiplicands = (codetype *) setup_malloc(f, sizeof(c->multiplicands[0]) * c->lookup_values);
+            if (c->multiplicands == NULL) { setup_temp_free(f, mults,sizeof(mults[0])*c->lookup_values); return error(f, VORBIS_outofmem); }
+            for (j=0; j < (int) c->lookup_values; ++j) {
+               float val = mults[j] * c->delta_value + c->minimum_value + last;
+               c->multiplicands[j] = val;
+               if (c->sequence_p)
+                  last = val;
+            }
+         }
+#ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+        skip:;
+#endif
+         setup_temp_free(f, mults, sizeof(mults[0])*c->lookup_values);
+
+         CHECK(f);
+      }
+      CHECK(f);
+   }
+
+   // time domain transfers (notused)
+
+   x = get_bits(f, 6) + 1;
+   for (i=0; i < x; ++i) {
+      uint32 z = get_bits(f, 16);
+      if (z != 0) return error(f, VORBIS_invalid_setup);
+   }
+
+   // Floors
+   f->floor_count = get_bits(f, 6)+1;
+   f->floor_config = (Floor *)  setup_malloc(f, f->floor_count * sizeof(*f->floor_config));
+   if (f->floor_config == NULL) return error(f, VORBIS_outofmem);
+   for (i=0; i < f->floor_count; ++i) {
+      f->floor_types[i] = get_bits(f, 16);
+      if (f->floor_types[i] > 1) return error(f, VORBIS_invalid_setup);
+      if (f->floor_types[i] == 0) {
+         Floor0 *g = &f->floor_config[i].floor0;
+         g->order = get_bits(f,8);
+         g->rate = get_bits(f,16);
+         g->bark_map_size = get_bits(f,16);
+         g->amplitude_bits = get_bits(f,6);
+         g->amplitude_offset = get_bits(f,8);
+         g->number_of_books = get_bits(f,4) + 1;
+         for (j=0; j < g->number_of_books; ++j)
+            g->book_list[j] = get_bits(f,8);
+         return error(f, VORBIS_feature_not_supported);
+      } else {
+         stbv__floor_ordering p[31*8+2];
+         Floor1 *g = &f->floor_config[i].floor1;
+         int max_class = -1;
+         g->partitions = get_bits(f, 5);
+         for (j=0; j < g->partitions; ++j) {
+            g->partition_class_list[j] = get_bits(f, 4);
+            if (g->partition_class_list[j] > max_class)
+               max_class = g->partition_class_list[j];
+         }
+         for (j=0; j <= max_class; ++j) {
+            g->class_dimensions[j] = get_bits(f, 3)+1;
+            g->class_subclasses[j] = get_bits(f, 2);
+            if (g->class_subclasses[j]) {
+               g->class_masterbooks[j] = get_bits(f, 8);
+               if (g->class_masterbooks[j] >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+            }
+            for (k=0; k < 1 << g->class_subclasses[j]; ++k) {
+               g->subclass_books[j][k] = (int16)get_bits(f,8)-1;
+               if (g->subclass_books[j][k] >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+            }
+         }
+         g->floor1_multiplier = get_bits(f,2)+1;
+         g->rangebits = get_bits(f,4);
+         g->Xlist[0] = 0;
+         g->Xlist[1] = 1 << g->rangebits;
+         g->values = 2;
+         for (j=0; j < g->partitions; ++j) {
+            int c = g->partition_class_list[j];
+            for (k=0; k < g->class_dimensions[c]; ++k) {
+               g->Xlist[g->values] = get_bits(f, g->rangebits);
+               ++g->values;
+            }
+         }
+         // precompute the sorting
+         for (j=0; j < g->values; ++j) {
+            p[j].x = g->Xlist[j];
+            p[j].id = j;
+         }
+         qsort(p, g->values, sizeof(p[0]), point_compare);
+         for (j=0; j < g->values-1; ++j)
+            if (p[j].x == p[j+1].x)
+               return error(f, VORBIS_invalid_setup);
+         for (j=0; j < g->values; ++j)
+            g->sorted_order[j] = (uint8) p[j].id;
+         // precompute the neighbors
+         for (j=2; j < g->values; ++j) {
+            int low = 0,hi = 0;
+            neighbors(g->Xlist, j, &low,&hi);
+            g->neighbors[j][0] = low;
+            g->neighbors[j][1] = hi;
+         }
+
+         if (g->values > longest_floorlist)
+            longest_floorlist = g->values;
+      }
+   }
+
+   // Residue
+   f->residue_count = get_bits(f, 6)+1;
+   f->residue_config = (Residue *) setup_malloc(f, f->residue_count * sizeof(f->residue_config[0]));
+   if (f->residue_config == NULL) return error(f, VORBIS_outofmem);
+   memset(f->residue_config, 0, f->residue_count * sizeof(f->residue_config[0]));
+   for (i=0; i < f->residue_count; ++i) {
+      uint8 residue_cascade[64];
+      Residue *r = f->residue_config+i;
+      f->residue_types[i] = get_bits(f, 16);
+      if (f->residue_types[i] > 2) return error(f, VORBIS_invalid_setup);
+      r->begin = get_bits(f, 24);
+      r->end = get_bits(f, 24);
+      if (r->end < r->begin) return error(f, VORBIS_invalid_setup);
+      r->part_size = get_bits(f,24)+1;
+      r->classifications = get_bits(f,6)+1;
+      r->classbook = get_bits(f,8);
+      if (r->classbook >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+      for (j=0; j < r->classifications; ++j) {
+         uint8 high_bits=0;
+         uint8 low_bits=get_bits(f,3);
+         if (get_bits(f,1))
+            high_bits = get_bits(f,5);
+         residue_cascade[j] = high_bits*8 + low_bits;
+      }
+      r->residue_books = (short (*)[8]) setup_malloc(f, sizeof(r->residue_books[0]) * r->classifications);
+      if (r->residue_books == NULL) return error(f, VORBIS_outofmem);
+      for (j=0; j < r->classifications; ++j) {
+         for (k=0; k < 8; ++k) {
+            if (residue_cascade[j] & (1 << k)) {
+               r->residue_books[j][k] = get_bits(f, 8);
+               if (r->residue_books[j][k] >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+            } else {
+               r->residue_books[j][k] = -1;
+            }
+         }
+      }
+      // precompute the classifications[] array to avoid inner-loop mod/divide
+      // call it 'classdata' since we already have r->classifications
+      r->classdata = (uint8 **) setup_malloc(f, sizeof(*r->classdata) * f->codebooks[r->classbook].entries);
+      if (!r->classdata) return error(f, VORBIS_outofmem);
+      memset(r->classdata, 0, sizeof(*r->classdata) * f->codebooks[r->classbook].entries);
+      for (j=0; j < f->codebooks[r->classbook].entries; ++j) {
+         int classwords = f->codebooks[r->classbook].dimensions;
+         int temp = j;
+         r->classdata[j] = (uint8 *) setup_malloc(f, sizeof(r->classdata[j][0]) * classwords);
+         if (r->classdata[j] == NULL) return error(f, VORBIS_outofmem);
+         for (k=classwords-1; k >= 0; --k) {
+            r->classdata[j][k] = temp % r->classifications;
+            temp /= r->classifications;
+         }
+      }
+   }
+
+   f->mapping_count = get_bits(f,6)+1;
+   f->mapping = (Mapping *) setup_malloc(f, f->mapping_count * sizeof(*f->mapping));
+   if (f->mapping == NULL) return error(f, VORBIS_outofmem);
+   memset(f->mapping, 0, f->mapping_count * sizeof(*f->mapping));
+   for (i=0; i < f->mapping_count; ++i) {
+      Mapping *m = f->mapping + i;
+      int mapping_type = get_bits(f,16);
+      if (mapping_type != 0) return error(f, VORBIS_invalid_setup);
+      m->chan = (MappingChannel *) setup_malloc(f, f->channels * sizeof(*m->chan));
+      if (m->chan == NULL) return error(f, VORBIS_outofmem);
+      if (get_bits(f,1))
+         m->submaps = get_bits(f,4)+1;
+      else
+         m->submaps = 1;
+      if (m->submaps > max_submaps)
+         max_submaps = m->submaps;
+      if (get_bits(f,1)) {
+         m->coupling_steps = get_bits(f,8)+1;
+         if (m->coupling_steps > f->channels) return error(f, VORBIS_invalid_setup);
+         for (k=0; k < m->coupling_steps; ++k) {
+            m->chan[k].magnitude = get_bits(f, ilog(f->channels-1));
+            m->chan[k].angle = get_bits(f, ilog(f->channels-1));
+            if (m->chan[k].magnitude >= f->channels)        return error(f, VORBIS_invalid_setup);
+            if (m->chan[k].angle     >= f->channels)        return error(f, VORBIS_invalid_setup);
+            if (m->chan[k].magnitude == m->chan[k].angle)   return error(f, VORBIS_invalid_setup);
+         }
+      } else
+         m->coupling_steps = 0;
+
+      // reserved field
+      if (get_bits(f,2)) return error(f, VORBIS_invalid_setup);
+      if (m->submaps > 1) {
+         for (j=0; j < f->channels; ++j) {
+            m->chan[j].mux = get_bits(f, 4);
+            if (m->chan[j].mux >= m->submaps)                return error(f, VORBIS_invalid_setup);
+         }
+      } else
+         // @SPECIFICATION: this case is missing from the spec
+         for (j=0; j < f->channels; ++j)
+            m->chan[j].mux = 0;
+
+      for (j=0; j < m->submaps; ++j) {
+         get_bits(f,8); // discard
+         m->submap_floor[j] = get_bits(f,8);
+         m->submap_residue[j] = get_bits(f,8);
+         if (m->submap_floor[j] >= f->floor_count)      return error(f, VORBIS_invalid_setup);
+         if (m->submap_residue[j] >= f->residue_count)  return error(f, VORBIS_invalid_setup);
+      }
+   }
+
+   // Modes
+   f->mode_count = get_bits(f, 6)+1;
+   for (i=0; i < f->mode_count; ++i) {
+      Mode *m = f->mode_config+i;
+      m->blockflag = get_bits(f,1);
+      m->windowtype = get_bits(f,16);
+      m->transformtype = get_bits(f,16);
+      m->mapping = get_bits(f,8);
+      if (m->windowtype != 0)                 return error(f, VORBIS_invalid_setup);
+      if (m->transformtype != 0)              return error(f, VORBIS_invalid_setup);
+      if (m->mapping >= f->mapping_count)     return error(f, VORBIS_invalid_setup);
+   }
+
+   flush_packet(f);
+
+   f->previous_length = 0;
+
+   for (i=0; i < f->channels; ++i) {
+      f->channel_buffers[i] = (float *) setup_malloc(f, sizeof(float) * f->blocksize_1);
+      f->previous_window[i] = (float *) setup_malloc(f, sizeof(float) * f->blocksize_1/2);
+      f->finalY[i]          = (int16 *) setup_malloc(f, sizeof(int16) * longest_floorlist);
+      if (f->channel_buffers[i] == NULL || f->previous_window[i] == NULL || f->finalY[i] == NULL) return error(f, VORBIS_outofmem);
+      memset(f->channel_buffers[i], 0, sizeof(float) * f->blocksize_1);
+      #ifdef STB_VORBIS_NO_DEFER_FLOOR
+      f->floor_buffers[i]   = (float *) setup_malloc(f, sizeof(float) * f->blocksize_1/2);
+      if (f->floor_buffers[i] == NULL) return error(f, VORBIS_outofmem);
+      #endif
+   }
+
+   if (!init_blocksize(f, 0, f->blocksize_0)) return FALSE;
+   if (!init_blocksize(f, 1, f->blocksize_1)) return FALSE;
+   f->blocksize[0] = f->blocksize_0;
+   f->blocksize[1] = f->blocksize_1;
+
+#ifdef STB_VORBIS_DIVIDE_TABLE
+   if (integer_divide_table[1][1]==0)
+      for (i=0; i < DIVTAB_NUMER; ++i)
+         for (j=1; j < DIVTAB_DENOM; ++j)
+            integer_divide_table[i][j] = i / j;
+#endif
+
+   // compute how much temporary memory is needed
+
+   // 1.
+   {
+      uint32 imdct_mem = (f->blocksize_1 * sizeof(float) >> 1);
+      uint32 classify_mem;
+      int i,max_part_read=0;
+      for (i=0; i < f->residue_count; ++i) {
+         Residue *r = f->residue_config + i;
+         unsigned int actual_size = f->blocksize_1 / 2;
+         unsigned int limit_r_begin = r->begin < actual_size ? r->begin : actual_size;
+         unsigned int limit_r_end   = r->end   < actual_size ? r->end   : actual_size;
+         int n_read = limit_r_end - limit_r_begin;
+         int part_read = n_read / r->part_size;
+         if (part_read > max_part_read)
+            max_part_read = part_read;
+      }
+      #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+      classify_mem = f->channels * (sizeof(void*) + max_part_read * sizeof(uint8 *));
+      #else
+      classify_mem = f->channels * (sizeof(void*) + max_part_read * sizeof(int *));
+      #endif
+
+      // maximum reasonable partition size is f->blocksize_1
+
+      f->temp_memory_required = classify_mem;
+      if (imdct_mem > f->temp_memory_required)
+         f->temp_memory_required = imdct_mem;
+   }
+
+
+   if (f->alloc.alloc_buffer) {
+      assert(f->temp_offset == f->alloc.alloc_buffer_length_in_bytes);
+      // check if there's enough temp memory so we don't error later
+      if (f->setup_offset + sizeof(*f) + f->temp_memory_required > (unsigned) f->temp_offset)
+         return error(f, VORBIS_outofmem);
+   }
+
+   // @TODO: stb_vorbis_seek_start expects first_audio_page_offset to point to a page
+   // without PAGEFLAG_continued_packet, so this either points to the first page, or
+   // the page after the end of the headers. It might be cleaner to point to a page
+   // in the middle of the headers, when that's the page where the first audio packet
+   // starts, but we'd have to also correctly skip the end of any continued packet in
+   // stb_vorbis_seek_start.
+   if (f->next_seg == -1) {
+      f->first_audio_page_offset = stb_vorbis_get_file_offset(f);
+   } else {
+      f->first_audio_page_offset = 0;
+   }
+
+   return TRUE;
+}
+
+static void vorbis_deinit(stb_vorbis *p)
+{
+   int i,j;
+
+   setup_free(p, p->vendor);
+   for (i=0; i < p->comment_list_length; ++i) {
+      setup_free(p, p->comment_list[i]);
+   }
+   setup_free(p, p->comment_list);
+
+   if (p->residue_config) {
+      for (i=0; i < p->residue_count; ++i) {
+         Residue *r = p->residue_config+i;
+         if (r->classdata) {
+            for (j=0; j < p->codebooks[r->classbook].entries; ++j)
+               setup_free(p, r->classdata[j]);
+            setup_free(p, r->classdata);
+         }
+         setup_free(p, r->residue_books);
+      }
+   }
+
+   if (p->codebooks) {
+      CHECK(p);
+      for (i=0; i < p->codebook_count; ++i) {
+         Codebook *c = p->codebooks + i;
+         setup_free(p, c->codeword_lengths);
+         setup_free(p, c->multiplicands);
+         setup_free(p, c->codewords);
+         setup_free(p, c->sorted_codewords);
+         // c->sorted_values[-1] is the first entry in the array
+         setup_free(p, c->sorted_values ? c->sorted_values-1 : NULL);
+      }
+      setup_free(p, p->codebooks);
+   }
+   setup_free(p, p->floor_config);
+   setup_free(p, p->residue_config);
+   if (p->mapping) {
+      for (i=0; i < p->mapping_count; ++i)
+         setup_free(p, p->mapping[i].chan);
+      setup_free(p, p->mapping);
+   }
+   CHECK(p);
+   for (i=0; i < p->channels && i < STB_VORBIS_MAX_CHANNELS; ++i) {
+      setup_free(p, p->channel_buffers[i]);
+      setup_free(p, p->previous_window[i]);
+      #ifdef STB_VORBIS_NO_DEFER_FLOOR
+      setup_free(p, p->floor_buffers[i]);
+      #endif
+      setup_free(p, p->finalY[i]);
+   }
+   for (i=0; i < 2; ++i) {
+      setup_free(p, p->A[i]);
+      setup_free(p, p->B[i]);
+      setup_free(p, p->C[i]);
+      setup_free(p, p->window[i]);
+      setup_free(p, p->bit_reverse[i]);
+   }
+   #ifndef STB_VORBIS_NO_STDIO
+   if (p->close_on_free) fclose(p->f);
+   #endif
+}
+
+void stb_vorbis_close(stb_vorbis *p)
+{
+   if (p == NULL) return;
+   vorbis_deinit(p);
+   setup_free(p,p);
+}
+
+static void vorbis_init(stb_vorbis *p, const stb_vorbis_alloc *z)
+{
+   memset(p, 0, sizeof(*p)); // NULL out all malloc'd pointers to start
+   if (z) {
+      p->alloc = *z;
+      p->alloc.alloc_buffer_length_in_bytes &= ~7;
+      p->temp_offset = p->alloc.alloc_buffer_length_in_bytes;
+   }
+   p->eof = 0;
+   p->error = VORBIS__no_error;
+   p->stream = NULL;
+   p->codebooks = NULL;
+   p->page_crc_tests = -1;
+   #ifndef STB_VORBIS_NO_STDIO
+   p->close_on_free = FALSE;
+   p->f = NULL;
+   #endif
+}
+
+int stb_vorbis_get_sample_offset(stb_vorbis *f)
+{
+   if (f->current_loc_valid)
+      return f->current_loc;
+   else
+      return -1;
+}
+
+stb_vorbis_info stb_vorbis_get_info(stb_vorbis *f)
+{
+   stb_vorbis_info d;
+   d.channels = f->channels;
+   d.sample_rate = f->sample_rate;
+   d.setup_memory_required = f->setup_memory_required;
+   d.setup_temp_memory_required = f->setup_temp_memory_required;
+   d.temp_memory_required = f->temp_memory_required;
+   d.max_frame_size = f->blocksize_1 >> 1;
+   return d;
+}
+
+stb_vorbis_comment stb_vorbis_get_comment(stb_vorbis *f)
+{
+   stb_vorbis_comment d;
+   d.vendor = f->vendor;
+   d.comment_list_length = f->comment_list_length;
+   d.comment_list = f->comment_list;
+   return d;
+}
+
+int stb_vorbis_get_error(stb_vorbis *f)
+{
+   int e = f->error;
+   f->error = VORBIS__no_error;
+   return e;
+}
+
+static stb_vorbis * vorbis_alloc(stb_vorbis *f)
+{
+   stb_vorbis *p = (stb_vorbis *) setup_malloc(f, sizeof(*p));
+   return p;
+}
+
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+
+void stb_vorbis_flush_pushdata(stb_vorbis *f)
+{
+   f->previous_length = 0;
+   f->page_crc_tests  = 0;
+   f->discard_samples_deferred = 0;
+   f->current_loc_valid = FALSE;
+   f->first_decode = FALSE;
+   f->samples_output = 0;
+   f->channel_buffer_start = 0;
+   f->channel_buffer_end = 0;
+}
+
+static int vorbis_search_for_page_pushdata(vorb *f, uint8 *data, int data_len)
+{
+   int i,n;
+   for (i=0; i < f->page_crc_tests; ++i)
+      f->scan[i].bytes_done = 0;
+
+   // if we have room for more scans, search for them first, because
+   // they may cause us to stop early if their header is incomplete
+   if (f->page_crc_tests < STB_VORBIS_PUSHDATA_CRC_COUNT) {
+      if (data_len < 4) return 0;
+      data_len -= 3; // need to look for 4-byte sequence, so don't miss
+                     // one that straddles a boundary
+      for (i=0; i < data_len; ++i) {
+         if (data[i] == 0x4f) {
+            if (0==memcmp(data+i, ogg_page_header, 4)) {
+               int j,len;
+               uint32 crc;
+               // make sure we have the whole page header
+               if (i+26 >= data_len || i+27+data[i+26] >= data_len) {
+                  // only read up to this page start, so hopefully we'll
+                  // have the whole page header start next time
+                  data_len = i;
+                  break;
+               }
+               // ok, we have it all; compute the length of the page
+               len = 27 + data[i+26];
+               for (j=0; j < data[i+26]; ++j)
+                  len += data[i+27+j];
+               // scan everything up to the embedded crc (which we must 0)
+               crc = 0;
+               for (j=0; j < 22; ++j)
+                  crc = crc32_update(crc, data[i+j]);
+               // now process 4 0-bytes
+               for (   ; j < 26; ++j)
+                  crc = crc32_update(crc, 0);
+               // len is the total number of bytes we need to scan
+               n = f->page_crc_tests++;
+               f->scan[n].bytes_left = len-j;
+               f->scan[n].crc_so_far = crc;
+               f->scan[n].goal_crc = data[i+22] + (data[i+23] << 8) + (data[i+24]<<16) + (data[i+25]<<24);
+               // if the last frame on a page is continued to the next, then
+               // we can't recover the sample_loc immediately
+               if (data[i+27+data[i+26]-1] == 255)
+                  f->scan[n].sample_loc = ~0;
+               else
+                  f->scan[n].sample_loc = data[i+6] + (data[i+7] << 8) + (data[i+ 8]<<16) + (data[i+ 9]<<24);
+               f->scan[n].bytes_done = i+j;
+               if (f->page_crc_tests == STB_VORBIS_PUSHDATA_CRC_COUNT)
+                  break;
+               // keep going if we still have room for more
+            }
+         }
+      }
+   }
+
+   for (i=0; i < f->page_crc_tests;) {
+      uint32 crc;
+      int j;
+      int n = f->scan[i].bytes_done;
+      int m = f->scan[i].bytes_left;
+      if (m > data_len - n) m = data_len - n;
+      // m is the bytes to scan in the current chunk
+      crc = f->scan[i].crc_so_far;
+      for (j=0; j < m; ++j)
+         crc = crc32_update(crc, data[n+j]);
+      f->scan[i].bytes_left -= m;
+      f->scan[i].crc_so_far = crc;
+      if (f->scan[i].bytes_left == 0) {
+         // does it match?
+         if (f->scan[i].crc_so_far == f->scan[i].goal_crc) {
+            // Houston, we have page
+            data_len = n+m; // consumption amount is wherever that scan ended
+            f->page_crc_tests = -1; // drop out of page scan mode
+            f->previous_length = 0; // decode-but-don't-output one frame
+            f->next_seg = -1;       // start a new page
+            f->current_loc = f->scan[i].sample_loc; // set the current sample location
+                                    // to the amount we'd have decoded had we decoded this page
+            f->current_loc_valid = f->current_loc != ~0U;
+            return data_len;
+         }
+         // delete entry
+         f->scan[i] = f->scan[--f->page_crc_tests];
+      } else {
+         ++i;
+      }
+   }
+
+   return data_len;
+}
+
+// return value: number of bytes we used
+int stb_vorbis_decode_frame_pushdata(
+         stb_vorbis *f,                   // the file we're decoding
+         const uint8 *data, int data_len, // the memory available for decoding
+         int *channels,                   // place to write number of float * buffers
+         float ***output,                 // place to write float ** array of float * buffers
+         int *samples                     // place to write number of output samples
+     )
+{
+   int i;
+   int len,right,left;
+
+   if (!IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+
+   if (f->page_crc_tests >= 0) {
+      *samples = 0;
+      return vorbis_search_for_page_pushdata(f, (uint8 *) data, data_len);
+   }
+
+   f->stream     = (uint8 *) data;
+   f->stream_end = (uint8 *) data + data_len;
+   f->error      = VORBIS__no_error;
+
+   // check that we have the entire packet in memory
+   if (!is_whole_packet_present(f)) {
+      *samples = 0;
+      return 0;
+   }
+
+   if (!vorbis_decode_packet(f, &len, &left, &right)) {
+      // save the actual error we encountered
+      enum STBVorbisError error = f->error;
+      if (error == VORBIS_bad_packet_type) {
+         // flush and resynch
+         f->error = VORBIS__no_error;
+         while (get8_packet(f) != EOP)
+            if (f->eof) break;
+         *samples = 0;
+         return (int) (f->stream - data);
+      }
+      if (error == VORBIS_continued_packet_flag_invalid) {
+         if (f->previous_length == 0) {
+            // we may be resynching, in which case it's ok to hit one
+            // of these; just discard the packet
+            f->error = VORBIS__no_error;
+            while (get8_packet(f) != EOP)
+               if (f->eof) break;
+            *samples = 0;
+            return (int) (f->stream - data);
+         }
+      }
+      // if we get an error while parsing, what to do?
+      // well, it DEFINITELY won't work to continue from where we are!
+      stb_vorbis_flush_pushdata(f);
+      // restore the error that actually made us bail
+      f->error = error;
+      *samples = 0;
+      return 1;
+   }
+
+   // success!
+   len = vorbis_finish_frame(f, len, left, right);
+   for (i=0; i < f->channels; ++i)
+      f->outputs[i] = f->channel_buffers[i] + left;
+
+   if (channels) *channels = f->channels;
+   *samples = len;
+   *output = f->outputs;
+   return (int) (f->stream - data);
+}
+
+stb_vorbis *stb_vorbis_open_pushdata(
+         const unsigned char *data, int data_len, // the memory available for decoding
+         int *data_used,              // only defined if result is not NULL
+         int *error, const stb_vorbis_alloc *alloc)
+{
+   stb_vorbis *f, p;
+   vorbis_init(&p, alloc);
+   p.stream     = (uint8 *) data;
+   p.stream_end = (uint8 *) data + data_len;
+   p.push_mode  = TRUE;
+   if (!start_decoder(&p)) {
+      if (p.eof)
+         *error = VORBIS_need_more_data;
+      else
+         *error = p.error;
+      vorbis_deinit(&p);
+      return NULL;
+   }
+   f = vorbis_alloc(&p);
+   if (f) {
+      *f = p;
+      *data_used = (int) (f->stream - data);
+      *error = 0;
+      return f;
+   } else {
+      vorbis_deinit(&p);
+      return NULL;
+   }
+}
+#endif // STB_VORBIS_NO_PUSHDATA_API
+
+unsigned int stb_vorbis_get_file_offset(stb_vorbis *f)
+{
+   #ifndef STB_VORBIS_NO_PUSHDATA_API
+   if (f->push_mode) return 0;
+   #endif
+   if (USE_MEMORY(f)) return (unsigned int) (f->stream - f->stream_start);
+   #ifndef STB_VORBIS_NO_STDIO
+   return (unsigned int) (ftell(f->f) - f->f_start);
+   #endif
+}
+
+#ifndef STB_VORBIS_NO_PULLDATA_API
+//
+// DATA-PULLING API
+//
+
+static uint32 vorbis_find_page(stb_vorbis *f, uint32 *end, uint32 *last)
+{
+   for(;;) {
+      int n;
+      if (f->eof) return 0;
+      n = get8(f);
+      if (n == 0x4f) { // page header candidate
+         unsigned int retry_loc = stb_vorbis_get_file_offset(f);
+         int i;
+         // check if we're off the end of a file_section stream
+         if (retry_loc - 25 > f->stream_len)
+            return 0;
+         // check the rest of the header
+         for (i=1; i < 4; ++i)
+            if (get8(f) != ogg_page_header[i])
+               break;
+         if (f->eof) return 0;
+         if (i == 4) {
+            uint8 header[27];
+            uint32 i, crc, goal, len;
+            for (i=0; i < 4; ++i)
+               header[i] = ogg_page_header[i];
+            for (; i < 27; ++i)
+               header[i] = get8(f);
+            if (f->eof) return 0;
+            if (header[4] != 0) goto invalid;
+            goal = header[22] + (header[23] << 8) + (header[24]<<16) + ((uint32)header[25]<<24);
+            for (i=22; i < 26; ++i)
+               header[i] = 0;
+            crc = 0;
+            for (i=0; i < 27; ++i)
+               crc = crc32_update(crc, header[i]);
+            len = 0;
+            for (i=0; i < header[26]; ++i) {
+               int s = get8(f);
+               crc = crc32_update(crc, s);
+               len += s;
+            }
+            if (len && f->eof) return 0;
+            for (i=0; i < len; ++i)
+               crc = crc32_update(crc, get8(f));
+            // finished parsing probable page
+            if (crc == goal) {
+               // we could now check that it's either got the last
+               // page flag set, OR it's followed by the capture
+               // pattern, but I guess TECHNICALLY you could have
+               // a file with garbage between each ogg page and recover
+               // from it automatically? So even though that paranoia
+               // might decrease the chance of an invalid decode by
+               // another 2^32, not worth it since it would hose those
+               // invalid-but-useful files?
+               if (end)
+                  *end = stb_vorbis_get_file_offset(f);
+               if (last) {
+                  if (header[5] & 0x04)
+                     *last = 1;
+                  else
+                     *last = 0;
+               }
+               set_file_offset(f, retry_loc-1);
+               return 1;
+            }
+         }
+        invalid:
+         // not a valid page, so rewind and look for next one
+         set_file_offset(f, retry_loc);
+      }
+   }
+}
+
+
+#define SAMPLE_unknown  0xffffffff
+
+// seeking is implemented with a binary search, which narrows down the range to
+// 64K, before using a linear search (because finding the synchronization
+// pattern can be expensive, and the chance we'd find the end page again is
+// relatively high for small ranges)
+//
+// two initial interpolation-style probes are used at the start of the search
+// to try to bound either side of the binary search sensibly, while still
+// working in O(log n) time if they fail.
+
+static int get_seek_page_info(stb_vorbis *f, ProbedPage *z)
+{
+   uint8 header[27], lacing[255];
+   int i,len;
+
+   // record where the page starts
+   z->page_start = stb_vorbis_get_file_offset(f);
+
+   // parse the header
+   getn(f, header, 27);
+   if (header[0] != 'O' || header[1] != 'g' || header[2] != 'g' || header[3] != 'S')
+      return 0;
+   getn(f, lacing, header[26]);
+
+   // determine the length of the payload
+   len = 0;
+   for (i=0; i < header[26]; ++i)
+      len += lacing[i];
+
+   // this implies where the page ends
+   z->page_end = z->page_start + 27 + header[26] + len;
+
+   // read the last-decoded sample out of the data
+   z->last_decoded_sample = header[6] + (header[7] << 8) + (header[8] << 16) + (header[9] << 24);
+
+   // restore file state to where we were
+   set_file_offset(f, z->page_start);
+   return 1;
+}
+
+// rarely used function to seek back to the preceding page while finding the
+// start of a packet
+static int go_to_page_before(stb_vorbis *f, unsigned int limit_offset)
+{
+   unsigned int previous_safe, end;
+
+   // now we want to seek back 64K from the limit
+   if (limit_offset >= 65536 && limit_offset-65536 >= f->first_audio_page_offset)
+      previous_safe = limit_offset - 65536;
+   else
+      previous_safe = f->first_audio_page_offset;
+
+   set_file_offset(f, previous_safe);
+
+   while (vorbis_find_page(f, &end, NULL)) {
+      if (end >= limit_offset && stb_vorbis_get_file_offset(f) < limit_offset)
+         return 1;
+      set_file_offset(f, end);
+   }
+
+   return 0;
+}
+
+// implements the search logic for finding a page and starting decoding. if
+// the function succeeds, current_loc_valid will be true and current_loc will
+// be less than or equal to the provided sample number (the closer the
+// better).
+static int seek_to_sample_coarse(stb_vorbis *f, uint32 sample_number)
+{
+   ProbedPage left, right, mid;
+   int i, start_seg_with_known_loc, end_pos, page_start;
+   uint32 delta, stream_length, padding, last_sample_limit;
+   double offset = 0.0, bytes_per_sample = 0.0;
+   int probe = 0;
+
+   // find the last page and validate the target sample
+   stream_length = stb_vorbis_stream_length_in_samples(f);
+   if (stream_length == 0)            return error(f, VORBIS_seek_without_length);
+   if (sample_number > stream_length) return error(f, VORBIS_seek_invalid);
+
+   // this is the maximum difference between the window-center (which is the
+   // actual granule position value), and the right-start (which the spec
+   // indicates should be the granule position (give or take one)).
+   padding = ((f->blocksize_1 - f->blocksize_0) >> 2);
+   if (sample_number < padding)
+      last_sample_limit = 0;
+   else
+      last_sample_limit = sample_number - padding;
+
+   left = f->p_first;
+   while (left.last_decoded_sample == ~0U) {
+      // (untested) the first page does not have a 'last_decoded_sample'
+      set_file_offset(f, left.page_end);
+      if (!get_seek_page_info(f, &left)) goto error;
+   }
+
+   right = f->p_last;
+   assert(right.last_decoded_sample != ~0U);
+
+   // starting from the start is handled differently
+   if (last_sample_limit <= left.last_decoded_sample) {
+      if (stb_vorbis_seek_start(f)) {
+         if (f->current_loc > sample_number)
+            return error(f, VORBIS_seek_failed);
+         return 1;
+      }
+      return 0;
+   }
+
+   while (left.page_end != right.page_start) {
+      assert(left.page_end < right.page_start);
+      // search range in bytes
+      delta = right.page_start - left.page_end;
+      if (delta <= 65536) {
+         // there's only 64K left to search - handle it linearly
+         set_file_offset(f, left.page_end);
+      } else {
+         if (probe < 2) {
+            if (probe == 0) {
+               // first probe (interpolate)
+               double data_bytes = right.page_end - left.page_start;
+               bytes_per_sample = data_bytes / right.last_decoded_sample;
+               offset = left.page_start + bytes_per_sample * (last_sample_limit - left.last_decoded_sample);
+            } else {
+               // second probe (try to bound the other side)
+               double error = ((double) last_sample_limit - mid.last_decoded_sample) * bytes_per_sample;
+               if (error >= 0 && error <  8000) error =  8000;
+               if (error <  0 && error > -8000) error = -8000;
+               offset += error * 2;
+            }
+
+            // ensure the offset is valid
+            if (offset < left.page_end)
+               offset = left.page_end;
+            if (offset > right.page_start - 65536)
+               offset = right.page_start - 65536;
+
+            set_file_offset(f, (unsigned int) offset);
+         } else {
+            // binary search for large ranges (offset by 32K to ensure
+            // we don't hit the right page)
+            set_file_offset(f, left.page_end + (delta / 2) - 32768);
+         }
+
+         if (!vorbis_find_page(f, NULL, NULL)) goto error;
+      }
+
+      for (;;) {
+         if (!get_seek_page_info(f, &mid)) goto error;
+         if (mid.last_decoded_sample != ~0U) break;
+         // (untested) no frames end on this page
+         set_file_offset(f, mid.page_end);
+         assert(mid.page_start < right.page_start);
+      }
+
+      // if we've just found the last page again then we're in a tricky file,
+      // and we're close enough (if it wasn't an interpolation probe).
+      if (mid.page_start == right.page_start) {
+         if (probe >= 2 || delta <= 65536)
+            break;
+      } else {
+         if (last_sample_limit < mid.last_decoded_sample)
+            right = mid;
+         else
+            left = mid;
+      }
+
+      ++probe;
+   }
+
+   // seek back to start of the last packet
+   page_start = left.page_start;
+   set_file_offset(f, page_start);
+   if (!start_page(f)) return error(f, VORBIS_seek_failed);
+   end_pos = f->end_seg_with_known_loc;
+   assert(end_pos >= 0);
+
+   for (;;) {
+      for (i = end_pos; i > 0; --i)
+         if (f->segments[i-1] != 255)
+            break;
+
+      start_seg_with_known_loc = i;
+
+      if (start_seg_with_known_loc > 0 || !(f->page_flag & PAGEFLAG_continued_packet))
+         break;
+
+      // (untested) the final packet begins on an earlier page
+      if (!go_to_page_before(f, page_start))
+         goto error;
+
+      page_start = stb_vorbis_get_file_offset(f);
+      if (!start_page(f)) goto error;
+      end_pos = f->segment_count - 1;
+   }
+
+   // prepare to start decoding
+   f->current_loc_valid = FALSE;
+   f->last_seg = FALSE;
+   f->valid_bits = 0;
+   f->packet_bytes = 0;
+   f->bytes_in_seg = 0;
+   f->previous_length = 0;
+   f->next_seg = start_seg_with_known_loc;
+
+   for (i = 0; i < start_seg_with_known_loc; i++)
+      skip(f, f->segments[i]);
+
+   // start decoding (optimizable - this frame is generally discarded)
+   if (!vorbis_pump_first_frame(f))
+      return 0;
+   if (f->current_loc > sample_number)
+      return error(f, VORBIS_seek_failed);
+   return 1;
+
+error:
+   // try to restore the file to a valid state
+   stb_vorbis_seek_start(f);
+   return error(f, VORBIS_seek_failed);
+}
+
+// the same as vorbis_decode_initial, but without advancing
+static int peek_decode_initial(vorb *f, int *p_left_start, int *p_left_end, int *p_right_start, int *p_right_end, int *mode)
+{
+   int bits_read, bytes_read;
+
+   if (!vorbis_decode_initial(f, p_left_start, p_left_end, p_right_start, p_right_end, mode))
+      return 0;
+
+   // either 1 or 2 bytes were read, figure out which so we can rewind
+   bits_read = 1 + ilog(f->mode_count-1);
+   if (f->mode_config[*mode].blockflag)
+      bits_read += 2;
+   bytes_read = (bits_read + 7) / 8;
+
+   f->bytes_in_seg += bytes_read;
+   f->packet_bytes -= bytes_read;
+   skip(f, -bytes_read);
+   if (f->next_seg == -1)
+      f->next_seg = f->segment_count - 1;
+   else
+      f->next_seg--;
+   f->valid_bits = 0;
+
+   return 1;
+}
+
+int stb_vorbis_seek_frame(stb_vorbis *f, unsigned int sample_number)
+{
+   uint32 max_frame_samples;
+
+   if (IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+
+   // fast page-level search
+   if (!seek_to_sample_coarse(f, sample_number))
+      return 0;
+
+   assert(f->current_loc_valid);
+   assert(f->current_loc <= sample_number);
+
+   // linear search for the relevant packet
+   max_frame_samples = (f->blocksize_1*3 - f->blocksize_0) >> 2;
+   while (f->current_loc < sample_number) {
+      int left_start, left_end, right_start, right_end, mode, frame_samples;
+      if (!peek_decode_initial(f, &left_start, &left_end, &right_start, &right_end, &mode))
+         return error(f, VORBIS_seek_failed);
+      // calculate the number of samples returned by the next frame
+      frame_samples = right_start - left_start;
+      if (f->current_loc + frame_samples > sample_number) {
+         return 1; // the next frame will contain the sample
+      } else if (f->current_loc + frame_samples + max_frame_samples > sample_number) {
+         // there's a chance the frame after this could contain the sample
+         vorbis_pump_first_frame(f);
+      } else {
+         // this frame is too early to be relevant
+         f->current_loc += frame_samples;
+         f->previous_length = 0;
+         maybe_start_packet(f);
+         flush_packet(f);
+      }
+   }
+   // the next frame should start with the sample
+   if (f->current_loc != sample_number) return error(f, VORBIS_seek_failed);
+   return 1;
+}
+
+int stb_vorbis_seek(stb_vorbis *f, unsigned int sample_number)
+{
+   if (!stb_vorbis_seek_frame(f, sample_number))
+      return 0;
+
+   if (sample_number != f->current_loc) {
+      int n;
+      uint32 frame_start = f->current_loc;
+      stb_vorbis_get_frame_float(f, &n, NULL);
+      assert(sample_number > frame_start);
+      assert(f->channel_buffer_start + (int) (sample_number-frame_start) <= f->channel_buffer_end);
+      f->channel_buffer_start += (sample_number - frame_start);
+   }
+
+   return 1;
+}
+
+int stb_vorbis_seek_start(stb_vorbis *f)
+{
+   if (IS_PUSH_MODE(f)) { return error(f, VORBIS_invalid_api_mixing); }
+   set_file_offset(f, f->first_audio_page_offset);
+   f->previous_length = 0;
+   f->first_decode = TRUE;
+   f->next_seg = -1;
+   return vorbis_pump_first_frame(f);
+}
+
+unsigned int stb_vorbis_stream_length_in_samples(stb_vorbis *f)
+{
+   unsigned int restore_offset, previous_safe;
+   unsigned int end, last_page_loc;
+
+   if (IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+   if (!f->total_samples) {
+      unsigned int last;
+      uint32 lo,hi;
+      char header[6];
+
+      // first, store the current decode position so we can restore it
+      restore_offset = stb_vorbis_get_file_offset(f);
+
+      // now we want to seek back 64K from the end (the last page must
+      // be at most a little less than 64K, but let's allow a little slop)
+      if (f->stream_len >= 65536 && f->stream_len-65536 >= f->first_audio_page_offset)
+         previous_safe = f->stream_len - 65536;
+      else
+         previous_safe = f->first_audio_page_offset;
+
+      set_file_offset(f, previous_safe);
+      // previous_safe is now our candidate 'earliest known place that seeking
+      // to will lead to the final page'
+
+      if (!vorbis_find_page(f, &end, &last)) {
+         // if we can't find a page, we're hosed!
+         f->error = VORBIS_cant_find_last_page;
+         f->total_samples = 0xffffffff;
+         goto done;
+      }
+
+      // check if there are more pages
+      last_page_loc = stb_vorbis_get_file_offset(f);
+
+      // stop when the last_page flag is set, not when we reach eof;
+      // this allows us to stop short of a 'file_section' end without
+      // explicitly checking the length of the section
+      while (!last) {
+         set_file_offset(f, end);
+         if (!vorbis_find_page(f, &end, &last)) {
+            // the last page we found didn't have the 'last page' flag
+            // set. whoops!
+            break;
+         }
+         //previous_safe = last_page_loc+1; // NOTE: not used after this point, but note for debugging
+         last_page_loc = stb_vorbis_get_file_offset(f);
+      }
+
+      set_file_offset(f, last_page_loc);
+
+      // parse the header
+      getn(f, (unsigned char *)header, 6);
+      // extract the absolute granule position
+      lo = get32(f);
+      hi = get32(f);
+      if (lo == 0xffffffff && hi == 0xffffffff) {
+         f->error = VORBIS_cant_find_last_page;
+         f->total_samples = SAMPLE_unknown;
+         goto done;
+      }
+      if (hi)
+         lo = 0xfffffffe; // saturate
+      f->total_samples = lo;
+
+      f->p_last.page_start = last_page_loc;
+      f->p_last.page_end   = end;
+      f->p_last.last_decoded_sample = lo;
+
+     done:
+      set_file_offset(f, restore_offset);
+   }
+   return f->total_samples == SAMPLE_unknown ? 0 : f->total_samples;
+}
+
+float stb_vorbis_stream_length_in_seconds(stb_vorbis *f)
+{
+   return stb_vorbis_stream_length_in_samples(f) / (float) f->sample_rate;
+}
+
+
+
+int stb_vorbis_get_frame_float(stb_vorbis *f, int *channels, float ***output)
+{
+   int len, right,left,i;
+   if (IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+
+   if (!vorbis_decode_packet(f, &len, &left, &right)) {
+      f->channel_buffer_start = f->channel_buffer_end = 0;
+      return 0;
+   }
+
+   len = vorbis_finish_frame(f, len, left, right);
+   for (i=0; i < f->channels; ++i)
+      f->outputs[i] = f->channel_buffers[i] + left;
+
+   f->channel_buffer_start = left;
+   f->channel_buffer_end   = left+len;
+
+   if (channels) *channels = f->channels;
+   if (output)   *output = f->outputs;
+   return len;
+}
+
+#ifndef STB_VORBIS_NO_STDIO
+
+stb_vorbis * stb_vorbis_open_file_section(FILE *file, int close_on_free, int *error, const stb_vorbis_alloc *alloc, unsigned int length)
+{
+   stb_vorbis *f, p;
+   vorbis_init(&p, alloc);
+   p.f = file;
+   p.f_start = (uint32) ftell(file);
+   p.stream_len   = length;
+   p.close_on_free = close_on_free;
+   if (start_decoder(&p)) {
+      f = vorbis_alloc(&p);
+      if (f) {
+         *f = p;
+         vorbis_pump_first_frame(f);
+         return f;
+      }
+   }
+   if (error) *error = p.error;
+   vorbis_deinit(&p);
+   return NULL;
+}
+
+stb_vorbis * stb_vorbis_open_file(FILE *file, int close_on_free, int *error, const stb_vorbis_alloc *alloc)
+{
+   unsigned int len, start;
+   start = (unsigned int) ftell(file);
+   fseek(file, 0, SEEK_END);
+   len = (unsigned int) (ftell(file) - start);
+   fseek(file, start, SEEK_SET);
+   return stb_vorbis_open_file_section(file, close_on_free, error, alloc, len);
+}
+
+stb_vorbis * stb_vorbis_open_filename(const char *filename, int *error, const stb_vorbis_alloc *alloc)
+{
+   FILE *f;
+#if defined(_WIN32) && defined(__STDC_WANT_SECURE_LIB__)
+   if (0 != fopen_s(&f, filename, "rb"))
+      f = NULL;
+#else
+   f = fopen(filename, "rb");
+#endif
+   if (f)
+      return stb_vorbis_open_file(f, TRUE, error, alloc);
+   if (error) *error = VORBIS_file_open_failure;
+   return NULL;
+}
+#endif // STB_VORBIS_NO_STDIO
+
+stb_vorbis * stb_vorbis_open_memory(const unsigned char *data, int len, int *error, const stb_vorbis_alloc *alloc)
+{
+   stb_vorbis *f, p;
+   if (!data) {
+      if (error) *error = VORBIS_unexpected_eof;
+      return NULL;
+   }
+   vorbis_init(&p, alloc);
+   p.stream = (uint8 *) data;
+   p.stream_end = (uint8 *) data + len;
+   p.stream_start = (uint8 *) p.stream;
+   p.stream_len = len;
+   p.push_mode = FALSE;
+   if (start_decoder(&p)) {
+      f = vorbis_alloc(&p);
+      if (f) {
+         *f = p;
+         vorbis_pump_first_frame(f);
+         if (error) *error = VORBIS__no_error;
+         return f;
+      }
+   }
+   if (error) *error = p.error;
+   vorbis_deinit(&p);
+   return NULL;
+}
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+#define PLAYBACK_MONO     1
+#define PLAYBACK_LEFT     2
+#define PLAYBACK_RIGHT    4
+
+#define L  (PLAYBACK_LEFT  | PLAYBACK_MONO)
+#define C  (PLAYBACK_LEFT  | PLAYBACK_RIGHT | PLAYBACK_MONO)
+#define R  (PLAYBACK_RIGHT | PLAYBACK_MONO)
+
+static int8 channel_position[7][6] =
+{
+   { 0 },
+   { C },
+   { L, R },
+   { L, C, R },
+   { L, R, L, R },
+   { L, C, R, L, R },
+   { L, C, R, L, R, C },
+};
+
+
+#ifndef STB_VORBIS_NO_FAST_SCALED_FLOAT
+   typedef union {
+      float f;
+      int i;
+   } float_conv;
+   typedef char stb_vorbis_float_size_test[sizeof(float)==4 && sizeof(int) == 4];
+   #define FASTDEF(x) float_conv x
+   // add (1<<23) to convert to int, then divide by 2^SHIFT, then add 0.5/2^SHIFT to round
+   #define MAGIC(SHIFT) (1.5f * (1 << (23-SHIFT)) + 0.5f/(1 << SHIFT))
+   #define ADDEND(SHIFT) (((150-SHIFT) << 23) + (1 << 22))
+   #define FAST_SCALED_FLOAT_TO_INT(temp,x,s) (temp.f = (x) + MAGIC(s), temp.i - ADDEND(s))
+   #define check_endianness()
+#else
+   #define FAST_SCALED_FLOAT_TO_INT(temp,x,s) ((int) ((x) * (1 << (s))))
+   #define check_endianness()
+   #define FASTDEF(x)
+#endif
+
+static void copy_samples(short *dest, float *src, int len)
+{
+   int i;
+   check_endianness();
+   for (i=0; i < len; ++i) {
+      FASTDEF(temp);
+      int v = FAST_SCALED_FLOAT_TO_INT(temp, src[i],15);
+      if ((unsigned int) (v + 32768) > 65535)
+         v = v < 0 ? -32768 : 32767;
+      dest[i] = v;
+   }
+}
+
+static void compute_samples(int mask, short *output, int num_c, float **data, int d_offset, int len)
+{
+   #define STB_BUFFER_SIZE  32
+   float buffer[STB_BUFFER_SIZE];
+   int i,j,o,n = STB_BUFFER_SIZE;
+   check_endianness();
+   for (o = 0; o < len; o += STB_BUFFER_SIZE) {
+      memset(buffer, 0, sizeof(buffer));
+      if (o + n > len) n = len - o;
+      for (j=0; j < num_c; ++j) {
+         if (channel_position[num_c][j] & mask) {
+            for (i=0; i < n; ++i)
+               buffer[i] += data[j][d_offset+o+i];
+         }
+      }
+      for (i=0; i < n; ++i) {
+         FASTDEF(temp);
+         int v = FAST_SCALED_FLOAT_TO_INT(temp,buffer[i],15);
+         if ((unsigned int) (v + 32768) > 65535)
+            v = v < 0 ? -32768 : 32767;
+         output[o+i] = v;
+      }
+   }
+   #undef STB_BUFFER_SIZE
+}
+
+static void compute_stereo_samples(short *output, int num_c, float **data, int d_offset, int len)
+{
+   #define STB_BUFFER_SIZE  32
+   float buffer[STB_BUFFER_SIZE];
+   int i,j,o,n = STB_BUFFER_SIZE >> 1;
+   // o is the offset in the source data
+   check_endianness();
+   for (o = 0; o < len; o += STB_BUFFER_SIZE >> 1) {
+      // o2 is the offset in the output data
+      int o2 = o << 1;
+      memset(buffer, 0, sizeof(buffer));
+      if (o + n > len) n = len - o;
+      for (j=0; j < num_c; ++j) {
+         int m = channel_position[num_c][j] & (PLAYBACK_LEFT | PLAYBACK_RIGHT);
+         if (m == (PLAYBACK_LEFT | PLAYBACK_RIGHT)) {
+            for (i=0; i < n; ++i) {
+               buffer[i*2+0] += data[j][d_offset+o+i];
+               buffer[i*2+1] += data[j][d_offset+o+i];
+            }
+         } else if (m == PLAYBACK_LEFT) {
+            for (i=0; i < n; ++i) {
+               buffer[i*2+0] += data[j][d_offset+o+i];
+            }
+         } else if (m == PLAYBACK_RIGHT) {
+            for (i=0; i < n; ++i) {
+               buffer[i*2+1] += data[j][d_offset+o+i];
+            }
+         }
+      }
+      for (i=0; i < (n<<1); ++i) {
+         FASTDEF(temp);
+         int v = FAST_SCALED_FLOAT_TO_INT(temp,buffer[i],15);
+         if ((unsigned int) (v + 32768) > 65535)
+            v = v < 0 ? -32768 : 32767;
+         output[o2+i] = v;
+      }
+   }
+   #undef STB_BUFFER_SIZE
+}
+
+static void convert_samples_short(int buf_c, short **buffer, int b_offset, int data_c, float **data, int d_offset, int samples)
+{
+   int i;
+   if (buf_c != data_c && buf_c <= 2 && data_c <= 6) {
+      static int channel_selector[3][2] = { {0}, {PLAYBACK_MONO}, {PLAYBACK_LEFT, PLAYBACK_RIGHT} };
+      for (i=0; i < buf_c; ++i)
+         compute_samples(channel_selector[buf_c][i], buffer[i]+b_offset, data_c, data, d_offset, samples);
+   } else {
+      int limit = buf_c < data_c ? buf_c : data_c;
+      for (i=0; i < limit; ++i)
+         copy_samples(buffer[i]+b_offset, data[i]+d_offset, samples);
+      for (   ; i < buf_c; ++i)
+         memset(buffer[i]+b_offset, 0, sizeof(short) * samples);
+   }
+}
+
+int stb_vorbis_get_frame_short(stb_vorbis *f, int num_c, short **buffer, int num_samples)
+{
+   float **output = NULL;
+   int len = stb_vorbis_get_frame_float(f, NULL, &output);
+   if (len > num_samples) len = num_samples;
+   if (len)
+      convert_samples_short(num_c, buffer, 0, f->channels, output, 0, len);
+   return len;
+}
+
+static void convert_channels_short_interleaved(int buf_c, short *buffer, int data_c, float **data, int d_offset, int len)
+{
+   int i;
+   check_endianness();
+   if (buf_c != data_c && buf_c <= 2 && data_c <= 6) {
+      assert(buf_c == 2);
+      for (i=0; i < buf_c; ++i)
+         compute_stereo_samples(buffer, data_c, data, d_offset, len);
+   } else {
+      int limit = buf_c < data_c ? buf_c : data_c;
+      int j;
+      for (j=0; j < len; ++j) {
+         for (i=0; i < limit; ++i) {
+            FASTDEF(temp);
+            float f = data[i][d_offset+j];
+            int v = FAST_SCALED_FLOAT_TO_INT(temp, f,15);//data[i][d_offset+j],15);
+            if ((unsigned int) (v + 32768) > 65535)
+               v = v < 0 ? -32768 : 32767;
+            *buffer++ = v;
+         }
+         for (   ; i < buf_c; ++i)
+            *buffer++ = 0;
+      }
+   }
+}
+
+int stb_vorbis_get_frame_short_interleaved(stb_vorbis *f, int num_c, short *buffer, int num_shorts)
+{
+   float **output;
+   int len;
+   if (num_c == 1) return stb_vorbis_get_frame_short(f,num_c,&buffer, num_shorts);
+   len = stb_vorbis_get_frame_float(f, NULL, &output);
+   if (len) {
+      if (len*num_c > num_shorts) len = num_shorts / num_c;
+      convert_channels_short_interleaved(num_c, buffer, f->channels, output, 0, len);
+   }
+   return len;
+}
+
+int stb_vorbis_get_samples_short_interleaved(stb_vorbis *f, int channels, short *buffer, int num_shorts)
+{
+   float **outputs;
+   int len = num_shorts / channels;
+   int n=0;
+   while (n < len) {
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= len) k = len - n;
+      if (k)
+         convert_channels_short_interleaved(channels, buffer, f->channels, f->channel_buffers, f->channel_buffer_start, k);
+      buffer += k*channels;
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == len) break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs)) break;
+   }
+   return n;
+}
+
+int stb_vorbis_get_samples_short(stb_vorbis *f, int channels, short **buffer, int len)
+{
+   float **outputs;
+   int n=0;
+   while (n < len) {
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= len) k = len - n;
+      if (k)
+         convert_samples_short(channels, buffer, n, f->channels, f->channel_buffers, f->channel_buffer_start, k);
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == len) break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs)) break;
+   }
+   return n;
+}
+
+#ifndef STB_VORBIS_NO_STDIO
+int stb_vorbis_decode_filename(const char *filename, int *channels, int *sample_rate, short **output)
+{
+   int data_len, offset, total, limit, error;
+   short *data;
+   stb_vorbis *v = stb_vorbis_open_filename(filename, &error, NULL);
+   if (v == NULL) return -1;
+   limit = v->channels * 4096;
+   *channels = v->channels;
+   if (sample_rate)
+      *sample_rate = v->sample_rate;
+   offset = data_len = 0;
+   total = limit;
+   data = (short *) malloc(total * sizeof(*data));
+   if (data == NULL) {
+      stb_vorbis_close(v);
+      return -2;
+   }
+   for (;;) {
+      int n = stb_vorbis_get_frame_short_interleaved(v, v->channels, data+offset, total-offset);
+      if (n == 0) break;
+      data_len += n;
+      offset += n * v->channels;
+      if (offset + limit > total) {
+         short *data2;
+         total *= 2;
+         data2 = (short *) realloc(data, total * sizeof(*data));
+         if (data2 == NULL) {
+            free(data);
+            stb_vorbis_close(v);
+            return -2;
+         }
+         data = data2;
+      }
+   }
+   *output = data;
+   stb_vorbis_close(v);
+   return data_len;
+}
+#endif // NO_STDIO
+
+int stb_vorbis_decode_memory(const uint8 *mem, int len, int *channels, int *sample_rate, short **output)
+{
+   int data_len, offset, total, limit, error;
+   short *data;
+   stb_vorbis *v = stb_vorbis_open_memory(mem, len, &error, NULL);
+   if (v == NULL) return -1;
+   limit = v->channels * 4096;
+   *channels = v->channels;
+   if (sample_rate)
+      *sample_rate = v->sample_rate;
+   offset = data_len = 0;
+   total = limit;
+   data = (short *) malloc(total * sizeof(*data));
+   if (data == NULL) {
+      stb_vorbis_close(v);
+      return -2;
+   }
+   for (;;) {
+      int n = stb_vorbis_get_frame_short_interleaved(v, v->channels, data+offset, total-offset);
+      if (n == 0) break;
+      data_len += n;
+      offset += n * v->channels;
+      if (offset + limit > total) {
+         short *data2;
+         total *= 2;
+         data2 = (short *) realloc(data, total * sizeof(*data));
+         if (data2 == NULL) {
+            free(data);
+            stb_vorbis_close(v);
+            return -2;
+         }
+         data = data2;
+      }
+   }
+   *output = data;
+   stb_vorbis_close(v);
+   return data_len;
+}
+#endif // STB_VORBIS_NO_INTEGER_CONVERSION
+
+int stb_vorbis_get_samples_float_interleaved(stb_vorbis *f, int channels, float *buffer, int num_floats)
+{
+   float **outputs;
+   int len = num_floats / channels;
+   int n=0;
+   int z = f->channels;
+   if (z > channels) z = channels;
+   while (n < len) {
+      int i,j;
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= len) k = len - n;
+      for (j=0; j < k; ++j) {
+         for (i=0; i < z; ++i)
+            *buffer++ = f->channel_buffers[i][f->channel_buffer_start+j];
+         for (   ; i < channels; ++i)
+            *buffer++ = 0;
+      }
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == len)
+         break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs))
+         break;
+   }
+   return n;
+}
+
+int stb_vorbis_get_samples_float(stb_vorbis *f, int channels, float **buffer, int num_samples)
+{
+   float **outputs;
+   int n=0;
+   int z = f->channels;
+   if (z > channels) z = channels;
+   while (n < num_samples) {
+      int i;
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= num_samples) k = num_samples - n;
+      if (k) {
+         for (i=0; i < z; ++i)
+            memcpy(buffer[i]+n, f->channel_buffers[i]+f->channel_buffer_start, sizeof(float)*k);
+         for (   ; i < channels; ++i)
+            memset(buffer[i]+n, 0, sizeof(float) * k);
+      }
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == num_samples)
+         break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs))
+         break;
+   }
+   return n;
+}
+#endif // STB_VORBIS_NO_PULLDATA_API
+
+/* Version history
+    1.17    - 2019-07-08 - fix CVE-2019-13217, -13218, -13219, -13220, -13221, -13222, -13223
+                           found with Mayhem by ForAllSecure
+    1.16    - 2019-03-04 - fix warnings
+    1.15    - 2019-02-07 - explicit failure if Ogg Skeleton data is found
+    1.14    - 2018-02-11 - delete bogus dealloca usage
+    1.13    - 2018-01-29 - fix truncation of last frame (hopefully)
+    1.12    - 2017-11-21 - limit residue begin/end to blocksize/2 to avoid large temp allocs in bad/corrupt files
+    1.11    - 2017-07-23 - fix MinGW compilation
+    1.10    - 2017-03-03 - more robust seeking; fix negative ilog(); clear error in open_memory
+    1.09    - 2016-04-04 - back out 'avoid discarding last frame' fix from previous version
+    1.08    - 2016-04-02 - fixed multiple warnings; fix setup memory leaks;
+                           avoid discarding last frame of audio data
+    1.07    - 2015-01-16 - fixed some warnings, fix mingw, const-correct API
+                           some more crash fixes when out of memory or with corrupt files
+    1.06    - 2015-08-31 - full, correct support for seeking API (Dougall Johnson)
+                           some crash fixes when out of memory or with corrupt files
+    1.05    - 2015-04-19 - don't define __forceinline if it's redundant
+    1.04    - 2014-08-27 - fix missing const-correct case in API
+    1.03    - 2014-08-07 - Warning fixes
+    1.02    - 2014-07-09 - Declare qsort compare function _cdecl on windows
+    1.01    - 2014-06-18 - fix stb_vorbis_get_samples_float
+    1.0     - 2014-05-26 - fix memory leaks; fix warnings; fix bugs in multichannel
+                           (API change) report sample rate for decode-full-file funcs
+    0.99996 - bracket #include <malloc.h> for macintosh compilation by Laurent Gomila
+    0.99995 - use union instead of pointer-cast for fast-float-to-int to avoid alias-optimization problem
+    0.99994 - change fast-float-to-int to work in single-precision FPU mode, remove endian-dependence
+    0.99993 - remove assert that fired on legal files with empty tables
+    0.99992 - rewind-to-start
+    0.99991 - bugfix to stb_vorbis_get_samples_short by Bernhard Wodo
+    0.9999 - (should have been 0.99990) fix no-CRT support, compiling as C++
+    0.9998 - add a full-decode function with a memory source
+    0.9997 - fix a bug in the read-from-FILE case in 0.9996 addition
+    0.9996 - query length of vorbis stream in samples/seconds
+    0.9995 - bugfix to another optimization that only happened in certain files
+    0.9994 - bugfix to one of the optimizations that caused significant (but inaudible?) errors
+    0.9993 - performance improvements; runs in 99% to 104% of time of reference implementation
+    0.9992 - performance improvement of IMDCT; now performs close to reference implementation
+    0.9991 - performance improvement of IMDCT
+    0.999 - (should have been 0.9990) performance improvement of IMDCT
+    0.998 - no-CRT support from Casey Muratori
+    0.997 - bugfixes for bugs found by Terje Mathisen
+    0.996 - bugfix: fast-huffman decode initialized incorrectly for sparse codebooks; fixing gives 10% speedup - found by Terje Mathisen
+    0.995 - bugfix: fix to 'effective' overrun detection - found by Terje Mathisen
+    0.994 - bugfix: garbage decode on final VQ symbol of a non-multiple - found by Terje Mathisen
+    0.993 - bugfix: pushdata API required 1 extra byte for empty page (failed to consume final page if empty) - found by Terje Mathisen
+    0.992 - fixes for MinGW warning
+    0.991 - turn fast-float-conversion on by default
+    0.990 - fix push-mode seek recovery if you seek into the headers
+    0.98b - fix to bad release of 0.98
+    0.98 - fix push-mode seek recovery; robustify float-to-int and support non-fast mode
+    0.97 - builds under c++ (typecasting, don't use 'class' keyword)
+    0.96 - somehow MY 0.95 was right, but the web one was wrong, so here's my 0.95 rereleased as 0.96, fixes a typo in the clamping code
+    0.95 - clamping code for 16-bit functions
+    0.94 - not publically released
+    0.93 - fixed all-zero-floor case (was decoding garbage)
+    0.92 - fixed a memory leak
+    0.91 - conditional compiles to omit parts of the API and the infrastructure to support them: STB_VORBIS_NO_PULLDATA_API, STB_VORBIS_NO_PUSHDATA_API, STB_VORBIS_NO_STDIO, STB_VORBIS_NO_INTEGER_CONVERSION
+    0.90 - first public release
+*/
+
+#endif // STB_VORBIS_HEADER_ONLY
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/stb_voxel_render.h b/vendor/stb/stb_voxel_render.h
new file mode 100644
index 0000000..2e7a372
--- /dev/null
+++ b/vendor/stb/stb_voxel_render.h
@@ -0,0 +1,3807 @@
+// stb_voxel_render.h - v0.89 - Sean Barrett, 2015 - public domain
+//
+// This library helps render large-scale "voxel" worlds for games,
+// in this case, one with blocks that can have textures and that
+// can also be a few shapes other than cubes.
+//
+//    Video introduction:
+//       http://www.youtube.com/watch?v=2vnTtiLrV1w
+//
+//    Minecraft-viewer sample app (not very simple though):
+//       http://github.com/nothings/stb/tree/master/tests/caveview
+//
+// It works by creating triangle meshes. The library includes
+//
+//    - converter from dense 3D arrays of block info to vertex mesh
+//    - vertex & fragment shaders for the vertex mesh
+//    - assistance in setting up shader state
+//
+// For portability, none of the library code actually accesses
+// the 3D graphics API. (At the moment, it's not actually portable
+// since the shaders are GLSL only, but patches are welcome.)
+//
+// You have to do all the caching and tracking of vertex buffers
+// yourself. However, you could also try making a game with
+// a small enough world that it's fully loaded rather than
+// streaming. Currently the preferred vertex format is 20 bytes
+// per quad. There are designs to allow much more compact formats
+// with a slight reduction in shader features, but no roadmap
+// for actually implementing them.
+//
+//
+// USAGE
+//
+//   #define the symbol STB_VOXEL_RENDER_IMPLEMENTATION in *one*
+//   C/C++ file before the #include of this file; the implementation
+//   will be generated in that file.
+//
+//   If you define the symbols STB_VOXEL_RENDER_STATIC, then the
+//   implementation will be private to that file.
+//
+//
+// FEATURES
+//
+//   - you can choose textured blocks with the features below,
+//     or colored voxels with 2^24 colors and no textures.
+//
+//   - voxels are mostly just cubes, but there's support for
+//     half-height cubes and diagonal slopes, half-height
+//     diagonals, and even odder shapes especially for doing
+//     more-continuous "ground".
+//
+//   - texture coordinates are projections along one of the major
+//     axes, with the per-texture scaling.
+//
+//   - a number of aspects of the shader and the vertex format
+//     are configurable; the library generally takes care of
+//     coordinating the vertex format with the mesh for you.
+//
+//
+// FEATURES (SHADER PERSPECTIVE)
+//
+//   - vertices aligned on integer lattice, z on multiples of 0.5
+//   - per-vertex "lighting" or "ambient occlusion" value (6 bits)
+//   - per-vertex texture crossfade (3 bits)
+//
+//   - per-face texture #1 id (8-bit index into array texture)
+//   - per-face texture #2 id (8-bit index into second array texture)
+//   - per-face color (6-bit palette index, 2 bits of per-texture boolean enable)
+//   - per-face 5-bit normal for lighting calculations & texture coord computation
+//   - per-face 2-bit texture matrix rotation to rotate faces
+//
+//   - indexed-by-texture-id scale factor (separate for texture #1 and texture #2)
+//   - indexed-by-texture-#2-id blend mode (alpha composite or modulate/multiply);
+//     the first is good for decals, the second for detail textures, "light maps",
+//     etc; both modes are controlled by texture #2's alpha, scaled by the
+//     per-vertex texture crossfade and the per-face color (if enabled on texture #2);
+//     modulate/multiply multiplies by an extra factor of 2.0 so that if you
+//     make detail maps whose average brightness is 0.5 everything works nicely.
+//
+//   - ambient lighting: half-lambert directional plus constant, all scaled by vertex ao
+//   - face can be fullbright (emissive), controlled by per-face color
+//   - installable lighting, with default single-point-light
+//   - installable fog, with default hacked smoothstep
+//
+//  Note that all the variations of lighting selection and texture
+//  blending are run-time conditions in the shader, so they can be
+//  intermixed in a single mesh.
+//
+//
+// INTEGRATION ARC
+//
+//   The way to get this library to work from scratch is to do the following:
+//
+//      Step 1. define STBVOX_CONFIG_MODE to 0
+//
+//        This mode uses only vertex attributes and uniforms, and is easiest
+//        to get working. It requires 32 bytes per quad and limits the
+//        size of some tables to avoid hitting uniform limits.
+//
+//      Step 2. define STBVOX_CONFIG_MODE to 1
+//
+//        This requires using a texture buffer to store the quad data,
+//        reducing the size to 20 bytes per quad.
+//
+//      Step 3: define STBVOX_CONFIG_PREFER_TEXBUFFER
+//
+//        This causes some uniforms to be stored as texture buffers
+//        instead. This increases the size of some of those tables,
+//        and avoids a potential slow path (gathering non-uniform
+//        data from uniforms) on some hardware.
+//
+//   In the future I might add additional modes that have significantly
+//   smaller meshes but reduce features, down as small as 6 bytes per quad.
+//   See elsewhere in this file for a table of candidate modes. Switching
+//   to a mode will require changing some of your mesh creation code, but
+//   everything else should be seamless. (And I'd like to change the API
+//   so that mesh creation is data-driven the way the uniforms are, and
+//   then you wouldn't even have to change anything but the mode number.)
+//
+//
+// IMPROVEMENTS FOR SHIP-WORTHY PROGRAMS USING THIS LIBRARY
+//
+//   I currently tolerate a certain level of "bugginess" in this library.
+//
+//   I'm referring to things which look a little wrong (as long as they
+//   don't cause holes or cracks in the output meshes), or things which
+//   do not produce as optimal a mesh as possible. Notable examples:
+//
+//        -  incorrect lighting on slopes
+//        -  inefficient meshes for vheight blocks
+//
+//   I am willing to do the work to improve these things if someone is
+//   going to ship a substantial program that would be improved by them.
+//   (It need not be commercial, nor need it be a game.) I just didn't
+//   want to do the work up front if it might never be leveraged. So just
+//   submit a bug report as usual (github is preferred), but add a note
+//   that this is for a thing that is really going to ship. (That means
+//   you need to be far enough into the project that it's clear you're
+//   committed to it; not during early exploratory development.)
+//
+//
+// VOXEL MESH API
+//
+//   Context
+//
+//     To understand the API, make sure you first understand the feature set
+//     listed above.
+//
+//     Because the vertices are compact, they have very limited spatial
+//     precision. Thus a single mesh can only contain the data for a limited
+//     area. To make very large voxel maps, you'll need to build multiple
+//     vertex buffers. (But you want this anyway for frustum culling.)
+//
+//     Each generated mesh has three components:
+//             - vertex data (vertex buffer)
+//             - face data (optional, stored in texture buffer)
+//             - mesh transform (uniforms)
+//
+//     Once you've generated the mesh with this library, it's up to you
+//     to upload it to the GPU, to keep track of the state, and to render
+//     it.
+//
+//   Concept
+//
+//     The basic design is that you pass in one or more 3D arrays; each array
+//     is (typically) one-byte-per-voxel and contains information about one
+//     or more properties of some particular voxel property.
+//
+//     Because there is so much per-vertex and per-face data possible
+//     in the output, and each voxel can have 6 faces and 8 vertices, it
+//     would require an very large data structure to describe all
+//     of the possibilities, and this would cause the mesh-creation
+//     process to be slow. Instead, the API provides multiple ways
+//     to express each property, some more compact, others less so;
+//     each such way has some limitations on what it can express.
+//
+//     Note that there are so many paths and combinations, not all of them
+//     have been tested. Just report bugs and I'll fix 'em.
+//
+//   Details
+//
+//     See the API documentation in the header-file section.
+//
+//
+// CONTRIBUTORS
+//
+//   Features             Porting            Bugfixes & Warnings
+//  Sean Barrett                          github:r-leyh   Jesus Fernandez
+//                                        Miguel Lechon   github:Arbeiterunfallversicherungsgesetz
+//                                        Thomas Frase    James Hofmann
+//                                        Stephen Olsen   github:guitarfreak
+//
+// VERSION HISTORY
+//
+//   0.89   (2020-02-02)  bugfix in sample code
+//   0.88   (2019-03-04)  fix warnings
+//   0.87   (2019-02-25)  fix warning
+//   0.86   (2019-02-07)  fix typos in comments
+//   0.85   (2017-03-03)  add block_selector (by guitarfreak)
+//   0.84   (2016-04-02)  fix GLSL syntax error on glModelView path
+//   0.83   (2015-09-13)  remove non-constant struct initializers to support more compilers
+//   0.82   (2015-08-01)  added input.packed_compact to store rot, vheight & texlerp efficiently
+//                        fix broken tex_overlay2
+//   0.81   (2015-05-28)  fix broken STBVOX_CONFIG_OPTIMIZED_VHEIGHT
+//   0.80   (2015-04-11)  fix broken STBVOX_CONFIG_ROTATION_IN_LIGHTING refactoring
+//                        change STBVOX_MAKE_LIGHTING to STBVOX_MAKE_LIGHTING_EXT so
+//                                    that header defs don't need to see config vars
+//                        add STBVOX_CONFIG_VHEIGHT_IN_LIGHTING and other vheight fixes
+//                        added documentation for vheight ("weird slopes")
+//   0.79   (2015-04-01)  fix the missing types from 0.78; fix string constants being const
+//   0.78   (2015-04-02)  bad "#else", compile as C++
+//   0.77   (2015-04-01)  documentation tweaks, rename config var to STB_VOXEL_RENDER_STATIC
+//   0.76   (2015-04-01)  typos, signed/unsigned shader issue, more documentation
+//   0.75   (2015-04-01)  initial release
+//
+//
+// HISTORICAL FOUNDATION
+//
+//   stb_voxel_render   20-byte quads   2015/01
+//   zmc engine         32-byte quads   2013/12
+//   zmc engine         96-byte quads   2011/10
+//
+//
+// LICENSE
+//
+//   See end of file for license information.
+
+#ifndef INCLUDE_STB_VOXEL_RENDER_H
+#define INCLUDE_STB_VOXEL_RENDER_H
+
+#include <stdlib.h>
+
+typedef struct stbvox_mesh_maker stbvox_mesh_maker;
+typedef struct stbvox_input_description stbvox_input_description;
+
+#ifdef STB_VOXEL_RENDER_STATIC
+#define STBVXDEC static
+#else
+#define STBVXDEC extern
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// CONFIGURATION MACROS
+//
+//  #define STBVOX_CONFIG_MODE <integer>           // REQUIRED
+//     Configures the overall behavior of stb_voxel_render. This
+//     can affect the shaders, the uniform info, and other things.
+//     (If you need more than one mode in the same app, you can
+//     use STB_VOXEL_RENDER_STATIC to create multiple versions
+//     in separate files, and then wrap them.)
+//
+//         Mode value       Meaning
+//             0               Textured blocks, 32-byte quads
+//             1               Textured blocks, 20-byte quads
+//            20               Untextured blocks, 32-byte quads
+//            21               Untextured blocks, 20-byte quads
+//
+//
+//  #define STBVOX_CONFIG_PRECISION_Z  <integer>   // OPTIONAL
+//     Defines the number of bits of fractional position for Z.
+//     Only 0 or 1 are valid. 1 is the default. If 0, then a
+//     single mesh has twice the legal Z range; e.g. in
+//     modes 0,1,20,21, Z in the mesh can extend to 511 instead
+//     of 255. However, half-height blocks cannot be used.
+//
+// All of the following are just #ifdef tested so need no values, and are optional.
+//
+//    STBVOX_CONFIG_BLOCKTYPE_SHORT
+//        use unsigned 16-bit values for 'blocktype' in the input instead of 8-bit values
+//
+//    STBVOX_CONFIG_OPENGL_MODELVIEW
+//        use the gl_ModelView matrix rather than the explicit uniform
+//
+//    STBVOX_CONFIG_HLSL
+//        NOT IMPLEMENTED! Define HLSL shaders instead of GLSL shaders
+//
+//    STBVOX_CONFIG_PREFER_TEXBUFFER
+//        Stores many of the uniform arrays in texture buffers instead,
+//        so they can be larger and may be more efficient on some hardware.
+//
+//    STBVOX_CONFIG_LIGHTING_SIMPLE
+//        Creates a simple lighting engine with a single point light source
+//        in addition to the default half-lambert ambient light.
+//
+//    STBVOX_CONFIG_LIGHTING
+//        Declares a lighting function hook; you must append a lighting function
+//        to the shader before compiling it:
+//            vec3 compute_lighting(vec3 pos, vec3 norm, vec3 albedo, vec3 ambient);
+//        'ambient' is the half-lambert ambient light with vertex ambient-occlusion applied
+//
+//    STBVOX_CONFIG_FOG_SMOOTHSTEP
+//        Defines a simple unrealistic fog system designed to maximize
+//        unobscured view distance while not looking too weird when things
+//        emerge from the fog. Configured using an extra array element
+//        in the STBVOX_UNIFORM_ambient uniform.
+//
+//    STBVOX_CONFIG_FOG
+//        Defines a fog function hook; you must append a fog function to
+//        the shader before compiling it:
+//            vec3 compute_fog(vec3 color, vec3 relative_pos, float fragment_alpha);
+//        "color" is the incoming pre-fogged color, fragment_alpha is the alpha value,
+//        and relative_pos is the vector from the point to the camera in worldspace
+//
+//    STBVOX_CONFIG_DISABLE_TEX2
+//        This disables all processing of texture 2 in the shader in case
+//        you don't use it. Eventually this could be replaced with a mode
+//        that omits the unused data entirely.
+//
+//    STBVOX_CONFIG_TEX1_EDGE_CLAMP
+//    STBVOX_CONFIG_TEX2_EDGE_CLAMP
+//        If you want to edge clamp the textures, instead of letting them wrap,
+//        set this flag. By default stb_voxel_render relies on texture wrapping
+//        to simplify texture coordinate generation. This flag forces it to do
+//        it correctly, although there can still be minor artifacts.
+//
+//    STBVOX_CONFIG_ROTATION_IN_LIGHTING
+//        Changes the meaning of the 'lighting' mesher input variable to also
+//        store the rotation; see later discussion.
+//
+//    STBVOX_CONFIG_VHEIGHT_IN_LIGHTING
+//        Changes the meaning of the 'lighting' mesher input variable to also
+//        store the vheight; see later discussion. Cannot use both this and
+//        the previous variable.
+//
+//    STBVOX_CONFIG_PREMULTIPLIED_ALPHA
+//        Adjusts the shader calculations on the assumption that tex1.rgba,
+//        tex2.rgba, and color.rgba all use premultiplied values, and that
+//        the output of the fragment shader should be premultiplied.
+//
+//    STBVOX_CONFIG_UNPREMULTIPLY
+//        Only meaningful if STBVOX_CONFIG_PREMULTIPLIED_ALPHA is defined.
+//        Changes the behavior described above so that the inputs are
+//        still premultiplied alpha, but the output of the fragment
+//        shader is not premultiplied alpha. This is needed when allowing
+//        non-unit alpha values but not doing alpha-blending (for example
+//        when alpha testing).
+//
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// MESHING
+//
+// A mesh represents a (typically) small chunk of a larger world.
+// Meshes encode coordinates using small integers, so those
+// coordinates must be relative to some base location.
+// All of the coordinates in the functions below use
+// these relative coordinates unless explicitly stated
+// otherwise.
+//
+// Input to the meshing step is documented further down
+
+STBVXDEC void stbvox_init_mesh_maker(stbvox_mesh_maker *mm);
+// Call this function to initialize a mesh-maker context structure
+// used to build meshes. You should have one context per thread
+// that's building meshes.
+
+STBVXDEC void stbvox_set_buffer(stbvox_mesh_maker *mm, int mesh, int slot, void *buffer, size_t len);
+// Call this to set the buffer into which stbvox will write the mesh
+// it creates. It can build more than one mesh in parallel (distinguished
+// by the 'mesh' parameter), and each mesh can be made up of more than
+// one buffer (distinguished by the 'slot' parameter).
+//
+// Multiple meshes are under your control; use the 'selector' input
+// variable to choose which mesh each voxel's vertices are written to.
+// For example, you can use this to generate separate meshes for opaque
+// and transparent data.
+//
+// You can query the number of slots by calling stbvox_get_buffer_count
+// described below. The meaning of the buffer for each slot depends
+// on STBVOX_CONFIG_MODE.
+//
+//   In mode 0 & mode 20, there is only one slot. The mesh data for that
+//   slot is two interleaved vertex attributes: attr_vertex, a single
+//   32-bit uint, and attr_face, a single 32-bit uint.
+//
+//   In mode 1 & mode 21, there are two slots. The first buffer should
+//   be four times as large as the second buffer. The first buffer
+//   contains a single vertex attribute: 'attr_vertex', a single 32-bit uint.
+//   The second buffer contains texture buffer data (an array of 32-bit uints)
+//   that will be accessed through the sampler identified by STBVOX_UNIFORM_face_data.
+
+STBVXDEC int stbvox_get_buffer_count(stbvox_mesh_maker *mm);
+// Returns the number of buffers needed per mesh as described above.
+
+STBVXDEC int stbvox_get_buffer_size_per_quad(stbvox_mesh_maker *mm, int slot);
+// Returns how much of a given buffer will get used per quad. This
+// allows you to choose correct relative sizes for each buffer, although
+// the values are fixed based on the configuration you've selected at
+// compile time, and the details are described in stbvox_set_buffer.
+
+STBVXDEC void stbvox_set_default_mesh(stbvox_mesh_maker *mm, int mesh);
+// Selects which mesh the mesher will output to (see previous function)
+// if the input doesn't specify a per-voxel selector. (I doubt this is
+// useful, but it's here just in case.)
+
+STBVXDEC stbvox_input_description *stbvox_get_input_description(stbvox_mesh_maker *mm);
+// This function call returns a pointer to the stbvox_input_description part
+// of stbvox_mesh_maker (which you should otherwise treat as opaque). You
+// zero this structure, then fill out the relevant pointers to the data
+// describing your voxel object/world.
+//
+// See further documentation at the description of stbvox_input_description below.
+
+STBVXDEC void stbvox_set_input_stride(stbvox_mesh_maker *mm, int x_stride_in_elements, int y_stride_in_elements);
+// This sets the stride between successive elements of the 3D arrays
+// in the stbvox_input_description. Z values are always stored consecutively.
+// (The preferred coordinate system for stbvox is X right, Y forwards, Z up.)
+
+STBVXDEC void stbvox_set_input_range(stbvox_mesh_maker *mm, int x0, int y0, int z0, int x1, int y1, int z1);
+// This sets the range of values in the 3D array for the voxels that
+// the mesh generator will convert. The lower values are inclusive,
+// the higher values are exclusive, so (0,0,0) to (16,16,16) generates
+// mesh data associated with voxels up to (15,15,15) but no higher.
+//
+// The mesh generate generates faces at the boundary between open space
+// and solid space but associates them with the solid space, so if (15,0,0)
+// is open and (16,0,0) is solid, then the mesh will contain the boundary
+// between them if x0 <= 16 and x1 > 16.
+//
+// Note that the mesh generator will access array elements 1 beyond the
+// limits set in these parameters. For example, if you set the limits
+// to be (0,0,0) and (16,16,16), then the generator will access all of
+// the voxels between (-1,-1,-1) and (16,16,16), including (16,16,16).
+// You may have to do pointer arithmetic to make it work.
+//
+// For example, caveview processes mesh chunks that are 32x32x16, but it
+// does this using input buffers that are 34x34x18.
+//
+// The lower limits are x0 >= 0, y0 >= 0, and z0 >= 0.
+//
+// The upper limits are mode dependent, but all the current methods are
+// limited to x1 < 127, y1 < 127, z1 < 255. Note that these are not
+// powers of two; if you want to use power-of-two chunks (to make
+// it efficient to decide which chunk a coordinate falls in), you're
+// limited to at most x1=64, y1=64, z1=128. For classic Minecraft-style
+// worlds with limited vertical extent, I recommend using a single
+// chunk for the entire height, which limits the height to 255 blocks
+// (one less than Minecraft), and only chunk the map in X & Y.
+
+STBVXDEC int stbvox_make_mesh(stbvox_mesh_maker *mm);
+// Call this function to create mesh data for the currently configured
+// set of input data. This appends to the currently configured mesh output
+// buffer. Returns 1 on success. If there is not enough room in the buffer,
+// it outputs as much as it can, and returns 0; you need to switch output
+// buffers (either by calling stbvox_set_buffer to set new buffers, or
+// by copying the data out and calling stbvox_reset_buffers), and then
+// call this function again without changing any of the input parameters.
+//
+// Note that this function appends; you can call it multiple times to
+// build a single mesh. For example, caveview uses chunks that are
+// 32x32x255, but builds the mesh for it by processing 32x32x16 at atime
+// (this is faster as it is reuses the same 34x34x18 input buffers rather
+// than needing 34x34x257 input buffers).
+
+// Once you're done creating a mesh into a given buffer,
+// consider the following functions:
+
+STBVXDEC int stbvox_get_quad_count(stbvox_mesh_maker *mm, int mesh);
+// Returns the number of quads in the mesh currently generated by mm.
+// This is the sum of all consecutive stbvox_make_mesh runs appending
+// to the same buffer. 'mesh' distinguishes between the multiple user
+// meshes available via 'selector' or stbvox_set_default_mesh.
+//
+// Typically you use this function when you're done building the mesh
+// and want to record how to draw it.
+//
+// Note that there are no index buffers; the data stored in the buffers
+// should be drawn as quads (e.g. with GL_QUAD); if your API does not
+// support quads, you can create a single index buffer large enough to
+// draw your largest vertex buffer, and reuse it for every rendering.
+// (Note that if you use 32-bit indices, you'll use 24 bytes of bandwidth
+// per quad, more than the 20 bytes for the vertex/face mesh data.)
+
+STBVXDEC void stbvox_set_mesh_coordinates(stbvox_mesh_maker *mm, int x, int y, int z);
+// Sets the global coordinates for this chunk, such that (0,0,0) relative
+// coordinates will be at (x,y,z) in global coordinates.
+
+STBVXDEC void stbvox_get_bounds(stbvox_mesh_maker *mm, float bounds[2][3]);
+// Returns the bounds for the mesh in global coordinates. Use this
+// for e.g. frustum culling the mesh. @BUG: this just uses the
+// values from stbvox_set_input_range(), so if you build by
+// appending multiple values, this will be wrong, and you need to
+// set stbvox_set_input_range() to the full size. Someday this
+// will switch to tracking the actual bounds of the *mesh*, though.
+
+STBVXDEC void stbvox_get_transform(stbvox_mesh_maker *mm, float transform[3][3]);
+// Returns the 'transform' data for the shader uniforms. It is your
+// job to set this to the shader before drawing the mesh. It is the
+// only uniform that needs to change per-mesh. Note that it is not
+// a 3x3 matrix, but rather a scale to decode fixed point numbers as
+// floats, a translate from relative to global space, and a special
+// translation for texture coordinate generation that avoids
+// floating-point precision issues. @TODO: currently we add the
+// global translation to the vertex, than multiply by modelview,
+// but this means if camera location and vertex are far from the
+// origin, we lose precision. Need to make a special modelview with
+// the translation (or some of it) factored out to avoid this.
+
+STBVXDEC void stbvox_reset_buffers(stbvox_mesh_maker *mm);
+// Call this function if you're done with the current output buffer
+// but want to reuse it (e.g. you're done appending with
+// stbvox_make_mesh and you've copied the data out to your graphics API
+// so can reuse the buffer).
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// RENDERING
+//
+
+STBVXDEC char *stbvox_get_vertex_shader(void);
+// Returns the (currently GLSL-only) vertex shader.
+
+STBVXDEC char *stbvox_get_fragment_shader(void);
+// Returns the (currently GLSL-only) fragment shader.
+// You can override the lighting and fogging calculations
+// by appending data to the end of these; see the #define
+// documentation for more information.
+
+STBVXDEC char *stbvox_get_fragment_shader_alpha_only(void);
+// Returns a slightly cheaper fragment shader that computes
+// alpha but not color. This is useful for e.g. a depth-only
+// pass when using alpha test.
+
+typedef struct stbvox_uniform_info stbvox_uniform_info;
+
+STBVXDEC int stbvox_get_uniform_info(stbvox_uniform_info *info, int uniform);
+// Gets the information about a uniform necessary for you to
+// set up each uniform with a minimal amount of explicit code.
+// See the sample code after the structure definition for stbvox_uniform_info,
+// further down in this header section.
+//
+// "uniform" is from the list immediately following. For many
+// of these, default values are provided which you can set.
+// Most values are shared for most draw calls; e.g. for stateful
+// APIs you can set most of the state only once. Only
+// STBVOX_UNIFORM_transform needs to change per draw call.
+//
+// STBVOX_UNIFORM_texscale
+//    64- or 128-long vec4 array. (128 only if STBVOX_CONFIG_PREFER_TEXBUFFER)
+//    x: scale factor to apply to texture #1. must be a power of two. 1.0 means 'face-sized'
+//    y: scale factor to apply to texture #2. must be a power of two. 1.0 means 'face-sized'
+//    z: blend mode indexed by texture #2. 0.0 is alpha compositing; 1.0 is multiplication.
+//    w: unused currently. @TODO use to support texture animation?
+//
+//    Texscale is indexed by the bottom 6 or 7 bits of the texture id; thus for
+//    example the texture at index 0 in the array and the texture in index 128 of
+//    the array must be scaled the same. This means that if you only have 64 or 128
+//    unique textures, they all get distinct values anyway; otherwise you have
+//    to group them in pairs or sets of four.
+//
+// STBVOX_UNIFORM_ambient
+//    4-long vec4 array:
+//      ambient[0].xyz   - negative of direction of a directional light for half-lambert
+//      ambient[1].rgb   - color of light scaled by NdotL (can be negative)
+//      ambient[2].rgb   - constant light added to above calculation;
+//                         effectively light ranges from ambient[2]-ambient[1] to ambient[2]+ambient[1]
+//      ambient[3].rgb   - fog color for STBVOX_CONFIG_FOG_SMOOTHSTEP
+//      ambient[3].a     - reciprocal of squared distance of farthest fog point (viewing distance)
+
+
+                               //  +----- has a default value
+                               //  |  +-- you should always use the default value
+enum                           //  V  V
+{                              //  ------------------------------------------------
+   STBVOX_UNIFORM_face_data,   //  n      the sampler with the face texture buffer
+   STBVOX_UNIFORM_transform,   //  n      the transform data from stbvox_get_transform
+   STBVOX_UNIFORM_tex_array,   //  n      an array of two texture samplers containing the two texture arrays
+   STBVOX_UNIFORM_texscale,    //  Y      a table of texture properties, see above
+   STBVOX_UNIFORM_color_table, //  Y      64 vec4 RGBA values; a default palette is provided; if A > 1.0, fullbright
+   STBVOX_UNIFORM_normals,     //  Y  Y   table of normals, internal-only
+   STBVOX_UNIFORM_texgen,      //  Y  Y   table of texgen vectors, internal-only
+   STBVOX_UNIFORM_ambient,     //  n      lighting & fog info, see above
+   STBVOX_UNIFORM_camera_pos,  //  Y      camera position in global voxel space (for lighting & fog)
+
+   STBVOX_UNIFORM_count,
+};
+
+enum
+{
+   STBVOX_UNIFORM_TYPE_none,
+   STBVOX_UNIFORM_TYPE_sampler,
+   STBVOX_UNIFORM_TYPE_vec2,
+   STBVOX_UNIFORM_TYPE_vec3,
+   STBVOX_UNIFORM_TYPE_vec4,
+};
+
+struct stbvox_uniform_info
+{
+   int type;                    // which type of uniform
+   int bytes_per_element;       // the size of each uniform array element (e.g. vec3 = 12 bytes)
+   int array_length;            // length of the uniform array
+   char *name;                  // name in the shader @TODO use numeric binding
+   float *default_value;        // if not NULL, you can use this as the uniform pointer
+   int use_tex_buffer;          // if true, then the uniform is a sampler but the data can come from default_value
+};
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Uniform sample code
+//
+
+#if 0
+// Run this once per frame before drawing all the meshes.
+// You still need to separately set the 'transform' uniform for every mesh.
+void setup_uniforms(GLuint shader, float camera_pos[4], GLuint tex1, GLuint tex2)
+{
+   int i;
+   glUseProgram(shader); // so uniform binding works
+   for (i=0; i < STBVOX_UNIFORM_count; ++i) {
+      stbvox_uniform_info sui;
+      if (stbvox_get_uniform_info(&sui, i)) {
+         GLint loc = glGetUniformLocation(shader, sui.name);
+         if (loc != -1) {
+            switch (i) {
+               case STBVOX_UNIFORM_camera_pos: // only needed for fog
+                  glUniform4fv(loc, sui.array_length, camera_pos);
+                  break;
+
+               case STBVOX_UNIFORM_tex_array: {
+                  GLuint tex_unit[2] = { 0, 1 }; // your choice of samplers
+                  glUniform1iv(loc, 2, tex_unit);
+
+                  glActiveTexture(GL_TEXTURE0 + tex_unit[0]); glBindTexture(GL_TEXTURE_2D_ARRAY, tex1);
+                  glActiveTexture(GL_TEXTURE0 + tex_unit[1]); glBindTexture(GL_TEXTURE_2D_ARRAY, tex2);
+                  glActiveTexture(GL_TEXTURE0); // reset to default
+                  break;
+               }
+
+               case STBVOX_UNIFORM_face_data:
+                  glUniform1i(loc, SAMPLER_YOU_WILL_BIND_PER_MESH_FACE_DATA_TO);
+                  break;
+
+               case STBVOX_UNIFORM_ambient:     // you definitely want to override this
+               case STBVOX_UNIFORM_color_table: // you might want to override this
+               case STBVOX_UNIFORM_texscale:    // you may want to override this
+                  glUniform4fv(loc, sui.array_length, sui.default_value);
+                  break;
+
+               case STBVOX_UNIFORM_normals:     // you never want to override this
+               case STBVOX_UNIFORM_texgen:      // you never want to override this
+                  glUniform3fv(loc, sui.array_length, sui.default_value);
+                  break;
+            }
+         }
+      }
+   }
+}
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// INPUT TO MESHING
+//
+
+// Shapes of blocks that aren't always cubes
+enum
+{
+   STBVOX_GEOM_empty,
+   STBVOX_GEOM_knockout,  // creates a hole in the mesh
+   STBVOX_GEOM_solid,
+   STBVOX_GEOM_transp,    // solid geometry, but transparent contents so neighbors generate normally, unless same blocktype
+
+   // following 4 can be represented by vheight as well
+   STBVOX_GEOM_slab_upper,
+   STBVOX_GEOM_slab_lower,
+   STBVOX_GEOM_floor_slope_north_is_top,
+   STBVOX_GEOM_ceil_slope_north_is_bottom,
+
+   STBVOX_GEOM_floor_slope_north_is_top_as_wall_UNIMPLEMENTED,   // same as floor_slope above, but uses wall's texture & texture projection
+   STBVOX_GEOM_ceil_slope_north_is_bottom_as_wall_UNIMPLEMENTED,
+   STBVOX_GEOM_crossed_pair,    // corner-to-corner pairs, with normal vector bumped upwards
+   STBVOX_GEOM_force,           // like GEOM_transp, but faces visible even if neighbor is same type, e.g. minecraft fancy leaves
+
+   // these access vheight input
+   STBVOX_GEOM_floor_vheight_03 = 12,  // diagonal is SW-NE
+   STBVOX_GEOM_floor_vheight_12,       // diagonal is SE-NW
+   STBVOX_GEOM_ceil_vheight_03,
+   STBVOX_GEOM_ceil_vheight_12,
+
+   STBVOX_GEOM_count, // number of geom cases
+};
+
+enum
+{
+   STBVOX_FACE_east,
+   STBVOX_FACE_north,
+   STBVOX_FACE_west,
+   STBVOX_FACE_south,
+   STBVOX_FACE_up,
+   STBVOX_FACE_down,
+
+   STBVOX_FACE_count,
+};
+
+#ifdef STBVOX_CONFIG_BLOCKTYPE_SHORT
+typedef unsigned short stbvox_block_type;
+#else
+typedef unsigned char stbvox_block_type;
+#endif
+
+// 24-bit color
+typedef struct
+{
+   unsigned char r,g,b;
+} stbvox_rgb;
+
+#define STBVOX_COLOR_TEX1_ENABLE   64
+#define STBVOX_COLOR_TEX2_ENABLE  128
+
+// This is the data structure you fill out. Most of the arrays can be
+// NULL, except when one is required to get the value to index another.
+//
+// The compass system used in the following descriptions is:
+//     east means increasing x
+//     north means increasing y
+//     up means increasing z
+struct stbvox_input_description
+{
+   unsigned char lighting_at_vertices;
+   // The default is lighting values (i.e. ambient occlusion) are at block
+   // center, and the vertex light is gathered from those adjacent block
+   // centers that the vertex is facing. This makes smooth lighting
+   // consistent across adjacent faces with the same orientation.
+   //
+   // Setting this flag to non-zero gives you explicit control
+   // of light at each vertex, but now the lighting/ao will be
+   // shared by all vertices at the same point, even if they
+   // have different normals.
+
+   // these are mostly 3D maps you use to define your voxel world, using x_stride and y_stride
+   // note that for cache efficiency, you want to use the block_foo palettes as much as possible instead
+
+   stbvox_rgb *rgb;
+   // Indexed by 3D coordinate.
+   // 24-bit voxel color for STBVOX_CONFIG_MODE = 20 or 21 only
+
+   unsigned char *lighting;
+   // Indexed by 3D coordinate. The lighting value / ambient occlusion
+   // value that is used to define the vertex lighting values.
+   // The raw lighting values are defined at the center of blocks
+   // (or at vertex if 'lighting_at_vertices' is true).
+   //
+   // If the macro STBVOX_CONFIG_ROTATION_IN_LIGHTING is defined,
+   // then an additional 2-bit block rotation value is stored
+   // in this field as well.
+   //
+   // Encode with STBVOX_MAKE_LIGHTING_EXT(lighting,rot)--here
+   // 'lighting' should still be 8 bits, as the macro will
+   // discard the bottom bits automatically. Similarly, if
+   // using STBVOX_CONFIG_VHEIGHT_IN_LIGHTING, encode with
+   // STBVOX_MAKE_LIGHTING_EXT(lighting,vheight).
+   //
+   // (Rationale: rotation needs to be independent of blocktype,
+   // but is only 2 bits so doesn't want to be its own array.
+   // Lighting is the one thing that was likely to already be
+   // in use and that I could easily steal 2 bits from.)
+
+   stbvox_block_type *blocktype;
+   // Indexed by 3D coordinate. This is a core "block type" value, which is used
+   // to index into other arrays; essentially a "palette". This is much more
+   // memory-efficient and performance-friendly than storing the values explicitly,
+   // but only makes sense if the values are always synchronized.
+   //
+   // If a voxel's blocktype is 0, it is assumed to be empty (STBVOX_GEOM_empty),
+   // and no other blocktypes should be STBVOX_GEOM_empty. (Only if you do not
+   // have blocktypes should STBVOX_GEOM_empty ever used.)
+   //
+   // Normally it is an unsigned byte, but you can override it to be
+   // a short if you have too many blocktypes.
+
+   unsigned char *geometry;
+   // Indexed by 3D coordinate. Contains the geometry type for the block.
+   // Also contains a 2-bit rotation for how the whole block is rotated.
+   // Also includes a 2-bit vheight value when using shared vheight values.
+   // See the separate vheight documentation.
+   // Encode with STBVOX_MAKE_GEOMETRY(geom, rot, vheight)
+
+   unsigned char *block_geometry;
+   // Array indexed by blocktype containing the geometry for this block, plus
+   // a 2-bit "simple rotation". Note rotation has limited use since it's not
+   // independent of blocktype.
+   //
+   // Encode with STBVOX_MAKE_GEOMETRY(geom,simple_rot,0)
+
+   unsigned char *block_tex1;
+   // Array indexed by blocktype containing the texture id for texture #1.
+
+   unsigned char (*block_tex1_face)[6];
+   // Array indexed by blocktype and face containing the texture id for texture #1.
+   // The N/E/S/W face choices can be rotated by one of the rotation selectors;
+   // The top & bottom face textures will rotate to match.
+   // Note that it only makes sense to use one of block_tex1 or block_tex1_face;
+   // this pattern repeats throughout and this notice is not repeated.
+
+   unsigned char *tex2;
+   // Indexed by 3D coordinate. Contains the texture id for texture #2
+   // to use on all faces of the block.
+
+   unsigned char *block_tex2;
+   // Array indexed by blocktype containing the texture id for texture #2.
+
+   unsigned char (*block_tex2_face)[6];
+   // Array indexed by blocktype and face containing the texture id for texture #2.
+   // The N/E/S/W face choices can be rotated by one of the rotation selectors;
+   // The top & bottom face textures will rotate to match.
+
+   unsigned char *color;
+   // Indexed by 3D coordinate. Contains the color for all faces of the block.
+   // The core color value is 0..63.
+   // Encode with STBVOX_MAKE_COLOR(color_number, tex1_enable, tex2_enable)
+
+   unsigned char *block_color;
+   // Array indexed by blocktype containing the color value to apply to the faces.
+   // The core color value is 0..63.
+   // Encode with STBVOX_MAKE_COLOR(color_number, tex1_enable, tex2_enable)
+
+   unsigned char (*block_color_face)[6];
+   // Array indexed by blocktype and face containing the color value to apply to that face.
+   // The core color value is 0..63.
+   // Encode with STBVOX_MAKE_COLOR(color_number, tex1_enable, tex2_enable)
+
+   unsigned char *block_texlerp;
+   // Array indexed by blocktype containing 3-bit scalar for texture #2 alpha
+   // (known throughout as 'texlerp'). This is constant over every face even
+   // though the property is potentially per-vertex.
+
+   unsigned char (*block_texlerp_face)[6];
+   // Array indexed by blocktype and face containing 3-bit scalar for texture #2 alpha.
+   // This is constant over the face even though the property is potentially per-vertex.
+
+   unsigned char *block_vheight;
+   // Array indexed by blocktype containing the vheight values for the
+   // top or bottom face of this block. These will rotate properly if the
+   // block is rotated. See discussion of vheight.
+   // Encode with STBVOX_MAKE_VHEIGHT(sw_height, se_height, nw_height, ne_height)
+
+   unsigned char *selector;
+   // Array indexed by 3D coordinates indicating which output mesh to select.
+
+   unsigned char *block_selector;
+   // Array indexed by blocktype indicating which output mesh to select.
+
+   unsigned char *side_texrot;
+   // Array indexed by 3D coordinates encoding 2-bit texture rotations for the
+   // faces on the E/N/W/S sides of the block.
+   // Encode with STBVOX_MAKE_SIDE_TEXROT(rot_e, rot_n, rot_w, rot_s)
+
+   unsigned char *block_side_texrot;
+   // Array indexed by blocktype encoding 2-bit texture rotations for the faces
+   // on the E/N/W/S sides of the block.
+   // Encode with STBVOX_MAKE_SIDE_TEXROT(rot_e, rot_n, rot_w, rot_s)
+
+   unsigned char *overlay;                 // index into palettes listed below
+   // Indexed by 3D coordinate. If 0, there is no overlay. If non-zero,
+   // it indexes into to the below arrays and overrides the values
+   // defined by the blocktype.
+
+   unsigned char (*overlay_tex1)[6];
+   // Array indexed by overlay value and face, containing an override value
+   // for the texture id for texture #1. If 0, the value defined by blocktype
+   // is used.
+
+   unsigned char (*overlay_tex2)[6];
+   // Array indexed by overlay value and face, containing an override value
+   // for the texture id for texture #2. If 0, the value defined by blocktype
+   // is used.
+
+   unsigned char (*overlay_color)[6];
+   // Array indexed by overlay value and face, containing an override value
+   // for the face color. If 0, the value defined by blocktype is used.
+
+   unsigned char *overlay_side_texrot;
+   // Array indexed by overlay value, encoding 2-bit texture rotations for the faces
+   // on the E/N/W/S sides of the block.
+   // Encode with STBVOX_MAKE_SIDE_TEXROT(rot_e, rot_n, rot_w, rot_s)
+
+   unsigned char *rotate;
+   // Indexed by 3D coordinate. Allows independent rotation of several
+   // parts of the voxel, where by rotation I mean swapping textures
+   // and colors between E/N/S/W faces.
+   //    Block: rotates anything indexed by blocktype
+   //    Overlay: rotates anything indexed by overlay
+   //    EColor: rotates faces defined in ecolor_facemask
+   // Encode with STBVOX_MAKE_MATROT(block,overlay,ecolor)
+
+   unsigned char *tex2_for_tex1;
+   // Array indexed by tex1 containing the texture id for texture #2.
+   // You can use this if the two are always/almost-always strictly
+   // correlated (e.g. if tex2 is a detail texture for tex1), as it
+   // will be more efficient (touching fewer cache lines) than using
+   // e.g. block_tex2_face.
+
+   unsigned char *tex2_replace;
+   // Indexed by 3D coordinate. Specifies the texture id for texture #2
+   // to use on a single face of the voxel, which must be E/N/W/S (not U/D).
+   // The texture id is limited to 6 bits unless tex2_facemask is also
+   // defined (see below).
+   // Encode with STBVOX_MAKE_TEX2_REPLACE(tex2, face)
+
+   unsigned char *tex2_facemask;
+   // Indexed by 3D coordinate. Specifies which of the six faces should
+   // have their tex2 replaced by the value of tex2_replace. In this
+   // case, all 8 bits of tex2_replace are used as the texture id.
+   // Encode with STBVOX_MAKE_FACE_MASK(east,north,west,south,up,down)
+
+   unsigned char *extended_color;
+   // Indexed by 3D coordinate. Specifies a value that indexes into
+   // the ecolor arrays below (both of which must be defined).
+
+   unsigned char *ecolor_color;
+   // Indexed by extended_color value, specifies an optional override
+   // for the color value on some faces.
+   // Encode with STBVOX_MAKE_COLOR(color_number, tex1_enable, tex2_enable)
+
+   unsigned char *ecolor_facemask;
+   // Indexed by extended_color value, this specifies which faces the
+   // color in ecolor_color should be applied to. The faces can be
+   // independently rotated by the ecolor value of 'rotate', if it exists.
+   // Encode with STBVOX_MAKE_FACE_MASK(e,n,w,s,u,d)
+
+   unsigned char *color2;
+   // Indexed by 3D coordinates, specifies an alternative color to apply
+   // to some of the faces of the block.
+   // Encode with STBVOX_MAKE_COLOR(color_number, tex1_enable, tex2_enable)
+
+   unsigned char *color2_facemask;
+   // Indexed by 3D coordinates, specifies which faces should use the
+   // color defined in color2. No rotation value is applied.
+   // Encode with STBVOX_MAKE_FACE_MASK(e,n,w,s,u,d)
+
+   unsigned char *color3;
+   // Indexed by 3D coordinates, specifies an alternative color to apply
+   // to some of the faces of the block.
+   // Encode with STBVOX_MAKE_COLOR(color_number, tex1_enable, tex2_enable)
+
+   unsigned char *color3_facemask;
+   // Indexed by 3D coordinates, specifies which faces should use the
+   // color defined in color3. No rotation value is applied.
+   // Encode with STBVOX_MAKE_FACE_MASK(e,n,w,s,u,d)
+
+   unsigned char *texlerp_simple;
+   // Indexed by 3D coordinates, this is the smallest texlerp encoding
+   // that can do useful work. It consits of three values: baselerp,
+   // vertlerp, and face_vertlerp. Baselerp defines the value
+   // to use on all of the faces but one, from the STBVOX_TEXLERP_BASE
+   // values. face_vertlerp is one of the 6 face values (or STBVOX_FACE_NONE)
+   // which specifies the face should use the vertlerp values.
+   // Vertlerp defines a lerp value at every vertex of the mesh.
+   // Thus, one face can have per-vertex texlerp values, and those
+   // values are encoded in the space so that they will be shared
+   // by adjacent faces that also use vertlerp, allowing continuity
+   // (this is used for the "texture crossfade" bit of the release video).
+   // Encode with STBVOX_MAKE_TEXLERP_SIMPLE(baselerp, vertlerp, face_vertlerp)
+
+   // The following texlerp encodings are experimental and maybe not
+   // that useful.
+
+   unsigned char *texlerp;
+   // Indexed by 3D coordinates, this defines four values:
+   //   vertlerp is a lerp value at every vertex of the mesh (using STBVOX_TEXLERP_BASE values).
+   //   ud is the value to use on up and down faces, from STBVOX_TEXLERP_FACE values
+   //   ew is the value to use on east and west faces, from STBVOX_TEXLERP_FACE values
+   //   ns is the value to use on north and south faces, from STBVOX_TEXLERP_FACE values
+   // If any of ud, ew, or ns is STBVOX_TEXLERP_FACE_use_vert, then the
+   // vertlerp values for the vertices are gathered and used for those faces.
+   // Encode with STBVOX_MAKE_TEXLERP(vertlerp,ud,ew,sw)
+
+   unsigned short *texlerp_vert3;
+   // Indexed by 3D coordinates, this works with texlerp and
+   // provides a unique texlerp value for every direction at
+   // every vertex. The same rules of whether faces share values
+   // applies. The STBVOX_TEXLERP_FACE vertlerp value defined in
+   // texlerp is only used for the down direction. The values at
+   // each vertex in other directions are defined in this array,
+   // and each uses the STBVOX_TEXLERP3 values (i.e. full precision
+   // 3-bit texlerp values).
+   // Encode with STBVOX_MAKE_VERT3(vertlerp_e,vertlerp_n,vertlerp_w,vertlerp_s,vertlerp_u)
+
+   unsigned short *texlerp_face3;          // e:3,n:3,w:3,s:3,u:2,d:2
+   // Indexed by 3D coordinates, this provides a compact way to
+   // fully specify the texlerp value indepenendly for every face,
+   // but doesn't allow per-vertex variation. E/N/W/S values are
+   // encoded using STBVOX_TEXLERP3 values, whereas up and down
+   // use STBVOX_TEXLERP_SIMPLE values.
+   // Encode with STBVOX_MAKE_FACE3(face_e,face_n,face_w,face_s,face_u,face_d)
+
+   unsigned char *vheight;                 // STBVOX_MAKE_VHEIGHT   -- sw:2, se:2, nw:2, ne:2, doesn't rotate
+   // Indexed by 3D coordinates, this defines the four
+   // vheight values to use if the geometry is STBVOX_GEOM_vheight*.
+   // See the vheight discussion.
+
+   unsigned char *packed_compact;
+   // Stores block rotation, vheight, and texlerp values:
+   //    block rotation: 2 bits
+   //    vertex vheight: 2 bits
+   //    use_texlerp   : 1 bit
+   //    vertex texlerp: 3 bits
+   // If STBVOX_CONFIG_UP_TEXLERP_PACKED is defined, then 'vertex texlerp' is
+   // used for up faces if use_texlerp is 1. If STBVOX_CONFIG_DOWN_TEXLERP_PACKED
+   // is defined, then 'vertex texlerp' is used for down faces if use_texlerp is 1.
+   // Note if those symbols are defined but packed_compact is NULL, the normal
+   // texlerp default will be used.
+   // Encode with STBVOX_MAKE_PACKED_COMPACT(rot, vheight, texlerp, use_texlerp)
+};
+// @OPTIMIZE allow specializing; build a single struct with all of the
+// 3D-indexed arrays combined so it's AoS instead of SoA for better
+// cache efficiency
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  VHEIGHT DOCUMENTATION
+//
+//  "vheight" is the internal name for the special block types
+//  with sloped tops or bottoms. "vheight" stands for "vertex height".
+//
+//  Note that these blocks are very flexible (there are 256 of them,
+//  although at least 17 of them should never be used), but they
+//  also have a disadvantage that they generate extra invisible
+//  faces; the generator does not currently detect whether adjacent
+//  vheight blocks hide each others sides, so those side faces are
+//  always generated. For a continuous ground terrain, this means
+//  that you may generate 5x as many quads as needed. See notes
+//  on "improvements for shipping products" in the introduction.
+
+enum
+{
+   STBVOX_VERTEX_HEIGHT_0,
+   STBVOX_VERTEX_HEIGHT_half,
+   STBVOX_VERTEX_HEIGHT_1,
+   STBVOX_VERTEX_HEIGHT_one_and_a_half,
+};
+// These are the "vheight" values. Vheight stands for "vertex height".
+// The idea is that for a "floor vheight" block, you take a cube and
+// reposition the top-most vertices at various heights as specified by
+// the vheight values. Similarly, a "ceiling vheight" block takes a
+// cube and repositions the bottom-most vertices.
+//
+// A floor block only adjusts the top four vertices; the bottom four vertices
+// remain at the bottom of the block. The height values are 2 bits,
+// measured in halves of a block; so you can specify heights of 0/2,
+// 1/2, 2/2, or 3/2. 0 is the bottom of the block, 1 is halfway
+// up the block, 2 is the top of the block, and 3 is halfway up the
+// next block (and actually outside of the block). The value 3 is
+// actually legal for floor vheight (but not ceiling), and allows you to:
+//
+//     (A) have smoother terrain by having slopes that cross blocks,
+//         e.g. (1,1,3,3) is a regular-seeming slope halfway between blocks
+//     (B) make slopes steeper than 45-degrees, e.g. (0,0,3,3)
+//
+// (Because only z coordinates have half-block precision, and x&y are
+// limited to block corner precision, it's not possible to make these
+// things "properly" out of blocks, e.g. a half-slope block on its side
+// or a sloped block halfway between blocks that's made out of two blocks.)
+//
+// If you define STBVOX_CONFIG_OPTIMIZED_VHEIGHT, then the top face
+// (or bottom face for a ceiling vheight block) will be drawn as a
+// single quad even if the four vertex heights aren't planar, and a
+// single normal will be used over the entire quad. If you
+// don't define it, then if the top face is non-planar, it will be
+// split into two triangles, each with their own normal/lighting.
+// (Note that since all output from stb_voxel_render is quad meshes,
+// triangles are actually rendered as degenerate quads.) In this case,
+// the distinction between STBVOX_GEOM_floor_vheight_03 and
+// STBVOX_GEOM_floor_vheight_12 comes into play; the former introduces
+// an edge from the SW to NE corner (i.e. from <0,0,?> to <1,1,?>),
+// while the latter introduces an edge from the NW to SE corner
+// (i.e. from <0,1,?> to <1,0,?>.) For a "lazy mesh" look, use
+// exclusively _03 or _12. For a "classic mesh" look, alternate
+// _03 and _12 in a checkerboard pattern. For a "smoothest surface"
+// look, choose the edge based on actual vertex heights.
+//
+// The four vertex heights can come from several places. The simplest
+// encoding is to just use the 'vheight' parameter which stores four
+// explicit vertex heights for every block. This allows total independence,
+// but at the cost of the largest memory usage, 1 byte per 3D block.
+// Encode this with STBVOX_MAKE_VHEIGHT(vh_sw, vh_se, vh_nw, vh_ne).
+// These coordinates are absolute, not affected by block rotations.
+//
+// An alternative if you just want to encode some very specific block
+// types, not all the possibilities--say you just want half-height slopes,
+// so you want (0,0,1,1) and (1,1,2,2)--then you can use block_vheight
+// to specify them. The geometry rotation will cause block_vheight values
+// to be rotated (because it's as if you're just defining a type of
+// block). This value is also encoded with STBVOX_MAKE_VHEIGHT.
+//
+// If you want to save memory and you're creating a "continuous ground"
+// sort of effect, you can make each vertex of the lattice share the
+// vheight value; that is, two adjacent blocks that share a vertex will
+// always get the same vheight value for that vertex. Then you need to
+// store two bits of vheight for every block, which you do by storing it
+// as part another data structure. Store the south-west vertex's vheight
+// with the block. You can either use the "geometry" mesh variable (it's
+// a parameter to STBVOX_MAKE_GEOMETRY) or you can store it in the
+// "lighting" mesh variable if you defined STBVOX_CONFIG_VHEIGHT_IN_LIGHTING,
+// using STBVOX_MAKE_LIGHTING_EXT(lighting,vheight).
+//
+// Note that if you start with a 2D height map and generate vheight data from
+// it, you don't necessarily store only one value per (x,y) coordinate,
+// as the same value may need to be set up at multiple z heights. For
+// example, if height(8,8) = 13.5, then you want the block at (8,8,13)
+// to store STBVOX_VERTEX_HEIGHT_half, and this will be used by blocks
+// at (7,7,13), (8,7,13), (7,8,13), and (8,8,13). However, if you're
+// allowing steep slopes, it might be the case that you have a block
+// at (7,7,12) which is supposed to stick up to 13.5; that means
+// you also need to store STBVOX_VERTEX_HEIGHT_one_and_a_half at (8,8,12).
+
+enum
+{
+   STBVOX_TEXLERP_FACE_0,
+   STBVOX_TEXLERP_FACE_half,
+   STBVOX_TEXLERP_FACE_1,
+   STBVOX_TEXLERP_FACE_use_vert,
+};
+
+enum
+{
+   STBVOX_TEXLERP_BASE_0,    // 0.0
+   STBVOX_TEXLERP_BASE_2_7,  // 2/7
+   STBVOX_TEXLERP_BASE_5_7,  // 4/7
+   STBVOX_TEXLERP_BASE_1     // 1.0
+};
+
+enum
+{
+   STBVOX_TEXLERP3_0_8,
+   STBVOX_TEXLERP3_1_8,
+   STBVOX_TEXLERP3_2_8,
+   STBVOX_TEXLERP3_3_8,
+   STBVOX_TEXLERP3_4_8,
+   STBVOX_TEXLERP3_5_8,
+   STBVOX_TEXLERP3_6_8,
+   STBVOX_TEXLERP3_7_8,
+};
+
+#define STBVOX_FACE_NONE  7
+
+#define STBVOX_BLOCKTYPE_EMPTY    0
+
+#ifdef STBVOX_BLOCKTYPE_SHORT
+#define STBVOX_BLOCKTYPE_HOLE  65535
+#else
+#define STBVOX_BLOCKTYPE_HOLE    255
+#endif
+
+#define STBVOX_MAKE_GEOMETRY(geom, rotate, vheight) ((geom) + (rotate)*16 + (vheight)*64)
+#define STBVOX_MAKE_VHEIGHT(v_sw, v_se, v_nw, v_ne) ((v_sw) + (v_se)*4 + (v_nw)*16 + (v_ne)*64)
+#define STBVOX_MAKE_MATROT(block, overlay, color)  ((block) + (overlay)*4 + (color)*64)
+#define STBVOX_MAKE_TEX2_REPLACE(tex2, tex2_replace_face) ((tex2) + ((tex2_replace_face) & 3)*64)
+#define STBVOX_MAKE_TEXLERP(ns2, ew2, ud2, vert)  ((ew2) + (ns2)*4 + (ud2)*16 + (vert)*64)
+#define STBVOX_MAKE_TEXLERP_SIMPLE(baselerp,vert,face)   ((vert)*32 + (face)*4 + (baselerp))
+#define STBVOX_MAKE_TEXLERP1(vert,e2,n2,w2,s2,u4,d2) STBVOX_MAKE_TEXLERP(s2, w2, d2, vert)
+#define STBVOX_MAKE_TEXLERP2(vert,e2,n2,w2,s2,u4,d2) ((u2)*16 + (n2)*4 + (s2))
+#define STBVOX_MAKE_FACE_MASK(e,n,w,s,u,d)  ((e)+(n)*2+(w)*4+(s)*8+(u)*16+(d)*32)
+#define STBVOX_MAKE_SIDE_TEXROT(e,n,w,s) ((e)+(n)*4+(w)*16+(s)*64)
+#define STBVOX_MAKE_COLOR(color,t1,t2) ((color)+(t1)*64+(t2)*128)
+#define STBVOX_MAKE_TEXLERP_VERT3(e,n,w,s,u)   ((e)+(n)*8+(w)*64+(s)*512+(u)*4096)
+#define STBVOX_MAKE_TEXLERP_FACE3(e,n,w,s,u,d) ((e)+(n)*8+(w)*64+(s)*512+(u)*4096+(d)*16384)
+#define STBVOX_MAKE_PACKED_COMPACT(rot, vheight, texlerp, def) ((rot)+4*(vheight)+16*(use)+32*(texlerp))
+
+#define STBVOX_MAKE_LIGHTING_EXT(lighting, rot)  (((lighting)&~3)+(rot))
+#define STBVOX_MAKE_LIGHTING(lighting)       (lighting)
+
+#ifndef STBVOX_MAX_MESHES
+#define STBVOX_MAX_MESHES      2           // opaque & transparent
+#endif
+
+#define STBVOX_MAX_MESH_SLOTS  3           // one vertex & two faces, or two vertex and one face
+
+
+// don't mess with this directly, it's just here so you can
+// declare stbvox_mesh_maker on the stack or as a global
+struct stbvox_mesh_maker
+{
+   stbvox_input_description input;
+   int cur_x, cur_y, cur_z;       // last unprocessed voxel if it splits into multiple buffers
+   int x0,y0,z0,x1,y1,z1;
+   int x_stride_in_bytes;
+   int y_stride_in_bytes;
+   int config_dirty;
+   int default_mesh;
+   unsigned int tags;
+
+   int cube_vertex_offset[6][4]; // this allows access per-vertex data stored block-centered (like texlerp, ambient)
+   int vertex_gather_offset[6][4];
+
+   int pos_x,pos_y,pos_z;
+   int full;
+
+   // computed from user input
+   char *output_cur   [STBVOX_MAX_MESHES][STBVOX_MAX_MESH_SLOTS];
+   char *output_end   [STBVOX_MAX_MESHES][STBVOX_MAX_MESH_SLOTS];
+   char *output_buffer[STBVOX_MAX_MESHES][STBVOX_MAX_MESH_SLOTS];
+   int   output_len   [STBVOX_MAX_MESHES][STBVOX_MAX_MESH_SLOTS];
+
+   // computed from config
+   int   output_size  [STBVOX_MAX_MESHES][STBVOX_MAX_MESH_SLOTS]; // per quad
+   int   output_step  [STBVOX_MAX_MESHES][STBVOX_MAX_MESH_SLOTS]; // per vertex or per face, depending
+   int   num_mesh_slots;
+
+   float default_tex_scale[128][2];
+};
+
+#endif //  INCLUDE_STB_VOXEL_RENDER_H
+
+
+#ifdef STB_VOXEL_RENDER_IMPLEMENTATION
+
+#include <stdlib.h>
+#include <assert.h>
+#include <string.h> // memset
+
+// have to use our own names to avoid the _MSC_VER path having conflicting type names
+#ifndef _MSC_VER
+   #include <stdint.h>
+   typedef uint16_t stbvox_uint16;
+   typedef uint32_t stbvox_uint32;
+#else
+   typedef unsigned short stbvox_uint16;
+   typedef unsigned int   stbvox_uint32;
+#endif
+
+#ifdef _MSC_VER
+   #define STBVOX_NOTUSED(v)  (void)(v)
+#else
+   #define STBVOX_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+
+
+#ifndef STBVOX_CONFIG_MODE
+#error "Must defined STBVOX_CONFIG_MODE to select the mode"
+#endif
+
+#if defined(STBVOX_CONFIG_ROTATION_IN_LIGHTING) && defined(STBVOX_CONFIG_VHEIGHT_IN_LIGHTING)
+#error "Can't store both rotation and vheight in lighting"
+#endif
+
+
+// The following are candidate voxel modes. Only modes 0, 1, and 20, and 21 are
+// currently implemented. Reducing the storage-per-quad further
+// shouldn't improve performance, although obviously it allow you
+// to create larger worlds without streaming.
+//
+//
+//                      -----------  Two textures -----------       -- One texture --     ---- Color only ----
+//            Mode:     0     1     2     3     4     5     6        10    11    12      20    21    22    23    24
+// ============================================================================================================
+//  uses Tex Buffer     n     Y     Y     Y     Y     Y     Y         Y     Y     Y       n     Y     Y     Y     Y
+//   bytes per quad    32    20    14    12    10     6     6         8     8     4      32    20    10     6     4
+//       non-blocks   all   all   some  some  some slabs stairs     some  some  none    all   all  slabs slabs  none
+//             tex1   256   256   256   256   256   256   256       256   256   256       n     n     n     n     n
+//             tex2   256   256   256   256   256   256   128         n     n     n       n     n     n     n     n
+//           colors    64    64    64    64    64    64    64         8     n     n     2^24  2^24  2^24  2^24  256
+//        vertex ao     Y     Y     Y     Y     Y     n     n         Y     Y     n       Y     Y     Y     n     n
+//   vertex texlerp     Y     Y     Y     n     n     n     n         -     -     -       -     -     -     -     -
+//      x&y extents   127   127   128    64    64   128    64        64   128   128     127   127   128   128   128
+//        z extents   255   255   128    64?   64?   64    64        32    64   128     255   255   128    64   128
+
+// not sure why I only wrote down the above "result data" and didn't preserve
+// the vertex formats, but here I've tried to reconstruct the designs...
+//     mode # 3 is wrong, one byte too large, but they may have been an error originally
+
+//            Mode:     0     1     2     3     4     5     6        10    11    12      20    21    22    23    24
+// =============================================================================================================
+//   bytes per quad    32    20    14    12    10     6     6         8     8     4            20    10     6     4
+//
+//    vertex x bits     7     7     0     6     0     0     0         0     0     0             7     0     0     0
+//    vertex y bits     7     7     0     0     0     0     0         0     0     0             7     0     0     0
+//    vertex z bits     9     9     7     4     2     0     0         2     2     0             9     2     0     0
+//   vertex ao bits     6     6     6     6     6     0     0         6     6     0             6     6     0     0
+//  vertex txl bits     3     3     3     0     0     0     0         0     0     0            (3)    0     0     0
+//
+//   face tex1 bits    (8)    8     8     8     8     8     8         8     8     8
+//   face tex2 bits    (8)    8     8     8     8     8     7         -     -     -
+//  face color bits    (8)    8     8     8     8     8     8         3     0     0            24    24    24     8
+// face normal bits    (8)    8     8     8     6     4     7         4     4     3             8     3     4     3
+//      face x bits                 7     0     6     7     6         6     7     7             0     7     7     7
+//      face y bits                 7     6     6     7     6         6     7     7             0     7     7     7
+//      face z bits                 2     2     6     6     6         5     6     7             0     7     6     7
+
+
+#if STBVOX_CONFIG_MODE==0 || STBVOX_CONFIG_MODE==1
+
+   #define STBVOX_ICONFIG_VERTEX_32
+   #define STBVOX_ICONFIG_FACE1_1
+
+#elif STBVOX_CONFIG_MODE==20 || STBVOX_CONFIG_MODE==21
+
+   #define STBVOX_ICONFIG_VERTEX_32
+   #define STBVOX_ICONFIG_FACE1_1
+   #define STBVOX_ICONFIG_UNTEXTURED
+
+#else
+#error "Selected value of STBVOX_CONFIG_MODE is not supported"
+#endif
+
+#if STBVOX_CONFIG_MODE==0 || STBVOX_CONFIG_MODE==20
+#define STBVOX_ICONFIG_FACE_ATTRIBUTE
+#endif
+
+#ifndef STBVOX_CONFIG_HLSL
+// the fallback if all others are exhausted is GLSL
+#define STBVOX_ICONFIG_GLSL
+#endif
+
+#ifdef STBVOX_CONFIG_OPENGL_MODELVIEW
+#define STBVOX_ICONFIG_OPENGL_3_1_COMPATIBILITY
+#endif
+
+#if defined(STBVOX_ICONFIG_VERTEX_32)
+   typedef stbvox_uint32 stbvox_mesh_vertex;
+   #define stbvox_vertex_encode(x,y,z,ao,texlerp) \
+      ((stbvox_uint32) ((x)+((y)<<7)+((z)<<14)+((ao)<<23)+((texlerp)<<29)))
+#elif defined(STBVOX_ICONFIG_VERTEX_16_1)  // mode=2
+   typedef stbvox_uint16 stbvox_mesh_vertex;
+   #define stbvox_vertex_encode(x,y,z,ao,texlerp) \
+      ((stbvox_uint16) ((z)+((ao)<<7)+((texlerp)<<13)
+#elif defined(STBVOX_ICONFIG_VERTEX_16_2)  // mode=3
+   typedef stbvox_uint16 stbvox_mesh_vertex;
+   #define stbvox_vertex_encode(x,y,z,ao,texlerp) \
+      ((stbvox_uint16) ((x)+((z)<<6))+((ao)<<10))
+#elif defined(STBVOX_ICONFIG_VERTEX_8)
+   typedef stbvox_uint8 stbvox_mesh_vertex;
+   #define stbvox_vertex_encode(x,y,z,ao,texlerp) \
+      ((stbvox_uint8) ((z)+((ao)<<6))
+#else
+   #error "internal error, no vertex type"
+#endif
+
+#ifdef STBVOX_ICONFIG_FACE1_1
+   typedef struct
+   {
+      unsigned char tex1,tex2,color,face_info;
+   } stbvox_mesh_face;
+#else
+   #error "internal error, no face type"
+#endif
+
+
+// 20-byte quad format:
+//
+// per vertex:
+//
+//     x:7
+//     y:7
+//     z:9
+//     ao:6
+//     tex_lerp:3
+//
+// per face:
+//
+//     tex1:8
+//     tex2:8
+//     face:8
+//     color:8
+
+
+// Faces:
+//
+// Faces use the bottom 3 bits to choose the texgen
+// mode, and all the bits to choose the normal.
+// Thus the bottom 3 bits have to be:
+//      e, n, w, s, u, d, u, d
+//
+// These use compact names so tables are readable
+
+enum
+{
+   STBVF_e,
+   STBVF_n,
+   STBVF_w,
+   STBVF_s,
+   STBVF_u,
+   STBVF_d,
+   STBVF_eu,
+   STBVF_ed,
+
+   STBVF_eu_wall,
+   STBVF_nu_wall,
+   STBVF_wu_wall,
+   STBVF_su_wall,
+   STBVF_ne_u,
+   STBVF_ne_d,
+   STBVF_nu,
+   STBVF_nd,
+
+   STBVF_ed_wall,
+   STBVF_nd_wall,
+   STBVF_wd_wall,
+   STBVF_sd_wall,
+   STBVF_nw_u,
+   STBVF_nw_d,
+   STBVF_wu,
+   STBVF_wd,
+
+   STBVF_ne_u_cross,
+   STBVF_nw_u_cross,
+   STBVF_sw_u_cross,
+   STBVF_se_u_cross,
+   STBVF_sw_u,
+   STBVF_sw_d,
+   STBVF_su,
+   STBVF_sd,
+
+   // @TODO we need more than 5 bits to encode the normal to fit the following
+   // so for now we use the right projection but the wrong normal
+   STBVF_se_u = STBVF_su,
+   STBVF_se_d = STBVF_sd,
+
+   STBVF_count,
+};
+
+/////////////////////////////////////////////////////////////////////////////
+//
+//    tables -- i'd prefer if these were at the end of the file, but: C++
+//
+
+static float stbvox_default_texgen[2][32][3] =
+{
+   { {  0, 1,0 }, { 0, 0, 1 }, {  0,-1,0 }, { 0, 0,-1 },
+     { -1, 0,0 }, { 0, 0, 1 }, {  1, 0,0 }, { 0, 0,-1 },
+     {  0,-1,0 }, { 0, 0, 1 }, {  0, 1,0 }, { 0, 0,-1 },
+     {  1, 0,0 }, { 0, 0, 1 }, { -1, 0,0 }, { 0, 0,-1 },
+
+     {  1, 0,0 }, { 0, 1, 0 }, { -1, 0,0 }, { 0,-1, 0 },
+     { -1, 0,0 }, { 0,-1, 0 }, {  1, 0,0 }, { 0, 1, 0 },
+     {  1, 0,0 }, { 0, 1, 0 }, { -1, 0,0 }, { 0,-1, 0 },
+     { -1, 0,0 }, { 0,-1, 0 }, {  1, 0,0 }, { 0, 1, 0 },
+   },
+   { { 0, 0,-1 }, {  0, 1,0 }, { 0, 0, 1 }, {  0,-1,0 },
+     { 0, 0,-1 }, { -1, 0,0 }, { 0, 0, 1 }, {  1, 0,0 },
+     { 0, 0,-1 }, {  0,-1,0 }, { 0, 0, 1 }, {  0, 1,0 },
+     { 0, 0,-1 }, {  1, 0,0 }, { 0, 0, 1 }, { -1, 0,0 },
+
+     { 0,-1, 0 }, {  1, 0,0 }, { 0, 1, 0 }, { -1, 0,0 },
+     { 0, 1, 0 }, { -1, 0,0 }, { 0,-1, 0 }, {  1, 0,0 },
+     { 0,-1, 0 }, {  1, 0,0 }, { 0, 1, 0 }, { -1, 0,0 },
+     { 0, 1, 0 }, { -1, 0,0 }, { 0,-1, 0 }, {  1, 0,0 },
+   },
+};
+
+#define STBVOX_RSQRT2   0.7071067811865f
+#define STBVOX_RSQRT3   0.5773502691896f
+
+static float stbvox_default_normals[32][3] =
+{
+   { 1,0,0 },  // east
+   { 0,1,0 },  // north
+   { -1,0,0 }, // west
+   { 0,-1,0 }, // south
+   { 0,0,1 },  // up
+   { 0,0,-1 }, // down
+   {  STBVOX_RSQRT2,0, STBVOX_RSQRT2 }, // east & up
+   {  STBVOX_RSQRT2,0, -STBVOX_RSQRT2 }, // east & down
+
+   {  STBVOX_RSQRT2,0, STBVOX_RSQRT2 }, // east & up
+   { 0, STBVOX_RSQRT2, STBVOX_RSQRT2 }, // north & up
+   { -STBVOX_RSQRT2,0, STBVOX_RSQRT2 }, // west & up
+   { 0,-STBVOX_RSQRT2, STBVOX_RSQRT2 }, // south & up
+   {  STBVOX_RSQRT3, STBVOX_RSQRT3, STBVOX_RSQRT3 }, // ne & up
+   {  STBVOX_RSQRT3, STBVOX_RSQRT3,-STBVOX_RSQRT3 }, // ne & down
+   { 0, STBVOX_RSQRT2, STBVOX_RSQRT2 }, // north & up
+   { 0, STBVOX_RSQRT2, -STBVOX_RSQRT2 }, // north & down
+
+   {  STBVOX_RSQRT2,0, -STBVOX_RSQRT2 }, // east & down
+   { 0, STBVOX_RSQRT2, -STBVOX_RSQRT2 }, // north & down
+   { -STBVOX_RSQRT2,0, -STBVOX_RSQRT2 }, // west & down
+   { 0,-STBVOX_RSQRT2, -STBVOX_RSQRT2 }, // south & down
+   { -STBVOX_RSQRT3, STBVOX_RSQRT3, STBVOX_RSQRT3 }, // NW & up
+   { -STBVOX_RSQRT3, STBVOX_RSQRT3,-STBVOX_RSQRT3 }, // NW & down
+   { -STBVOX_RSQRT2,0, STBVOX_RSQRT2 }, // west & up
+   { -STBVOX_RSQRT2,0, -STBVOX_RSQRT2 }, // west & down
+
+   {  STBVOX_RSQRT3, STBVOX_RSQRT3,STBVOX_RSQRT3 }, // NE & up crossed
+   { -STBVOX_RSQRT3, STBVOX_RSQRT3,STBVOX_RSQRT3 }, // NW & up crossed
+   { -STBVOX_RSQRT3,-STBVOX_RSQRT3,STBVOX_RSQRT3 }, // SW & up crossed
+   {  STBVOX_RSQRT3,-STBVOX_RSQRT3,STBVOX_RSQRT3 }, // SE & up crossed
+   { -STBVOX_RSQRT3,-STBVOX_RSQRT3, STBVOX_RSQRT3 }, // SW & up
+   { -STBVOX_RSQRT3,-STBVOX_RSQRT3,-STBVOX_RSQRT3 }, // SW & up
+   { 0,-STBVOX_RSQRT2, STBVOX_RSQRT2 }, // south & up
+   { 0,-STBVOX_RSQRT2, -STBVOX_RSQRT2 }, // south & down
+};
+
+static float stbvox_default_texscale[128][4] =
+{
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+   {1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},{1,1,0,0},
+};
+
+static unsigned char stbvox_default_palette_compact[64][3] =
+{
+   { 255,255,255 }, { 238,238,238 }, { 221,221,221 }, { 204,204,204 },
+   { 187,187,187 }, { 170,170,170 }, { 153,153,153 }, { 136,136,136 },
+   { 119,119,119 }, { 102,102,102 }, {  85, 85, 85 }, {  68, 68, 68 },
+   {  51, 51, 51 }, {  34, 34, 34 }, {  17, 17, 17 }, {   0,  0,  0 },
+   { 255,240,240 }, { 255,220,220 }, { 255,160,160 }, { 255, 32, 32 },
+   { 200,120,160 }, { 200, 60,150 }, { 220,100,130 }, { 255,  0,128 },
+   { 240,240,255 }, { 220,220,255 }, { 160,160,255 }, {  32, 32,255 },
+   { 120,160,200 }, {  60,150,200 }, { 100,130,220 }, {   0,128,255 },
+   { 240,255,240 }, { 220,255,220 }, { 160,255,160 }, {  32,255, 32 },
+   { 160,200,120 }, { 150,200, 60 }, { 130,220,100 }, { 128,255,  0 },
+   { 255,255,240 }, { 255,255,220 }, { 220,220,180 }, { 255,255, 32 },
+   { 200,160,120 }, { 200,150, 60 }, { 220,130,100 }, { 255,128,  0 },
+   { 255,240,255 }, { 255,220,255 }, { 220,180,220 }, { 255, 32,255 },
+   { 160,120,200 }, { 150, 60,200 }, { 130,100,220 }, { 128,  0,255 },
+   { 240,255,255 }, { 220,255,255 }, { 180,220,220 }, {  32,255,255 },
+   { 120,200,160 }, {  60,200,150 }, { 100,220,130 }, {   0,255,128 },
+};
+
+static float stbvox_default_ambient[4][4] =
+{
+   { 0,0,1      ,0 }, // reversed lighting direction
+   { 0.5,0.5,0.5,0 }, // directional color
+   { 0.5,0.5,0.5,0 }, // constant color
+   { 0.5,0.5,0.5,1.0f/1000.0f/1000.0f }, // fog data for simple_fog
+};
+
+static float stbvox_default_palette[64][4];
+
+static void stbvox_build_default_palette(void)
+{
+   int i;
+   for (i=0; i < 64; ++i) {
+      stbvox_default_palette[i][0] = stbvox_default_palette_compact[i][0] / 255.0f;
+      stbvox_default_palette[i][1] = stbvox_default_palette_compact[i][1] / 255.0f;
+      stbvox_default_palette[i][2] = stbvox_default_palette_compact[i][2] / 255.0f;
+      stbvox_default_palette[i][3] = 1.0f;
+   }
+}
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Shaders
+//
+
+#if defined(STBVOX_ICONFIG_OPENGL_3_1_COMPATIBILITY)
+   #define STBVOX_SHADER_VERSION "#version 150 compatibility\n"
+#elif defined(STBVOX_ICONFIG_OPENGL_3_0)
+   #define STBVOX_SHADER_VERSION "#version 130\n"
+#elif defined(STBVOX_ICONFIG_GLSL)
+   #define STBVOX_SHADER_VERSION "#version 150\n"
+#else
+   #define STBVOX_SHADER_VERSION ""
+#endif
+
+static const char *stbvox_vertex_program =
+{
+      STBVOX_SHADER_VERSION
+
+   #ifdef STBVOX_ICONFIG_FACE_ATTRIBUTE  // NOT TAG_face_sampled
+      "in uvec4 attr_face;\n"
+   #else
+      "uniform usamplerBuffer facearray;\n"
+   #endif
+
+   #ifdef STBVOX_ICONFIG_FACE_ARRAY_2
+      "uniform usamplerBuffer facearray2;\n"
+   #endif
+
+      // vertex input data
+      "in uint attr_vertex;\n"
+
+      // per-buffer data
+      "uniform vec3 transform[3];\n"
+
+      // per-frame data
+      "uniform vec4 camera_pos;\n"  // 4th value is used for arbitrary hacking
+
+      // to simplify things, we avoid using more than 256 uniform vectors
+      // in fragment shader to avoid possible 1024 component limit, so
+      // we access this table in the fragment shader.
+      "uniform vec3 normal_table[32];\n"
+
+      #ifndef STBVOX_CONFIG_OPENGL_MODELVIEW
+         "uniform mat4x4 model_view;\n"
+      #endif
+
+      // fragment output data
+      "flat out uvec4  facedata;\n"
+      "     out  vec3  voxelspace_pos;\n"
+      "     out  vec3  vnormal;\n"
+      "     out float  texlerp;\n"
+      "     out float  amb_occ;\n"
+
+      // @TODO handle the HLSL way to do this
+      "void main()\n"
+      "{\n"
+      #ifdef STBVOX_ICONFIG_FACE_ATTRIBUTE
+         "   facedata = attr_face;\n"
+      #else
+         "   int faceID = gl_VertexID >> 2;\n"
+         "   facedata   = texelFetch(facearray, faceID);\n"
+      #endif
+
+      // extract data for vertex
+      "   vec3 offset;\n"
+      "   offset.x = float( (attr_vertex       ) & 127u );\n"             // a[0..6]
+      "   offset.y = float( (attr_vertex >>  7u) & 127u );\n"             // a[7..13]
+      "   offset.z = float( (attr_vertex >> 14u) & 511u );\n"             // a[14..22]
+      "   amb_occ  = float( (attr_vertex >> 23u) &  63u ) / 63.0;\n"      // a[23..28]
+      "   texlerp  = float( (attr_vertex >> 29u)        ) /  7.0;\n"      // a[29..31]
+
+      "   vnormal = normal_table[(facedata.w>>2u) & 31u];\n"
+      "   voxelspace_pos = offset * transform[0];\n"  // mesh-to-object scale
+      "   vec3 position  = voxelspace_pos + transform[1];\n"  // mesh-to-object translate
+
+      #ifdef STBVOX_DEBUG_TEST_NORMALS
+         "   if ((facedata.w & 28u) == 16u || (facedata.w & 28u) == 24u)\n"
+         "      position += vnormal.xyz * camera_pos.w;\n"
+      #endif
+
+      #ifndef STBVOX_CONFIG_OPENGL_MODELVIEW
+         "   gl_Position = model_view * vec4(position,1.0);\n"
+      #else
+         "   gl_Position = gl_ModelViewProjectionMatrix * vec4(position,1.0);\n"
+      #endif
+
+      "}\n"
+};
+
+
+static const char *stbvox_fragment_program =
+{
+      STBVOX_SHADER_VERSION
+
+      // rlerp is lerp but with t on the left, like god intended
+      #if defined(STBVOX_ICONFIG_GLSL)
+         "#define rlerp(t,x,y) mix(x,y,t)\n"
+      #elif defined(STBVOX_CONFIG_HLSL)
+         "#define rlerp(t,x,y) lerp(x,y,t)\n"
+      #else
+         #error "need definition of rlerp()"
+      #endif
+
+
+      // vertex-shader output data
+      "flat in uvec4  facedata;\n"
+      "     in  vec3  voxelspace_pos;\n"
+      "     in  vec3  vnormal;\n"
+      "     in float  texlerp;\n"
+      "     in float  amb_occ;\n"
+
+      // per-buffer data
+      "uniform vec3 transform[3];\n"
+
+      // per-frame data
+      "uniform vec4 camera_pos;\n"  // 4th value is used for arbitrary hacking
+
+      // probably constant data
+      "uniform vec4 ambient[4];\n"
+
+      #ifndef STBVOX_ICONFIG_UNTEXTURED
+         // generally constant data
+         "uniform sampler2DArray tex_array[2];\n"
+
+         #ifdef STBVOX_CONFIG_PREFER_TEXBUFFER
+            "uniform samplerBuffer color_table;\n"
+            "uniform samplerBuffer texscale;\n"
+            "uniform samplerBuffer texgen;\n"
+         #else
+            "uniform vec4 color_table[64];\n"
+            "uniform vec4 texscale[64];\n" // instead of 128, to avoid running out of uniforms
+            "uniform vec3 texgen[64];\n"
+         #endif
+      #endif
+
+      "out vec4  outcolor;\n"
+
+      #if defined(STBVOX_CONFIG_LIGHTING) || defined(STBVOX_CONFIG_LIGHTING_SIMPLE)
+      "vec3 compute_lighting(vec3 pos, vec3 norm, vec3 albedo, vec3 ambient);\n"
+      #endif
+      #if defined(STBVOX_CONFIG_FOG) || defined(STBVOX_CONFIG_FOG_SMOOTHSTEP)
+      "vec3 compute_fog(vec3 color, vec3 relative_pos, float fragment_alpha);\n"
+      #endif
+
+      "void main()\n"
+      "{\n"
+      "   vec3 albedo;\n"
+      "   float fragment_alpha;\n"
+
+      #ifndef STBVOX_ICONFIG_UNTEXTURED
+         // unpack the values
+         "   uint tex1_id = facedata.x;\n"
+         "   uint tex2_id = facedata.y;\n"
+         "   uint texprojid = facedata.w & 31u;\n"
+         "   uint color_id  = facedata.z;\n"
+
+         #ifndef STBVOX_CONFIG_PREFER_TEXBUFFER
+            // load from uniforms / texture buffers
+            "   vec3 texgen_s = texgen[texprojid];\n"
+            "   vec3 texgen_t = texgen[texprojid+32u];\n"
+            "   float tex1_scale = texscale[tex1_id & 63u].x;\n"
+            "   vec4 color = color_table[color_id & 63u];\n"
+            #ifndef STBVOX_CONFIG_DISABLE_TEX2
+            "   vec4 tex2_props = texscale[tex2_id & 63u];\n"
+            #endif
+         #else
+            "   vec3 texgen_s = texelFetch(texgen, int(texprojid)).xyz;\n"
+            "   vec3 texgen_t = texelFetch(texgen, int(texprojid+32u)).xyz;\n"
+            "   float tex1_scale = texelFetch(texscale, int(tex1_id & 127u)).x;\n"
+            "   vec4 color = texelFetch(color_table, int(color_id & 63u));\n"
+            #ifndef STBVOX_CONFIG_DISABLE_TEX2
+            "   vec4 tex2_props = texelFetch(texscale, int(tex1_id & 127u));\n"
+            #endif
+         #endif
+
+         #ifndef STBVOX_CONFIG_DISABLE_TEX2
+         "   float tex2_scale = tex2_props.y;\n"
+         "   bool texblend_mode = tex2_props.z != 0.0;\n"
+         #endif
+         "   vec2 texcoord;\n"
+         "   vec3 texturespace_pos = voxelspace_pos + transform[2].xyz;\n"
+         "   texcoord.s = dot(texturespace_pos, texgen_s);\n"
+         "   texcoord.t = dot(texturespace_pos, texgen_t);\n"
+
+         "   vec2  texcoord_1 = tex1_scale * texcoord;\n"
+         #ifndef STBVOX_CONFIG_DISABLE_TEX2
+         "   vec2  texcoord_2 = tex2_scale * texcoord;\n"
+         #endif
+
+         #ifdef STBVOX_CONFIG_TEX1_EDGE_CLAMP
+         "   texcoord_1 = texcoord_1 - floor(texcoord_1);\n"
+         "   vec4 tex1 = textureGrad(tex_array[0], vec3(texcoord_1, float(tex1_id)), dFdx(tex1_scale*texcoord), dFdy(tex1_scale*texcoord));\n"
+         #else
+         "   vec4 tex1 = texture(tex_array[0], vec3(texcoord_1, float(tex1_id)));\n"
+         #endif
+
+         #ifndef STBVOX_CONFIG_DISABLE_TEX2
+         #ifdef STBVOX_CONFIG_TEX2_EDGE_CLAMP
+         "   texcoord_2 = texcoord_2 - floor(texcoord_2);\n"
+         "   vec4 tex2 = textureGrad(tex_array[0], vec3(texcoord_2, float(tex2_id)), dFdx(tex2_scale*texcoord), dFdy(tex2_scale*texcoord));\n"
+         #else
+         "   vec4 tex2 = texture(tex_array[1], vec3(texcoord_2, float(tex2_id)));\n"
+         #endif
+         #endif
+
+         "   bool emissive = (color.a > 1.0);\n"
+         "   color.a = min(color.a, 1.0);\n"
+
+         // recolor textures
+         "   if ((color_id &  64u) != 0u) tex1.rgba *= color.rgba;\n"
+         "   fragment_alpha = tex1.a;\n"
+         #ifndef STBVOX_CONFIG_DISABLE_TEX2
+            "   if ((color_id & 128u) != 0u) tex2.rgba *= color.rgba;\n"
+
+            #ifdef STBVOX_CONFIG_PREMULTIPLIED_ALPHA
+            "   tex2.rgba *= texlerp;\n"
+            #else
+            "   tex2.a *= texlerp;\n"
+            #endif
+
+            "   if (texblend_mode)\n"
+            "      albedo = tex1.xyz * rlerp(tex2.a, vec3(1.0,1.0,1.0), 2.0*tex2.xyz);\n"
+            "   else {\n"
+            #ifdef STBVOX_CONFIG_PREMULTIPLIED_ALPHA
+            "      albedo = (1.0-tex2.a)*tex1.xyz + tex2.xyz;\n"
+            #else
+            "      albedo = rlerp(tex2.a, tex1.xyz, tex2.xyz);\n"
+            #endif
+            "      fragment_alpha = tex1.a*(1-tex2.a)+tex2.a;\n"
+            "   }\n"
+         #else
+            "      albedo = tex1.xyz;\n"
+         #endif
+
+      #else // UNTEXTURED
+         "   vec4 color;"
+         "   color.xyz = vec3(facedata.xyz) / 255.0;\n"
+         "   bool emissive = false;\n"
+         "   albedo = color.xyz;\n"
+         "   fragment_alpha = 1.0;\n"
+      #endif
+
+      #ifdef STBVOX_ICONFIG_VARYING_VERTEX_NORMALS
+         // currently, there are no modes that trigger this path; idea is that there
+         // could be a couple of bits per vertex to perturb the normal to e.g. get curved look
+         "   vec3 normal = normalize(vnormal);\n"
+      #else
+         "   vec3 normal = vnormal;\n"
+      #endif
+
+      "   vec3 ambient_color = dot(normal, ambient[0].xyz) * ambient[1].xyz + ambient[2].xyz;\n"
+
+      "   ambient_color = clamp(ambient_color, 0.0, 1.0);"
+      "   ambient_color *= amb_occ;\n"
+
+      "   vec3 lit_color;\n"
+      "   if (!emissive)\n"
+      #if defined(STBVOX_ICONFIG_LIGHTING) || defined(STBVOX_CONFIG_LIGHTING_SIMPLE)
+         "      lit_color = compute_lighting(voxelspace_pos + transform[1], normal, albedo, ambient_color);\n"
+      #else
+         "      lit_color = albedo * ambient_color ;\n"
+      #endif
+      "   else\n"
+      "      lit_color = albedo;\n"
+
+      #if defined(STBVOX_ICONFIG_FOG) || defined(STBVOX_CONFIG_FOG_SMOOTHSTEP)
+         "   vec3 dist = voxelspace_pos + (transform[1] - camera_pos.xyz);\n"
+         "   lit_color = compute_fog(lit_color, dist, fragment_alpha);\n"
+      #endif
+
+      #ifdef STBVOX_CONFIG_UNPREMULTIPLY
+      "   vec4 final_color = vec4(lit_color/fragment_alpha, fragment_alpha);\n"
+      #else
+      "   vec4 final_color = vec4(lit_color, fragment_alpha);\n"
+      #endif
+      "   outcolor = final_color;\n"
+      "}\n"
+
+   #ifdef STBVOX_CONFIG_LIGHTING_SIMPLE
+      "\n"
+      "uniform vec3 light_source[2];\n"
+      "vec3 compute_lighting(vec3 pos, vec3 norm, vec3 albedo, vec3 ambient)\n"
+      "{\n"
+      "   vec3 light_dir = light_source[0] - pos;\n"
+      "   float lambert = dot(light_dir, norm) / dot(light_dir, light_dir);\n"
+      "   vec3 diffuse = clamp(light_source[1] * clamp(lambert, 0.0, 1.0), 0.0, 1.0);\n"
+      "   return (diffuse + ambient) * albedo;\n"
+      "}\n"
+   #endif
+
+   #ifdef STBVOX_CONFIG_FOG_SMOOTHSTEP
+      "\n"
+      "vec3 compute_fog(vec3 color, vec3 relative_pos, float fragment_alpha)\n"
+      "{\n"
+      "   float f = dot(relative_pos,relative_pos)*ambient[3].w;\n"
+      //"   f = rlerp(f, -2,1);\n"
+      "   f = clamp(f, 0.0, 1.0);\n"
+      "   f = 3.0*f*f - 2.0*f*f*f;\n" // smoothstep
+      //"   f = f*f;\n"  // fade in more smoothly
+      #ifdef STBVOX_CONFIG_PREMULTIPLIED_ALPHA
+      "   return rlerp(f, color.xyz, ambient[3].xyz*fragment_alpha);\n"
+      #else
+      "   return rlerp(f, color.xyz, ambient[3].xyz);\n"
+      #endif
+      "}\n"
+   #endif
+};
+
+
+// still requires full alpha lookups, including tex2 if texblend is enabled
+static const char *stbvox_fragment_program_alpha_only =
+{
+   STBVOX_SHADER_VERSION
+
+   // vertex-shader output data
+   "flat in uvec4  facedata;\n"
+   "     in  vec3  voxelspace_pos;\n"
+   "     in float  texlerp;\n"
+
+   // per-buffer data
+   "uniform vec3 transform[3];\n"
+
+   #ifndef STBVOX_ICONFIG_UNTEXTURED
+      // generally constant data
+      "uniform sampler2DArray tex_array[2];\n"
+
+      #ifdef STBVOX_CONFIG_PREFER_TEXBUFFER
+         "uniform samplerBuffer texscale;\n"
+         "uniform samplerBuffer texgen;\n"
+      #else
+         "uniform vec4 texscale[64];\n" // instead of 128, to avoid running out of uniforms
+         "uniform vec3 texgen[64];\n"
+      #endif
+   #endif
+
+   "out vec4  outcolor;\n"
+
+   "void main()\n"
+   "{\n"
+   "   vec3 albedo;\n"
+   "   float fragment_alpha;\n"
+
+   #ifndef STBVOX_ICONFIG_UNTEXTURED
+      // unpack the values
+      "   uint tex1_id = facedata.x;\n"
+      "   uint tex2_id = facedata.y;\n"
+      "   uint texprojid = facedata.w & 31u;\n"
+      "   uint color_id  = facedata.z;\n"
+
+      #ifndef STBVOX_CONFIG_PREFER_TEXBUFFER
+         // load from uniforms / texture buffers
+         "   vec3 texgen_s = texgen[texprojid];\n"
+         "   vec3 texgen_t = texgen[texprojid+32u];\n"
+         "   float tex1_scale = texscale[tex1_id & 63u].x;\n"
+         "   vec4 color = color_table[color_id & 63u];\n"
+         "   vec4 tex2_props = texscale[tex2_id & 63u];\n"
+      #else
+         "   vec3 texgen_s = texelFetch(texgen, int(texprojid)).xyz;\n"
+         "   vec3 texgen_t = texelFetch(texgen, int(texprojid+32u)).xyz;\n"
+         "   float tex1_scale = texelFetch(texscale, int(tex1_id & 127u)).x;\n"
+         "   vec4 color = texelFetch(color_table, int(color_id & 63u));\n"
+         "   vec4 tex2_props = texelFetch(texscale, int(tex2_id & 127u));\n"
+      #endif
+
+      #ifndef STBVOX_CONFIG_DISABLE_TEX2
+      "   float tex2_scale = tex2_props.y;\n"
+      "   bool texblend_mode = tex2_props.z &((facedata.w & 128u) != 0u);\n"
+      #endif
+
+      "   color.a = min(color.a, 1.0);\n"
+
+      "   vec2 texcoord;\n"
+      "   vec3 texturespace_pos = voxelspace_pos + transform[2].xyz;\n"
+      "   texcoord.s = dot(texturespace_pos, texgen_s);\n"
+      "   texcoord.t = dot(texturespace_pos, texgen_t);\n"
+
+      "   vec2  texcoord_1 = tex1_scale * texcoord;\n"
+      "   vec2  texcoord_2 = tex2_scale * texcoord;\n"
+
+      #ifdef STBVOX_CONFIG_TEX1_EDGE_CLAMP
+      "   texcoord_1 = texcoord_1 - floor(texcoord_1);\n"
+      "   vec4 tex1 = textureGrad(tex_array[0], vec3(texcoord_1, float(tex1_id)), dFdx(tex1_scale*texcoord), dFdy(tex1_scale*texcoord));\n"
+      #else
+      "   vec4 tex1 = texture(tex_array[0], vec3(texcoord_1, float(tex1_id)));\n"
+      #endif
+
+      "   if ((color_id &  64u) != 0u) tex1.a *= color.a;\n"
+      "   fragment_alpha = tex1.a;\n"
+
+      #ifndef STBVOX_CONFIG_DISABLE_TEX2
+      "   if (!texblend_mode) {\n"
+         #ifdef STBVOX_CONFIG_TEX2_EDGE_CLAMP
+         "      texcoord_2 = texcoord_2 - floor(texcoord_2);\n"
+         "      vec4 tex2 = textureGrad(tex_array[0], vec3(texcoord_2, float(tex2_id)), dFdx(tex2_scale*texcoord), dFdy(tex2_scale*texcoord));\n"
+         #else
+         "      vec4 tex2 = texture(tex_array[1], vec3(texcoord_2, float(tex2_id)));\n"
+         #endif
+
+         "      tex2.a *= texlerp;\n"
+         "      if ((color_id & 128u) != 0u) tex2.rgba *= color.a;\n"
+         "      fragment_alpha = tex1.a*(1-tex2.a)+tex2.a;\n"
+         "}\n"
+      "\n"
+      #endif
+
+   #else // UNTEXTURED
+      "   fragment_alpha = 1.0;\n"
+   #endif
+
+   "   outcolor = vec4(0.0, 0.0, 0.0, fragment_alpha);\n"
+   "}\n"
+};
+
+
+STBVXDEC char *stbvox_get_vertex_shader(void)
+{
+   return (char *) stbvox_vertex_program;
+}
+
+STBVXDEC char *stbvox_get_fragment_shader(void)
+{
+   return (char *) stbvox_fragment_program;
+}
+
+STBVXDEC char *stbvox_get_fragment_shader_alpha_only(void)
+{
+   return (char *) stbvox_fragment_program_alpha_only;
+}
+
+static float stbvox_dummy_transform[3][3];
+
+#ifdef STBVOX_CONFIG_PREFER_TEXBUFFER
+#define STBVOX_TEXBUF 1
+#else
+#define STBVOX_TEXBUF 0
+#endif
+
+static stbvox_uniform_info stbvox_uniforms[] =
+{
+   { STBVOX_UNIFORM_TYPE_sampler  ,  4,   1, (char*) "facearray"    , 0                           },
+   { STBVOX_UNIFORM_TYPE_vec3     , 12,   3, (char*) "transform"    , stbvox_dummy_transform[0]   },
+   { STBVOX_UNIFORM_TYPE_sampler  ,  4,   2, (char*) "tex_array"    , 0                           },
+   { STBVOX_UNIFORM_TYPE_vec4     , 16, 128, (char*) "texscale"     , stbvox_default_texscale[0] , STBVOX_TEXBUF },
+   { STBVOX_UNIFORM_TYPE_vec4     , 16,  64, (char*) "color_table"  , stbvox_default_palette[0]  , STBVOX_TEXBUF },
+   { STBVOX_UNIFORM_TYPE_vec3     , 12,  32, (char*) "normal_table" , stbvox_default_normals[0]   },
+   { STBVOX_UNIFORM_TYPE_vec3     , 12,  64, (char*) "texgen"       , stbvox_default_texgen[0][0], STBVOX_TEXBUF },
+   { STBVOX_UNIFORM_TYPE_vec4     , 16,   4, (char*) "ambient"      , stbvox_default_ambient[0]   },
+   { STBVOX_UNIFORM_TYPE_vec4     , 16,   1, (char*) "camera_pos"   , stbvox_dummy_transform[0]   },
+};
+
+STBVXDEC int stbvox_get_uniform_info(stbvox_uniform_info *info, int uniform)
+{
+   if (uniform < 0 || uniform >= STBVOX_UNIFORM_count)
+      return 0;
+
+   *info = stbvox_uniforms[uniform];
+   return 1;
+}
+
+#define STBVOX_GET_GEO(geom_data)  ((geom_data) & 15)
+
+typedef struct
+{
+   unsigned char block:2;
+   unsigned char overlay:2;
+   unsigned char facerot:2;
+   unsigned char ecolor:2;
+} stbvox_rotate;
+
+typedef struct
+{
+   unsigned char x,y,z;
+} stbvox_pos;
+
+static unsigned char stbvox_rotate_face[6][4] =
+{
+   { 0,1,2,3 },
+   { 1,2,3,0 },
+   { 2,3,0,1 },
+   { 3,0,1,2 },
+   { 4,4,4,4 },
+   { 5,5,5,5 },
+};
+
+#define STBVOX_ROTATE(x,r)   stbvox_rotate_face[x][r] // (((x)+(r))&3)
+
+stbvox_mesh_face stbvox_compute_mesh_face_value(stbvox_mesh_maker *mm, stbvox_rotate rot, int face, int v_off, int normal)
+{
+   stbvox_mesh_face face_data = { 0 };
+   stbvox_block_type bt = mm->input.blocktype[v_off];
+   unsigned char bt_face = STBVOX_ROTATE(face, rot.block);
+   int facerot = rot.facerot;
+
+   #ifdef STBVOX_ICONFIG_UNTEXTURED
+   if (mm->input.rgb) {
+      face_data.tex1  = mm->input.rgb[v_off].r;
+      face_data.tex2  = mm->input.rgb[v_off].g;
+      face_data.color = mm->input.rgb[v_off].b;
+      face_data.face_info = (normal<<2);
+      return face_data;
+   }
+   #else
+   unsigned char color_face;
+
+   if (mm->input.color)
+      face_data.color = mm->input.color[v_off];
+
+   if (mm->input.block_tex1)
+      face_data.tex1 = mm->input.block_tex1[bt];
+   else if (mm->input.block_tex1_face)
+      face_data.tex1 = mm->input.block_tex1_face[bt][bt_face];
+   else
+      face_data.tex1 = bt;
+
+   if (mm->input.block_tex2)
+      face_data.tex2 = mm->input.block_tex2[bt];
+   else if (mm->input.block_tex2_face)
+      face_data.tex2 = mm->input.block_tex2_face[bt][bt_face];
+
+   if (mm->input.block_color) {
+      unsigned char mcol = mm->input.block_color[bt];
+      if (mcol)
+         face_data.color = mcol;
+   } else if (mm->input.block_color_face) {
+      unsigned char mcol = mm->input.block_color_face[bt][bt_face];
+      if (mcol)
+         face_data.color = mcol;
+   }
+
+   if (face <= STBVOX_FACE_south) {
+      if (mm->input.side_texrot)
+         facerot = mm->input.side_texrot[v_off] >> (2 * face);
+      else if (mm->input.block_side_texrot)
+         facerot = mm->input.block_side_texrot[v_off] >> (2 * bt_face);
+   }
+
+   if (mm->input.overlay) {
+      int over_face = STBVOX_ROTATE(face, rot.overlay);
+      unsigned char over = mm->input.overlay[v_off];
+      if (over) {
+         if (mm->input.overlay_tex1) {
+            unsigned char rep1 = mm->input.overlay_tex1[over][over_face];
+            if (rep1)
+               face_data.tex1 = rep1;
+         }
+         if (mm->input.overlay_tex2) {
+            unsigned char rep2 = mm->input.overlay_tex2[over][over_face];
+            if (rep2)
+               face_data.tex2 = rep2;
+         }
+         if (mm->input.overlay_color) {
+            unsigned char rep3 = mm->input.overlay_color[over][over_face];
+            if (rep3)
+               face_data.color = rep3;
+         }
+
+         if (mm->input.overlay_side_texrot && face <= STBVOX_FACE_south)
+            facerot = mm->input.overlay_side_texrot[over] >> (2*over_face);
+      }
+   }
+
+   if (mm->input.tex2_for_tex1)
+      face_data.tex2 = mm->input.tex2_for_tex1[face_data.tex1];
+   if (mm->input.tex2)
+      face_data.tex2 = mm->input.tex2[v_off];
+   if (mm->input.tex2_replace) {
+      if (mm->input.tex2_facemask[v_off] & (1 << face))
+         face_data.tex2 = mm->input.tex2_replace[v_off];
+   }
+
+   color_face = STBVOX_ROTATE(face, rot.ecolor);
+   if (mm->input.extended_color) {
+      unsigned char ec = mm->input.extended_color[v_off];
+      if (mm->input.ecolor_facemask[ec] & (1 << color_face))
+         face_data.color = mm->input.ecolor_color[ec];
+   }
+
+   if (mm->input.color2) {
+      if (mm->input.color2_facemask[v_off] & (1 << color_face))
+         face_data.color = mm->input.color2[v_off];
+      if (mm->input.color3 && (mm->input.color3_facemask[v_off] & (1 << color_face)))
+         face_data.color = mm->input.color3[v_off];
+   }
+   #endif
+
+   face_data.face_info = (normal<<2) + facerot;
+   return face_data;
+}
+
+// these are the types of faces each block can have
+enum
+{
+   STBVOX_FT_none    ,
+   STBVOX_FT_upper   ,
+   STBVOX_FT_lower   ,
+   STBVOX_FT_solid   ,
+   STBVOX_FT_diag_012,
+   STBVOX_FT_diag_023,
+   STBVOX_FT_diag_013,
+   STBVOX_FT_diag_123,
+   STBVOX_FT_force   , // can't be covered up, used for internal faces, also hides nothing
+   STBVOX_FT_partial , // only covered by solid, never covers anything else
+
+   STBVOX_FT_count
+};
+
+static unsigned char stbvox_face_lerp[6] = { 0,2,0,2,4,4 };
+static unsigned char stbvox_vert3_lerp[5] = { 0,3,6,9,12 };
+static unsigned char stbvox_vert_lerp_for_face_lerp[4] = { 0, 4, 7, 7 };
+static unsigned char stbvox_face3_lerp[6] = { 0,3,6,9,12,14 };
+static unsigned char stbvox_vert_lerp_for_simple[4] = { 0,2,5,7 };
+static unsigned char stbvox_face3_updown[8] = { 0,2,5,7,0,2,5,7 }; // ignore top bit
+
+// vertex offsets for face vertices
+static unsigned char stbvox_vertex_vector[6][4][3] =
+{
+   { { 1,0,1 }, { 1,1,1 }, { 1,1,0 }, { 1,0,0 } }, // east
+   { { 1,1,1 }, { 0,1,1 }, { 0,1,0 }, { 1,1,0 } }, // north
+   { { 0,1,1 }, { 0,0,1 }, { 0,0,0 }, { 0,1,0 } }, // west
+   { { 0,0,1 }, { 1,0,1 }, { 1,0,0 }, { 0,0,0 } }, // south
+   { { 0,1,1 }, { 1,1,1 }, { 1,0,1 }, { 0,0,1 } }, // up
+   { { 0,0,0 }, { 1,0,0 }, { 1,1,0 }, { 0,1,0 } }, // down
+};
+
+// stbvox_vertex_vector, but read coordinates as binary numbers, zyx
+static unsigned char stbvox_vertex_selector[6][4] =
+{
+   { 5,7,3,1 },
+   { 7,6,2,3 },
+   { 6,4,0,2 },
+   { 4,5,1,0 },
+   { 6,7,5,4 },
+   { 0,1,3,2 },
+};
+
+static stbvox_mesh_vertex stbvox_vmesh_delta_normal[6][4] =
+{
+   {  stbvox_vertex_encode(1,0,1,0,0) ,
+      stbvox_vertex_encode(1,1,1,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0)  },
+   {  stbvox_vertex_encode(1,1,1,0,0) ,
+      stbvox_vertex_encode(0,1,1,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0)  },
+   {  stbvox_vertex_encode(0,1,1,0,0) ,
+      stbvox_vertex_encode(0,0,1,0,0) ,
+      stbvox_vertex_encode(0,0,0,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0)  },
+   {  stbvox_vertex_encode(0,0,1,0,0) ,
+      stbvox_vertex_encode(1,0,1,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(0,0,0,0,0)  },
+   {  stbvox_vertex_encode(0,1,1,0,0) ,
+      stbvox_vertex_encode(1,1,1,0,0) ,
+      stbvox_vertex_encode(1,0,1,0,0) ,
+      stbvox_vertex_encode(0,0,1,0,0)  },
+   {  stbvox_vertex_encode(0,0,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0)  }
+};
+
+static stbvox_mesh_vertex stbvox_vmesh_pre_vheight[6][4] =
+{
+   {  stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0)  },
+   {  stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0)  },
+   {  stbvox_vertex_encode(0,1,0,0,0) ,
+      stbvox_vertex_encode(0,0,0,0,0) ,
+      stbvox_vertex_encode(0,0,0,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0)  },
+   {  stbvox_vertex_encode(0,0,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(0,0,0,0,0)  },
+   {  stbvox_vertex_encode(0,1,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(0,0,0,0,0)  },
+   {  stbvox_vertex_encode(0,0,0,0,0) ,
+      stbvox_vertex_encode(1,0,0,0,0) ,
+      stbvox_vertex_encode(1,1,0,0,0) ,
+      stbvox_vertex_encode(0,1,0,0,0)  }
+};
+
+static stbvox_mesh_vertex stbvox_vmesh_delta_half_z[6][4] =
+{
+   { stbvox_vertex_encode(1,0,2,0,0) ,
+     stbvox_vertex_encode(1,1,2,0,0) ,
+     stbvox_vertex_encode(1,1,0,0,0) ,
+     stbvox_vertex_encode(1,0,0,0,0)  },
+   { stbvox_vertex_encode(1,1,2,0,0) ,
+     stbvox_vertex_encode(0,1,2,0,0) ,
+     stbvox_vertex_encode(0,1,0,0,0) ,
+     stbvox_vertex_encode(1,1,0,0,0)  },
+   { stbvox_vertex_encode(0,1,2,0,0) ,
+     stbvox_vertex_encode(0,0,2,0,0) ,
+     stbvox_vertex_encode(0,0,0,0,0) ,
+     stbvox_vertex_encode(0,1,0,0,0)  },
+   { stbvox_vertex_encode(0,0,2,0,0) ,
+     stbvox_vertex_encode(1,0,2,0,0) ,
+     stbvox_vertex_encode(1,0,0,0,0) ,
+     stbvox_vertex_encode(0,0,0,0,0)  },
+   { stbvox_vertex_encode(0,1,2,0,0) ,
+     stbvox_vertex_encode(1,1,2,0,0) ,
+     stbvox_vertex_encode(1,0,2,0,0) ,
+     stbvox_vertex_encode(0,0,2,0,0)  },
+   { stbvox_vertex_encode(0,0,0,0,0) ,
+     stbvox_vertex_encode(1,0,0,0,0) ,
+     stbvox_vertex_encode(1,1,0,0,0) ,
+     stbvox_vertex_encode(0,1,0,0,0)  }
+};
+
+static stbvox_mesh_vertex stbvox_vmesh_crossed_pair[6][4] =
+{
+   { stbvox_vertex_encode(1,0,2,0,0) ,
+     stbvox_vertex_encode(0,1,2,0,0) ,
+     stbvox_vertex_encode(0,1,0,0,0) ,
+     stbvox_vertex_encode(1,0,0,0,0)  },
+   { stbvox_vertex_encode(1,1,2,0,0) ,
+     stbvox_vertex_encode(0,0,2,0,0) ,
+     stbvox_vertex_encode(0,0,0,0,0) ,
+     stbvox_vertex_encode(1,1,0,0,0)  },
+   { stbvox_vertex_encode(0,1,2,0,0) ,
+     stbvox_vertex_encode(1,0,2,0,0) ,
+     stbvox_vertex_encode(1,0,0,0,0) ,
+     stbvox_vertex_encode(0,1,0,0,0)  },
+   { stbvox_vertex_encode(0,0,2,0,0) ,
+     stbvox_vertex_encode(1,1,2,0,0) ,
+     stbvox_vertex_encode(1,1,0,0,0) ,
+     stbvox_vertex_encode(0,0,0,0,0)  },
+   // not used, so we leave it non-degenerate to make sure it doesn't get gen'd accidentally
+   { stbvox_vertex_encode(0,1,2,0,0) ,
+     stbvox_vertex_encode(1,1,2,0,0) ,
+     stbvox_vertex_encode(1,0,2,0,0) ,
+     stbvox_vertex_encode(0,0,2,0,0)  },
+   { stbvox_vertex_encode(0,0,0,0,0) ,
+     stbvox_vertex_encode(1,0,0,0,0) ,
+     stbvox_vertex_encode(1,1,0,0,0) ,
+     stbvox_vertex_encode(0,1,0,0,0)  }
+};
+
+#define STBVOX_MAX_GEOM     16
+#define STBVOX_NUM_ROTATION  4
+
+// this is used to determine if a face is ever generated at all
+static unsigned char stbvox_hasface[STBVOX_MAX_GEOM][STBVOX_NUM_ROTATION] =
+{
+   { 0,0,0,0 }, // empty
+   { 0,0,0,0 }, // knockout
+   { 63,63,63,63 }, // solid
+   { 63,63,63,63 }, // transp
+   { 63,63,63,63 }, // slab
+   { 63,63,63,63 }, // slab
+   { 1|2|4|48, 8|1|2|48, 4|8|1|48, 2|4|8|48, }, // floor slopes
+   { 1|2|4|48, 8|1|2|48, 4|8|1|48, 2|4|8|48, }, // ceil slopes
+   { 47,47,47,47 }, // wall-projected diagonal with down face
+   { 31,31,31,31 }, // wall-projected diagonal with up face
+   { 63,63,63,63 }, // crossed-pair has special handling, but avoid early-out
+   { 63,63,63,63 }, // force
+   { 63,63,63,63 }, // vheight
+   { 63,63,63,63 }, // vheight
+   { 63,63,63,63 }, // vheight
+   { 63,63,63,63 }, // vheight
+};
+
+// this determines which face type above is visible on each side of the geometry
+static unsigned char stbvox_facetype[STBVOX_GEOM_count][6] =
+{
+   { 0, },  // STBVOX_GEOM_empty
+   { STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid }, // knockout
+   { STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid, STBVOX_FT_solid }, // solid
+   { STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force }, // transp
+
+   { STBVOX_FT_upper, STBVOX_FT_upper, STBVOX_FT_upper, STBVOX_FT_upper, STBVOX_FT_solid, STBVOX_FT_force },
+   { STBVOX_FT_lower, STBVOX_FT_lower, STBVOX_FT_lower, STBVOX_FT_lower, STBVOX_FT_force, STBVOX_FT_solid },
+   { STBVOX_FT_diag_123, STBVOX_FT_solid, STBVOX_FT_diag_023, STBVOX_FT_none, STBVOX_FT_force, STBVOX_FT_solid },
+   { STBVOX_FT_diag_012, STBVOX_FT_solid, STBVOX_FT_diag_013, STBVOX_FT_none, STBVOX_FT_solid, STBVOX_FT_force },
+
+   { STBVOX_FT_diag_123, STBVOX_FT_solid, STBVOX_FT_diag_023, STBVOX_FT_force, STBVOX_FT_none, STBVOX_FT_solid },
+   { STBVOX_FT_diag_012, STBVOX_FT_solid, STBVOX_FT_diag_013, STBVOX_FT_force, STBVOX_FT_solid, STBVOX_FT_none },
+   { STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, 0,0 }, // crossed pair
+   { STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force, STBVOX_FT_force }, // GEOM_force
+
+   { STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial, STBVOX_FT_force, STBVOX_FT_solid }, // floor vheight, all neighbors forced
+   { STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial, STBVOX_FT_force, STBVOX_FT_solid }, // floor vheight, all neighbors forced
+   { STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial, STBVOX_FT_solid, STBVOX_FT_force }, // ceil vheight, all neighbors forced
+   { STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial,STBVOX_FT_partial, STBVOX_FT_solid, STBVOX_FT_force }, // ceil vheight, all neighbors forced
+};
+
+// This table indicates what normal to use for the "up" face of a sloped geom
+// @TODO this could be done with math given the current arrangement of the enum, but let's not require it
+static unsigned char stbvox_floor_slope_for_rot[4] =
+{
+   STBVF_su,
+   STBVF_wu, // @TODO: why is this reversed from what it should be? this is a north-is-up face, so slope should be south&up
+   STBVF_nu,
+   STBVF_eu,
+};
+
+static unsigned char stbvox_ceil_slope_for_rot[4] =
+{
+   STBVF_sd,
+   STBVF_ed,
+   STBVF_nd,
+   STBVF_wd,
+};
+
+// this table indicates whether, for each pair of types above, a face is visible.
+// each value indicates whether a given type is visible for all neighbor types
+static unsigned short stbvox_face_visible[STBVOX_FT_count] =
+{
+   // we encode the table by listing which cases cause *obscuration*, and bitwise inverting that
+   // table is pre-shifted by 5 to save a shift when it's accessed
+   (unsigned short) ((~0x07ffu                                          )<<5),  // none is completely obscured by everything
+   (unsigned short) ((~((1u<<STBVOX_FT_solid) | (1<<STBVOX_FT_upper)   ))<<5),  // upper
+   (unsigned short) ((~((1u<<STBVOX_FT_solid) | (1<<STBVOX_FT_lower)   ))<<5),  // lower
+   (unsigned short) ((~((1u<<STBVOX_FT_solid)                          ))<<5),  // solid is only completely obscured only by solid
+   (unsigned short) ((~((1u<<STBVOX_FT_solid) | (1<<STBVOX_FT_diag_013)))<<5),  // diag012 matches diag013
+   (unsigned short) ((~((1u<<STBVOX_FT_solid) | (1<<STBVOX_FT_diag_123)))<<5),  // diag023 matches diag123
+   (unsigned short) ((~((1u<<STBVOX_FT_solid) | (1<<STBVOX_FT_diag_012)))<<5),  // diag013 matches diag012
+   (unsigned short) ((~((1u<<STBVOX_FT_solid) | (1<<STBVOX_FT_diag_023)))<<5),  // diag123 matches diag023
+   (unsigned short) ((~0u                                               )<<5),  // force is always rendered regardless, always forces neighbor
+   (unsigned short) ((~((1u<<STBVOX_FT_solid)                          ))<<5),  // partial is only completely obscured only by solid
+};
+
+// the vertex heights of the block types, in binary vertex order (zyx):
+// lower: SW, SE, NW, NE; upper: SW, SE, NW, NE
+static stbvox_mesh_vertex stbvox_geometry_vheight[8][8] =
+{
+   #define STBVOX_HEIGHTS(a,b,c,d,e,f,g,h) \
+     { stbvox_vertex_encode(0,0,a,0,0),  \
+       stbvox_vertex_encode(0,0,b,0,0),  \
+       stbvox_vertex_encode(0,0,c,0,0),  \
+       stbvox_vertex_encode(0,0,d,0,0),  \
+       stbvox_vertex_encode(0,0,e,0,0),  \
+       stbvox_vertex_encode(0,0,f,0,0),  \
+       stbvox_vertex_encode(0,0,g,0,0),  \
+       stbvox_vertex_encode(0,0,h,0,0) }
+
+   STBVOX_HEIGHTS(0,0,0,0, 2,2,2,2),
+   STBVOX_HEIGHTS(0,0,0,0, 2,2,2,2),
+   STBVOX_HEIGHTS(0,0,0,0, 2,2,2,2),
+   STBVOX_HEIGHTS(0,0,0,0, 2,2,2,2),
+   STBVOX_HEIGHTS(1,1,1,1, 2,2,2,2),
+   STBVOX_HEIGHTS(0,0,0,0, 1,1,1,1),
+   STBVOX_HEIGHTS(0,0,0,0, 0,0,2,2),
+   STBVOX_HEIGHTS(2,2,0,0, 2,2,2,2),
+};
+
+// rotate vertices defined as [z][y][x] coords
+static unsigned char stbvox_rotate_vertex[8][4] =
+{
+   { 0,1,3,2 }, // zyx=000
+   { 1,3,2,0 }, // zyx=001
+   { 2,0,1,3 }, // zyx=010
+   { 3,2,0,1 }, // zyx=011
+   { 4,5,7,6 }, // zyx=100
+   { 5,7,6,4 }, // zyx=101
+   { 6,4,5,7 }, // zyx=110
+   { 7,6,4,5 }, // zyx=111
+};
+
+#ifdef STBVOX_CONFIG_OPTIMIZED_VHEIGHT
+// optimized vheight generates a single normal over the entire face, even if it's not planar
+static unsigned char stbvox_optimized_face_up_normal[4][4][4][4] =
+{
+   {
+      {
+         { STBVF_u   , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+         { STBVF_nw_u, STBVF_nu  , STBVF_nu  , STBVF_ne_u, },
+         { STBVF_nw_u, STBVF_nu  , STBVF_nu  , STBVF_nu  , },
+         { STBVF_nw_u, STBVF_nw_u, STBVF_nu  , STBVF_nu  , },
+      },{
+         { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+         { STBVF_u   , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+         { STBVF_nw_u, STBVF_nu  , STBVF_nu  , STBVF_ne_u, },
+         { STBVF_nw_u, STBVF_nu  , STBVF_nu  , STBVF_nu  , },
+      },{
+         { STBVF_eu  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+         { STBVF_u   , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+         { STBVF_nw_u, STBVF_nu  , STBVF_nu  , STBVF_ne_u, },
+      },{
+         { STBVF_eu  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_eu  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+         { STBVF_u   , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+      },
+   },{
+      {
+         { STBVF_sw_u, STBVF_u   , STBVF_ne_u, STBVF_ne_u, },
+         { STBVF_wu  , STBVF_nw_u, STBVF_nu  , STBVF_nu  , },
+         { STBVF_wu  , STBVF_nw_u, STBVF_nu  , STBVF_nu  , },
+         { STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, STBVF_nu  , },
+      },{
+         { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_u   , STBVF_ne_u, STBVF_ne_u, },
+         { STBVF_wu  , STBVF_nw_u, STBVF_nu  , STBVF_nu  , },
+         { STBVF_wu  , STBVF_nw_u, STBVF_nu  , STBVF_nu  , },
+      },{
+         { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_u   , STBVF_ne_u, STBVF_ne_u, },
+         { STBVF_wu  , STBVF_nw_u, STBVF_nu  , STBVF_nu  , },
+      },{
+         { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_u   , STBVF_ne_u, STBVF_ne_u, },
+      },
+   },{
+      {
+         { STBVF_sw_u, STBVF_sw_u, STBVF_u   , STBVF_ne_u, },
+         { STBVF_wu  , STBVF_wu  , STBVF_nw_u, STBVF_nu  , },
+         { STBVF_wu  , STBVF_wu  , STBVF_nw_u, STBVF_nu  , },
+         { STBVF_wu  , STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, },
+      },{
+         { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_sw_u, STBVF_u   , STBVF_ne_u, },
+         { STBVF_wu  , STBVF_wu  , STBVF_nw_u, STBVF_nu  , },
+         { STBVF_wu  , STBVF_wu  , STBVF_nw_u, STBVF_nu  , },
+      },{
+         { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_sw_u, STBVF_u   , STBVF_ne_u, },
+         { STBVF_wu  , STBVF_wu  , STBVF_nw_u, STBVF_nu  , },
+      },{
+         { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+         { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_sw_u, STBVF_u   , STBVF_ne_u, },
+      },
+   },{
+      {
+         { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_u   , },
+         { STBVF_sw_u, STBVF_wu  , STBVF_wu  , STBVF_nw_u, },
+         { STBVF_wu  , STBVF_wu  , STBVF_wu  , STBVF_nw_u, },
+         { STBVF_wu  , STBVF_wu  , STBVF_nw_u, STBVF_nw_u, },
+      },{
+         { STBVF_sw_u, STBVF_su  , STBVF_su  , STBVF_su  , },
+         { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_u   , },
+         { STBVF_sw_u, STBVF_wu  , STBVF_wu  , STBVF_nw_u, },
+         { STBVF_wu  , STBVF_wu  , STBVF_wu  , STBVF_nw_u, },
+      },{
+         { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_su  , STBVF_su  , STBVF_su  , },
+         { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_u   , },
+         { STBVF_sw_u, STBVF_wu  , STBVF_wu  , STBVF_nw_u, },
+      },{
+         { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+         { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+         { STBVF_sw_u, STBVF_su  , STBVF_su  , STBVF_su  , },
+         { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_u   , },
+      },
+   },
+};
+#else
+// which normal to use for a given vheight that's planar
+// @TODO: this table was constructed by hand and may have bugs
+//                                 nw se sw
+static unsigned char stbvox_planar_face_up_normal[4][4][4] =
+{
+   {                                                      // sw,se,nw,ne;  ne = se+nw-sw
+      { STBVF_u   , 0         , 0         , 0          }, //  0,0,0,0; 1,0,0,-1; 2,0,0,-2; 3,0,0,-3;
+      { STBVF_u   , STBVF_u   , 0         , 0          }, //  0,1,0,1; 1,1,0, 0; 2,1,0,-1; 3,1,0,-2;
+      { STBVF_wu  , STBVF_nw_u, STBVF_nu  , 0          }, //  0,2,0,2; 1,2,0, 1; 2,2,0, 0; 3,2,0,-1;
+      { STBVF_wu  , STBVF_nw_u, STBVF_nw_u, STBVF_nu   }, //  0,3,0,3; 1,3,0, 2; 2,3,0, 1; 3,3,0, 0;
+   },{
+      { STBVF_u   , STBVF_u   , 0         , 0          }, //  0,0,1,1; 1,0,1, 0; 2,0,1,-1; 3,0,1,-2;
+      { STBVF_sw_u, STBVF_u   , STBVF_ne_u, 0          }, //  0,1,1,2; 1,1,1, 1; 2,1,1, 0; 3,1,1,-1;
+      { STBVF_sw_u, STBVF_u   , STBVF_u   , STBVF_ne_u }, //  0,2,1,3; 1,2,1, 2; 2,2,1, 1; 3,2,1, 0;
+      { 0         , STBVF_wu  , STBVF_nw_u, STBVF_nu   }, //  0,3,1,4; 1,3,1, 3; 2,3,1, 2; 3,3,1, 1;
+   },{
+      { STBVF_su  , STBVF_se_u, STBVF_eu  , 0          }, //  0,0,2,2; 1,0,2, 1; 2,0,2, 0; 3,0,2,-1;
+      { STBVF_sw_u, STBVF_u   , STBVF_u   , STBVF_ne_u }, //  0,1,2,3; 1,1,2, 2; 2,1,2, 1; 3,1,2, 0;
+      { 0         , STBVF_sw_u, STBVF_u   , STBVF_ne_u }, //  0,2,2,4; 1,2,2, 3; 2,2,2, 2; 3,2,2, 1;
+      { 0         , 0         , STBVF_u   , STBVF_u    }, //  0,3,2,5; 1,3,2, 4; 2,3,2, 3; 3,3,2, 2;
+   },{
+      { STBVF_su  , STBVF_se_u, STBVF_se_u, STBVF_eu   }, //  0,0,3,3; 1,0,3, 2; 2,0,3, 1; 3,0,3, 0;
+      { 0         , STBVF_su  , STBVF_se_u, STBVF_eu   }, //  0,1,3,4; 1,1,3, 3; 2,1,3, 2; 3,1,3, 1;
+      { 0         , 0         , STBVF_u   , STBVF_u    }, //  0,2,3,5; 1,2,3, 4; 2,2,3, 3; 3,2,3, 2;
+      { 0         , 0         , 0         , STBVF_u    }, //  0,3,3,6; 1,3,3, 5; 2,3,3, 4; 3,3,3, 3;
+   }
+};
+
+// these tables were constructed automatically using a variant of the code
+// below; however, they seem wrong, so who knows
+static unsigned char stbvox_face_up_normal_012[4][4][4] =
+{
+   {
+      { STBVF_u   , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_wu  , STBVF_nu  , STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_wu  , STBVF_nw_u, STBVF_nu  , STBVF_ne_u, },
+      { STBVF_wu  , STBVF_nw_u, STBVF_nw_u, STBVF_nu  , },
+   },{
+      { STBVF_su  , STBVF_eu  , STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_sw_u, STBVF_u   , STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_sw_u, STBVF_wu  , STBVF_nu  , STBVF_ne_u, },
+      { STBVF_sw_u, STBVF_wu  , STBVF_nw_u, STBVF_nu  , },
+   },{
+      { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+      { STBVF_sw_u, STBVF_su  , STBVF_eu  , STBVF_ne_u, },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_u   , STBVF_ne_u, },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_wu  , STBVF_nu  , },
+   },{
+      { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+      { STBVF_sw_u, STBVF_su  , STBVF_eu  , STBVF_eu  , },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_su  , STBVF_eu  , },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_u   , },
+   }
+};
+
+static unsigned char stbvox_face_up_normal_013[4][4][4] =
+{
+   {
+      { STBVF_u   , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+      { STBVF_nw_u, STBVF_nu  , STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_nw_u, STBVF_nw_u, STBVF_nu  , STBVF_ne_u, },
+      { STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, STBVF_nu  , },
+   },{
+      { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+      { STBVF_wu  , STBVF_u   , STBVF_eu  , STBVF_eu  , },
+      { STBVF_nw_u, STBVF_nw_u, STBVF_nu  , STBVF_ne_u, },
+      { STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, STBVF_nu  , },
+   },{
+      { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+      { STBVF_sw_u, STBVF_su  , STBVF_eu  , STBVF_eu  , },
+      { STBVF_wu  , STBVF_wu  , STBVF_u   , STBVF_eu  , },
+      { STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, STBVF_nu  , },
+   },{
+      { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_eu  , },
+      { STBVF_sw_u, STBVF_su  , STBVF_su  , STBVF_su  , },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_su  , STBVF_eu  , },
+      { STBVF_wu  , STBVF_wu  , STBVF_wu  , STBVF_u   , },
+   }
+};
+
+static unsigned char stbvox_face_up_normal_023[4][4][4] =
+{
+   {
+      { STBVF_u   , STBVF_nu  , STBVF_nu  , STBVF_nu  , },
+      { STBVF_eu  , STBVF_eu  , STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+      { STBVF_eu  , STBVF_eu  , STBVF_eu  , STBVF_eu  , },
+   },{
+      { STBVF_wu  , STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, },
+      { STBVF_su  , STBVF_u   , STBVF_nu  , STBVF_nu  , },
+      { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+      { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+   },{
+      { STBVF_wu  , STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, },
+      { STBVF_sw_u, STBVF_wu  , STBVF_nw_u, STBVF_nw_u, },
+      { STBVF_su  , STBVF_su  , STBVF_u   , STBVF_nu  , },
+      { STBVF_su  , STBVF_su  , STBVF_eu  , STBVF_eu  , },
+   },{
+      { STBVF_wu  , STBVF_nw_u, STBVF_nw_u, STBVF_nw_u, },
+      { STBVF_sw_u, STBVF_wu  , STBVF_nw_u, STBVF_nw_u, },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_wu  , STBVF_nw_u, },
+      { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_u   , },
+   }
+};
+
+static unsigned char stbvox_face_up_normal_123[4][4][4] =
+{
+   {
+      { STBVF_u   , STBVF_nu  , STBVF_nu  , STBVF_nu  , },
+      { STBVF_eu  , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_eu  , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_eu  , STBVF_ne_u, STBVF_ne_u, STBVF_ne_u, },
+   },{
+      { STBVF_sw_u, STBVF_wu  , STBVF_nw_u, STBVF_nw_u, },
+      { STBVF_su  , STBVF_u   , STBVF_nu  , STBVF_nu  , },
+      { STBVF_eu  , STBVF_eu  , STBVF_ne_u, STBVF_ne_u, },
+      { STBVF_eu  , STBVF_eu  , STBVF_ne_u, STBVF_ne_u, },
+   },{
+      { STBVF_sw_u, STBVF_sw_u, STBVF_wu  , STBVF_nw_u, },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_wu  , STBVF_nw_u, },
+      { STBVF_su  , STBVF_su  , STBVF_u   , STBVF_nu  , },
+      { STBVF_su  , STBVF_eu  , STBVF_eu  , STBVF_ne_u, },
+   },{
+      { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_wu  , },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_wu  , },
+      { STBVF_sw_u, STBVF_sw_u, STBVF_sw_u, STBVF_wu  , },
+      { STBVF_su  , STBVF_su  , STBVF_su  , STBVF_u   , },
+   }
+};
+#endif
+
+void stbvox_get_quad_vertex_pointer(stbvox_mesh_maker *mm, int mesh, stbvox_mesh_vertex **vertices, stbvox_mesh_face face)
+{
+   char *p = mm->output_cur[mesh][0];
+   int step = mm->output_step[mesh][0];
+
+   // allocate a new quad from the mesh
+   vertices[0] = (stbvox_mesh_vertex *) p; p += step;
+   vertices[1] = (stbvox_mesh_vertex *) p; p += step;
+   vertices[2] = (stbvox_mesh_vertex *) p; p += step;
+   vertices[3] = (stbvox_mesh_vertex *) p; p += step;
+   mm->output_cur[mesh][0] = p;
+
+   // output the face
+   #ifdef STBVOX_ICONFIG_FACE_ATTRIBUTE
+      // write face as interleaved vertex data
+      *(stbvox_mesh_face *) (vertices[0]+1) = face;
+      *(stbvox_mesh_face *) (vertices[1]+1) = face;
+      *(stbvox_mesh_face *) (vertices[2]+1) = face;
+      *(stbvox_mesh_face *) (vertices[3]+1) = face;
+   #else
+      *(stbvox_mesh_face *) mm->output_cur[mesh][1] = face;
+      mm->output_cur[mesh][1] += 4;
+   #endif
+}
+
+void stbvox_make_mesh_for_face(stbvox_mesh_maker *mm, stbvox_rotate rot, int face, int v_off, stbvox_pos pos, stbvox_mesh_vertex vertbase, stbvox_mesh_vertex *face_coord, unsigned char mesh, int normal)
+{
+   stbvox_mesh_face face_data = stbvox_compute_mesh_face_value(mm,rot,face,v_off, normal);
+
+   // still need to compute ao & texlerp for each vertex
+
+   // first compute texlerp into p1
+   stbvox_mesh_vertex p1[4] = { 0 };
+
+   #if defined(STBVOX_CONFIG_DOWN_TEXLERP_PACKED) && defined(STBVOX_CONFIG_UP_TEXLERP_PACKED)
+      #define STBVOX_USE_PACKED(f) ((f) == STBVOX_FACE_up || (f) == STBVOX_FACE_down)
+   #elif defined(STBVOX_CONFIG_UP_TEXLERP_PACKED)
+      #define STBVOX_USE_PACKED(f) ((f) == STBVOX_FACE_up                           )
+   #elif defined(STBVOX_CONFIG_DOWN_TEXLERP_PACKED)
+      #define STBVOX_USE_PACKED(f) (                         (f) == STBVOX_FACE_down)
+   #endif
+
+   #if defined(STBVOX_CONFIG_DOWN_TEXLERP_PACKED) || defined(STBVOX_CONFIG_UP_TEXLERP_PACKED)
+   if (STBVOX_USE_PACKED(face)) {
+      if (!mm->input.packed_compact || 0==(mm->input.packed_compact[v_off]&16))
+         goto set_default;
+      p1[0] = (mm->input.packed_compact[v_off + mm->cube_vertex_offset[face][0]] >> 5);
+      p1[1] = (mm->input.packed_compact[v_off + mm->cube_vertex_offset[face][1]] >> 5);
+      p1[2] = (mm->input.packed_compact[v_off + mm->cube_vertex_offset[face][2]] >> 5);
+      p1[3] = (mm->input.packed_compact[v_off + mm->cube_vertex_offset[face][3]] >> 5);
+      p1[0] = stbvox_vertex_encode(0,0,0,0,p1[0]);
+      p1[1] = stbvox_vertex_encode(0,0,0,0,p1[1]);
+      p1[2] = stbvox_vertex_encode(0,0,0,0,p1[2]);
+      p1[3] = stbvox_vertex_encode(0,0,0,0,p1[3]);
+      goto skip;
+   }
+   #endif
+
+   if (mm->input.block_texlerp) {
+      stbvox_block_type bt = mm->input.blocktype[v_off];
+      unsigned char val = mm->input.block_texlerp[bt];
+      p1[0] = p1[1] = p1[2] = p1[3] = stbvox_vertex_encode(0,0,0,0,val);
+   } else if (mm->input.block_texlerp_face) {
+      stbvox_block_type bt = mm->input.blocktype[v_off];
+      unsigned char bt_face = STBVOX_ROTATE(face, rot.block);
+      unsigned char val = mm->input.block_texlerp_face[bt][bt_face];
+      p1[0] = p1[1] = p1[2] = p1[3] = stbvox_vertex_encode(0,0,0,0,val);
+   } else if (mm->input.texlerp_face3) {
+      unsigned char val = (mm->input.texlerp_face3[v_off] >> stbvox_face3_lerp[face]) & 7;
+      if (face >= STBVOX_FACE_up)
+         val = stbvox_face3_updown[val];
+      p1[0] = p1[1] = p1[2] = p1[3] = stbvox_vertex_encode(0,0,0,0,val);
+   } else if (mm->input.texlerp_simple) {
+      unsigned char val = mm->input.texlerp_simple[v_off];
+      unsigned char lerp_face = (val >> 2) & 7;
+      if (lerp_face == face) {
+         p1[0] = (mm->input.texlerp_simple[v_off + mm->cube_vertex_offset[face][0]] >> 5) & 7;
+         p1[1] = (mm->input.texlerp_simple[v_off + mm->cube_vertex_offset[face][1]] >> 5) & 7;
+         p1[2] = (mm->input.texlerp_simple[v_off + mm->cube_vertex_offset[face][2]] >> 5) & 7;
+         p1[3] = (mm->input.texlerp_simple[v_off + mm->cube_vertex_offset[face][3]] >> 5) & 7;
+         p1[0] = stbvox_vertex_encode(0,0,0,0,p1[0]);
+         p1[1] = stbvox_vertex_encode(0,0,0,0,p1[1]);
+         p1[2] = stbvox_vertex_encode(0,0,0,0,p1[2]);
+         p1[3] = stbvox_vertex_encode(0,0,0,0,p1[3]);
+      } else {
+         unsigned char base = stbvox_vert_lerp_for_simple[val&3];
+         p1[0] = p1[1] = p1[2] = p1[3] = stbvox_vertex_encode(0,0,0,0,base);
+      }
+   } else if (mm->input.texlerp) {
+      unsigned char facelerp = (mm->input.texlerp[v_off] >> stbvox_face_lerp[face]) & 3;
+      if (facelerp == STBVOX_TEXLERP_FACE_use_vert) {
+         if (mm->input.texlerp_vert3 && face != STBVOX_FACE_down) {
+            unsigned char shift = stbvox_vert3_lerp[face];
+            p1[0] = (mm->input.texlerp_vert3[mm->cube_vertex_offset[face][0]] >> shift) & 7;
+            p1[1] = (mm->input.texlerp_vert3[mm->cube_vertex_offset[face][1]] >> shift) & 7;
+            p1[2] = (mm->input.texlerp_vert3[mm->cube_vertex_offset[face][2]] >> shift) & 7;
+            p1[3] = (mm->input.texlerp_vert3[mm->cube_vertex_offset[face][3]] >> shift) & 7;
+         } else {
+            p1[0] = stbvox_vert_lerp_for_simple[mm->input.texlerp[mm->cube_vertex_offset[face][0]]>>6];
+            p1[1] = stbvox_vert_lerp_for_simple[mm->input.texlerp[mm->cube_vertex_offset[face][1]]>>6];
+            p1[2] = stbvox_vert_lerp_for_simple[mm->input.texlerp[mm->cube_vertex_offset[face][2]]>>6];
+            p1[3] = stbvox_vert_lerp_for_simple[mm->input.texlerp[mm->cube_vertex_offset[face][3]]>>6];
+         }
+         p1[0] = stbvox_vertex_encode(0,0,0,0,p1[0]);
+         p1[1] = stbvox_vertex_encode(0,0,0,0,p1[1]);
+         p1[2] = stbvox_vertex_encode(0,0,0,0,p1[2]);
+         p1[3] = stbvox_vertex_encode(0,0,0,0,p1[3]);
+      } else {
+         p1[0] = p1[1] = p1[2] = p1[3] = stbvox_vertex_encode(0,0,0,0,stbvox_vert_lerp_for_face_lerp[facelerp]);
+      }
+   } else {
+      #if defined(STBVOX_CONFIG_UP_TEXLERP_PACKED) || defined(STBVOX_CONFIG_DOWN_TEXLERP_PACKED)
+      set_default:
+      #endif
+      p1[0] = p1[1] = p1[2] = p1[3] = stbvox_vertex_encode(0,0,0,0,7); // @TODO make this configurable
+   }
+
+   #if defined(STBVOX_CONFIG_UP_TEXLERP_PACKED) || defined(STBVOX_CONFIG_DOWN_TEXLERP_PACKED)
+   skip:
+   #endif
+
+   // now compute lighting and store to vertices
+   {
+      stbvox_mesh_vertex *mv[4];
+      stbvox_get_quad_vertex_pointer(mm, mesh, mv, face_data);
+
+      if (mm->input.lighting) {
+         // @TODO: lighting at block centers, but not gathered, instead constant-per-face
+         if (mm->input.lighting_at_vertices) {
+            int i;
+            for (i=0; i < 4; ++i) {
+               *mv[i] = vertbase + face_coord[i]
+                          + stbvox_vertex_encode(0,0,0,mm->input.lighting[v_off + mm->cube_vertex_offset[face][i]] & 63,0)
+                          + p1[i];
+            }
+         } else {
+            unsigned char *amb = &mm->input.lighting[v_off];
+            int i,j;
+            #if defined(STBVOX_CONFIG_ROTATION_IN_LIGHTING) || defined(STBVOX_CONFIG_VHEIGHT_IN_LIGHTING)
+            #define STBVOX_GET_LIGHTING(light) ((light) & ~3)
+            #define STBVOX_LIGHTING_ROUNDOFF   8
+            #else
+            #define STBVOX_GET_LIGHTING(light) (light)
+            #define STBVOX_LIGHTING_ROUNDOFF   2
+            #endif
+
+            for (i=0; i < 4; ++i) {
+               // for each vertex, gather from the four neighbor blocks it's facing
+               unsigned char *vamb = &amb[mm->cube_vertex_offset[face][i]];
+               int total=0;
+               for (j=0; j < 4; ++j)
+                  total += STBVOX_GET_LIGHTING(vamb[mm->vertex_gather_offset[face][j]]);
+               *mv[i] = vertbase + face_coord[i]
+                          + stbvox_vertex_encode(0,0,0,(total+STBVOX_LIGHTING_ROUNDOFF)>>4,0)
+                          + p1[i];
+                          // >> 4 is because:
+                          //   >> 2 to divide by 4 to get average over 4 samples
+                          //   >> 2 because input is 8 bits, output is 6 bits
+            }
+
+            // @TODO: note that gathering baked *lighting*
+            // is different from gathering baked ao; baked ao can count
+            // solid blocks as 0 ao, but baked lighting wants average
+            // of non-blocked--not take average & treat blocked as 0. And
+            // we can't bake the right value into the solid blocks
+            // because they can have different lighting values on
+            // different sides. So we need to actually gather and
+            // then divide by 0..4 (which we can do with a table-driven
+            // multiply, or have an 'if' for the 3 case)
+
+         }
+      } else {
+         vertbase += stbvox_vertex_encode(0,0,0,63,0);
+         *mv[0] = vertbase + face_coord[0] + p1[0];
+         *mv[1] = vertbase + face_coord[1] + p1[1];
+         *mv[2] = vertbase + face_coord[2] + p1[2];
+         *mv[3] = vertbase + face_coord[3] + p1[3];
+      }
+   }
+}
+
+// get opposite-facing normal & texgen for opposite face, used to map up-facing vheight data to down-facing data
+static unsigned char stbvox_reverse_face[STBVF_count] =
+{
+   STBVF_w, STBVF_s, STBVF_e, STBVF_n, STBVF_d   , STBVF_u   , STBVF_wd, STBVF_wu,
+         0,       0,       0,       0, STBVF_sw_d, STBVF_sw_u, STBVF_sd, STBVF_su,
+         0,       0,       0,       0, STBVF_se_d, STBVF_se_u, STBVF_ed, STBVF_eu,
+         0,       0,       0,       0, STBVF_ne_d, STBVF_ne_d, STBVF_nd, STBVF_nu
+};
+
+#ifndef STBVOX_CONFIG_OPTIMIZED_VHEIGHT
+// render non-planar quads by splitting into two triangles, rendering each as a degenerate quad
+static void stbvox_make_12_split_mesh_for_face(stbvox_mesh_maker *mm, stbvox_rotate rot, int face, int v_off, stbvox_pos pos, stbvox_mesh_vertex vertbase, stbvox_mesh_vertex *face_coord, unsigned char mesh, unsigned char *ht)
+{
+   stbvox_mesh_vertex v[4];
+
+   unsigned char normal1 = stbvox_face_up_normal_012[ht[2]][ht[1]][ht[0]];
+   unsigned char normal2 = stbvox_face_up_normal_123[ht[3]][ht[2]][ht[1]];
+
+   if (face == STBVOX_FACE_down) {
+      normal1 = stbvox_reverse_face[normal1];
+      normal2 = stbvox_reverse_face[normal2];
+   }
+
+   // the floor side face_coord is stored in order NW,NE,SE,SW, but ht[] is stored SW,SE,NW,NE
+   v[0] = face_coord[2];
+   v[1] = face_coord[3];
+   v[2] = face_coord[0];
+   v[3] = face_coord[2];
+   stbvox_make_mesh_for_face(mm, rot, face, v_off, pos, vertbase, v, mesh, normal1);
+   v[1] = face_coord[0];
+   v[2] = face_coord[1];
+   stbvox_make_mesh_for_face(mm, rot, face, v_off, pos, vertbase, v, mesh, normal2);
+}
+
+static void stbvox_make_03_split_mesh_for_face(stbvox_mesh_maker *mm, stbvox_rotate rot, int face, int v_off, stbvox_pos pos, stbvox_mesh_vertex vertbase, stbvox_mesh_vertex *face_coord, unsigned char mesh, unsigned char *ht)
+{
+   stbvox_mesh_vertex v[4];
+
+   unsigned char normal1 = stbvox_face_up_normal_013[ht[3]][ht[1]][ht[0]];
+   unsigned char normal2 = stbvox_face_up_normal_023[ht[3]][ht[2]][ht[0]];
+
+   if (face == STBVOX_FACE_down) {
+      normal1 = stbvox_reverse_face[normal1];
+      normal2 = stbvox_reverse_face[normal2];
+   }
+
+   v[0] = face_coord[1];
+   v[1] = face_coord[2];
+   v[2] = face_coord[3];
+   v[3] = face_coord[1];
+   stbvox_make_mesh_for_face(mm, rot, face, v_off, pos, vertbase, v, mesh, normal1);
+   v[1] = face_coord[3];
+   v[2] = face_coord[0];
+   stbvox_make_mesh_for_face(mm, rot, face, v_off, pos, vertbase, v, mesh, normal2);  // this one is correct!
+}
+#endif
+
+#ifndef STBVOX_CONFIG_PRECISION_Z
+#define STBVOX_CONFIG_PRECISION_Z 1
+#endif
+
+// simple case for mesh generation: we have only solid and empty blocks
+static void stbvox_make_mesh_for_block(stbvox_mesh_maker *mm, stbvox_pos pos, int v_off, stbvox_mesh_vertex *vmesh)
+{
+   int ns_off = mm->y_stride_in_bytes;
+   int ew_off = mm->x_stride_in_bytes;
+
+   unsigned char *blockptr = &mm->input.blocktype[v_off];
+   stbvox_mesh_vertex basevert = stbvox_vertex_encode(pos.x, pos.y, pos.z << STBVOX_CONFIG_PRECISION_Z , 0,0);
+
+   stbvox_rotate rot = { 0,0,0,0 };
+   unsigned char simple_rot = 0;
+
+   unsigned char mesh = mm->default_mesh;
+
+   if (mm->input.selector)
+      mesh = mm->input.selector[v_off];
+   else if (mm->input.block_selector)
+      mesh = mm->input.block_selector[mm->input.blocktype[v_off]];
+
+   // check if we're going off the end
+   if (mm->output_cur[mesh][0] + mm->output_size[mesh][0]*6 > mm->output_end[mesh][0]) {
+      mm->full = 1;
+      return;
+   }
+
+   #ifdef STBVOX_CONFIG_ROTATION_IN_LIGHTING
+   simple_rot = mm->input.lighting[v_off] & 3;
+   #endif
+
+   if (mm->input.packed_compact)
+      simple_rot = mm->input.packed_compact[v_off] & 3;
+
+   if (blockptr[ 1]==0) {
+      rot.facerot = simple_rot;
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_up  , v_off, pos, basevert, vmesh+4*STBVOX_FACE_up, mesh, STBVOX_FACE_up);
+   }
+   if (blockptr[-1]==0) {
+      rot.facerot = (-simple_rot) & 3;
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_down, v_off, pos, basevert, vmesh+4*STBVOX_FACE_down, mesh, STBVOX_FACE_down);
+   }
+
+   if (mm->input.rotate) {
+      unsigned char val = mm->input.rotate[v_off];
+      rot.block   = (val >> 0) & 3;
+      rot.overlay = (val >> 2) & 3;
+      //rot.tex2    = (val >> 4) & 3;
+      rot.ecolor  = (val >> 6) & 3;
+   } else {
+      rot.block = rot.overlay = rot.ecolor = simple_rot;
+   }
+   rot.facerot = 0;
+
+   if (blockptr[ ns_off]==0)
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_north, v_off, pos, basevert, vmesh+4*STBVOX_FACE_north, mesh, STBVOX_FACE_north);
+   if (blockptr[-ns_off]==0)
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_south, v_off, pos, basevert, vmesh+4*STBVOX_FACE_south, mesh, STBVOX_FACE_south);
+   if (blockptr[ ew_off]==0)
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_east , v_off, pos, basevert, vmesh+4*STBVOX_FACE_east, mesh, STBVOX_FACE_east);
+   if (blockptr[-ew_off]==0)
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_west , v_off, pos, basevert, vmesh+4*STBVOX_FACE_west, mesh, STBVOX_FACE_west);
+}
+
+// complex case for mesh generation: we have lots of different
+// block types, and we don't want to generate faces of blocks
+// if they're hidden by neighbors.
+//
+// we use lots of tables to determine this: we have a table
+// which tells us what face type is generated for each type of
+// geometry, and then a table that tells us whether that type
+// is hidden by a neighbor.
+static void stbvox_make_mesh_for_block_with_geo(stbvox_mesh_maker *mm, stbvox_pos pos, int v_off)
+{
+   int ns_off = mm->y_stride_in_bytes;
+   int ew_off = mm->x_stride_in_bytes;
+   int visible_faces, visible_base;
+   unsigned char mesh;
+
+   // first gather the geometry info for this block and all neighbors
+
+   unsigned char bt, nbt[6];
+   unsigned char geo, ngeo[6];
+   unsigned char rot, nrot[6];
+
+   bt = mm->input.blocktype[v_off];
+   nbt[0] = mm->input.blocktype[v_off + ew_off];
+   nbt[1] = mm->input.blocktype[v_off + ns_off];
+   nbt[2] = mm->input.blocktype[v_off - ew_off];
+   nbt[3] = mm->input.blocktype[v_off - ns_off];
+   nbt[4] = mm->input.blocktype[v_off +      1];
+   nbt[5] = mm->input.blocktype[v_off -      1];
+   if (mm->input.geometry) {
+      int i;
+      geo = mm->input.geometry[v_off];
+      ngeo[0] = mm->input.geometry[v_off + ew_off];
+      ngeo[1] = mm->input.geometry[v_off + ns_off];
+      ngeo[2] = mm->input.geometry[v_off - ew_off];
+      ngeo[3] = mm->input.geometry[v_off - ns_off];
+      ngeo[4] = mm->input.geometry[v_off +      1];
+      ngeo[5] = mm->input.geometry[v_off -      1];
+
+      rot = (geo >> 4) & 3;
+      geo &= 15;
+      for (i=0; i < 6; ++i) {
+         nrot[i] = (ngeo[i] >> 4) & 3;
+         ngeo[i] &= 15;
+      }
+   } else {
+      int i;
+      assert(mm->input.block_geometry);
+      geo = mm->input.block_geometry[bt];
+      for (i=0; i < 6; ++i)
+         ngeo[i] = mm->input.block_geometry[nbt[i]];
+      if (mm->input.selector) {
+         #ifndef STBVOX_CONFIG_ROTATION_IN_LIGHTING
+         if (mm->input.packed_compact == NULL) {
+            rot     = (mm->input.selector[v_off         ] >> 4) & 3;
+            nrot[0] = (mm->input.selector[v_off + ew_off] >> 4) & 3;
+            nrot[1] = (mm->input.selector[v_off + ns_off] >> 4) & 3;
+            nrot[2] = (mm->input.selector[v_off - ew_off] >> 4) & 3;
+            nrot[3] = (mm->input.selector[v_off - ns_off] >> 4) & 3;
+            nrot[4] = (mm->input.selector[v_off +      1] >> 4) & 3;
+            nrot[5] = (mm->input.selector[v_off -      1] >> 4) & 3;
+         }
+         #endif
+      } else {
+         #ifndef STBVOX_CONFIG_ROTATION_IN_LIGHTING
+         if (mm->input.packed_compact == NULL) {
+            rot = (geo>>4)&3;
+            geo &= 15;
+            for (i=0; i < 6; ++i) {
+               nrot[i] = (ngeo[i]>>4)&3;
+               ngeo[i] &= 15;
+            }
+         }
+         #endif
+      }
+   }
+
+   #ifndef STBVOX_CONFIG_ROTATION_IN_LIGHTING
+   if (mm->input.packed_compact) {
+      rot = mm->input.packed_compact[rot] & 3;
+      nrot[0] = mm->input.packed_compact[v_off + ew_off] & 3;
+      nrot[1] = mm->input.packed_compact[v_off + ns_off] & 3;
+      nrot[2] = mm->input.packed_compact[v_off - ew_off] & 3;
+      nrot[3] = mm->input.packed_compact[v_off - ns_off] & 3;
+      nrot[4] = mm->input.packed_compact[v_off +      1] & 3;
+      nrot[5] = mm->input.packed_compact[v_off -      1] & 3;
+   }
+   #else
+   rot = mm->input.lighting[v_off] & 3;
+   nrot[0] = (mm->input.lighting[v_off + ew_off]) & 3;
+   nrot[1] = (mm->input.lighting[v_off + ns_off]) & 3;
+   nrot[2] = (mm->input.lighting[v_off - ew_off]) & 3;
+   nrot[3] = (mm->input.lighting[v_off - ns_off]) & 3;
+   nrot[4] = (mm->input.lighting[v_off +      1]) & 3;
+   nrot[5] = (mm->input.lighting[v_off -      1]) & 3;
+   #endif
+
+   if (geo == STBVOX_GEOM_transp) {
+      // transparency has a special rule: if the blocktype is the same,
+      // and the faces are compatible, then can hide them; otherwise,
+      // force them on
+      // Note that this means we don't support any transparentshapes other
+      // than solid blocks, since detecting them is too complicated. If
+      // you wanted to do something like minecraft water, you probably
+      // should just do that with a separate renderer anyway. (We don't
+      // support transparency sorting so you need to use alpha test
+      // anyway)
+      int i;
+      for (i=0; i < 6; ++i)
+         if (nbt[i] != bt) {
+            nbt[i] = 0;
+            ngeo[i] = STBVOX_GEOM_empty;
+         } else
+            ngeo[i] = STBVOX_GEOM_solid;
+      geo = STBVOX_GEOM_solid;
+   }
+
+   // now compute the face visibility
+   visible_base = stbvox_hasface[geo][rot];
+   // @TODO: assert(visible_base != 0); // we should have early-outted earlier in this case
+   visible_faces = 0;
+
+   // now, for every face that might be visible, check if neighbor hides it
+   if (visible_base & (1 << STBVOX_FACE_east)) {
+      int  type = stbvox_facetype[ geo   ][(STBVOX_FACE_east+ rot   )&3];
+      int ntype = stbvox_facetype[ngeo[0]][(STBVOX_FACE_west+nrot[0])&3];
+      visible_faces |= ((stbvox_face_visible[type]) >> (ntype + 5 - STBVOX_FACE_east)) & (1 << STBVOX_FACE_east);
+   }
+   if (visible_base & (1 << STBVOX_FACE_north)) {
+      int  type = stbvox_facetype[ geo   ][(STBVOX_FACE_north+ rot   )&3];
+      int ntype = stbvox_facetype[ngeo[1]][(STBVOX_FACE_south+nrot[1])&3];
+      visible_faces |= ((stbvox_face_visible[type]) >> (ntype + 5 - STBVOX_FACE_north)) & (1 << STBVOX_FACE_north);
+   }
+   if (visible_base & (1 << STBVOX_FACE_west)) {
+      int  type = stbvox_facetype[ geo   ][(STBVOX_FACE_west+ rot   )&3];
+      int ntype = stbvox_facetype[ngeo[2]][(STBVOX_FACE_east+nrot[2])&3];
+      visible_faces |= ((stbvox_face_visible[type]) >> (ntype + 5 - STBVOX_FACE_west)) & (1 << STBVOX_FACE_west);
+   }
+   if (visible_base & (1 << STBVOX_FACE_south)) {
+      int  type = stbvox_facetype[ geo   ][(STBVOX_FACE_south+ rot   )&3];
+      int ntype = stbvox_facetype[ngeo[3]][(STBVOX_FACE_north+nrot[3])&3];
+      visible_faces |= ((stbvox_face_visible[type]) >> (ntype + 5 - STBVOX_FACE_south)) & (1 << STBVOX_FACE_south);
+   }
+   if (visible_base & (1 << STBVOX_FACE_up)) {
+      int  type = stbvox_facetype[ geo   ][STBVOX_FACE_up];
+      int ntype = stbvox_facetype[ngeo[4]][STBVOX_FACE_down];
+      visible_faces |= ((stbvox_face_visible[type]) >> (ntype + 5 - STBVOX_FACE_up)) & (1 << STBVOX_FACE_up);
+   }
+   if (visible_base & (1 << STBVOX_FACE_down)) {
+      int  type = stbvox_facetype[ geo   ][STBVOX_FACE_down];
+      int ntype = stbvox_facetype[ngeo[5]][STBVOX_FACE_up];
+      visible_faces |= ((stbvox_face_visible[type]) >> (ntype + 5 - STBVOX_FACE_down)) & (1 << STBVOX_FACE_down);
+   }
+
+   if (geo == STBVOX_GEOM_force)
+      geo = STBVOX_GEOM_solid;
+
+   assert((geo == STBVOX_GEOM_crossed_pair) ? (visible_faces == 15) : 1);
+
+   // now we finally know for sure which faces are getting generated
+   if (visible_faces == 0)
+      return;
+
+   mesh = mm->default_mesh;
+   if (mm->input.selector)
+      mesh = mm->input.selector[v_off];
+   else if (mm->input.block_selector)
+      mesh = mm->input.block_selector[bt];
+
+   if (geo <= STBVOX_GEOM_ceil_slope_north_is_bottom) {
+      // this is the simple case, we can just use regular block gen with special vmesh calculated with vheight
+      stbvox_mesh_vertex basevert;
+      stbvox_mesh_vertex vmesh[6][4];
+      stbvox_rotate rotate = { 0,0,0,0 };
+      unsigned char simple_rot = rot;
+      int i;
+      // we only need to do this for the displayed faces, but it's easier
+      // to just do it up front; @OPTIMIZE check if it's faster to do it
+      // for visible faces only
+      for (i=0; i < 6*4; ++i) {
+         int vert = stbvox_vertex_selector[0][i];
+         vert = stbvox_rotate_vertex[vert][rot];
+         vmesh[0][i] = stbvox_vmesh_pre_vheight[0][i]
+                     + stbvox_geometry_vheight[geo][vert];
+      }
+
+      basevert = stbvox_vertex_encode(pos.x, pos.y, pos.z << STBVOX_CONFIG_PRECISION_Z, 0,0);
+      if (mm->input.selector) {
+         mesh = mm->input.selector[v_off];
+      } else if (mm->input.block_selector)
+         mesh = mm->input.block_selector[bt];
+
+
+      // check if we're going off the end
+      if (mm->output_cur[mesh][0] + mm->output_size[mesh][0]*6 > mm->output_end[mesh][0]) {
+         mm->full = 1;
+         return;
+      }
+
+      if (geo >= STBVOX_GEOM_floor_slope_north_is_top) {
+         if (visible_faces & (1 << STBVOX_FACE_up)) {
+            int normal = geo == STBVOX_GEOM_floor_slope_north_is_top ? stbvox_floor_slope_for_rot[simple_rot] : STBVOX_FACE_up;
+            rotate.facerot = simple_rot;
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_up  , v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, normal);
+         }
+         if (visible_faces & (1 << STBVOX_FACE_down)) {
+            int normal = geo == STBVOX_GEOM_ceil_slope_north_is_bottom ? stbvox_ceil_slope_for_rot[simple_rot] : STBVOX_FACE_down;
+            rotate.facerot = (-rotate.facerot) & 3;
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, normal);
+         }
+      } else {
+         if (visible_faces & (1 << STBVOX_FACE_up)) {
+            rotate.facerot = simple_rot;
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_up  , v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, STBVOX_FACE_up);
+         }
+         if (visible_faces & (1 << STBVOX_FACE_down)) {
+            rotate.facerot = (-rotate.facerot) & 3;
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, STBVOX_FACE_down);
+         }
+      }
+
+      if (mm->input.rotate) {
+         unsigned char val = mm->input.rotate[v_off];
+         rotate.block   = (val >> 0) & 3;
+         rotate.overlay = (val >> 2) & 3;
+         //rotate.tex2    = (val >> 4) & 3;
+         rotate.ecolor  = (val >> 6) & 3;
+      } else {
+         rotate.block = rotate.overlay = rotate.ecolor = simple_rot;
+      }
+
+      rotate.facerot = 0;
+
+      if (visible_faces & (1 << STBVOX_FACE_north))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_north, v_off, pos, basevert, vmesh[STBVOX_FACE_north], mesh, STBVOX_FACE_north);
+      if (visible_faces & (1 << STBVOX_FACE_south))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_south, v_off, pos, basevert, vmesh[STBVOX_FACE_south], mesh, STBVOX_FACE_south);
+      if (visible_faces & (1 << STBVOX_FACE_east))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_east , v_off, pos, basevert, vmesh[STBVOX_FACE_east ], mesh, STBVOX_FACE_east);
+      if (visible_faces & (1 << STBVOX_FACE_west))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_west , v_off, pos, basevert, vmesh[STBVOX_FACE_west ], mesh, STBVOX_FACE_west);
+   }
+   if (geo >= STBVOX_GEOM_floor_vheight_03) {
+      // this case can also be generated with regular block gen with special vmesh,
+      // except:
+      //     if we want to generate middle diagonal for 'weird' blocks
+      //     it's more complicated to detect neighbor matchups
+      stbvox_mesh_vertex vmesh[6][4];
+      stbvox_mesh_vertex cube[8];
+      stbvox_mesh_vertex basevert;
+      stbvox_rotate rotate = { 0,0,0,0 };
+      unsigned char simple_rot = rot;
+      unsigned char ht[4];
+      int extreme;
+
+      // extract the heights
+      #ifdef STBVOX_CONFIG_VHEIGHT_IN_LIGHTING
+      ht[0] = mm->input.lighting[v_off              ] & 3;
+      ht[1] = mm->input.lighting[v_off+ew_off       ] & 3;
+      ht[2] = mm->input.lighting[v_off       +ns_off] & 3;
+      ht[3] = mm->input.lighting[v_off+ew_off+ns_off] & 3;
+      #else
+      if (mm->input.vheight) {
+         unsigned char v =  mm->input.vheight[v_off];
+         ht[0] = (v >> 0) & 3;
+         ht[1] = (v >> 2) & 3;
+         ht[2] = (v >> 4) & 3;
+         ht[3] = (v >> 6) & 3;
+      } else if (mm->input.block_vheight) {
+         unsigned char v = mm->input.block_vheight[bt];
+         unsigned char raw[4];
+         int i;
+
+         raw[0] = (v >> 0) & 3;
+         raw[1] = (v >> 2) & 3;
+         raw[2] = (v >> 4) & 3;
+         raw[3] = (v >> 6) & 3;
+
+         for (i=0; i < 4; ++i)
+            ht[i] = raw[stbvox_rotate_vertex[i][rot]];
+      } else if (mm->input.packed_compact) {
+         ht[0] = (mm->input.packed_compact[v_off              ] >> 2) & 3;
+         ht[1] = (mm->input.packed_compact[v_off+ew_off       ] >> 2) & 3;
+         ht[2] = (mm->input.packed_compact[v_off       +ns_off] >> 2) & 3;
+         ht[3] = (mm->input.packed_compact[v_off+ew_off+ns_off] >> 2) & 3;
+      } else if (mm->input.geometry) {
+         ht[0] = mm->input.geometry[v_off              ] >> 6;
+         ht[1] = mm->input.geometry[v_off+ew_off       ] >> 6;
+         ht[2] = mm->input.geometry[v_off       +ns_off] >> 6;
+         ht[3] = mm->input.geometry[v_off+ew_off+ns_off] >> 6;
+      } else {
+         assert(0);
+      }
+      #endif
+
+      // flag whether any sides go off the top of the block, which means
+      // our visible_faces test was wrong
+      extreme = (ht[0] == 3 || ht[1] == 3 || ht[2] == 3 || ht[3] == 3);
+
+      if (geo >= STBVOX_GEOM_ceil_vheight_03) {
+         cube[0] = stbvox_vertex_encode(0,0,ht[0],0,0);
+         cube[1] = stbvox_vertex_encode(0,0,ht[1],0,0);
+         cube[2] = stbvox_vertex_encode(0,0,ht[2],0,0);
+         cube[3] = stbvox_vertex_encode(0,0,ht[3],0,0);
+         cube[4] = stbvox_vertex_encode(0,0,2,0,0);
+         cube[5] = stbvox_vertex_encode(0,0,2,0,0);
+         cube[6] = stbvox_vertex_encode(0,0,2,0,0);
+         cube[7] = stbvox_vertex_encode(0,0,2,0,0);
+      } else {
+         cube[0] = stbvox_vertex_encode(0,0,0,0,0);
+         cube[1] = stbvox_vertex_encode(0,0,0,0,0);
+         cube[2] = stbvox_vertex_encode(0,0,0,0,0);
+         cube[3] = stbvox_vertex_encode(0,0,0,0,0);
+         cube[4] = stbvox_vertex_encode(0,0,ht[0],0,0);
+         cube[5] = stbvox_vertex_encode(0,0,ht[1],0,0);
+         cube[6] = stbvox_vertex_encode(0,0,ht[2],0,0);
+         cube[7] = stbvox_vertex_encode(0,0,ht[3],0,0);
+      }
+      if (!mm->input.vheight && mm->input.block_vheight) {
+         // @TODO: support block vheight here, I've forgotten what needs to be done specially
+      }
+
+      // build vertex mesh
+      {
+         int i;
+         for (i=0; i < 6*4; ++i) {
+            int vert = stbvox_vertex_selector[0][i];
+            vmesh[0][i] = stbvox_vmesh_pre_vheight[0][i]
+                        + cube[vert];
+         }
+      }
+
+      basevert = stbvox_vertex_encode(pos.x, pos.y, pos.z << STBVOX_CONFIG_PRECISION_Z, 0,0);
+      // check if we're going off the end
+      if (mm->output_cur[mesh][0] + mm->output_size[mesh][0]*6 > mm->output_end[mesh][0]) {
+         mm->full = 1;
+         return;
+      }
+
+      // @TODO generate split faces
+      if (visible_faces & (1 << STBVOX_FACE_up)) {
+         if (geo >= STBVOX_GEOM_ceil_vheight_03)
+            // flat
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_up  , v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, STBVOX_FACE_up);
+         else {
+         #ifndef STBVOX_CONFIG_OPTIMIZED_VHEIGHT
+            // check if it's non-planar
+            if (cube[5] + cube[6] != cube[4] + cube[7]) {
+               // not planar, split along diagonal and make degenerate quads
+               if (geo == STBVOX_GEOM_floor_vheight_03)
+                  stbvox_make_03_split_mesh_for_face(mm, rotate, STBVOX_FACE_up, v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, ht);
+               else
+                  stbvox_make_12_split_mesh_for_face(mm, rotate, STBVOX_FACE_up, v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, ht);
+            } else
+               stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_up  , v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, stbvox_planar_face_up_normal[ht[2]][ht[1]][ht[0]]);
+         #else
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_up  , v_off, pos, basevert, vmesh[STBVOX_FACE_up], mesh, stbvox_optimized_face_up_normal[ht[3]][ht[2]][ht[1]][ht[0]]);
+         #endif
+         }
+      }
+      if (visible_faces & (1 << STBVOX_FACE_down)) {
+         if (geo < STBVOX_GEOM_ceil_vheight_03)
+            // flat
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, STBVOX_FACE_down);
+         else {
+         #ifndef STBVOX_CONFIG_OPTIMIZED_VHEIGHT
+            // check if it's non-planar
+            if (cube[1] + cube[2] != cube[0] + cube[3]) {
+               // not planar, split along diagonal and make degenerate quads
+               if (geo == STBVOX_GEOM_ceil_vheight_03)
+                  stbvox_make_03_split_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, ht);
+               else
+                  stbvox_make_12_split_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, ht);
+            } else
+               stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, stbvox_reverse_face[stbvox_planar_face_up_normal[ht[2]][ht[1]][ht[0]]]);
+         #else
+            stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_down, v_off, pos, basevert, vmesh[STBVOX_FACE_down], mesh, stbvox_reverse_face[stbvox_optimized_face_up_normal[ht[3]][ht[2]][ht[1]][ht[0]]]);
+         #endif
+         }
+      }
+
+      if (mm->input.rotate) {
+         unsigned char val = mm->input.rotate[v_off];
+         rotate.block   = (val >> 0) & 3;
+         rotate.overlay = (val >> 2) & 3;
+         //rotate.tex2    = (val >> 4) & 3;
+         rotate.ecolor  = (val >> 6) & 3;
+      } else if (mm->input.selector) {
+         rotate.block = rotate.overlay = rotate.ecolor = simple_rot;
+      }
+
+      if ((visible_faces & (1 << STBVOX_FACE_north)) || (extreme && (ht[2] == 3 || ht[3] == 3)))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_north, v_off, pos, basevert, vmesh[STBVOX_FACE_north], mesh, STBVOX_FACE_north);
+      if ((visible_faces & (1 << STBVOX_FACE_south)) || (extreme && (ht[0] == 3 || ht[1] == 3)))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_south, v_off, pos, basevert, vmesh[STBVOX_FACE_south], mesh, STBVOX_FACE_south);
+      if ((visible_faces & (1 << STBVOX_FACE_east)) || (extreme && (ht[1] == 3 || ht[3] == 3)))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_east , v_off, pos, basevert, vmesh[STBVOX_FACE_east ], mesh, STBVOX_FACE_east);
+      if ((visible_faces & (1 << STBVOX_FACE_west)) || (extreme && (ht[0] == 3 || ht[2] == 3)))
+         stbvox_make_mesh_for_face(mm, rotate, STBVOX_FACE_west , v_off, pos, basevert, vmesh[STBVOX_FACE_west ], mesh, STBVOX_FACE_west);
+   }
+
+   if (geo == STBVOX_GEOM_crossed_pair) {
+      // this can be generated with a special vmesh
+      stbvox_mesh_vertex basevert = stbvox_vertex_encode(pos.x, pos.y, pos.z << STBVOX_CONFIG_PRECISION_Z , 0,0);
+      unsigned char simple_rot=0;
+      stbvox_rotate rot = { 0,0,0,0 };
+      unsigned char mesh = mm->default_mesh;
+      if (mm->input.selector) {
+         mesh = mm->input.selector[v_off];
+         simple_rot = mesh >> 4;
+         mesh &= 15;
+      }
+      if (mm->input.block_selector) {
+         mesh = mm->input.block_selector[bt];
+      }
+
+      // check if we're going off the end
+      if (mm->output_cur[mesh][0] + mm->output_size[mesh][0]*4 > mm->output_end[mesh][0]) {
+         mm->full = 1;
+         return;
+      }
+
+      if (mm->input.rotate) {
+         unsigned char val = mm->input.rotate[v_off];
+         rot.block   = (val >> 0) & 3;
+         rot.overlay = (val >> 2) & 3;
+         //rot.tex2    = (val >> 4) & 3;
+         rot.ecolor  = (val >> 6) & 3;
+      } else if (mm->input.selector) {
+         rot.block = rot.overlay = rot.ecolor = simple_rot;
+      }
+      rot.facerot = 0;
+
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_north, v_off, pos, basevert, stbvox_vmesh_crossed_pair[STBVOX_FACE_north], mesh, STBVF_ne_u_cross);
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_south, v_off, pos, basevert, stbvox_vmesh_crossed_pair[STBVOX_FACE_south], mesh, STBVF_sw_u_cross);
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_east , v_off, pos, basevert, stbvox_vmesh_crossed_pair[STBVOX_FACE_east ], mesh, STBVF_se_u_cross);
+      stbvox_make_mesh_for_face(mm, rot, STBVOX_FACE_west , v_off, pos, basevert, stbvox_vmesh_crossed_pair[STBVOX_FACE_west ], mesh, STBVF_nw_u_cross);
+   }
+
+
+   // @TODO
+   // STBVOX_GEOM_floor_slope_north_is_top_as_wall,
+   // STBVOX_GEOM_ceil_slope_north_is_bottom_as_wall,
+}
+
+static void stbvox_make_mesh_for_column(stbvox_mesh_maker *mm, int x, int y, int z0)
+{
+   stbvox_pos pos;
+   int v_off = x * mm->x_stride_in_bytes + y * mm->y_stride_in_bytes;
+   int ns_off = mm->y_stride_in_bytes;
+   int ew_off = mm->x_stride_in_bytes;
+   pos.x = x;
+   pos.y = y;
+   pos.z = 0;
+   if (mm->input.geometry) {
+      unsigned char *bt  = mm->input.blocktype + v_off;
+      unsigned char *geo = mm->input.geometry + v_off;
+      int z;
+      for (z=z0; z < mm->z1; ++z) {
+         if (bt[z] && ( !bt[z+ns_off] || !STBVOX_GET_GEO(geo[z+ns_off]) || !bt[z-ns_off] || !STBVOX_GET_GEO(geo[z-ns_off])
+                      || !bt[z+ew_off] || !STBVOX_GET_GEO(geo[z+ew_off]) || !bt[z-ew_off] || !STBVOX_GET_GEO(geo[z-ew_off])
+                      || !bt[z-1] || !STBVOX_GET_GEO(geo[z-1]) || !bt[z+1] || !STBVOX_GET_GEO(geo[z+1])))
+         {  // TODO check up and down
+            pos.z = z;
+            stbvox_make_mesh_for_block_with_geo(mm, pos, v_off+z);
+            if (mm->full) {
+               mm->cur_z = z;
+               return;
+            }
+         }
+      }
+   } else if (mm->input.block_geometry) {
+      int z;
+      unsigned char *bt  = mm->input.blocktype + v_off;
+      unsigned char *geo = mm->input.block_geometry;
+      for (z=z0; z < mm->z1; ++z) {
+         if (bt[z] && (    geo[bt[z+ns_off]] != STBVOX_GEOM_solid
+                        || geo[bt[z-ns_off]] != STBVOX_GEOM_solid
+                        || geo[bt[z+ew_off]] != STBVOX_GEOM_solid
+                        || geo[bt[z-ew_off]] != STBVOX_GEOM_solid
+                        || geo[bt[z-1]] != STBVOX_GEOM_solid
+                        || geo[bt[z+1]] != STBVOX_GEOM_solid))
+         {
+            pos.z = z;
+            stbvox_make_mesh_for_block_with_geo(mm, pos, v_off+z);
+            if (mm->full) {
+               mm->cur_z = z;
+               return;
+            }
+         }
+      }
+   } else {
+      unsigned char *bt = mm->input.blocktype + v_off;
+      int z;
+      #if STBVOX_CONFIG_PRECISION_Z == 1
+      stbvox_mesh_vertex *vmesh = stbvox_vmesh_delta_half_z[0];
+      #else
+      stbvox_mesh_vertex *vmesh = stbvox_vmesh_delta_normal[0];
+      #endif
+      for (z=z0; z < mm->z1; ++z) {
+         // if it's solid and at least one neighbor isn't solid
+         if (bt[z] && (!bt[z+ns_off] || !bt[z-ns_off] || !bt[z+ew_off] || !bt[z-ew_off] || !bt[z-1] || !bt[z+1])) {
+            pos.z = z;
+            stbvox_make_mesh_for_block(mm, pos, v_off+z, vmesh);
+            if (mm->full) {
+               mm->cur_z = z;
+               return;
+            }
+         }
+      }
+   }
+}
+
+static void stbvox_bring_up_to_date(stbvox_mesh_maker *mm)
+{
+   if (mm->config_dirty) {
+      int i;
+      #ifdef STBVOX_ICONFIG_FACE_ATTRIBUTE
+         mm->num_mesh_slots = 1;
+         for (i=0; i < STBVOX_MAX_MESHES; ++i) {
+            mm->output_size[i][0] = 32;
+            mm->output_step[i][0] = 8;
+         }
+      #else
+         mm->num_mesh_slots = 2;
+         for (i=0; i < STBVOX_MAX_MESHES; ++i) {
+            mm->output_size[i][0] = 16;
+            mm->output_step[i][0] = 4;
+            mm->output_size[i][1] = 4;
+            mm->output_step[i][1] = 4;
+         }
+      #endif
+
+      mm->config_dirty = 0;
+   }
+}
+
+int stbvox_make_mesh(stbvox_mesh_maker *mm)
+{
+   int x,y;
+   stbvox_bring_up_to_date(mm);
+   mm->full = 0;
+   if (mm->cur_x > mm->x0 || mm->cur_y > mm->y0 || mm->cur_z > mm->z0) {
+      stbvox_make_mesh_for_column(mm, mm->cur_x, mm->cur_y, mm->cur_z);
+      if (mm->full)
+         return 0;
+      ++mm->cur_y;
+      while (mm->cur_y < mm->y1 && !mm->full) {
+         stbvox_make_mesh_for_column(mm, mm->cur_x, mm->cur_y, mm->z0);
+         if (mm->full)
+            return 0;
+         ++mm->cur_y;
+      }
+      ++mm->cur_x;
+   }
+   for (x=mm->cur_x; x < mm->x1; ++x) {
+      for (y=mm->y0; y < mm->y1; ++y) {
+         stbvox_make_mesh_for_column(mm, x, y, mm->z0);
+         if (mm->full) {
+            mm->cur_x = x;
+            mm->cur_y = y;
+            return 0;
+         }
+      }
+   }
+   return 1;
+}
+
+void stbvox_init_mesh_maker(stbvox_mesh_maker *mm)
+{
+   memset(mm, 0, sizeof(*mm));
+   stbvox_build_default_palette();
+
+   mm->config_dirty = 1;
+   mm->default_mesh = 0;
+}
+
+int stbvox_get_buffer_count(stbvox_mesh_maker *mm)
+{
+   stbvox_bring_up_to_date(mm);
+   return mm->num_mesh_slots;
+}
+
+int stbvox_get_buffer_size_per_quad(stbvox_mesh_maker *mm, int n)
+{
+   return mm->output_size[0][n];
+}
+
+void stbvox_reset_buffers(stbvox_mesh_maker *mm)
+{
+   int i;
+   for (i=0; i < STBVOX_MAX_MESHES*STBVOX_MAX_MESH_SLOTS; ++i) {
+      mm->output_cur[0][i] = 0;
+      mm->output_buffer[0][i] = 0;
+   }
+}
+
+void stbvox_set_buffer(stbvox_mesh_maker *mm, int mesh, int slot, void *buffer, size_t len)
+{
+   int i;
+   stbvox_bring_up_to_date(mm);
+   mm->output_buffer[mesh][slot] = (char *) buffer;
+   mm->output_cur   [mesh][slot] = (char *) buffer;
+   mm->output_len   [mesh][slot] = (int) len;
+   mm->output_end   [mesh][slot] = (char *) buffer + len;
+   for (i=0; i < STBVOX_MAX_MESH_SLOTS; ++i) {
+      if (mm->output_buffer[mesh][i]) {
+         assert(mm->output_len[mesh][i] / mm->output_size[mesh][i] == mm->output_len[mesh][slot] / mm->output_size[mesh][slot]);
+      }
+   }
+}
+
+void stbvox_set_default_mesh(stbvox_mesh_maker *mm, int mesh)
+{
+   mm->default_mesh = mesh;
+}
+
+int stbvox_get_quad_count(stbvox_mesh_maker *mm, int mesh)
+{
+   return (int) ((mm->output_cur[mesh][0] - mm->output_buffer[mesh][0]) / mm->output_size[mesh][0]);
+}
+
+stbvox_input_description *stbvox_get_input_description(stbvox_mesh_maker *mm)
+{
+   return &mm->input;
+}
+
+void stbvox_set_input_range(stbvox_mesh_maker *mm, int x0, int y0, int z0, int x1, int y1, int z1)
+{
+   mm->x0 = x0;
+   mm->y0 = y0;
+   mm->z0 = z0;
+
+   mm->x1 = x1;
+   mm->y1 = y1;
+   mm->z1 = z1;
+
+   mm->cur_x = x0;
+   mm->cur_y = y0;
+   mm->cur_z = z0;
+
+   // @TODO validate that this range is representable in this mode
+}
+
+void stbvox_get_transform(stbvox_mesh_maker *mm, float transform[3][3])
+{
+   // scale
+   transform[0][0] = 1.0;
+   transform[0][1] = 1.0;
+   #if STBVOX_CONFIG_PRECISION_Z==1
+   transform[0][2] = 0.5f;
+   #else
+   transform[0][2] = 1.0f;
+   #endif
+   // translation
+   transform[1][0] = (float) (mm->pos_x);
+   transform[1][1] = (float) (mm->pos_y);
+   transform[1][2] = (float) (mm->pos_z);
+   // texture coordinate projection translation
+   transform[2][0] = (float) (mm->pos_x & 255); // @TODO depends on max texture scale
+   transform[2][1] = (float) (mm->pos_y & 255);
+   transform[2][2] = (float) (mm->pos_z & 255);
+}
+
+void stbvox_get_bounds(stbvox_mesh_maker *mm, float bounds[2][3])
+{
+   bounds[0][0] = (float) (mm->pos_x + mm->x0);
+   bounds[0][1] = (float) (mm->pos_y + mm->y0);
+   bounds[0][2] = (float) (mm->pos_z + mm->z0);
+   bounds[1][0] = (float) (mm->pos_x + mm->x1);
+   bounds[1][1] = (float) (mm->pos_y + mm->y1);
+   bounds[1][2] = (float) (mm->pos_z + mm->z1);
+}
+
+void stbvox_set_mesh_coordinates(stbvox_mesh_maker *mm, int x, int y, int z)
+{
+   mm->pos_x = x;
+   mm->pos_y = y;
+   mm->pos_z = z;
+}
+
+void stbvox_set_input_stride(stbvox_mesh_maker *mm, int x_stride_in_bytes, int y_stride_in_bytes)
+{
+   int f,v;
+   mm->x_stride_in_bytes = x_stride_in_bytes;
+   mm->y_stride_in_bytes = y_stride_in_bytes;
+   for (f=0; f < 6; ++f) {
+      for (v=0; v < 4; ++v) {
+         mm->cube_vertex_offset[f][v]   =   stbvox_vertex_vector[f][v][0]    * mm->x_stride_in_bytes
+                                         +  stbvox_vertex_vector[f][v][1]    * mm->y_stride_in_bytes
+                                         +  stbvox_vertex_vector[f][v][2]                           ;
+         mm->vertex_gather_offset[f][v] =  (stbvox_vertex_vector[f][v][0]-1) * mm->x_stride_in_bytes
+                                         + (stbvox_vertex_vector[f][v][1]-1) * mm->y_stride_in_bytes
+                                         + (stbvox_vertex_vector[f][v][2]-1)                        ;
+      }
+   }
+}
+
+/////////////////////////////////////////////////////////////////////////////
+//
+//    offline computation of tables
+//
+
+#if 0
+// compute optimized vheight table
+static char *normal_names[32] =
+{
+   0,0,0,0,"u   ",0, "eu  ",0,
+   0,0,0,0,"ne_u",0, "nu  ",0,
+   0,0,0,0,"nw_u",0, "wu  ",0,
+   0,0,0,0,"sw_u",0, "su  ",0,
+};
+
+static char *find_best_normal(float x, float y, float z)
+{
+   int best_slot = 4;
+   float best_dot = 0;
+   int i;
+   for (i=0; i < 32; ++i) {
+      if (normal_names[i]) {
+         float dot = x * stbvox_default_normals[i][0] + y * stbvox_default_normals[i][1] + z * stbvox_default_normals[i][2];
+         if (dot > best_dot) {
+            best_dot = dot;
+            best_slot = i;
+         }
+      }
+   }
+   return normal_names[best_slot];
+}
+
+int main(int argc, char **argv)
+{
+   int sw,se,nw,ne;
+   for (ne=0; ne < 4; ++ne) {
+      for (nw=0; nw < 4; ++nw) {
+         for (se=0; se < 4; ++se) {
+            printf("        { ");
+            for (sw=0; sw < 4; ++sw) {
+               float x = (float) (nw + sw - ne - se);
+               float y = (float) (sw + se - nw - ne);
+               float z = 2;
+               printf("STBVF_%s, ", find_best_normal(x,y,z));
+            }
+            printf("},\n");
+         }
+      }
+   }
+   return 0;
+}
+#endif
+
+// @TODO
+//
+//   - test API for texture rotation on side faces
+//   - API for texture rotation on top & bottom
+//   - better culling of vheight faces with vheight neighbors
+//   - better culling of non-vheight faces with vheight neighbors
+//   - gather vertex lighting from slopes correctly
+//   - better support texture edge_clamp: currently if you fall
+//     exactly on 1.0 you get wrapped incorrectly; this is rare, but
+//     can avoid: compute texcoords in vertex shader, offset towards
+//     center before modding, need 2 bits per vertex to know offset direction)
+//   - other mesh modes (10,6,4-byte quads)
+//
+//
+// With TexBuffer for the fixed vertex data, we can actually do
+// minecrafty non-blocks like stairs -- we still probably only
+// want 256 or so, so we can't do the equivalent of all the vheight
+// combos, but that's ok. The 256 includes baked rotations, but only
+// some of them need it, and lots of block types share some faces.
+//
+// mode 5 (6 bytes):   mode 6 (6 bytes)
+//   x:7                x:6
+//   y:7                y:6
+//   z:6                z:6
+//   tex1:8             tex1:8
+//   tex2:8             tex2:7
+//   color:8            color:8
+//   face:4             face:7
+//
+//
+//  side faces (all x4)        top&bottom faces (2x)    internal faces (1x)
+//     1  regular                1 regular
+//     2  slabs                                             2
+//     8  stairs                 4 stairs                  16
+//     4  diag side                                         8
+//     4  upper diag side                                   8
+//     4  lower diag side                                   8
+//                                                          4 crossed pairs
+//
+//    23*4                   +   5*4                    +  46
+//  == 92 + 20 + 46 = 158
+//
+//   Must drop 30 of them to fit in 7 bits:
+//       ceiling half diagonals: 16+8 = 24
+//   Need to get rid of 6 more.
+//       ceiling diagonals: 8+4 = 12
+//   This brings it to 122, so can add a crossed-pair variant.
+//       (diagonal and non-diagonal, or randomly offset)
+//   Or carpet, which would be 5 more.
+//
+//
+// Mode 4 (10 bytes):
+//  v:  z:2,light:6
+//  f:  x:6,y:6,z:7, t1:8,t2:8,c:8,f:5
+//
+// Mode ? (10 bytes)
+//  v:  xyz:5 (27 values), light:3
+//  f:  x:7,y:7,z:6, t1:8,t2:8,c:8,f:4
+// (v:  x:2,y:2,z:2,light:2)
+
+#endif // STB_VOXEL_RENDER_IMPLEMENTATION
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/tests/Makefile b/vendor/stb/tests/Makefile
new file mode 100644
index 0000000..1782ea6
--- /dev/null
+++ b/vendor/stb/tests/Makefile
@@ -0,0 +1,12 @@
+INCLUDES = -I..
+CFLAGS = -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -DSTB_DIVIDE_TEST
+CPPFLAGS = -Wno-write-strings -DSTB_DIVIDE_TEST
+
+# Uncomment this line for reproducing OSS-Fuzz bugs with image_fuzzer
+#CFLAGS += -O -fsanitize=address 
+
+all:
+	$(CC) $(INCLUDES) $(CFLAGS) ../stb_vorbis.c test_c_compilation.c test_c_lexer.c test_dxt.c test_easyfont.c test_image.c test_image_write.c test_perlin.c test_sprintf.c test_truetype.c test_voxel.c -lm
+	$(CC) $(INCLUDES) $(CPPFLAGS) -std=c++0x test_cpp_compilation.cpp -lm -lstdc++
+	$(CC) $(INCLUDES) $(CFLAGS) -DIWT_TEST image_write_test.c -lm -o image_write_test
+	$(CC) $(INCLUDES) $(CFLAGS) fuzz_main.c stbi_read_fuzzer.c -lm -o image_fuzzer
diff --git a/vendor/stb/tests/c_lexer_test.c b/vendor/stb/tests/c_lexer_test.c
new file mode 100644
index 0000000..a919b6c
--- /dev/null
+++ b/vendor/stb/tests/c_lexer_test.c
@@ -0,0 +1,50 @@
+#define STB_C_LEX_C_DECIMAL_INTS    Y   //  "0|[1-9][0-9]*"                        CLEX_intlit
+#define STB_C_LEX_C_HEX_INTS        Y   //  "0x[0-9a-fA-F]+"                       CLEX_intlit
+#define STB_C_LEX_C_OCTAL_INTS      Y   //  "[0-7]+"                               CLEX_intlit
+#define STB_C_LEX_C_DECIMAL_FLOATS  Y   //  "[0-9]*(.[0-9]*([eE][-+]?[0-9]+)?)     CLEX_floatlit
+#define STB_C_LEX_C99_HEX_FLOATS    Y   //  "0x{hex}+(.{hex}*)?[pP][-+]?{hex}+     CLEX_floatlit
+#define STB_C_LEX_C_IDENTIFIERS     Y   //  "[_a-zA-Z][_a-zA-Z0-9]*"               CLEX_id
+#define STB_C_LEX_C_DQ_STRINGS      Y   //  double-quote-delimited strings with escapes  CLEX_dqstring
+#define STB_C_LEX_C_SQ_STRINGS      Y   //  single-quote-delimited strings with escapes  CLEX_ssstring
+#define STB_C_LEX_C_CHARS           Y   //  single-quote-delimited character with escape CLEX_charlits
+#define STB_C_LEX_C_COMMENTS        Y   //  "/* comment */"
+#define STB_C_LEX_CPP_COMMENTS      Y   //  "// comment to end of line\n"
+#define STB_C_LEX_C_COMPARISONS     Y   //  "==" CLEX_eq  "!=" CLEX_noteq   "<=" CLEX_lesseq  ">=" CLEX_greatereq
+#define STB_C_LEX_C_LOGICAL         Y   //  "&&"  CLEX_andand   "||"  CLEX_oror
+#define STB_C_LEX_C_SHIFTS          Y   //  "<<"  CLEX_shl      ">>"  CLEX_shr
+#define STB_C_LEX_C_INCREMENTS      Y   //  "++"  CLEX_plusplus "--"  CLEX_minusminus
+#define STB_C_LEX_C_ARROW           Y   //  "->"  CLEX_arrow
+#define STB_C_LEX_EQUAL_ARROW       Y   //  "=>"  CLEX_eqarrow
+#define STB_C_LEX_C_BITWISEEQ       Y   //  "&="  CLEX_andeq    "|="  CLEX_oreq     "^="  CLEX_xoreq
+#define STB_C_LEX_C_ARITHEQ         Y   //  "+="  CLEX_pluseq   "-="  CLEX_minuseq
+                                        //  "*="  CLEX_muleq    "/="  CLEX_diveq    "%=" CLEX_modeq
+                                        //  if both STB_C_LEX_SHIFTS & STB_C_LEX_ARITHEQ:
+                                        //                      "<<=" CLEX_shleq    ">>=" CLEX_shreq
+
+#define STB_C_LEX_PARSE_SUFFIXES    Y   // letters after numbers are parsed as part of those numbers, and must be in suffix list below
+#define STB_C_LEX_DECIMAL_SUFFIXES  "uUlL"  // decimal integer suffixes e.g. "uUlL" -- these are returned as-is in string storage
+#define STB_C_LEX_HEX_SUFFIXES      "lL"  // e.g. "uUlL"
+#define STB_C_LEX_OCTAL_SUFFIXES    "lL"  // e.g. "uUlL"
+#define STB_C_LEX_FLOAT_SUFFIXES    "uulL"  //
+
+#define STB_C_LEX_0_IS_EOF             N  // if Y, ends parsing at '\0'; if N, returns '\0' as token
+#define STB_C_LEX_INTEGERS_AS_DOUBLES  N  // parses integers as doubles so they can be larger than 'int', but only if STB_C_LEX_STDLIB==N
+#define STB_C_LEX_MULTILINE_DSTRINGS   Y  // allow newlines in double-quoted strings
+#define STB_C_LEX_MULTILINE_SSTRINGS   Y  // allow newlines in single-quoted strings
+#define STB_C_LEX_USE_STDLIB           N  // use strtod,strtol for parsing #s; otherwise inaccurate hack
+#define STB_C_LEX_DOLLAR_IDENTIFIER    Y  // allow $ as an identifier character
+#define STB_C_LEX_FLOAT_NO_DECIMAL     Y  // allow floats that have no decimal point if they have an exponent
+
+#define STB_C_LEX_DEFINE_ALL_TOKEN_NAMES  Y   // if Y, all CLEX_ token names are defined, even if never returned
+                                              // leaving it as N should help you catch config bugs
+
+#define STB_C_LEX_DISCARD_PREPROCESSOR    Y   // discard C-preprocessor directives (e.g. after prepocess
+                                              // still have #line, #pragma, etc)
+
+#define STB_C_LEXER_DEFINITIONS         // This line prevents the header file from replacing your definitions
+
+
+
+#define STB_C_LEXER_IMPLEMENTATION
+#define STB_C_LEXER_SELF_TEST
+#include "../stb_c_lexer.h"
diff --git a/vendor/stb/tests/c_lexer_test.dsp b/vendor/stb/tests/c_lexer_test.dsp
new file mode 100644
index 0000000..13f8758
--- /dev/null
+++ b/vendor/stb/tests/c_lexer_test.dsp
@@ -0,0 +1,89 @@
+# Microsoft Developer Studio Project File - Name="c_lexer_test" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=c_lexer_test - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "c_lexer_test.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "c_lexer_test.mak" CFG="c_lexer_test - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "c_lexer_test - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "c_lexer_test - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "c_lexer_test - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "c_lexer_test - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug\c_lexer_test"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "c_lexer_test - Win32 Release"
+# Name "c_lexer_test - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\c_lexer_test.c
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/caveview/README.md b/vendor/stb/tests/caveview/README.md
new file mode 100644
index 0000000..10da838
--- /dev/null
+++ b/vendor/stb/tests/caveview/README.md
@@ -0,0 +1,85 @@
+# FAQ
+
+### How to run it?
+
+There's no GUI. Find a directory with Minecraft Anvil files (.mca).
+Copy a Minecraft "terrain.png" into that directory (do a google
+image search). Run from that directory.
+
+### How accurate is this as a Minecraft viewer?
+
+Not very. Many Minecraft blocks are not handled correctly:
+
+*         No redstone, rails, or other "flat" blocks
+*         No signs, doors, fences, carpets, or other complicated geometry
+*         Stairs are turned into ramps
+*         Upper slabs turn into lower slabs
+*         Wood types only for blocks, not stairs, slabs, etc
+*         Colored glass becomes regular glass
+*         Glass panes become glass blocks
+*         Water is opaque
+*         Water level is incorrect
+*         No biome coloration
+*         Cactus is not shrunk, shows holes
+*         Chests are not shrunk
+*         Double-chests draw as two chests
+*         Pumpkins etc. are not rotated properly
+*         Torches are drawn hackily, do not attach to walls
+*         Incorrect textures for blocks that postdate terrain.png
+*         Transparent textures have black fringes due to non-premultiplied-alpha
+*         Skylight and block light are combined in a single value
+*         Only blocks at y=1..255 are shown (not y=0)
+*         If a 32x32x256 "quad-chunk" needs more than 800K quads, isn't handled (very unlikely)
+
+Some of these are due to engine limitations, and some of
+these are because I didn't make the effort since my
+goal was to make a demo for stb_voxel_render.h, not
+to make a proper Minecraft viewer.
+
+
+### Could this be turned into a proper Minecraft viewer?
+
+Yes and no. Yes, you could do it, but no, it wouldn't
+really resemble this code that much anymore.
+
+You could certainly use this engine to
+render the parts of Minecraft it works for, but many
+of the things it doesn't handle it can't handle at all
+(stairs, water, fences, carpets, etc) because it uses
+low-precision coordinates to store voxel data.
+
+You would have to render all of the stuff it doesn't
+handle through another rendering path. In a game (not
+a viewer) you would need such a path for movable entities
+like doors and carts anyway, so possibly handling other
+things that way wouldn't be so bad.
+
+Rails, ladders, and redstone lines could be implemented by
+using tex2 to overlay those effects, but you can't rotate
+tex1 and tex2 independently, so there may be cases where
+the underlying texture needs a different rotation from the
+overlaid texture, which would require separate rendering.
+Handling redstone's brightness being different from underlying
+block's brightness would require separate rendering.
+
+You can use the face-color effect to do biome coloration,
+but the change won't be smooth the way it is in Minecraft.
+
+
+### Why isn't building the mesh data faster?
+
+Partly because converting from minecraft data is expensive.
+
+Here is the approximate breakdown of an older version
+of this executable and lib that did the building single-threaded.
+
+*       25%   loading & parsing minecraft files (4/5ths of this is my crappy zlib)
+*       18%   converting from minecraft blockids & lighting to stb blockids & lighting
+*       10%   reordering from data[z][y]\[x] (minecraft-style) to data[y][x]\[z] (stb-style)
+*       40%   building mesh data
+*        7%   uploading mesh data to OpenGL
+
+I did do significant optimizations after the above, so the
+final breakdown is different, but it should give you some
+sense of the costs.
+
diff --git a/vendor/stb/tests/caveview/cave_main.c b/vendor/stb/tests/caveview/cave_main.c
new file mode 100644
index 0000000..d345cf1
--- /dev/null
+++ b/vendor/stb/tests/caveview/cave_main.c
@@ -0,0 +1,598 @@
+#define _WIN32_WINNT 0x400
+
+#include <assert.h>
+#include <windows.h>
+
+// stb.h
+#define STB_DEFINE
+#include "stb.h"
+
+// stb_gl.h
+#define STB_GL_IMPLEMENTATION
+#define STB_GLEXT_DEFINE "glext_list.h"
+#include "stb_gl.h"
+
+// SDL
+#include "sdl.h"
+#include "SDL_opengl.h"
+
+// stb_glprog.h
+#define STB_GLPROG_IMPLEMENTATION
+#define STB_GLPROG_ARB_DEFINE_EXTENSIONS
+#include "stb_glprog.h"
+
+// stb_image.h
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+// stb_easy_font.h
+#include "stb_easy_font.h" // doesn't require an IMPLEMENTATION
+
+#include "caveview.h"
+
+char *game_name = "caveview";
+
+
+#define REVERSE_DEPTH
+
+
+
+static void print_string(float x, float y, char *text, float r, float g, float b)
+{
+   static char buffer[99999];
+   int num_quads;
+   
+   num_quads = stb_easy_font_print(x, y, text, NULL, buffer, sizeof(buffer));
+
+   glColor3f(r,g,b);
+   glEnableClientState(GL_VERTEX_ARRAY);
+   glVertexPointer(2, GL_FLOAT, 16, buffer);
+   glDrawArrays(GL_QUADS, 0, num_quads*4);
+   glDisableClientState(GL_VERTEX_ARRAY);
+}
+
+static float text_color[3];
+static float pos_x = 10;
+static float pos_y = 10;
+
+static void print(char *text, ...)
+{
+   char buffer[999];
+   va_list va;
+   va_start(va, text);
+   vsprintf(buffer, text, va);
+   va_end(va);
+   print_string(pos_x, pos_y, buffer, text_color[0], text_color[1], text_color[2]);
+   pos_y += 10;
+}
+
+float camang[3], camloc[3] = { 60,22,77 };
+float player_zoom = 1.0;
+float rotate_view = 0.0;
+
+
+void camera_to_worldspace(float world[3], float cam_x, float cam_y, float cam_z)
+{
+   float vec[3] = { cam_x, cam_y, cam_z };
+   float t[3];
+   float s,c;
+   s = (float) sin(camang[0]*3.141592/180);
+   c = (float) cos(camang[0]*3.141592/180);
+
+   t[0] = vec[0];
+   t[1] = c*vec[1] - s*vec[2];
+   t[2] = s*vec[1] + c*vec[2];
+
+   s = (float) sin(camang[2]*3.141592/180);
+   c = (float) cos(camang[2]*3.141592/180);
+   world[0] = c*t[0] - s*t[1];
+   world[1] = s*t[0] + c*t[1];
+   world[2] = t[2];
+}
+
+// camera worldspace velocity
+float cam_vel[3];
+
+int controls;
+
+#define MAX_VEL  150.0f      // blocks per second
+#define ACCEL      6.0f
+#define DECEL      3.0f
+
+#define STATIC_FRICTION   DECEL
+#define EFFECTIVE_ACCEL   (ACCEL+DECEL)
+
+// dynamic friction:
+//
+//    if going at MAX_VEL, ACCEL and friction must cancel
+//    EFFECTIVE_ACCEL = DECEL + DYNAMIC_FRIC*MAX_VEL
+#define DYNAMIC_FRICTION  (ACCEL/(float)MAX_VEL)
+
+float view_x_vel = 0;
+float view_z_vel = 0;
+float pending_view_x;
+float pending_view_z;
+float pending_view_x;
+float pending_view_z;
+
+void process_tick_raw(float dt)
+{
+   int i;
+   float thrust[3] = { 0,0,0 };
+   float world_thrust[3];
+
+   // choose direction to apply thrust
+
+   thrust[0] = (controls &  3)== 1 ? EFFECTIVE_ACCEL : (controls &  3)== 2 ? -EFFECTIVE_ACCEL : 0;
+   thrust[1] = (controls & 12)== 4 ? EFFECTIVE_ACCEL : (controls & 12)== 8 ? -EFFECTIVE_ACCEL : 0;
+   thrust[2] = (controls & 48)==16 ? EFFECTIVE_ACCEL : (controls & 48)==32 ? -EFFECTIVE_ACCEL : 0;
+
+   // @TODO clamp thrust[0] & thrust[1] vector length to EFFECTIVE_ACCEL
+
+   camera_to_worldspace(world_thrust, thrust[0], thrust[1], 0);
+   world_thrust[2] += thrust[2];
+
+   for (i=0; i < 3; ++i) {
+      float acc = world_thrust[i];
+      cam_vel[i] += acc*dt;
+   }
+
+   if (cam_vel[0] || cam_vel[1] || cam_vel[2])
+   {
+      float vel = (float) sqrt(cam_vel[0]*cam_vel[0] + cam_vel[1]*cam_vel[1] + cam_vel[2]*cam_vel[2]);
+      float newvel = vel;
+      float dec = STATIC_FRICTION + DYNAMIC_FRICTION*vel;
+      newvel = vel - dec*dt;
+      if (newvel < 0)
+         newvel = 0;
+      cam_vel[0] *= newvel/vel;
+      cam_vel[1] *= newvel/vel;
+      cam_vel[2] *= newvel/vel;
+   }
+
+   camloc[0] += cam_vel[0] * dt;
+   camloc[1] += cam_vel[1] * dt;
+   camloc[2] += cam_vel[2] * dt;
+
+   view_x_vel *= (float) pow(0.75, dt);
+   view_z_vel *= (float) pow(0.75, dt);
+
+   view_x_vel += (pending_view_x - view_x_vel)*dt*60;
+   view_z_vel += (pending_view_z - view_z_vel)*dt*60;
+
+   pending_view_x -= view_x_vel * dt;
+   pending_view_z -= view_z_vel * dt;
+   camang[0] += view_x_vel * dt;
+   camang[2] += view_z_vel * dt;
+   camang[0] = stb_clamp(camang[0], -90, 90);
+   camang[2] = (float) fmod(camang[2], 360);
+}
+
+void process_tick(float dt)
+{
+   while (dt > 1.0f/60) {
+      process_tick_raw(1.0f/60);
+      dt -= 1.0f/60;
+   }
+   process_tick_raw(dt);
+}
+
+void update_view(float dx, float dy)
+{
+   // hard-coded mouse sensitivity, not resolution independent?
+   pending_view_z -= dx*300;
+   pending_view_x -= dy*700;
+}
+
+extern int screen_x, screen_y;
+extern int is_synchronous_debug;
+float render_time;
+
+extern int chunk_locations, chunks_considered, chunks_in_frustum;
+extern int quads_considered, quads_rendered;
+extern int chunk_storage_rendered, chunk_storage_considered, chunk_storage_total;
+extern int view_dist_in_chunks;
+extern int num_threads_active, num_meshes_started, num_meshes_uploaded;
+extern float chunk_server_activity;
+
+static Uint64 start_time, end_time; // render time
+
+float chunk_server_status[32];
+int chunk_server_pos;
+
+void draw_stats(void)
+{
+   int i;
+
+   static Uint64 last_frame_time;
+   Uint64 cur_time = SDL_GetPerformanceCounter();
+   float chunk_server=0;
+   float frame_time = (cur_time - last_frame_time) / (float) SDL_GetPerformanceFrequency();
+   last_frame_time = cur_time;
+
+   chunk_server_status[chunk_server_pos] = chunk_server_activity;
+   chunk_server_pos = (chunk_server_pos+1) %32;
+
+   for (i=0; i < 32; ++i)
+      chunk_server += chunk_server_status[i] / 32.0f;
+
+   stb_easy_font_spacing(-0.75);
+   pos_y = 10;
+   text_color[0] = text_color[1] = text_color[2] = 1.0f;
+   print("Frame time: %6.2fms, CPU frame render time: %5.2fms", frame_time*1000, render_time*1000);
+   print("Tris: %4.1fM drawn of %4.1fM in range", 2*quads_rendered/1000000.0f, 2*quads_considered/1000000.0f);
+   print("Vbuf storage: %dMB in frustum of %dMB in range of %dMB in cache", chunk_storage_rendered>>20, chunk_storage_considered>>20, chunk_storage_total>>20);
+   print("Num mesh builds started this frame: %d; num uploaded this frame: %d\n", num_meshes_started, num_meshes_uploaded);
+   print("QChunks: %3d in frustum of %3d valid of %3d in range", chunks_in_frustum, chunks_considered, chunk_locations);
+   print("Mesh worker threads active: %d", num_threads_active);
+   print("View distance: %d blocks", view_dist_in_chunks*16);
+   print("%s", glGetString(GL_RENDERER));
+
+   if (is_synchronous_debug) {
+      text_color[0] = 1.0;
+      text_color[1] = 0.5;
+      text_color[2] = 0.5;
+      print("SLOWNESS: Synchronous debug output is enabled!");
+   }
+}
+
+void draw_main(void)
+{
+   glEnable(GL_CULL_FACE);
+   glDisable(GL_TEXTURE_2D);
+   glDisable(GL_LIGHTING);
+   glEnable(GL_DEPTH_TEST);
+   #ifdef REVERSE_DEPTH
+   glDepthFunc(GL_GREATER);
+   glClearDepth(0);
+   #else
+   glDepthFunc(GL_LESS);
+   glClearDepth(1);
+   #endif
+   glDepthMask(GL_TRUE);
+   glDisable(GL_SCISSOR_TEST);
+   glClearColor(0.6f,0.7f,0.9f,0.0f);
+   glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
+
+   glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
+   glColor3f(1,1,1);
+   glFrontFace(GL_CW);
+   glEnable(GL_TEXTURE_2D);
+   glDisable(GL_BLEND);
+
+
+   glMatrixMode(GL_PROJECTION);
+   glLoadIdentity();
+
+   #ifdef REVERSE_DEPTH
+   stbgl_Perspective(player_zoom, 90, 70, 3000, 1.0/16);
+   #else
+   stbgl_Perspective(player_zoom, 90, 70, 1.0/16, 3000);
+   #endif
+
+   // now compute where the camera should be
+   glMatrixMode(GL_MODELVIEW);
+   glLoadIdentity();
+   stbgl_initCamera_zup_facing_y();
+
+   glRotatef(-camang[0],1,0,0);
+   glRotatef(-camang[2],0,0,1);
+   glTranslatef(-camloc[0], -camloc[1], -camloc[2]);
+
+   start_time = SDL_GetPerformanceCounter();
+   render_caves(camloc);
+   end_time = SDL_GetPerformanceCounter();
+
+   render_time = (end_time - start_time) / (float) SDL_GetPerformanceFrequency();
+
+   glMatrixMode(GL_PROJECTION);
+   glLoadIdentity();
+   gluOrtho2D(0,screen_x/2,screen_y/2,0);
+   glMatrixMode(GL_MODELVIEW);
+   glLoadIdentity();
+   glDisable(GL_TEXTURE_2D);
+   glDisable(GL_BLEND);
+   glDisable(GL_CULL_FACE);
+   draw_stats();
+}
+
+
+
+#pragma warning(disable:4244; disable:4305; disable:4018)
+
+#define SCALE   2
+
+void error(char *s)
+{
+   SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "Error", s, NULL);
+   exit(0);
+}
+
+void ods(char *fmt, ...)
+{
+   char buffer[1000];
+   va_list va;
+   va_start(va, fmt);
+   vsprintf(buffer, fmt, va);
+   va_end(va);
+   SDL_Log("%s", buffer);
+}
+
+#define TICKS_PER_SECOND  60
+
+static SDL_Window *window;
+
+extern void draw_main(void);
+extern void process_tick(float dt);
+extern void editor_init(void);
+
+void draw(void)
+{
+   draw_main();
+   SDL_GL_SwapWindow(window);
+}
+
+
+static int initialized=0;
+static float last_dt;
+
+int screen_x,screen_y;
+
+float carried_dt = 0;
+#define TICKRATE 60
+
+float tex2_alpha = 1.0;
+
+int raw_level_time;
+
+float global_timer;
+int global_hack;
+
+int loopmode(float dt, int real, int in_client)
+{
+   if (!initialized) return 0;
+
+   if (!real)
+      return 0;
+
+   // don't allow more than 6 frames to update at a time
+   if (dt > 0.075) dt = 0.075;
+
+   global_timer += dt;
+
+   carried_dt += dt;
+   while (carried_dt > 1.0/TICKRATE) {
+      if (global_hack) {
+         tex2_alpha += global_hack / 60.0f;
+         if (tex2_alpha < 0) tex2_alpha = 0;
+         if (tex2_alpha > 1) tex2_alpha = 1;
+      }
+      //update_input();
+      // if the player is dead, stop the sim
+      carried_dt -= 1.0/TICKRATE;
+   }
+
+   process_tick(dt);
+   draw();
+
+   return 0;
+}
+
+static int quit;
+
+extern int controls;
+
+void active_control_set(int key)
+{
+   controls |= 1 << key;
+}
+
+void active_control_clear(int key)
+{
+   controls &= ~(1 << key);
+}
+
+extern void update_view(float dx, float dy);
+
+void  process_sdl_mouse(SDL_Event *e)
+{
+   update_view((float) e->motion.xrel / screen_x, (float) e->motion.yrel / screen_y);
+}
+
+void process_event(SDL_Event *e)
+{
+   switch (e->type) {
+      case SDL_MOUSEMOTION:
+         process_sdl_mouse(e);
+         break;
+      case SDL_MOUSEBUTTONDOWN:
+      case SDL_MOUSEBUTTONUP:
+         break;
+
+      case SDL_QUIT:
+         quit = 1;
+         break;
+
+      case SDL_WINDOWEVENT:
+         switch (e->window.event) {
+            case SDL_WINDOWEVENT_SIZE_CHANGED:
+               screen_x = e->window.data1;
+               screen_y = e->window.data2;
+               loopmode(0,1,0);
+               break;
+         }
+         break;
+
+      case SDL_KEYDOWN: {
+         int k = e->key.keysym.sym;
+         int s = e->key.keysym.scancode;
+         SDL_Keymod mod;
+         mod = SDL_GetModState();
+         if (k == SDLK_ESCAPE)
+            quit = 1;
+
+         if (s == SDL_SCANCODE_D)   active_control_set(0);
+         if (s == SDL_SCANCODE_A)   active_control_set(1);
+         if (s == SDL_SCANCODE_W)   active_control_set(2);
+         if (s == SDL_SCANCODE_S)   active_control_set(3);
+         if (k == SDLK_SPACE)       active_control_set(4); 
+         if (s == SDL_SCANCODE_LCTRL)   active_control_set(5);
+         if (s == SDL_SCANCODE_S)   active_control_set(6);
+         if (s == SDL_SCANCODE_D)   active_control_set(7);
+         if (k == '1') global_hack = !global_hack;
+         if (k == '2') global_hack = -1;
+
+         #if 0
+         if (game_mode == GAME_editor) {
+            switch (k) {
+               case SDLK_RIGHT: editor_key(STBTE_scroll_right); break;
+               case SDLK_LEFT : editor_key(STBTE_scroll_left ); break;
+               case SDLK_UP   : editor_key(STBTE_scroll_up   ); break;
+               case SDLK_DOWN : editor_key(STBTE_scroll_down ); break;
+            }
+            switch (s) {
+               case SDL_SCANCODE_S: editor_key(STBTE_tool_select); break;
+               case SDL_SCANCODE_B: editor_key(STBTE_tool_brush ); break;
+               case SDL_SCANCODE_E: editor_key(STBTE_tool_erase ); break;
+               case SDL_SCANCODE_R: editor_key(STBTE_tool_rectangle ); break;
+               case SDL_SCANCODE_I: editor_key(STBTE_tool_eyedropper); break;
+               case SDL_SCANCODE_L: editor_key(STBTE_tool_link); break;
+               case SDL_SCANCODE_G: editor_key(STBTE_act_toggle_grid); break;
+            }
+            if ((e->key.keysym.mod & KMOD_CTRL) && !(e->key.keysym.mod & ~KMOD_CTRL)) {
+               switch (s) {
+                  case SDL_SCANCODE_X: editor_key(STBTE_act_cut  ); break;
+                  case SDL_SCANCODE_C: editor_key(STBTE_act_copy ); break;
+                  case SDL_SCANCODE_V: editor_key(STBTE_act_paste); break;
+                  case SDL_SCANCODE_Z: editor_key(STBTE_act_undo ); break;
+                  case SDL_SCANCODE_Y: editor_key(STBTE_act_redo ); break;
+               }
+            }
+         }
+         #endif
+         break;
+      }
+      case SDL_KEYUP: {
+         int k = e->key.keysym.sym;
+         int s = e->key.keysym.scancode;
+         if (s == SDL_SCANCODE_D)   active_control_clear(0);
+         if (s == SDL_SCANCODE_A)   active_control_clear(1);
+         if (s == SDL_SCANCODE_W)   active_control_clear(2);
+         if (s == SDL_SCANCODE_S)   active_control_clear(3);
+         if (k == SDLK_SPACE)       active_control_clear(4); 
+         if (s == SDL_SCANCODE_LCTRL)   active_control_clear(5);
+         if (s == SDL_SCANCODE_S)   active_control_clear(6);
+         if (s == SDL_SCANCODE_D)   active_control_clear(7);
+         break;
+      }
+   }
+}
+
+static SDL_GLContext *context;
+
+static float getTimestep(float minimum_time)
+{
+   float elapsedTime;
+   double thisTime;
+   static double lastTime = -1;
+   
+   if (lastTime == -1)
+      lastTime = SDL_GetTicks() / 1000.0 - minimum_time;
+
+   for(;;) {
+      thisTime = SDL_GetTicks() / 1000.0;
+      elapsedTime = (float) (thisTime - lastTime);
+      if (elapsedTime >= minimum_time) {
+         lastTime = thisTime;         
+         return elapsedTime;
+      }
+      // @TODO: compute correct delay
+      SDL_Delay(1);
+   }
+}
+
+void APIENTRY gl_debug(GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar *message, const void *param)
+{
+   ods("%s\n", message);
+}
+
+int is_synchronous_debug;
+void enable_synchronous(void)
+{
+   glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB);
+   is_synchronous_debug = 1;
+}
+
+void prepare_threads(void);
+
+//void stbwingraph_main(void)
+int SDL_main(int argc, char **argv)
+{
+   SDL_Init(SDL_INIT_VIDEO);
+
+   prepare_threads();
+
+   SDL_GL_SetAttribute(SDL_GL_RED_SIZE  , 8);
+   SDL_GL_SetAttribute(SDL_GL_GREEN_SIZE, 8);
+   SDL_GL_SetAttribute(SDL_GL_BLUE_SIZE , 8);
+   SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24);
+
+   SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_COMPATIBILITY);
+   SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
+   SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 1);
+
+   #ifdef GL_DEBUG
+   SDL_GL_SetAttribute(SDL_GL_CONTEXT_FLAGS, SDL_GL_CONTEXT_DEBUG_FLAG);
+   #endif
+
+   SDL_GL_SetAttribute(SDL_GL_MULTISAMPLESAMPLES, 4);
+
+   screen_x = 1920;
+   screen_y = 1080;
+
+   window = SDL_CreateWindow("caveview", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
+                                   screen_x, screen_y,
+                                   SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE
+                             );
+   if (!window) error("Couldn't create window");
+
+   context = SDL_GL_CreateContext(window);
+   if (!context) error("Couldn't create context");
+
+   SDL_GL_MakeCurrent(window, context); // is this true by default?
+
+   SDL_SetRelativeMouseMode(SDL_TRUE);
+   #if defined(_MSC_VER) && _MSC_VER < 1300
+   // work around broken behavior in VC6 debugging
+   if (IsDebuggerPresent())
+      SDL_SetHint(SDL_HINT_MOUSE_RELATIVE_MODE_WARP, "1");
+   #endif
+
+   stbgl_initExtensions();
+
+   #ifdef GL_DEBUG
+   if (glDebugMessageCallbackARB) {
+      glDebugMessageCallbackARB(gl_debug, NULL);
+
+      enable_synchronous();
+   }
+   #endif
+
+   SDL_GL_SetSwapInterval(1);
+
+   render_init();
+   mesh_init();
+   world_init();
+
+   initialized = 1;
+
+   while (!quit) {
+      SDL_Event e;
+      while (SDL_PollEvent(&e))
+         process_event(&e);
+
+      loopmode(getTimestep(0.0166f/8), 1, 1);
+   }
+
+   return 0;
+}
diff --git a/vendor/stb/tests/caveview/cave_mesher.c b/vendor/stb/tests/caveview/cave_mesher.c
new file mode 100644
index 0000000..1f76c89
--- /dev/null
+++ b/vendor/stb/tests/caveview/cave_mesher.c
@@ -0,0 +1,933 @@
+// This file takes minecraft chunks (decoded by cave_parse) and
+// uses stb_voxel_render to turn them into vertex buffers.
+
+#define STB_GLEXT_DECLARE "glext_list.h"
+#include "stb_gl.h"
+#include "stb_image.h"
+#include "stb_glprog.h"
+
+#include "caveview.h"
+#include "cave_parse.h"
+#include "stb.h"
+#include "sdl.h"
+#include "sdl_thread.h"
+#include <math.h>
+
+//#define VHEIGHT_TEST
+//#define STBVOX_OPTIMIZED_VHEIGHT
+
+#define STBVOX_CONFIG_MODE  1
+#define STBVOX_CONFIG_OPENGL_MODELVIEW
+#define STBVOX_CONFIG_PREFER_TEXBUFFER
+//#define STBVOX_CONFIG_LIGHTING_SIMPLE
+#define STBVOX_CONFIG_FOG_SMOOTHSTEP
+//#define STBVOX_CONFIG_PREMULTIPLIED_ALPHA  // this doesn't work properly alpha test without next #define
+//#define STBVOX_CONFIG_UNPREMULTIPLY  // slower, fixes alpha test makes windows & fancy leaves look better
+//#define STBVOX_CONFIG_TEX1_EDGE_CLAMP
+#define STBVOX_CONFIG_DISABLE_TEX2
+//#define STBVOX_CONFIG_DOWN_TEXLERP_PACKED
+#define STBVOX_CONFIG_ROTATION_IN_LIGHTING
+
+#define STB_VOXEL_RENDER_IMPLEMENTATION
+#include "stb_voxel_render.h"
+
+extern void ods(char *fmt, ...);
+
+//#define FANCY_LEAVES  // nearly 2x the triangles when enabled (if underground is filled)
+#define FAST_CHUNK
+#define IN_PLACE
+
+#define SKIP_TERRAIN   0
+//#define SKIP_TERRAIN   48 // use to avoid building underground stuff
+                          // allows you to see what perf would be like if underground was efficiently culled,
+                          // or if you were making a game without underground
+
+enum
+{
+   C_empty,
+   C_solid,
+   C_trans,
+   C_cross,
+   C_water,
+   C_slab,
+   C_stair,
+   C_force,
+};
+
+unsigned char geom_map[] =
+{
+   STBVOX_GEOM_empty,
+   STBVOX_GEOM_solid,
+   STBVOX_GEOM_transp,
+   STBVOX_GEOM_crossed_pair,
+   STBVOX_GEOM_solid,
+   STBVOX_GEOM_slab_lower,
+   STBVOX_GEOM_floor_slope_north_is_top,
+   STBVOX_GEOM_force,
+};
+
+unsigned char minecraft_info[256][7] =
+{
+   { C_empty, 0,0,0,0,0,0 },
+   { C_solid, 1,1,1,1,1,1 },
+   { C_solid, 3,3,3,3,40,2 },
+   { C_solid, 2,2,2,2,2,2 },
+   { C_solid, 16,16,16,16,16,16 },
+   { C_solid, 4,4,4,4,4,4 },
+   { C_cross, 15,15,15,15 },
+   { C_solid, 17,17,17,17,17,17 },
+
+   // 8
+   { C_water, 223,223,223,223,223,223 },
+   { C_water, 223,223,223,223,223,223 },
+   { C_solid, 255,255,255,255,255,255 },
+   { C_solid, 255,255,255,255,255,255 },
+   { C_solid, 18,18,18,18,18,18 },
+   { C_solid, 19,19,19,19,19,19 },
+   { C_solid, 32,32,32,32,32,32 },
+   { C_solid, 33,33,33,33,33,33 },
+
+   // 16
+   { C_solid, 34,34,34,34,34,34 },
+   { C_solid, 20,20,20,20,21,21 },
+#ifdef FANCY_LEAVES
+   { C_force, 52,52,52,52,52,52 }, // leaves
+#else
+   { C_solid, 53,53,53,53,53,53 }, // leaves
+#endif
+   { C_solid, 24,24,24,24,24,24 },
+   { C_trans, 49,49,49,49,49,49 }, // glass
+   { C_solid, 160,160,160,160,160,160 },
+   { C_solid, 144,144,144,144,144,144 },
+   { C_solid, 46,45,45,45,62,62 },
+
+   // 24
+   { C_solid, 192,192,192,192, 176,176 },
+   { C_solid, 74,74,74,74,74,74 },
+   { C_empty }, // bed
+   { C_empty }, // powered rail
+   { C_empty }, // detector rail
+   { C_solid, 106,108,109,108,108,108 },
+   { C_empty }, // cobweb=11
+   { C_cross, 39,39,39,39 },
+
+   // 32
+   { C_cross, 55,55,55,55,0,0 },
+   { C_solid, 107,108,109,108,108,108 },
+   { C_empty }, // piston head
+   { C_solid, 64,64,64,64,64,64 }, // various colors
+   { C_empty }, // unused
+   { C_cross, 13,13,13,13,0,0 },
+   { C_cross, 12,12,12,12,0,0 },
+   { C_cross, 29,29,29,29,0,0 },
+
+   // 40
+   { C_cross, 28,28,28,28,0,0 },
+   { C_solid, 23,23,23,23,23,23 },
+   { C_solid, 22,22,22,22,22,22 },
+   { C_solid, 5,5,5,5,6,6, },
+   { C_slab , 5,5,5,5,6,6, },
+   { C_solid, 7,7,7,7,7,7, },
+   { C_solid, 8,8,8,8,9,10 },
+   { C_solid, 35,35,35,35,4,4, },
+
+   // 48
+   //{ C_solid, 36,36,36,36,36,36 },
+   { C_force, 36,36,36,36,36,36 },
+   { C_solid, 37,37,37,37,37,37 },
+   { C_cross, 80,80,80,80,80,80 }, // torch
+   { C_empty }, // fire
+   { C_trans, 65,65,65,65,65,65 },
+   { C_stair, 4,4,4,4,4,4 },
+   { C_solid, 26,26,26,27,25,25 },
+   { C_empty }, // redstone
+
+   // 56
+   { C_solid, 50,50,50,50,50,50 },
+   //{ C_force, 50,50,50,50,50,50 },
+   { C_solid, 26,26,26,26,26,26 },
+   { C_solid, 60,59,59,59,43,43 },
+   { C_cross, 95,95,95,95 },
+   { C_solid, 2,2,2,2,86,2 },
+   { C_solid, 44,45,45,45,62,62 },
+   { C_solid, 61,45,45,45,62,62 },
+   { C_empty }, // sign
+
+   // 64
+   { C_empty }, // door
+   { C_empty }, // ladder
+   { C_empty }, // rail
+   { C_stair, 16,16,16,16,16,16 }, // cobblestone stairs
+   { C_empty }, // sign
+   { C_empty }, // lever
+   { C_empty }, // stone pressure plate
+   { C_empty }, // iron door
+
+   // 72
+   { C_empty }, // wooden pressure
+   { C_solid, 51,51,51,51,51,51 },
+   { C_solid, 51,51,51,51,51,51 },
+   { C_empty },
+   { C_empty },
+   { C_empty },
+   { C_empty }, // snow on block below, do as half slab?
+   { C_solid, 67,67,67,67,67,67 },
+
+   // 80
+   { C_solid, 66,66,66,66,66,66 },
+   { C_solid, 70,70,70,70,69,71 },
+   { C_solid, 72,72,72,72,72,72 },
+   { C_cross, 73,73,73,73,73,73 },
+   { C_solid, 74,74,74,74,75,74 },
+   { C_empty }, // fence
+   { C_solid,119,118,118,118,102,102 },
+   { C_solid,103,103,103,103,103,103 },
+
+   // 88
+   { C_solid, 104,104,104,104,104,104 },
+   { C_solid, 105,105,105,105,105,105 },
+   { C_solid, 167,167,167,167,167,167 },
+   { C_solid, 120,118,118,118,102,102 },
+   { C_empty }, // cake
+   { C_empty }, // repeater
+   { C_empty }, // repeater
+   { C_solid, 49,49,49,49,49,49 }, // colored glass
+
+   // 96
+   { C_empty },
+   { C_empty },
+   { C_solid, 54,54,54,54,54,54 },
+   { C_solid, 125,125,125,125,125,125 },
+   { C_solid, 126,126,126,126,126,126 },
+   { C_empty }, // bars
+   { C_trans, 49,49,49,49,49,49 }, // glass pane
+   { C_solid, 136,136,136,136,137,137 }, // melon
+
+   // 104
+   { C_empty }, // pumpkin stem
+   { C_empty }, // melon stem
+   { C_empty }, // vines
+   { C_empty }, // gate
+   { C_stair, 7,7,7,7,7,7, }, // brick stairs
+   { C_stair, 54,54,54,54,54,54 }, // stone brick stairs
+   { C_empty }, // mycelium
+   { C_empty }, // lily pad
+
+   // 112
+   { C_solid, 224,224,224,224,224,224 },
+   { C_empty }, // nether brick fence
+   { C_stair, 224,224,224,224,224,224 }, // nether brick stairs
+   { C_empty }, // nether wart
+   { C_solid, 182,182,182,182,166,183 },
+   { C_empty }, // brewing stand
+   { C_empty }, // cauldron
+   { C_empty }, // end portal
+
+   // 120
+   { C_solid, 159,159,159,159,158,158 },
+   { C_solid, 175,175,175,175,175,175 },
+   { C_empty }, // dragon egg
+   { C_solid, 211,211,211,211,211,211 },
+   { C_solid, 212,212,212,212,212,212 },
+   { C_solid, 4,4,4,4,4,4, }, // wood double-slab
+   { C_slab , 4,4,4,4,4,4, }, // wood slab
+   { C_empty }, // cocoa
+
+   // 128
+   { C_solid, 192,192,192,192,176,176 }, // sandstone stairs
+   { C_solid, 32,32,32,32,32,32 }, // emerald ore
+   { C_solid, 26,26,26,27,25,25 }, // ender chest
+   { C_empty },
+   { C_empty },
+   { C_solid, 23,23,23,23,23,23 }, // emerald block
+   { C_solid, 198,198,198,198,198,198 }, // spruce stairs
+   { C_solid, 214,214,214,214,214,214 }, // birch stairs
+
+   // 136
+   { C_stair, 199,199,199,199,199,199 }, // jungle stairs
+   { C_empty }, // command block
+   { C_empty }, // beacon
+   { C_slab, 16,16,16,16,16,16 }, // cobblestone wall
+   { C_empty }, // flower pot
+   { C_empty }, // carrot
+   { C_empty }, // potatoes
+   { C_empty }, // wooden button
+
+   // 144
+   { C_empty }, // mob head
+   { C_empty }, // anvil
+   { C_solid, 26,26,26,27,25,25 }, // trapped chest
+   { C_empty }, // weighted pressure plate light
+   { C_empty }, // weighted pressure plat eheavy
+   { C_empty }, // comparator inactive
+   { C_empty }, // comparator active
+   { C_empty }, // daylight sensor
+
+   // 152
+   { C_solid, 135,135,135,135,135,135 }, // redstone block
+   { C_solid, 0,0,0,0,0,0, }, // nether quartz ore
+   { C_empty }, // hopper
+   { C_solid, 22,22,22,22,22,22 }, // quartz block
+   { C_stair, 22,22,22,22,22,22 }, // quartz stairs
+   { C_empty }, // activator rail
+   { C_solid, 46,45,45,45,62,62 }, // dropper
+   { C_solid, 72,72,72,72,72,72 }, // stained clay
+
+   // 160
+   { C_trans, 49,49,49,49,49,49 }, // stained glass pane
+   #ifdef FANCY_LEAVES
+   { C_force, 52,52,52,52,52,52 }, // leaves
+   #else
+   { C_solid, 53,53,53,53,53,53 }, // acacia leaves
+   #endif
+   { C_solid, 20,20,20,20,21,21 }, // acacia tree
+   { C_solid, 199,199,199,199,199,199 }, // acacia wood stairs
+   { C_solid, 198,198,198,198,198,198 }, // dark oak stairs
+   { C_solid, 146,146,146,146,146,146 }, // slime block
+
+   { C_solid, 176,176,176,176,176,176 }, // red sandstone
+   { C_solid, 176,176,176,176,176,176 }, // red sandstone
+
+   // 168
+   { C_empty },
+   { C_empty },
+   { C_empty },
+   { C_empty },
+   { C_solid, 72,72,72,72,72,72 }, // hardened clay
+   { C_empty },
+   { C_empty },
+   { C_empty },
+
+   // 176
+   { C_empty },
+   { C_empty },
+   { C_solid, 176,176,176,176,176,176 }, // red sandstone
+};
+
+unsigned char minecraft_tex1_for_blocktype[256][6];
+unsigned char effective_blocktype[256];
+unsigned char minecraft_color_for_blocktype[256][6];
+unsigned char minecraft_geom_for_blocktype[256];
+
+uint8 build_buffer[BUILD_BUFFER_SIZE];
+uint8 face_buffer[FACE_BUFFER_SIZE];
+
+//GLuint vbuf, fbuf, fbuf_tex;
+
+//unsigned char tex1_for_blocktype[256][6];
+
+//unsigned char blocktype[34][34][257];
+//unsigned char lighting[34][34][257];
+
+// a superchunk is 64x64x256, with the border blocks computed as well,
+// which means we need 4x4 chunks plus 16 border chunks plus 4 corner chunks
+
+#define SUPERCHUNK_X   4
+#define SUPERCHUNK_Y   4
+
+unsigned char remap_data[16][16];
+unsigned char remap[256];
+unsigned char rotate_data[4] = { 1,3,2,0 };
+
+void convert_fastchunk_inplace(fast_chunk *fc)
+{
+   int i;
+   int num_blocks=0, step=0;
+   unsigned char rot[4096];
+   #ifndef IN_PLACE
+   unsigned char *storage;
+   #endif
+
+   memset(rot, 0, 4096);
+
+   for (i=0; i < 16; ++i)
+      num_blocks += fc->blockdata[i] != NULL;
+
+   #ifndef IN_PLACE
+   storage = malloc(16*16*16*2 * num_blocks);
+   #endif
+
+   for (i=0; i < 16; ++i) {
+      if (fc->blockdata[i]) {
+         int o=0;
+         unsigned char *bd,*dd,*lt,*sky;
+         unsigned char *out, *outb;
+
+         // this ordering allows us to determine which data we can safely overwrite for in-place processing
+         bd = fc->blockdata[i];
+         dd = fc->data[i];
+         lt = fc->light[i];
+         sky = fc->skylight[i];
+
+         #ifdef IN_PLACE
+         out = bd;
+         #else
+         out = storage + 16*16*16*2*step;
+         #endif
+
+         // bd is written in place, but also reads from dd
+         for (o=0; o < 16*16*16/2; o += 1) {
+            unsigned char v1,v2;
+            unsigned char d = dd[o];
+            v1 = bd[o*2+0];
+            v2 = bd[o*2+1];
+
+            if (remap[v1])
+            {
+               //unsigned char d = bd[o] & 15;
+               v1 = remap_data[remap[v1]][d&15];
+               rot[o*2+0] = rotate_data[d&3];
+            } else
+               v1 = effective_blocktype[v1];
+
+            if (remap[v2])
+            {
+               //unsigned char d = bd[o] >> 4;
+               v2 = remap_data[remap[v2]][d>>4];
+               rot[o*2+1] = rotate_data[(d>>4)&3];
+            } else
+               v2 = effective_blocktype[v2];
+
+            out[o*2+0] = v1;
+            out[o*2+1] = v2;
+         }
+
+         // this reads from lt & sky
+         #ifndef IN_PLACE
+         outb = out + 16*16*16;
+         ++step;
+         #endif
+
+         // MC used to write in this order and it makes it possible to compute in-place
+         if (dd < sky && sky < lt) {
+            // @TODO go this path always if !IN_PLACE
+            #ifdef IN_PLACE
+            outb = dd;
+            #endif
+
+            for (o=0; o < 16*16*16/2; ++o) {
+               int bright;
+               bright = (lt[o]&15)*12 + 15 + (sky[o]&15)*16;
+               if (bright > 255) bright = 255;
+               if (bright <  32) bright =  32;
+               outb[o*2+0] = STBVOX_MAKE_LIGHTING_EXT((unsigned char) bright, (rot[o*2+0]&3));
+
+               bright = (lt[o]>>4)*12 + 15 + (sky[o]>>4)*16;
+               if (bright > 255) bright = 255;
+               if (bright <  32) bright =  32;
+               outb[o*2+1] = STBVOX_MAKE_LIGHTING_EXT((unsigned char) bright, (rot[o*2+1]&3));
+            }
+         } else {
+            // @TODO: if blocktype is in between others, this breaks; need to find which side has two pointers, and use that
+            // overwrite rot[] array, then copy out
+            #ifdef IN_PLACE
+            outb = (dd < sky) ? dd : sky;
+            if (lt < outb) lt = outb;
+            #endif
+
+            for (o=0; o < 16*16*16/2; ++o) {
+               int bright;
+               bright = (lt[o]&15)*12 + 15 + (sky[o]&15)*16;
+               if (bright > 255) bright = 255;
+               if (bright <  32) bright =  32;
+               rot[o*2+0] = STBVOX_MAKE_LIGHTING_EXT((unsigned char) bright, (rot[o*2+0]&3));
+
+               bright = (lt[o]>>4)*12 + 15 + (sky[o]>>4)*16;
+               if (bright > 255) bright = 255;
+               if (bright <  32) bright =  32;
+               rot[o*2+1] = STBVOX_MAKE_LIGHTING_EXT((unsigned char) bright, (rot[o*2+1]&3));
+            }
+
+            memcpy(outb, rot, 4096);
+            fc->data[i] = outb;
+         }
+
+         #ifndef IN_PLACE
+         fc->blockdata[i] = out;
+         fc->data[i] = outb;
+         #endif
+      }
+   }
+
+   #ifndef IN_PLACE
+   free(fc->pointer_to_free);
+   fc->pointer_to_free = storage;
+   #endif
+}
+
+void make_converted_fastchunk(fast_chunk *fc, int x, int y, int segment, uint8 *sv_blocktype, uint8 *sv_lighting)
+{
+   int z;
+   assert(fc == NULL || (fc->refcount > 0 && fc->refcount < 64));
+   if (fc == NULL || fc->blockdata[segment] == NULL) {
+      for (z=0; z < 16; ++z) {
+         sv_blocktype[z] = C_empty;
+         sv_lighting[z] = 255;
+      }
+   } else {
+      unsigned char *block = fc->blockdata[segment];
+      unsigned char *data  = fc->data[segment];
+      y = 15-y;
+      for (z=0; z < 16; ++z) {
+         sv_blocktype[z] = block[z*256 + y*16 + x];
+         sv_lighting [z] = data [z*256 + y*16 + x];
+      }
+   }
+}
+
+
+#define CHUNK_CACHE   64
+typedef struct
+{
+   int valid;
+   int chunk_x, chunk_y;
+   fast_chunk *fc;
+} cached_converted_chunk;
+
+cached_converted_chunk chunk_cache[CHUNK_CACHE][CHUNK_CACHE];
+int cache_size = CHUNK_CACHE;
+
+void reset_cache_size(int size)
+{
+   int i,j;
+   for (j=size; j < cache_size; ++j) {
+      for (i=size; i < cache_size; ++i) {
+         cached_converted_chunk *ccc = &chunk_cache[j][i];
+         if (ccc->valid) {
+            if (ccc->fc) {
+               free(ccc->fc->pointer_to_free);
+               free(ccc->fc);
+               ccc->fc = NULL;
+            }
+            ccc->valid = 0;
+         }
+      }
+   }
+   cache_size = size;
+}
+
+// this must be called inside mutex
+void deref_fastchunk(fast_chunk *fc)
+{
+   if (fc) {
+      assert(fc->refcount > 0);
+      --fc->refcount;
+      if (fc->refcount == 0) {
+         free(fc->pointer_to_free);
+         free(fc);
+      }
+   }
+}
+
+SDL_mutex * chunk_cache_mutex;
+SDL_mutex * chunk_get_mutex;
+
+void lock_chunk_get_mutex(void)
+{
+   SDL_LockMutex(chunk_get_mutex);
+}
+void unlock_chunk_get_mutex(void)
+{
+   SDL_UnlockMutex(chunk_get_mutex);
+}
+
+fast_chunk *get_converted_fastchunk(int chunk_x, int chunk_y)
+{
+   int slot_x = (chunk_x & (cache_size-1));
+   int slot_y = (chunk_y & (cache_size-1));
+   fast_chunk *fc;
+   cached_converted_chunk *ccc;
+   SDL_LockMutex(chunk_cache_mutex);
+   ccc = &chunk_cache[slot_y][slot_x];
+   if (ccc->valid) {
+      if (ccc->chunk_x == chunk_x && ccc->chunk_y == chunk_y) {
+         fast_chunk *fc = ccc->fc;
+         if (fc)
+            ++fc->refcount;
+         SDL_UnlockMutex(chunk_cache_mutex);
+         return fc;
+      }
+      if (ccc->fc) {
+         deref_fastchunk(ccc->fc);
+         ccc->fc = NULL;
+         ccc->valid = 0;
+      }
+   }
+   SDL_UnlockMutex(chunk_cache_mutex);
+
+   fc = get_decoded_fastchunk_uncached(chunk_x, -chunk_y);
+   if (fc)
+      convert_fastchunk_inplace(fc);
+
+   SDL_LockMutex(chunk_cache_mutex);
+   // another thread might have updated it, so before we overwrite it...
+   if (ccc->fc) {
+      deref_fastchunk(ccc->fc);
+      ccc->fc = NULL;
+   }
+
+   if (fc)
+      fc->refcount = 1; // 1 in the cache
+
+   ccc->chunk_x = chunk_x;
+   ccc->chunk_y = chunk_y;
+   ccc->valid = 1;
+   if (fc)
+      ++fc->refcount;
+   ccc->fc = fc;
+   SDL_UnlockMutex(chunk_cache_mutex);
+   return fc;
+}
+
+void make_map_segment_for_superchunk_preconvert(int chunk_x, int chunk_y, int segment, fast_chunk *fc_table[4][4], uint8 sv_blocktype[34][34][18], uint8 sv_lighting[34][34][18])
+{
+   int a,b;
+   assert((chunk_x & 1) == 0);
+   assert((chunk_y & 1) == 0);
+   for (b=-1; b < 3; ++b) {
+      for (a=-1; a < 3; ++a) {
+         int xo = a*16+1;
+         int yo = b*16+1;
+         int x,y;
+         fast_chunk *fc = fc_table[b+1][a+1];
+         for (y=0; y < 16; ++y)
+            for (x=0; x < 16; ++x)
+               if (xo+x >= 0 && xo+x < 34 && yo+y >= 0 && yo+y < 34)
+                  make_converted_fastchunk(fc,x,y, segment, sv_blocktype[xo+x][yo+y], sv_lighting[xo+x][yo+y]);
+      }
+   }
+}
+
+// build 1 mesh covering 2x2 chunks
+void build_chunk(int chunk_x, int chunk_y, fast_chunk *fc_table[4][4], raw_mesh *rm)
+{
+   int a,b,z;
+   stbvox_input_description *map;
+
+   #ifdef VHEIGHT_TEST
+   unsigned char vheight[34][34][18];
+   #endif
+
+   #ifndef STBVOX_CONFIG_DISABLE_TEX2
+   unsigned char tex2_choice[34][34][18];
+   #endif
+
+   assert((chunk_x & 1) == 0);
+   assert((chunk_y & 1) == 0);
+
+   rm->cx = chunk_x;
+   rm->cy = chunk_y;
+
+   stbvox_set_input_stride(&rm->mm, 34*18, 18);
+
+   assert(rm->mm.input.geometry == NULL);
+
+   map = stbvox_get_input_description(&rm->mm);
+   map->block_tex1_face = minecraft_tex1_for_blocktype;
+   map->block_color_face = minecraft_color_for_blocktype;
+   map->block_geometry = minecraft_geom_for_blocktype;
+
+   stbvox_reset_buffers(&rm->mm);
+   stbvox_set_buffer(&rm->mm, 0, 0, rm->build_buffer, BUILD_BUFFER_SIZE);
+   stbvox_set_buffer(&rm->mm, 0, 1, rm->face_buffer , FACE_BUFFER_SIZE);
+
+   map->blocktype = &rm->sv_blocktype[1][1][1]; // this is (0,0,0), but we need to be able to query off the edges
+   map->lighting = &rm->sv_lighting[1][1][1];
+
+   // fill in the top two rows of the buffer
+   for (a=0; a < 34; ++a) {
+      for (b=0; b < 34; ++b) {
+         rm->sv_blocktype[a][b][16] = 0;
+         rm->sv_lighting [a][b][16] = 255;
+         rm->sv_blocktype[a][b][17] = 0;
+         rm->sv_lighting [a][b][17] = 255;
+      }
+   }
+
+   #ifndef STBVOX_CONFIG_DISABLE_TEX2
+   for (a=0; a < 34; ++a) {
+      for (b=0; b < 34; ++b) {
+         int px = chunk_x*16 + a - 1;
+         int py = chunk_y*16 + b - 1;
+         float dist = (float) sqrt(px*px + py*py);
+         float s1 = (float) sin(dist / 16), s2, s3;
+         dist = (float) sqrt((px-80)*(px-80) + (py-50)*(py-50));
+         s2 = (float) sin(dist / 11);
+         for (z=0; z < 18; ++z) {
+            s3 = (float) sin(z * 3.141592 / 8);
+
+            s3 = s1*s2*s3;
+            tex2_choice[a][b][z] = 63 & (int) stb_linear_remap(s3,-1,1, -20,83);
+         }
+      }
+   }
+   #endif
+
+   for (z=256-16; z >= SKIP_TERRAIN; z -= 16)
+   {
+      int z0 = z;
+      int z1 = z+16;
+      if (z1 == 256) z1 = 255;
+
+      make_map_segment_for_superchunk_preconvert(chunk_x, chunk_y, z >> 4, fc_table, rm->sv_blocktype, rm->sv_lighting);
+
+      map->blocktype = &rm->sv_blocktype[1][1][1-z]; // specify location of 0,0,0 so that accessing z0..z1 gets right data
+      map->lighting = &rm->sv_lighting[1][1][1-z];
+      #ifndef STBVOX_CONFIG_DISABLE_TEX2
+      map->tex2 = &tex2_choice[1][1][1-z];
+      #endif
+
+      #ifdef VHEIGHT_TEST
+      // hacky test of vheight
+      for (a=0; a < 34; ++a) {
+         for (b=0; b < 34; ++b) {
+            int c;
+            for (c=0; c < 17; ++c) {
+               if (rm->sv_blocktype[a][b][c] != 0 && rm->sv_blocktype[a][b][c+1] == 0) {
+                  // topmost block
+                  vheight[a][b][c] = rand() & 255;
+                  rm->sv_blocktype[a][b][c] = 168;
+               } else if (c > 0 && rm->sv_blocktype[a][b][c] != 0 && rm->sv_blocktype[a][b][c-1] == 0) {
+                  // bottommost block
+                  vheight[a][b][c] = ((rand() % 3) << 6) + ((rand() % 3) << 4) + ((rand() % 3) << 2) + (rand() % 3);
+                  rm->sv_blocktype[a][b][c] = 169;
+               }
+            }
+            vheight[a][b][c] = STBVOX_MAKE_VHEIGHT(2,2,2,2); // flat top
+         }
+      }
+      map->vheight = &vheight[1][1][1-z];
+      #endif
+
+      {
+         stbvox_set_input_range(&rm->mm, 0,0,z0, 32,32,z1);
+         stbvox_set_default_mesh(&rm->mm, 0);
+         stbvox_make_mesh(&rm->mm);
+      }
+
+      // copy the bottom two rows of data up to the top
+      for (a=0; a < 34; ++a) {
+         for (b=0; b < 34; ++b) {
+            rm->sv_blocktype[a][b][16] = rm->sv_blocktype[a][b][0];
+            rm->sv_blocktype[a][b][17] = rm->sv_blocktype[a][b][1];
+            rm->sv_lighting [a][b][16] = rm->sv_lighting [a][b][0];
+            rm->sv_lighting [a][b][17] = rm->sv_lighting [a][b][1];
+         }
+      }
+   }
+
+   stbvox_set_mesh_coordinates(&rm->mm, chunk_x*16, chunk_y*16, 0);
+   stbvox_get_transform(&rm->mm, rm->transform);
+
+   stbvox_set_input_range(&rm->mm, 0,0,0, 32,32,255);
+   stbvox_get_bounds(&rm->mm, rm->bounds);
+
+   rm->num_quads = stbvox_get_quad_count(&rm->mm, 0);
+}
+
+int next_blocktype = 255;
+
+unsigned char mc_rot[4] = { 1,3,2,0 };
+
+// create blocktypes with rotation baked into type...
+// @TODO we no longer need this now that we store rotations
+// in lighting
+void build_stair_rotations(int blocktype, unsigned char *map)
+{
+   int i;
+
+   // use the existing block type for floor stairs; allocate a new type for ceil stairs
+   for (i=0; i < 6; ++i) {
+      minecraft_color_for_blocktype[next_blocktype][i] = minecraft_color_for_blocktype[blocktype][i];
+      minecraft_tex1_for_blocktype [next_blocktype][i] = minecraft_tex1_for_blocktype [blocktype][i];
+   }
+   minecraft_geom_for_blocktype[next_blocktype] = (unsigned char) STBVOX_MAKE_GEOMETRY(STBVOX_GEOM_ceil_slope_north_is_bottom, 0, 0);
+   minecraft_geom_for_blocktype[     blocktype] = (unsigned char) STBVOX_MAKE_GEOMETRY(STBVOX_GEOM_floor_slope_north_is_top, 0, 0);
+
+   for (i=0; i < 4; ++i) {
+      map[0+i+8] = map[0+i] =      blocktype;
+      map[4+i+8] = map[4+i] = next_blocktype;
+   }
+   --next_blocktype;
+}
+
+void build_wool_variations(int bt, unsigned char *map)
+{
+   int i,k;
+   unsigned char tex[16] = { 64, 210, 194, 178,  162, 146, 130, 114,  225, 209, 193, 177,  161, 145, 129, 113 };
+   for (i=0; i < 16; ++i) {
+      if (i == 0)
+         map[i] = bt;
+      else {
+         map[i] = next_blocktype;
+         for (k=0; k < 6; ++k) {
+            minecraft_tex1_for_blocktype[next_blocktype][k] = tex[i];
+         }
+         minecraft_geom_for_blocktype[next_blocktype] = minecraft_geom_for_blocktype[bt];
+         --next_blocktype;
+      }
+   }
+}
+
+void build_wood_variations(int bt, unsigned char *map)
+{
+   int i,k;
+   unsigned char tex[4] = { 5, 198, 214, 199 };
+   for (i=0; i < 4; ++i) {
+      if (i == 0)
+         map[i] = bt;
+      else {
+         map[i] = next_blocktype;
+         for (k=0; k < 6; ++k) {
+            minecraft_tex1_for_blocktype[next_blocktype][k] = tex[i];
+         }
+         minecraft_geom_for_blocktype[next_blocktype] = minecraft_geom_for_blocktype[bt];
+         --next_blocktype;
+      }
+   }
+   map[i] = map[i-1];
+   ++i;
+   for (; i < 16; ++i)
+      map[i] = bt;
+}
+
+void remap_in_place(int bt, int rm)
+{
+   int i;
+   remap[bt] = rm;
+   for (i=0; i < 16; ++i)
+      remap_data[rm][i] = bt;
+}
+
+
+void mesh_init(void)
+{
+   int i;
+
+   chunk_cache_mutex = SDL_CreateMutex();
+   chunk_get_mutex   = SDL_CreateMutex();
+
+   for (i=0; i < 256; ++i) {
+      memcpy(minecraft_tex1_for_blocktype[i], minecraft_info[i]+1, 6);
+      effective_blocktype[i] = (minecraft_info[i][0] == C_empty ? 0 : i);
+      minecraft_geom_for_blocktype[i] = geom_map[minecraft_info[i][0]];
+   }
+   //effective_blocktype[50] = 0; // delete torches
+
+   for (i=0; i < 6*256; ++i) {
+      if (minecraft_tex1_for_blocktype[0][i] == 40)
+         minecraft_color_for_blocktype[0][i] = 38 | 64; // apply to tex1
+      if (minecraft_tex1_for_blocktype[0][i] == 39)
+         minecraft_color_for_blocktype[0][i] = 39 | 64; // apply to tex1
+      if (minecraft_tex1_for_blocktype[0][i] == 105)
+         minecraft_color_for_blocktype[0][i] = 63; // emissive
+      if (minecraft_tex1_for_blocktype[0][i] == 212)
+         minecraft_color_for_blocktype[0][i] = 63; // emissive
+      if (minecraft_tex1_for_blocktype[0][i] == 80)
+         minecraft_color_for_blocktype[0][i] = 63; // emissive
+   }
+
+   for (i=0; i < 6; ++i) {
+      minecraft_color_for_blocktype[172][i] = 47 | 64; // apply to tex1
+      minecraft_color_for_blocktype[178][i] = 47 | 64; // apply to tex1
+      minecraft_color_for_blocktype[18][i] = 39 | 64; // green
+      minecraft_color_for_blocktype[161][i] = 37 | 64; // green
+      minecraft_color_for_blocktype[10][i] = 63; // emissive lava
+      minecraft_color_for_blocktype[11][i] = 63; // emissive
+      //minecraft_color_for_blocktype[56][i] = 63; // emissive diamond
+      minecraft_color_for_blocktype[48][i] = 63; // emissive dungeon
+   }
+
+   #ifdef VHEIGHT_TEST
+   effective_blocktype[168] = 168;
+   minecraft_tex1_for_blocktype[168][0] = 1;
+   minecraft_tex1_for_blocktype[168][1] = 1;
+   minecraft_tex1_for_blocktype[168][2] = 1;
+   minecraft_tex1_for_blocktype[168][3] = 1;
+   minecraft_tex1_for_blocktype[168][4] = 1;
+   minecraft_tex1_for_blocktype[168][5] = 1;
+   minecraft_geom_for_blocktype[168] = STBVOX_GEOM_floor_vheight_12;
+   effective_blocktype[169] = 169;
+   minecraft_tex1_for_blocktype[169][0] = 1;
+   minecraft_tex1_for_blocktype[169][1] = 1;
+   minecraft_tex1_for_blocktype[169][2] = 1;
+   minecraft_tex1_for_blocktype[169][3] = 1;
+   minecraft_tex1_for_blocktype[169][4] = 1;
+   minecraft_tex1_for_blocktype[169][5] = 1;
+   minecraft_geom_for_blocktype[169] = STBVOX_GEOM_ceil_vheight_03;
+   #endif
+
+   remap[53] = 1;
+   remap[67] = 2;
+   remap[108] = 3;
+   remap[109] = 4;
+   remap[114] = 5;
+   remap[136] = 6;
+   remap[156] = 7;
+   for (i=0; i < 256; ++i)
+      if (remap[i])
+         build_stair_rotations(i, remap_data[remap[i]]);
+   remap[35]  = 8;
+   build_wool_variations(35, remap_data[remap[35]]);
+   remap[5] = 11;
+   build_wood_variations(5, remap_data[remap[5]]);
+
+   // set the remap flags for these so they write the rotation values
+   remap_in_place(54, 9);
+   remap_in_place(146, 10);
+}
+
+// Timing stats while optimizing the single-threaded builder
+
+// 32..-32, 32..-32, SKIP_TERRAIN=0, !FANCY_LEAVES on 'mcrealm' data set
+
+// 6.27s  - reblocked to do 16 z at a time instead of 256 (still using 66x66x258), 4 meshes in parallel
+// 5.96s  - reblocked to use FAST_CHUNK (no intermediate data structure)
+// 5.45s  - unknown change, or previous measurement was wrong
+
+// 6.12s  - use preconverted data, not in-place
+// 5.91s  - use preconverted, in-place
+// 5.34s  - preconvert, in-place, avoid dependency chain (suggested by ryg)
+// 5.34s  - preconvert, in-place, avoid dependency chain, use bit-table instead of byte-table
+// 5.50s  - preconvert, in-place, branchless
+
+// 6.42s  - non-preconvert, avoid dependency chain (not an error)
+// 5.40s  - non-preconvert, w/dependency chain (same as earlier)
+
+// 5.50s  - non-FAST_CHUNK, reblocked outer loop for better cache reuse
+// 4.73s  - FAST_CHUNK non-preconvert, reblocked outer loop
+// 4.25s  - preconvert, in-place, reblocked outer loop
+// 4.18s  - preconvert, in-place, unrolled again
+// 4.10s  - 34x34 1 mesh instead of 66x66 and 4 meshes (will make it easier to do multiple threads)
+
+// 4.83s  - building bitmasks but not using them (2 bits per block, one if empty, one if solid)
+
+// 5.16s  - using empty bitmasks to early out
+// 5.01s  - using solid & empty bitmasks to early out - "foo"
+// 4.64s  - empty bitmask only, test 8 at a time, then test geom
+// 4.72s  - empty bitmask only, 8 at a time, then test bits
+// 4.46s  - split bitmask building into three loops (each byte is separate)
+// 4.42s  - further optimize computing bitmask
+
+// 4.58s  - using solid & empty bitmasks to early out, same as "foo" but faster bitmask building
+// 4.12s  - using solid & empty bitmasks to efficiently test neighbors
+// 4.04s  - using 16-bit fetches (not endian-independent)
+//        - note this is first place that beats previous best '4.10s - 34x34 1 mesh'
+
+// 4.30s  - current time with bitmasks disabled again (note was 4.10s earlier)
+// 3.95s  - bitmasks enabled again, no other changes
+// 4.00s  - current time with bitmasks disabled again, no other changes -- wide variation that is time dependent?
+//          (note that most of the numbers listed here are median of 3 values already)
+// 3.98s  - bitmasks enabled
+
+// Bitmasks removed from the code as not worth the complexity increase
+
+
+
+// Raw data for Q&A:
+//
+//   26% parsing & loading minecraft files (4/5ths of which is zlib decode)
+//   39% building mesh from stb input format
+//   18% converting from minecraft blocks to stb blocks
+//    9% reordering from minecraft axis order to stb axis order
+//    7% uploading vertex buffer to OpenGL
diff --git a/vendor/stb/tests/caveview/cave_parse.c b/vendor/stb/tests/caveview/cave_parse.c
new file mode 100644
index 0000000..e8ae02b
--- /dev/null
+++ b/vendor/stb/tests/caveview/cave_parse.c
@@ -0,0 +1,632 @@
+#include <assert.h>
+#include <stdio.h>
+#include <limits.h>
+#include <stdlib.h>
+
+#define FAST_CHUNK   // disabling this enables the old, slower path that deblocks into a regular form
+
+#include "cave_parse.h"
+
+#include "stb_image.h"
+#include "stb.h"
+
+#define NUM_CHUNKS_PER_REGION       32  // only on one axis
+#define NUM_CHUNKS_PER_REGION_LOG2   5
+
+#define NUM_COLUMNS_PER_CHUNK       16
+#define NUM_COLUMNS_PER_CHUNK_LOG2   4
+
+uint32 read_uint32_be(FILE *f)
+{
+   unsigned char data[4];
+   fread(data, 1, 4, f);
+   return (data[0]<<24) + (data[1]<<16) + (data[2]<<8) + data[3];
+}
+
+typedef struct
+{
+   uint8 *data;
+   size_t len;
+   int x,z; // chunk index
+   int refcount; // for multi-threading
+} compressed_chunk;
+
+typedef struct
+{
+   int x,z;
+   uint32 sector_data[NUM_CHUNKS_PER_REGION][NUM_CHUNKS_PER_REGION];
+} region;
+
+size_t cached_compressed=0;
+
+FILE *last_region;
+int last_region_x;
+int last_region_z;
+int opened=0;
+
+static void open_file(int reg_x, int reg_z)
+{
+   if (!opened || last_region_x != reg_x || last_region_z != reg_z) {
+      char filename[256];
+      if (last_region != NULL)
+         fclose(last_region);
+      sprintf(filename, "r.%d.%d.mca", reg_x, reg_z);
+      last_region = fopen(filename, "rb");
+      last_region_x = reg_x;
+      last_region_z = reg_z;
+      opened = 1;
+   }
+}
+
+static region *load_region(int reg_x, int reg_z)
+{
+   region *r;
+   int x,z;
+
+   open_file(reg_x, reg_z);
+
+   r = malloc(sizeof(*r));
+
+   if (last_region == NULL) {
+      memset(r, 0, sizeof(*r));
+   } else {
+      fseek(last_region, 0, SEEK_SET);
+      for (z=0; z < NUM_CHUNKS_PER_REGION; ++z)
+         for (x=0; x < NUM_CHUNKS_PER_REGION; ++x)
+            r->sector_data[z][x] = read_uint32_be(last_region);
+   }
+   r->x = reg_x;
+   r->z = reg_z;
+
+   return r;
+}
+
+void free_region(region *r)
+{
+   free(r);
+}
+
+#define MAX_MAP_REGIONS   64  // in one axis: 64 regions * 32 chunk/region * 16 columns/chunk = 16384 columns
+region *regions[MAX_MAP_REGIONS][MAX_MAP_REGIONS];
+
+static region *get_region(int reg_x, int reg_z)
+{
+   int slot_x = reg_x & (MAX_MAP_REGIONS-1);
+   int slot_z = reg_z & (MAX_MAP_REGIONS-1);
+   region *r;
+
+   r = regions[slot_z][slot_x];
+
+   if (r) {
+      if (r->x == reg_x && r->z == reg_z)
+         return r;
+      free_region(r);
+   }
+
+   r = load_region(reg_x, reg_z);
+   regions[slot_z][slot_x] = r;
+
+   return r;
+}
+
+// about one region, so size should be ok
+#define NUM_CACHED_X 64
+#define NUM_CACHED_Z 64
+
+// @TODO: is it really worth caching these? we probably can just
+// pull them from the disk cache nearly as efficiently.
+// Can test that by setting to 1x1?
+compressed_chunk *cached_chunk[NUM_CACHED_Z][NUM_CACHED_X];
+
+static void deref_compressed_chunk(compressed_chunk *cc)
+{
+   assert(cc->refcount > 0);
+   --cc->refcount;
+   if (cc->refcount == 0) {
+      if (cc->data)
+         free(cc->data);
+      free(cc);
+   }
+}
+
+static compressed_chunk *get_compressed_chunk(int chunk_x, int chunk_z)
+{
+   int slot_x = chunk_x & (NUM_CACHED_X-1);
+   int slot_z = chunk_z & (NUM_CACHED_Z-1);
+   compressed_chunk *cc = cached_chunk[slot_z][slot_x];
+
+   if (cc && cc->x == chunk_x && cc->z == chunk_z)
+      return cc;
+   else {
+      int reg_x = chunk_x >> NUM_CHUNKS_PER_REGION_LOG2;
+      int reg_z = chunk_z >> NUM_CHUNKS_PER_REGION_LOG2;
+      region *r = get_region(reg_x, reg_z);
+      if (cc) {
+         deref_compressed_chunk(cc);
+         cached_chunk[slot_z][slot_x] = NULL;
+      }
+      cc = malloc(sizeof(*cc));
+      cc->x = chunk_x;
+      cc->z = chunk_z;
+      {
+         int subchunk_x = chunk_x & (NUM_CHUNKS_PER_REGION-1);
+         int subchunk_z = chunk_z & (NUM_CHUNKS_PER_REGION-1);
+         uint32 code = r->sector_data[subchunk_z][subchunk_x];
+
+         if (code & 255) {
+            open_file(reg_x, reg_z);
+            fseek(last_region, (code>>8)*4096, SEEK_SET);
+            cc->len = (code&255)*4096;
+            cc->data = malloc(cc->len);
+            fread(cc->data, 1, cc->len, last_region);
+         } else {
+            cc->len = 0;
+            cc->data = 0;
+         }
+      }
+      cc->refcount = 1;
+      cached_chunk[slot_z][slot_x] = cc;
+      return cc;
+   }
+}
+
+
+// NBT parser -- can automatically parse stuff we don't
+// have definitions for, but want to explicitly parse
+// stuff we do have definitions for.
+//
+// option 1: auto-parse everything into data structures,
+// then read those
+//
+// option 2: have a "parse next object" which
+// doesn't resolve whether it expands its children
+// yet, and then the user either says "expand" or
+// "skip" after looking at the name. Anything with
+// "children" without names can't go through this
+// interface.
+//
+// Let's try option 2.
+
+
+typedef struct
+{
+   unsigned char *buffer_start;
+   unsigned char *buffer_end;
+   unsigned char *cur;
+   int nesting;
+   char temp_buffer[256];
+} nbt;
+
+enum { TAG_End=0, TAG_Byte=1, TAG_Short=2, TAG_Int=3, TAG_Long=4,
+       TAG_Float=5, TAG_Double=6, TAG_Byte_Array=7, TAG_String=8,
+       TAG_List=9, TAG_Compound=10, TAG_Int_Array=11 };
+
+static void nbt_get_string_data(unsigned char *data, char *buffer, size_t bufsize)
+{
+   int len = data[0]*256 + data[1];
+   int i;
+   for (i=0; i < len && i+1 < (int) bufsize; ++i)
+      buffer[i] = (char) data[i+2];
+   buffer[i] = 0;
+}
+
+static char *nbt_peek(nbt *n)
+{
+   unsigned char type = *n->cur;
+   if (type == TAG_End)
+      return NULL;
+   nbt_get_string_data(n->cur+1, n->temp_buffer, sizeof(n->temp_buffer));
+   return n->temp_buffer;
+}
+
+static uint32 nbt_parse_uint32(unsigned char *buffer)
+{
+   return (buffer[0] << 24) + (buffer[1]<<16) + (buffer[2]<<8) + buffer[3];
+}
+
+static void nbt_skip(nbt *n);
+
+// skip an item that doesn't have an id or name prefix (usable in lists)
+static void nbt_skip_raw(nbt *n, unsigned char type)
+{
+   switch (type) {
+      case TAG_Byte  : n->cur += 1; break;
+      case TAG_Short : n->cur += 2; break;
+      case TAG_Int   : n->cur += 4; break;
+      case TAG_Long  : n->cur += 8; break;
+      case TAG_Float : n->cur += 4; break;
+      case TAG_Double: n->cur += 8; break;
+      case TAG_Byte_Array: n->cur += 4 + 1*nbt_parse_uint32(n->cur); break;
+      case TAG_Int_Array : n->cur += 4 + 4*nbt_parse_uint32(n->cur); break;
+      case TAG_String    : n->cur += 2 + (n->cur[0]*256 + n->cur[1]); break;
+      case TAG_List      : {
+         unsigned char list_type = *n->cur++;
+         unsigned int list_len = nbt_parse_uint32(n->cur);
+         unsigned int i;
+         n->cur += 4; // list_len
+         for (i=0; i < list_len; ++i)
+            nbt_skip_raw(n, list_type);
+         break;
+      }
+      case TAG_Compound : {
+         while (*n->cur != TAG_End)
+            nbt_skip(n);
+         nbt_skip(n); // skip the TAG_end
+         break;
+      }
+   }
+   assert(n->cur <= n->buffer_end);
+}
+
+static void nbt_skip(nbt *n)
+{
+   unsigned char type = *n->cur++;
+   if (type == TAG_End)
+      return;
+   // skip name
+   n->cur += (n->cur[0]*256 + n->cur[1]) + 2;
+   nbt_skip_raw(n, type);
+}
+
+// byteswap
+static void nbt_swap(unsigned char *ptr, int len)
+{
+   int i;
+   for (i=0; i < (len>>1); ++i) {
+      unsigned char t = ptr[i];
+      ptr[i] = ptr[len-1-i];
+      ptr[len-1-i] = t;
+   }
+}
+
+// pass in the expected type, fail if doesn't match
+// returns a pointer to the data, byteswapped if appropriate
+static void *nbt_get_fromlist(nbt *n, unsigned char type, int *len)
+{
+   unsigned char *ptr;
+   assert(type != TAG_Compound);
+   assert(type != TAG_List); // we could support getting lists of primitives as if they were arrays, but eh
+   if (len) *len = 1;
+   ptr = n->cur;
+   switch (type) {
+      case TAG_Byte  : break;
+
+      case TAG_Short : nbt_swap(ptr, 2); break;
+      case TAG_Int   : nbt_swap(ptr, 4); break;
+      case TAG_Long  : nbt_swap(ptr, 8); break;
+      case TAG_Float : nbt_swap(ptr, 4); break;
+      case TAG_Double: nbt_swap(ptr, 8); break;
+
+      case TAG_Byte_Array:
+         *len = nbt_parse_uint32(ptr);
+         ptr += 4;
+         break;
+      case TAG_Int_Array: {
+         int i;
+         *len = nbt_parse_uint32(ptr);
+         ptr += 4;
+         for (i=0; i < *len; ++i)
+            nbt_swap(ptr + 4*i, 4);
+         break;
+      }
+
+      default: assert(0); // unhandled case
+   }
+   nbt_skip_raw(n, type);
+   return ptr;
+}
+
+static void *nbt_get(nbt *n, unsigned char type, int *len)
+{
+   assert(n->cur[0] == type);
+   n->cur += 3 + (n->cur[1]*256+n->cur[2]);
+   return nbt_get_fromlist(n, type, len);
+}
+
+static void nbt_begin_compound(nbt *n) // start a compound
+{
+   assert(*n->cur == TAG_Compound);
+   // skip header
+   n->cur += 3 + (n->cur[1]*256 + n->cur[2]);
+   ++n->nesting;
+}
+
+static void nbt_begin_compound_in_list(nbt *n) // start a compound
+{
+   ++n->nesting;
+}
+
+static void nbt_end_compound(nbt *n) // end a compound
+{
+   assert(*n->cur == TAG_End);
+   assert(n->nesting != 0);
+   ++n->cur;
+   --n->nesting;   
+}
+
+// @TODO no interface to get lists from lists
+static int nbt_begin_list(nbt *n, unsigned char type)
+{
+   uint32 len;
+   unsigned char *ptr;
+
+   ptr = n->cur + 3 + (n->cur[1]*256 + n->cur[2]);
+   if (ptr[0] != type)
+      return -1;
+   n->cur = ptr;
+   len = nbt_parse_uint32(n->cur+1);
+   assert(n->cur[0] == type);
+   // @TODO keep a stack with the count to make sure they do it right
+   ++n->nesting;
+   n->cur += 5;
+   return (int) len;
+}
+
+static void nbt_end_list(nbt *n)
+{
+   --n->nesting;
+}
+
+// raw_block chunk is 16x256x16x4 = 2^(4+8+4+2) = 256KB
+//
+// if we want to process 64x64x256 at a time, that will be:
+//    4*4*256KB => 4MB per area in raw_block
+//
+// (plus we maybe need to decode adjacent regions)
+
+
+#ifdef FAST_CHUNK
+typedef fast_chunk parse_chunk;
+#else
+typedef chunk parse_chunk;
+#endif
+
+static parse_chunk *minecraft_chunk_parse(unsigned char *data, size_t len)
+{
+   char *s;
+   parse_chunk *c = NULL;
+
+   nbt n_store, *n = &n_store;
+   n->buffer_start = data;
+   n->buffer_end   = data + len;
+   n->cur = n->buffer_start;
+   n->nesting = 0;
+
+   nbt_begin_compound(n);
+   while ((s = nbt_peek(n)) != NULL) {
+      if (!strcmp(s, "Level")) {
+         int *height;
+         c = malloc(sizeof(*c));
+         #ifdef FAST_CHUNK
+         memset(c, 0, sizeof(*c));
+         c->pointer_to_free = data;
+         #else
+         c->rb[15][15][255].block = 0;
+         #endif
+         c->max_y = 0;
+
+         nbt_begin_compound(n);
+         while ((s = nbt_peek(n)) != NULL) {
+            if (!strcmp(s, "xPos"))
+               c->xpos = *(int *) nbt_get(n, TAG_Int, 0);
+            else if (!strcmp(s, "zPos"))
+               c->zpos = *(int *) nbt_get(n, TAG_Int, 0);
+            else if (!strcmp(s, "Sections")) {
+               int count = nbt_begin_list(n, TAG_Compound), i;
+               if (count == -1) {
+                  // this not-a-list case happens in The End and I'm not sure
+                  // what it means... possibly one of those silly encodings
+                  // where it's not encoded as a list if there's only one?
+                  // not worth figuring out
+                  nbt_skip(n);
+                  count = -1;
+               }
+               for (i=0; i < count; ++i) {
+                  int yi, len;
+                  uint8 *light = NULL, *blocks = NULL, *data = NULL, *skylight = NULL;
+                  nbt_begin_compound_in_list(n);
+                  while ((s = nbt_peek(n)) != NULL) {
+                     if (!strcmp(s, "Y"))
+                        yi = * (uint8 *) nbt_get(n, TAG_Byte, 0);
+                     else if (!strcmp(s, "BlockLight")) {
+                        light = nbt_get(n, TAG_Byte_Array, &len);
+                        assert(len == 2048);
+                     } else if (!strcmp(s, "Blocks")) {
+                        blocks = nbt_get(n, TAG_Byte_Array, &len);
+                        assert(len == 4096);
+                     } else if (!strcmp(s, "Data")) {
+                        data = nbt_get(n, TAG_Byte_Array, &len);
+                        assert(len == 2048);
+                     } else if (!strcmp(s, "SkyLight")) {
+                        skylight = nbt_get(n, TAG_Byte_Array, &len);
+                        assert(len == 2048);
+                     }
+                  }
+                  nbt_end_compound(n);
+
+                  assert(yi < 16);
+
+                  #ifndef FAST_CHUNK
+
+                  // clear data below current max_y
+                  {
+                     int x,z;
+                     while (c->max_y < yi*16) {
+                        for (x=0; x < 16; ++x)
+                           for (z=0; z < 16; ++z)
+                              c->rb[z][x][c->max_y].block = 0;
+                        ++c->max_y;
+                     }
+                  }
+
+                  // now assemble the data
+                  {
+                     int x,y,z, o2=0,o4=0;
+                     for (y=0; y < 16; ++y) {
+                        for (z=0; z < 16; ++z) {
+                           for (x=0; x < 16; x += 2) {
+                              raw_block *rb = &c->rb[15-z][x][y + yi*16]; // 15-z because switching to z-up will require flipping an axis
+                              rb[0].block = blocks[o4];
+                              rb[0].light = light[o2] & 15;
+                              rb[0].data  = data[o2] & 15;
+                              rb[0].skylight = skylight[o2] & 15;
+
+                              rb[256].block = blocks[o4+1];
+                              rb[256].light = light[o2] >> 4;
+                              rb[256].data  = data[o2] >> 4;
+                              rb[256].skylight = skylight[o2] >> 4;
+
+                              o2 += 1;
+                              o4 += 2;
+                           }
+                        }
+                     }
+                     c->max_y += 16;
+                  }
+                  #else
+                  c->blockdata[yi] = blocks;
+                  c->data     [yi] = data;
+                  c->light    [yi] = light;
+                  c->skylight [yi] = skylight;
+                  #endif
+               }
+               //nbt_end_list(n);
+            } else if (!strcmp(s, "HeightMap")) {
+               height = nbt_get(n, TAG_Int_Array, &len);
+               assert(len == 256);
+            } else
+               nbt_skip(n);
+         }
+         nbt_end_compound(n);
+
+      } else
+         nbt_skip(n);
+   }
+   nbt_end_compound(n);
+   assert(n->cur == n->buffer_end);
+   return c;
+}
+
+#define MAX_DECODED_CHUNK_X  64
+#define MAX_DECODED_CHUNK_Z  64
+
+typedef struct
+{
+   int cx,cz;
+   fast_chunk *fc;
+   int valid;
+} decoded_buffer;
+
+static decoded_buffer decoded_buffers[MAX_DECODED_CHUNK_Z][MAX_DECODED_CHUNK_X];
+void lock_chunk_get_mutex(void);
+void unlock_chunk_get_mutex(void);
+
+#ifdef FAST_CHUNK
+fast_chunk *get_decoded_fastchunk_uncached(int chunk_x, int chunk_z)
+{
+   unsigned char *decoded;
+   compressed_chunk *cc;
+   int inlen;
+   int len;
+   fast_chunk *fc;
+
+   lock_chunk_get_mutex();
+   cc = get_compressed_chunk(chunk_x, chunk_z);
+   if (cc->len != 0)
+      ++cc->refcount;
+   unlock_chunk_get_mutex();
+
+   if (cc->len == 0)
+      return NULL;
+
+   assert(cc != NULL);
+
+   assert(cc->data[4] == 2);
+
+   inlen = nbt_parse_uint32(cc->data);
+   decoded = stbi_zlib_decode_malloc_guesssize(cc->data+5, inlen, inlen*3, &len);
+   assert(decoded != NULL);
+   assert(len != 0);
+
+   lock_chunk_get_mutex();
+   deref_compressed_chunk(cc);
+   unlock_chunk_get_mutex();
+
+   #ifdef FAST_CHUNK
+   fc = minecraft_chunk_parse(decoded, len);
+   #else
+   fc = NULL;
+   #endif
+   if (fc == NULL)
+      free(decoded);
+   return fc;
+}
+
+
+decoded_buffer *get_decoded_buffer(int chunk_x, int chunk_z)
+{
+   decoded_buffer *db = &decoded_buffers[chunk_z&(MAX_DECODED_CHUNK_Z-1)][chunk_x&(MAX_DECODED_CHUNK_X-1)];
+   if (db->valid) {
+      if (db->cx == chunk_x && db->cz == chunk_z)
+         return db;
+      if (db->fc) {
+         free(db->fc->pointer_to_free);
+         free(db->fc);
+      }
+   }
+
+   db->cx = chunk_x;
+   db->cz = chunk_z;
+   db->valid = 1;
+   db->fc = 0;
+
+   {
+      db->fc = get_decoded_fastchunk_uncached(chunk_x, chunk_z);
+      return db;
+   }
+}
+
+fast_chunk *get_decoded_fastchunk(int chunk_x, int chunk_z)
+{
+   decoded_buffer *db = get_decoded_buffer(chunk_x, chunk_z);
+   return db->fc;
+}
+#endif
+
+#ifndef FAST_CHUNK
+chunk *get_decoded_chunk_raw(int chunk_x, int chunk_z)
+{
+   unsigned char *decoded;
+   compressed_chunk *cc = get_compressed_chunk(chunk_x, chunk_z);
+   assert(cc != NULL);
+   if (cc->len == 0)
+      return NULL;
+   else {
+      chunk *ch;
+      int inlen = nbt_parse_uint32(cc->data);
+      int len;
+      assert(cc->data[4] == 2);
+      decoded = stbi_zlib_decode_malloc_guesssize(cc->data+5, inlen, inlen*3, &len);
+      assert(decoded != NULL);
+      #ifdef FAST_CHUNK
+      ch = NULL;
+      #else
+      ch = minecraft_chunk_parse(decoded, len);
+      #endif
+      free(decoded);
+      return ch;
+   }
+}
+
+static chunk *decoded_chunks[MAX_DECODED_CHUNK_Z][MAX_DECODED_CHUNK_X];
+chunk *get_decoded_chunk(int chunk_x, int chunk_z)
+{
+   chunk *c = decoded_chunks[chunk_z&(MAX_DECODED_CHUNK_Z-1)][chunk_x&(MAX_DECODED_CHUNK_X-1)];
+   if (c && c->xpos == chunk_x && c->zpos == chunk_z)
+      return c;
+   if (c) free(c);
+   c = get_decoded_chunk_raw(chunk_x, chunk_z);
+   decoded_chunks[chunk_z&(MAX_DECODED_CHUNK_Z-1)][chunk_x&(MAX_DECODED_CHUNK_X-1)] = c;
+   return c;
+}
+#endif
diff --git a/vendor/stb/tests/caveview/cave_parse.h b/vendor/stb/tests/caveview/cave_parse.h
new file mode 100644
index 0000000..4cdfe2a
--- /dev/null
+++ b/vendor/stb/tests/caveview/cave_parse.h
@@ -0,0 +1,41 @@
+#ifndef INCLUDE_CAVE_PARSE_H
+#define INCLUDE_CAVE_PARSE_H
+
+typedef struct
+{
+   unsigned char block;
+   unsigned char data;
+   unsigned char light:4;
+   unsigned char skylight:4;
+} raw_block;
+
+// this is the old fully-decoded chunk
+typedef struct
+{
+   int xpos, zpos, max_y;
+   int height[16][16];
+   raw_block rb[16][16][256]; // [z][x][y] which becomes [y][x][z] in stb
+} chunk;
+
+chunk *get_decoded_chunk(int chunk_x, int chunk_z);
+
+#define NUM_SEGMENTS  16
+typedef struct
+{
+   int max_y, xpos, zpos;
+
+   unsigned char *blockdata[NUM_SEGMENTS];
+   unsigned char *data[NUM_SEGMENTS];
+   unsigned char *skylight[NUM_SEGMENTS];
+   unsigned char *light[NUM_SEGMENTS];
+   
+   void *pointer_to_free;   
+
+   int refcount; // this allows multi-threaded building without wrapping in ANOTHER struct
+} fast_chunk;
+
+fast_chunk *get_decoded_fastchunk(int chunk_x, int chunk_z); // cache, never call free()
+
+fast_chunk *get_decoded_fastchunk_uncached(int chunk_x, int chunk_z);
+
+#endif
diff --git a/vendor/stb/tests/caveview/cave_render.c b/vendor/stb/tests/caveview/cave_render.c
new file mode 100644
index 0000000..3ed4628
--- /dev/null
+++ b/vendor/stb/tests/caveview/cave_render.c
@@ -0,0 +1,951 @@
+// This file renders vertex buffers, converts raw meshes
+// to GL meshes, and manages threads that do the raw-mesh
+// building (found in cave_mesher.c)
+
+
+#include "stb_voxel_render.h"
+
+#define STB_GLEXT_DECLARE "glext_list.h"
+#include "stb_gl.h"
+#include "stb_image.h"
+#include "stb_glprog.h"
+
+#include "caveview.h"
+#include "cave_parse.h"
+#include "stb.h"
+#include "sdl.h"
+#include "sdl_thread.h"
+#include <math.h>
+#include <assert.h>
+
+//#define STBVOX_CONFIG_TEX1_EDGE_CLAMP
+
+
+// currently no dynamic way to set mesh cache size or view distance
+//#define SHORTVIEW
+
+
+stbvox_mesh_maker g_mesh_maker;
+
+GLuint main_prog;
+GLint uniform_locations[64];
+
+//#define MAX_QUADS_PER_DRAW        (65536 / 4) // assuming 16-bit indices, 4 verts per quad
+//#define FIXED_INDEX_BUFFER_SIZE   (MAX_QUADS_PER_DRAW * 6 * 2)  // 16*1024 * 12 == ~192KB
+
+// while uploading texture data, this holds our each texture
+#define TEX_SIZE  64
+uint32 texture[TEX_SIZE][TEX_SIZE];
+
+GLuint voxel_tex[2];
+
+// chunk state
+enum
+{
+   STATE_invalid,
+   STATE_needed,
+   STATE_requested,
+   STATE_abandoned,
+   STATE_valid,
+};
+
+// mesh is 32x32x255 ... this is hardcoded in that
+// a mesh covers 2x2 minecraft chunks, no #defines for it
+typedef struct
+{
+   int state;
+   int chunk_x, chunk_y;
+   int num_quads;
+   float priority;
+   int vbuf_size, fbuf_size;
+
+   float transform[3][3];
+   float bounds[2][3];
+
+   GLuint vbuf;// vbuf_tex;
+   GLuint fbuf, fbuf_tex;
+
+} chunk_mesh;
+
+void scale_texture(unsigned char *src, int x, int y, int w, int h)
+{
+   int i,j,k;
+   assert(w == 256 && h == 256);
+   for (j=0; j < TEX_SIZE; ++j) {
+      for (i=0; i < TEX_SIZE; ++i) {
+         uint32 val=0;
+         for (k=0; k < 4; ++k) {
+            val >>= 8;
+            val += src[ 4*(x+(i>>2)) + 4*w*(y+(j>>2)) + k]<<24;
+         }
+         texture[j][i] = val;
+      }
+   }
+}
+
+void build_base_texture(int n)
+{
+   int x,y;
+   uint32 color = stb_rand() | 0x808080;
+   for (y=0; y<TEX_SIZE; ++y)
+      for (x=0; x<TEX_SIZE; ++x) {
+         texture[y][x] = (color + (stb_rand()&0x1f1f1f))|0xff000000;
+      }
+}
+
+void build_overlay_texture(int n)
+{
+   int x,y;
+   uint32 color = stb_rand();
+   if (color & 16)
+      color = 0xff000000;
+   else
+      color = 0xffffffff;
+   for (y=0; y<TEX_SIZE; ++y)
+      for (x=0; x<TEX_SIZE; ++x) {
+         texture[y][x] = 0;
+      }
+
+   for (y=0; y < TEX_SIZE/8; ++y) {
+      for (x=0; x < TEX_SIZE; ++x) {
+         texture[y][x] = color;
+         texture[TEX_SIZE-1-y][x] = color;
+         texture[x][y] = color;
+         texture[x][TEX_SIZE-1-y] = color;
+      }
+   }
+}
+
+// view radius of about 1024 = 2048 columns / 32 columns-per-mesh = 2^11 / 2^5 = 64x64
+// so we need bigger than 64x64 so we can precache, which means we have to be
+// non-power-of-two, or we have to be pretty huge
+#define CACHED_MESH_NUM_X   128
+#define CACHED_MESH_NUM_Y   128
+
+
+chunk_mesh cached_chunk_mesh[CACHED_MESH_NUM_Y][CACHED_MESH_NUM_X];
+
+void free_chunk(int slot_x, int slot_y)
+{
+   chunk_mesh *cm = &cached_chunk_mesh[slot_y][slot_x];
+   if (cm->state == STATE_valid) {
+      glDeleteTextures(1, &cm->fbuf_tex);
+      glDeleteBuffersARB(1, &cm->vbuf);
+      glDeleteBuffersARB(1, &cm->fbuf);
+      cached_chunk_mesh[slot_y][slot_x].state = STATE_invalid;
+   }
+}
+
+void upload_mesh(chunk_mesh *cm, uint8 *build_buffer, uint8 *face_buffer)
+{
+   glGenBuffersARB(1, &cm->vbuf);
+   glBindBufferARB(GL_ARRAY_BUFFER_ARB, cm->vbuf);
+   glBufferDataARB(GL_ARRAY_BUFFER_ARB, cm->num_quads*4*sizeof(uint32), build_buffer, GL_STATIC_DRAW_ARB);
+   glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
+
+   glGenBuffersARB(1, &cm->fbuf);
+   glBindBufferARB(GL_TEXTURE_BUFFER_ARB, cm->fbuf);
+   glBufferDataARB(GL_TEXTURE_BUFFER_ARB, cm->num_quads*sizeof(uint32), face_buffer , GL_STATIC_DRAW_ARB);
+   glBindBufferARB(GL_TEXTURE_BUFFER_ARB, 0);
+
+   glGenTextures(1, &cm->fbuf_tex);
+   glBindTexture(GL_TEXTURE_BUFFER_ARB, cm->fbuf_tex);
+   glTexBufferARB(GL_TEXTURE_BUFFER_ARB, GL_RGBA8UI, cm->fbuf);
+   glBindTexture(GL_TEXTURE_BUFFER_ARB, 0);
+}
+
+static void upload_mesh_data(raw_mesh *rm)
+{
+   int cx = rm->cx;
+   int cy = rm->cy;
+   int slot_x = (cx >> 1) & (CACHED_MESH_NUM_X-1);
+   int slot_y = (cy >> 1) & (CACHED_MESH_NUM_Y-1);
+   chunk_mesh *cm;
+
+   free_chunk(slot_x, slot_y);
+
+   cm = &cached_chunk_mesh[slot_y][slot_x];
+   cm->num_quads = rm->num_quads;
+
+   upload_mesh(cm, rm->build_buffer, rm->face_buffer);
+   cm->vbuf_size = rm->num_quads*4*sizeof(uint32);
+   cm->fbuf_size = rm->num_quads*sizeof(uint32);
+   cm->priority = 100000;
+   cm->chunk_x = cx;
+   cm->chunk_y = cy;
+
+   memcpy(cm->bounds, rm->bounds, sizeof(cm->bounds));
+   memcpy(cm->transform, rm->transform, sizeof(cm->transform));
+
+   // write barrier here
+   cm->state = STATE_valid;
+}
+
+GLint uniform_loc[16];
+float table3[128][3];
+float table4[64][4];
+GLint tablei[2];
+
+float step=0;
+
+#ifdef SHORTVIEW
+int view_dist_in_chunks = 50;
+#else
+int view_dist_in_chunks = 80;
+#endif
+
+void setup_uniforms(float pos[3])
+{
+   int i,j;
+   step += 1.0f/60.0f;
+   for (i=0; i < STBVOX_UNIFORM_count; ++i) {
+      stbvox_uniform_info raw, *ui=&raw;
+      stbvox_get_uniform_info(&raw, i);
+      uniform_loc[i] = -1;
+
+      if (i == STBVOX_UNIFORM_texscale || i == STBVOX_UNIFORM_texgen || i == STBVOX_UNIFORM_color_table)
+         continue;
+
+      if (ui) {
+         void *data = ui->default_value;
+         uniform_loc[i] = stbgl_find_uniform(main_prog, ui->name);
+        switch (i) {
+            case STBVOX_UNIFORM_face_data:
+               tablei[0] = 2;
+               data = tablei;
+               break;
+
+            case STBVOX_UNIFORM_tex_array:
+               glActiveTextureARB(GL_TEXTURE0_ARB);
+               glBindTexture(GL_TEXTURE_2D_ARRAY_EXT, voxel_tex[0]);
+               glActiveTextureARB(GL_TEXTURE1_ARB);
+               glBindTexture(GL_TEXTURE_2D_ARRAY_EXT, voxel_tex[1]);
+               glActiveTextureARB(GL_TEXTURE0_ARB);
+               tablei[0] = 0;
+               tablei[1] = 1;
+               data = tablei;
+               break;
+
+            case STBVOX_UNIFORM_color_table:
+               data = ui->default_value;
+               ((float *)data)[63*4+3] = 2.0f; // emissive
+               break;
+
+            case STBVOX_UNIFORM_camera_pos:
+               data = table3[0];
+               table3[0][0] = pos[0];
+               table3[0][1] = pos[1];
+               table3[0][2] = pos[2];
+               table3[0][3] = stb_max(0,(float)sin(step*2)*0.125f);
+               break;
+
+            case STBVOX_UNIFORM_ambient: {
+               float bright = 1.0;
+               //float bright = 0.75;
+               float amb[3][3];
+
+               // ambient direction is sky-colored upwards
+               // "ambient" lighting is from above
+               table4[0][0] =  0.3f;
+               table4[0][1] = -0.5f;
+               table4[0][2] =  0.9f;
+
+               amb[1][0] = 0.3f; amb[1][1] = 0.3f; amb[1][2] = 0.3f; // dark-grey
+               amb[2][0] = 1.0; amb[2][1] = 1.0; amb[2][2] = 1.0; // white
+
+               // convert so (table[1]*dot+table[2]) gives
+               // above interpolation
+               //     lerp((dot+1)/2, amb[1], amb[2])
+               //     amb[1] + (amb[2] - amb[1]) * (dot+1)/2
+               //     amb[1] + (amb[2] - amb[1]) * dot/2 + (amb[2]-amb[1])/2
+
+               for (j=0; j < 3; ++j) {
+                  table4[1][j] = (amb[2][j] - amb[1][j])/2 * bright;
+                  table4[2][j] = (amb[1][j] + amb[2][j])/2 * bright;
+               }
+
+               // fog color
+               table4[3][0] = 0.6f, table4[3][1] = 0.7f, table4[3][2] = 0.9f;
+               table4[3][3] = 1.0f / (view_dist_in_chunks * 16);
+               table4[3][3] *= table4[3][3];
+
+               data = table4;
+               break;
+            }
+         }
+
+         switch (ui->type) {
+            case STBVOX_UNIFORM_TYPE_sampler: stbglUniform1iv(uniform_loc[i], ui->array_length, data); break;
+            case STBVOX_UNIFORM_TYPE_vec2:    stbglUniform2fv(uniform_loc[i], ui->array_length, data); break;
+            case STBVOX_UNIFORM_TYPE_vec3:    stbglUniform3fv(uniform_loc[i], ui->array_length, data); break;
+            case STBVOX_UNIFORM_TYPE_vec4:    stbglUniform4fv(uniform_loc[i], ui->array_length, data); break;
+         }
+      }
+   }
+}
+
+GLuint unitex[64], unibuf[64];
+void make_texture_buffer_for_uniform(int uniform, int slot)
+{
+   GLenum type;
+   stbvox_uniform_info raw, *ui=&raw;
+   GLint uloc;
+   
+   stbvox_get_uniform_info(ui, uniform);
+   uloc = stbgl_find_uniform(main_prog, ui->name);
+
+   if (uniform == STBVOX_UNIFORM_color_table)
+      ((float *)ui->default_value)[63*4+3] = 2.0f; // emissive
+
+   glGenBuffersARB(1, &unibuf[uniform]);
+   glBindBufferARB(GL_ARRAY_BUFFER_ARB, unibuf[uniform]);
+   glBufferDataARB(GL_ARRAY_BUFFER_ARB, ui->array_length * ui->bytes_per_element, ui->default_value, GL_STATIC_DRAW_ARB);
+   glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
+
+   glGenTextures(1, &unitex[uniform]);
+   glBindTexture(GL_TEXTURE_BUFFER_ARB, unitex[uniform]);
+   switch (ui->type) {
+      case STBVOX_UNIFORM_TYPE_vec2: type = GL_RG32F; break;
+      case STBVOX_UNIFORM_TYPE_vec3: type = GL_RGB32F; break;
+      case STBVOX_UNIFORM_TYPE_vec4: type = GL_RGBA32F; break;
+      default: assert(0);
+   }
+   glTexBufferARB(GL_TEXTURE_BUFFER_ARB, type, unibuf[uniform]);
+   glBindTexture(GL_TEXTURE_BUFFER_ARB, 0);
+
+   glActiveTextureARB(GL_TEXTURE0 + slot);
+   glBindTexture(GL_TEXTURE_BUFFER_ARB, unitex[uniform]);
+   glActiveTextureARB(GL_TEXTURE0);
+
+   stbglUseProgram(main_prog);
+   stbglUniform1i(uloc, slot);
+}
+
+#define MAX_MESH_WORKERS  8
+#define MAX_CHUNK_LOAD_WORKERS 2
+
+int num_mesh_workers;
+int num_chunk_load_workers;
+
+typedef struct
+{
+   int state;
+   int request_cx;
+   int request_cy;
+   int padding[13];
+
+   SDL_sem * request_received;
+
+   SDL_sem * chunk_server_done_processing;
+   int chunk_action;
+   int chunk_request_x;
+   int chunk_request_y;
+   fast_chunk *chunks[4][4];
+
+   int padding2[16];
+   raw_mesh rm;
+   int padding3[16];
+
+   uint8 *build_buffer;
+   uint8 *face_buffer ;
+} mesh_worker;
+
+enum
+{
+   WSTATE_idle,
+   WSTATE_requested,
+   WSTATE_running,
+   WSTATE_mesh_ready,
+};
+
+mesh_worker mesh_data[MAX_MESH_WORKERS];
+int num_meshes_started; // stats
+
+int request_chunk(int chunk_x, int chunk_y);
+void update_meshes_from_render_thread(void);
+
+unsigned char tex2_data[64][4];
+
+void init_tex2_gradient(void)
+{
+   int i;
+   for (i=0; i < 16; ++i) {
+      tex2_data[i+ 0][0] = 64 + 12*i;
+      tex2_data[i+ 0][1] = 32;
+      tex2_data[i+ 0][2] = 64;
+
+      tex2_data[i+16][0] = 255;
+      tex2_data[i+16][1] = 32 + 8*i;
+      tex2_data[i+16][2] = 64;
+
+      tex2_data[i+32][0] = 255;
+      tex2_data[i+32][1] = 160;
+      tex2_data[i+32][2] = 64 + 12*i;
+
+      tex2_data[i+48][0] = 255;
+      tex2_data[i+48][1] = 160 + 6*i;
+      tex2_data[i+48][2] = 255;
+   }
+}
+
+void set_tex2_alpha(float fa)
+{
+   int i;
+   int a = (int) stb_lerp(fa, 0, 255);
+   if (a < 0) a = 0; else if (a > 255) a = 255;
+   glBindTexture(GL_TEXTURE_2D_ARRAY_EXT, voxel_tex[1]);
+   for (i=0; i < 64; ++i) {
+      tex2_data[i][3] = a;
+      glTexSubImage3DEXT(GL_TEXTURE_2D_ARRAY_EXT, 0, 0,0,i, 1,1,1, GL_RGBA, GL_UNSIGNED_BYTE, tex2_data[i]);
+   }
+}
+
+void render_init(void)
+{
+   int i;
+   char *binds[] = { "attr_vertex", "attr_face", NULL };
+   char *vertex;
+   char *fragment;
+   int w=0,h=0;
+
+   unsigned char *texdata = stbi_load("terrain.png", &w, &h, NULL, 4);
+
+   stbvox_init_mesh_maker(&g_mesh_maker);
+   for (i=0; i < num_mesh_workers; ++i) {
+      stbvox_init_mesh_maker(&mesh_data[i].rm.mm);
+   }
+
+   vertex = stbvox_get_vertex_shader();
+   fragment = stbvox_get_fragment_shader();
+
+   {
+      char error_buffer[1024];
+      char *main_vertex[] = { vertex, NULL };
+      char *main_fragment[] = { fragment, NULL };
+      main_prog = stbgl_create_program(main_vertex, main_fragment, binds, error_buffer, sizeof(error_buffer));
+      if (main_prog == 0) {
+         ods("Compile error for main shader: %s\n", error_buffer);
+         assert(0);
+         exit(1);
+      }
+   }
+   //init_index_buffer();
+
+   make_texture_buffer_for_uniform(STBVOX_UNIFORM_texscale     , 3);
+   make_texture_buffer_for_uniform(STBVOX_UNIFORM_texgen       , 4);
+   make_texture_buffer_for_uniform(STBVOX_UNIFORM_color_table  , 5);
+
+   glGenTextures(2, voxel_tex);
+
+   glBindTexture(GL_TEXTURE_2D_ARRAY_EXT, voxel_tex[0]);
+   glTexImage3DEXT(GL_TEXTURE_2D_ARRAY_EXT, 0, GL_RGBA,
+                      TEX_SIZE,TEX_SIZE,256,
+                      0,GL_RGBA,GL_UNSIGNED_BYTE,NULL);
+   for (i=0; i < 256; ++i) {
+      if (texdata)
+         scale_texture(texdata, (i&15)*w/16, (h/16)*(i>>4), w,h);
+      else
+         build_base_texture(i);
+      glTexSubImage3DEXT(GL_TEXTURE_2D_ARRAY_EXT, 0, 0,0,i, TEX_SIZE,TEX_SIZE,1, GL_RGBA, GL_UNSIGNED_BYTE, texture[0]);
+   }
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_MAX_ANISOTROPY_EXT, 16);
+   #ifdef STBVOX_CONFIG_TEX1_EDGE_CLAMP
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
+   #endif
+
+   glGenerateMipmapEXT(GL_TEXTURE_2D_ARRAY_EXT);
+
+   glBindTexture(GL_TEXTURE_2D_ARRAY_EXT, voxel_tex[1]);
+   glTexImage3DEXT(GL_TEXTURE_2D_ARRAY_EXT, 0, GL_RGBA,
+                      1,1,64,
+                      0,GL_RGBA,GL_UNSIGNED_BYTE,NULL);
+   init_tex2_gradient();
+   set_tex2_alpha(0.0);
+   #if 0
+   for (i=0; i < 128; ++i) {
+      //build_overlay_texture(i);
+      glTexSubImage3DEXT(GL_TEXTURE_2D_ARRAY_EXT, 0, 0,0,i, TEX_SIZE,TEX_SIZE,1, GL_RGBA, GL_UNSIGNED_BYTE, texture[0]);
+   }
+   #endif
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
+   glTexParameteri(GL_TEXTURE_2D_ARRAY_EXT, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
+   glGenerateMipmapEXT(GL_TEXTURE_2D_ARRAY_EXT);
+}
+
+void world_init(void)
+{
+   int a,b,x,y;
+
+   Uint64 start_time, end_time;
+   #ifdef NDEBUG
+   int range = 32;
+   #else
+   int range = 12;
+   #endif
+
+   start_time = SDL_GetPerformanceCounter();
+
+   // iterate in 8x8 clusters of qchunks at a time to get better converted-chunk-cache reuse
+   // than a purely row-by-row ordering is (single-threaded this is a bigger win than
+   // any of the above optimizations were, since it halves zlib/mc-conversion costs)
+   for (x=-range; x <= range; x += 16)
+      for (y=-range; y <= range; y += 16)
+         for (b=y; b < y+16 && b <= range; b += 2)
+            for (a=x; a < x+16 && a <= range; a += 2)
+               while (!request_chunk(a, b)) { // if request fails, all threads are busy
+                  update_meshes_from_render_thread();
+                  SDL_Delay(1);
+               }
+
+   // wait until all the workers are done,
+   // (this is only needed if we want to time
+   // when the build finishes, or when we want to reset the
+   // cache size; otherwise we could just go ahead and
+   // start rendering whatever we've got)
+   for(;;) {
+      int i;
+      update_meshes_from_render_thread();
+      for (i=0; i < num_mesh_workers; ++i)
+         if (mesh_data[i].state != WSTATE_idle)
+            break;
+      if (i == num_mesh_workers)
+         break;
+      SDL_Delay(3);
+   }
+
+   end_time = SDL_GetPerformanceCounter();
+   ods("Build time: %7.2fs\n", (end_time - start_time) / (float) SDL_GetPerformanceFrequency());
+
+   // don't waste lots of storage on chunk caches once it's finished starting-up;
+   // this was only needed to be this large because we worked in large blocks
+   // to maximize sharing
+   reset_cache_size(32);
+}
+
+extern SDL_mutex * chunk_cache_mutex;
+
+int mesh_worker_handler(void *data)
+{
+   mesh_worker *mw = data;
+   mw->face_buffer = malloc(FACE_BUFFER_SIZE);
+   mw->build_buffer = malloc(BUILD_BUFFER_SIZE);
+
+   // this loop only works because the compiler can't
+   // tell that the SDL_calls don't access mw->state;
+   // really we should barrier that stuff
+   for(;;) {
+      int i,j;
+      int cx,cy;
+
+      // wait for a chunk request
+      SDL_SemWait(mw->request_received);
+
+      // analyze the chunk request
+      assert(mw->state == WSTATE_requested);
+      cx = mw->request_cx;
+      cy = mw->request_cy;
+
+      // this is inaccurate as it can block while another thread has the cache locked
+      mw->state = WSTATE_running;
+
+      // get the chunks we need (this takes a lock and caches them)
+      for (j=0; j < 4; ++j)
+         for (i=0; i < 4; ++i)
+            mw->chunks[j][i] = get_converted_fastchunk(cx-1 + i, cy-1 + j);
+
+      // build the mesh based on the chunks
+      mw->rm.build_buffer = mw->build_buffer;
+      mw->rm.face_buffer = mw->face_buffer;
+      build_chunk(cx, cy, mw->chunks, &mw->rm);
+      mw->state = WSTATE_mesh_ready;
+      // don't need to notify of this, because it gets polled
+
+      // when done, free the chunks
+
+      // for efficiency we just take the mutex once around the whole thing,
+      // though this spreads the mutex logic over two files
+      SDL_LockMutex(chunk_cache_mutex);
+      for (j=0; j < 4; ++j)
+         for (i=0; i < 4; ++i) {
+            deref_fastchunk(mw->chunks[j][i]);
+            mw->chunks[j][i] = NULL;
+         }
+      SDL_UnlockMutex(chunk_cache_mutex);
+   }
+   return 0;
+}
+
+int request_chunk(int chunk_x, int chunk_y)
+{
+   int i;
+   for (i=0; i < num_mesh_workers; ++i) {
+      mesh_worker *mw = &mesh_data[i];
+      if (mw->state == WSTATE_idle) {
+         mw->request_cx = chunk_x;
+         mw->request_cy = chunk_y;
+         mw->state = WSTATE_requested;
+         SDL_SemPost(mw->request_received);
+         ++num_meshes_started;
+         return 1;
+      }
+   }
+   return 0;
+}
+
+void prepare_threads(void)
+{
+   int i;
+   int num_proc = SDL_GetCPUCount();
+
+   if (num_proc > 6)
+      num_mesh_workers = num_proc/2;
+   else if (num_proc > 4)
+      num_mesh_workers = 4;
+   else 
+      num_mesh_workers = num_proc-1;
+
+// @TODO
+//   Thread usage is probably pretty terrible; need to make a
+//   separate queue of needed chunks, instead of just generating
+//   one request per thread per frame, and a separate queue of
+//   results. (E.g. If it takes 1.5 frames to build mesh, thread
+//   is idle for 0.5 frames.) To fake this for now, I've just
+//   doubled the number of threads to let those serve as a 'queue',
+//   but that's dumb.
+
+   num_mesh_workers *= 2; // try to get better thread usage
+
+   if (num_mesh_workers > MAX_MESH_WORKERS)
+      num_mesh_workers = MAX_MESH_WORKERS;
+
+   for (i=0; i < num_mesh_workers; ++i) {
+      mesh_worker *data = &mesh_data[i];
+      data->request_received = SDL_CreateSemaphore(0);
+      data->chunk_server_done_processing = SDL_CreateSemaphore(0);
+      SDL_CreateThread(mesh_worker_handler, "mesh worker", data);
+   }
+}
+
+
+// "better" buffer uploading
+#if 0
+   if (glBufferStorage) {
+      glDeleteBuffersARB(1, &vb->vbuf);
+      glGenBuffersARB(1, &vb->vbuf);
+
+      glBindBufferARB(GL_ARRAY_BUFFER_ARB, vb->vbuf);
+      glBufferStorage(GL_ARRAY_BUFFER_ARB, sizeof(build_buffer), build_buffer, 0);
+      glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
+   } else {
+      glBindBufferARB(GL_ARRAY_BUFFER_ARB, vb->vbuf);
+      glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(build_buffer), build_buffer, GL_STATIC_DRAW_ARB);
+      glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
+   }
+#endif
+
+
+typedef struct
+{
+   float x,y,z,w;
+} plane;
+
+static plane frustum[6];
+
+static void matd_mul(double out[4][4], double src1[4][4], double src2[4][4])
+{
+   int i,j,k;
+   for (j=0; j < 4; ++j) {
+      for (i=0; i < 4; ++i) {
+         double t=0;
+         for (k=0; k < 4; ++k)
+            t += src1[k][i] * src2[j][k];
+         out[i][j] = t;
+      }
+   }
+}
+
+// https://fgiesen.wordpress.com/2012/08/31/frustum-planes-from-the-projection-matrix/
+static void compute_frustum(void)
+{
+   int i;
+   GLdouble mv[4][4],proj[4][4], mvproj[4][4];
+   glGetDoublev(GL_MODELVIEW_MATRIX , mv[0]);
+   glGetDoublev(GL_PROJECTION_MATRIX, proj[0]);
+   matd_mul(mvproj, proj, mv);
+   for (i=0; i < 4; ++i) {
+      (&frustum[0].x)[i] = (float) (mvproj[3][i] + mvproj[0][i]);
+      (&frustum[1].x)[i] = (float) (mvproj[3][i] - mvproj[0][i]);
+      (&frustum[2].x)[i] = (float) (mvproj[3][i] + mvproj[1][i]);
+      (&frustum[3].x)[i] = (float) (mvproj[3][i] - mvproj[1][i]);
+      (&frustum[4].x)[i] = (float) (mvproj[3][i] + mvproj[2][i]);
+      (&frustum[5].x)[i] = (float) (mvproj[3][i] - mvproj[2][i]);
+   }   
+}
+
+static int test_plane(plane *p, float x0, float y0, float z0, float x1, float y1, float z1)
+{
+   // return false if the box is entirely behind the plane
+   float d=0;
+   assert(x0 <= x1 && y0 <= y1 && z0 <= z1);
+   if (p->x > 0) d += x1*p->x; else d += x0*p->x;
+   if (p->y > 0) d += y1*p->y; else d += y0*p->y;
+   if (p->z > 0) d += z1*p->z; else d += z0*p->z;
+   return d + p->w >= 0;
+}
+
+static int is_box_in_frustum(float *bmin, float *bmax)
+{
+   int i;
+   for (i=0; i < 6; ++i)
+      if (!test_plane(&frustum[i], bmin[0], bmin[1], bmin[2], bmax[0], bmax[1], bmax[2]))
+         return 0;
+   return 1;
+}
+
+float compute_priority(int cx, int cy, float x, float y)
+{
+   float distx, disty, dist2;
+   distx = (cx*16+8) - x;
+   disty = (cy*16+8) - y;
+   dist2 = distx*distx + disty*disty;
+   return view_dist_in_chunks*view_dist_in_chunks * 16 * 16 - dist2;
+}
+
+int chunk_locations, chunks_considered, chunks_in_frustum;
+int quads_considered, quads_rendered;
+int chunk_storage_rendered, chunk_storage_considered, chunk_storage_total;
+int update_frustum = 1;
+
+#ifdef SHORTVIEW
+int max_chunk_storage = 450 << 20;
+int min_chunk_storage = 350 << 20;
+#else
+int max_chunk_storage = 900 << 20;
+int min_chunk_storage = 800 << 20;
+#endif
+
+float min_priority = -500; // this really wants to be in unit space, not squared space
+
+int num_meshes_uploaded;
+
+void update_meshes_from_render_thread(void)
+{
+   int i;
+   for (i=0; i < num_mesh_workers; ++i) {
+      mesh_worker *mw = &mesh_data[i];
+      if (mw->state == WSTATE_mesh_ready) {
+         upload_mesh_data(&mw->rm);
+         ++num_meshes_uploaded;
+         mw->state = WSTATE_idle;
+      }
+   }
+}
+
+extern float tex2_alpha;
+extern int global_hack;
+int num_threads_active;
+float chunk_server_activity;
+
+void render_caves(float campos[3])
+{
+   float x = campos[0], y = campos[1];
+   int qchunk_x, qchunk_y;
+   int cam_x, cam_y;
+   int i,j, rad;
+
+   compute_frustum();
+
+   chunk_locations = chunks_considered = chunks_in_frustum = 0;
+   quads_considered = quads_rendered = 0;
+   chunk_storage_total = chunk_storage_considered = chunk_storage_rendered = 0;
+
+   cam_x = (int) floor(x+0.5);
+   cam_y = (int) floor(y+0.5);
+
+   qchunk_x = (((int) floor(x)+16) >> 5) << 1;
+   qchunk_y = (((int) floor(y)+16) >> 5) << 1;
+
+   glEnable(GL_ALPHA_TEST);
+   glAlphaFunc(GL_GREATER, 0.5);
+
+   stbglUseProgram(main_prog);
+   setup_uniforms(campos); // set uniforms to default values inefficiently
+   glActiveTextureARB(GL_TEXTURE2_ARB);
+   stbglEnableVertexAttribArray(0);
+
+   {
+      float lighting[2][3] = { { campos[0],campos[1],campos[2] }, { 0.75,0.75,0.65f } };
+      float bright = 8;
+      lighting[1][0] *= bright;
+      lighting[1][1] *= bright;
+      lighting[1][2] *= bright;
+      stbglUniform3fv(stbgl_find_uniform(main_prog, "light_source"), 2, lighting[0]);
+   }
+
+   if (global_hack)
+      set_tex2_alpha(tex2_alpha);
+
+   num_meshes_uploaded = 0;
+   update_meshes_from_render_thread();
+
+   // traverse all in-range chunks and analyze them
+   for (j=-view_dist_in_chunks; j <= view_dist_in_chunks; j += 2) {
+      for (i=-view_dist_in_chunks; i <= view_dist_in_chunks; i += 2) {
+         float priority;
+         int cx = qchunk_x + i;
+         int cy = qchunk_y + j;
+
+         priority = compute_priority(cx, cy, x, y);
+         if (priority >= min_priority) {
+            int slot_x = (cx>>1) & (CACHED_MESH_NUM_X-1);
+            int slot_y = (cy>>1) & (CACHED_MESH_NUM_Y-1);
+            chunk_mesh *cm = &cached_chunk_mesh[slot_y][slot_x];
+            ++chunk_locations;
+            if (cm->state == STATE_valid && priority >= 0) {
+               // check if chunk pos actually matches
+               if (cm->chunk_x != cx || cm->chunk_y != cy) {
+                  // we have a stale chunk we need to recreate
+                  free_chunk(slot_x, slot_y); // it probably will have already gotten freed, but just in case
+               }
+            }
+            if (cm->state == STATE_invalid) {
+               cm->chunk_x = cx;
+               cm->chunk_y = cy;
+               cm->state = STATE_needed;
+            }
+            cm->priority = priority;
+         }
+      }
+   }
+
+   // draw front-to-back
+   for (rad = 0; rad <= view_dist_in_chunks; rad += 2) {
+      for (j=-rad; j <= rad; j += 2) {
+         // if j is +- rad, then iterate i through all values
+         // if j isn't +-rad, then i should be only -rad & rad
+         int step = 2;
+         if (abs(j) != rad)
+            step = 2*rad;
+         for (i=-rad; i <= rad; i += step) {
+            int cx = qchunk_x + i;
+            int cy = qchunk_y + j;
+            int slot_x = (cx>>1) & (CACHED_MESH_NUM_X-1);
+            int slot_y = (cy>>1) & (CACHED_MESH_NUM_Y-1);
+            chunk_mesh *cm = &cached_chunk_mesh[slot_y][slot_x];
+            if (cm->state == STATE_valid && cm->priority >= 0) {
+               ++chunks_considered;
+               quads_considered += cm->num_quads;
+               if (is_box_in_frustum(cm->bounds[0], cm->bounds[1])) {
+                  ++chunks_in_frustum;
+
+                  // @TODO if in range
+                  stbglUniform3fv(uniform_loc[STBVOX_UNIFORM_transform], 3, cm->transform[0]);
+                  glBindBufferARB(GL_ARRAY_BUFFER_ARB, cm->vbuf);
+                  glVertexAttribIPointer(0, 1, GL_UNSIGNED_INT, 4, (void*) 0);
+                  glBindTexture(GL_TEXTURE_BUFFER_ARB, cm->fbuf_tex);
+                  glDrawArrays(GL_QUADS, 0, cm->num_quads*4);
+                  quads_rendered += cm->num_quads;
+
+                  chunk_storage_rendered += cm->vbuf_size + cm->fbuf_size;
+               }
+               chunk_storage_considered += cm->vbuf_size + cm->fbuf_size;
+            }
+         }
+      }
+   }
+
+   stbglDisableVertexAttribArray(0);
+   glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
+   glActiveTextureARB(GL_TEXTURE0_ARB);
+
+   stbglUseProgram(0);
+   num_meshes_started = 0;
+
+   {
+      #define MAX_QUEUE  8
+      float highest_priority[MAX_QUEUE];
+      int highest_i[MAX_QUEUE], highest_j[MAX_QUEUE];
+      float lowest_priority = view_dist_in_chunks * view_dist_in_chunks * 16 * 16.0f;
+      int lowest_i = -1, lowest_j = -1;
+
+      for (i=0; i < MAX_QUEUE; ++i) {
+         highest_priority[i] = min_priority;
+         highest_i[i] = -1;
+         highest_j[i] = -1;
+      }
+
+      for (j=0; j < CACHED_MESH_NUM_Y; ++j) {
+         for (i=0; i < CACHED_MESH_NUM_X; ++i) {
+            chunk_mesh *cm = &cached_chunk_mesh[j][i];
+            if (cm->state == STATE_valid) {
+               cm->priority = compute_priority(cm->chunk_x, cm->chunk_y, x, y);
+               chunk_storage_total += cm->vbuf_size + cm->fbuf_size;
+               if (cm->priority < lowest_priority) {
+                  lowest_priority = cm->priority;
+                  lowest_i = i;
+                  lowest_j = j;
+               }
+            }
+            if (cm->state == STATE_needed) {
+               cm->priority = compute_priority(cm->chunk_x, cm->chunk_y, x, y);
+               if (cm->priority < min_priority)
+                  cm->state = STATE_invalid;
+               else if (cm->priority > highest_priority[0]) {
+                  int k;
+                  highest_priority[0] = cm->priority;
+                  highest_i[0] = i;
+                  highest_j[0] = j;
+                  // bubble this up to right place
+                  for (k=0; k < MAX_QUEUE-1; ++k) {
+                     if (highest_priority[k] > highest_priority[k+1]) {
+                        highest_priority[k] = highest_priority[k+1];
+                        highest_priority[k+1] = cm->priority;
+                        highest_i[k] = highest_i[k+1];
+                        highest_i[k+1] = i;
+                        highest_j[k] = highest_j[k+1];
+                        highest_j[k+1] = j;
+                     } else {
+                        break;
+                     }
+                  }
+               }
+            }
+         }
+      }
+
+
+      // I couldn't find any straightforward logic that avoids
+      // the hysteresis problem of continually creating & freeing
+      // a block on the margin, so I just don't free a block until
+      // it's out of range, but this doesn't actually correctly
+      // handle when the cache is too small for the given range
+      if (chunk_storage_total >= min_chunk_storage && lowest_i >= 0) {
+         if (cached_chunk_mesh[lowest_j][lowest_i].priority < -1200) // -1000? 0?
+            free_chunk(lowest_i, lowest_j);
+      }
+
+      if (chunk_storage_total < max_chunk_storage && highest_i[0] >= 0) {
+         for (j=MAX_QUEUE-1; j >= 0; --j) {
+            if (highest_j[0] >= 0) {
+               chunk_mesh *cm = &cached_chunk_mesh[highest_j[j]][highest_i[j]];
+               if (request_chunk(cm->chunk_x, cm->chunk_y)) {
+                  cm->state = STATE_requested;
+               } else {
+                  // if we couldn't queue this one, skip the remainder
+                  break;
+               }
+            }
+         }
+      }
+   }
+
+   update_meshes_from_render_thread();
+
+   num_threads_active = 0;
+   for (i=0; i < num_mesh_workers; ++i) {
+      num_threads_active += (mesh_data[i].state == WSTATE_running);
+   }
+}
diff --git a/vendor/stb/tests/caveview/caveview.dsp b/vendor/stb/tests/caveview/caveview.dsp
new file mode 100644
index 0000000..2a462d3
--- /dev/null
+++ b/vendor/stb/tests/caveview/caveview.dsp
@@ -0,0 +1,157 @@
+# Microsoft Developer Studio Project File - Name="caveview" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Application" 0x0101
+
+CFG=caveview - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "caveview.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "caveview.mak" CFG="caveview - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "caveview - Win32 Release" (based on "Win32 (x86) Application")
+!MESSAGE "caveview - Win32 Debug" (based on "Win32 (x86) Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+MTL=midl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "caveview - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /MD /W3 /WX /GX /Zd /O2 /I "../.." /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /FD /c
+# SUBTRACT CPP /YX
+# ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /win32
+# ADD MTL /nologo /D "NDEBUG" /mktyplib203 /win32
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib sdl2.lib opengl32.lib glu32.lib winmm.lib sdl2_mixer.lib advapi32.lib /nologo /subsystem:windows /debug /machine:I386
+# SUBTRACT LINK32 /map
+
+!ELSEIF  "$(CFG)" == "caveview - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /MDd /W3 /WX /Gm /GX /Zi /Od /I "../.." /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /D "_MBCS" /FD /GZ /c
+# ADD BASE MTL /nologo /D "_DEBUG" /mktyplib203 /win32
+# ADD MTL /nologo /D "_DEBUG" /mktyplib203 /win32
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib advapi32.lib winspool.lib comdlg32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib sdl2.lib opengl32.lib glu32.lib winmm.lib sdl2_mixer.lib /nologo /subsystem:windows /incremental:no /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "caveview - Win32 Release"
+# Name "caveview - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\cave_main.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\cave_mesher.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\cave_parse.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\cave_parse.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\cave_render.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\caveview.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\glext.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\glext_list.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\README.md
+# End Source File
+# Begin Source File
+
+SOURCE=.\win32\SDL_windows_main.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\..\stb.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\..\stb_easy_font.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\stb_gl.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\stb_glprog.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\..\stb_image.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\..\stb_voxel_render.h
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/caveview/caveview.dsw b/vendor/stb/tests/caveview/caveview.dsw
new file mode 100644
index 0000000..ddc9387
--- /dev/null
+++ b/vendor/stb/tests/caveview/caveview.dsw
@@ -0,0 +1,29 @@
+Microsoft Developer Studio Workspace File, Format Version 6.00
+# WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE!
+
+###############################################################################
+
+Project: "caveview"=.\caveview.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Global:
+
+Package=<5>
+{{{
+}}}
+
+Package=<3>
+{{{
+}}}
+
+###############################################################################
+
diff --git a/vendor/stb/tests/caveview/caveview.h b/vendor/stb/tests/caveview/caveview.h
new file mode 100644
index 0000000..73a71da
--- /dev/null
+++ b/vendor/stb/tests/caveview/caveview.h
@@ -0,0 +1,50 @@
+#ifndef INCLUDE_CAVEVIEW_H
+#define INCLUDE_CAVEVIEW_H
+
+#include "stb.h"
+
+#include "stb_voxel_render.h"
+
+typedef struct
+{
+   int cx,cy;
+
+   stbvox_mesh_maker mm;
+
+   uint8 *build_buffer;
+   uint8 *face_buffer;
+
+   int num_quads;
+   float transform[3][3];
+   float bounds[2][3];
+
+   uint8 sv_blocktype[34][34][18];
+   uint8 sv_lighting [34][34][18];
+} raw_mesh;
+
+// a 3D checkerboard of empty,solid would be: 32x32x255x6/2 ~= 800000
+// an all-leaf qchunk would be: 32 x 32 x 255 x 6 ~= 1,600,000
+
+#define BUILD_QUAD_MAX     400000
+#define BUILD_BUFFER_SIZE  (4*4*BUILD_QUAD_MAX) // 4 bytes per vertex, 4 vertices per quad
+#define FACE_BUFFER_SIZE   (  4*BUILD_QUAD_MAX) // 4 bytes per quad
+
+
+extern void mesh_init(void);
+extern void render_init(void);
+extern void world_init(void);
+extern void ods(char *fmt, ...); // output debug string
+extern void reset_cache_size(int size);
+
+
+extern void render_caves(float pos[3]);
+
+
+#include "cave_parse.h"  // fast_chunk
+
+extern fast_chunk *get_converted_fastchunk(int chunk_x, int chunk_y);
+extern void build_chunk(int chunk_x, int chunk_y, fast_chunk *fc_table[4][4], raw_mesh *rm);
+extern void reset_cache_size(int size);
+extern void deref_fastchunk(fast_chunk *fc);
+
+#endif
\ No newline at end of file
diff --git a/vendor/stb/tests/caveview/glext.h b/vendor/stb/tests/caveview/glext.h
new file mode 100644
index 0000000..c6a233a
--- /dev/null
+++ b/vendor/stb/tests/caveview/glext.h
@@ -0,0 +1,11124 @@
+#ifndef __glext_h_
+#define __glext_h_ 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** Copyright (c) 2013 The Khronos Group Inc.
+**
+** Permission is hereby granted, free of charge, to any person obtaining a
+** copy of this software and/or associated documentation files (the
+** "Materials"), to deal in the Materials without restriction, including
+** without limitation the rights to use, copy, modify, merge, publish,
+** distribute, sublicense, and/or sell copies of the Materials, and to
+** permit persons to whom the Materials are furnished to do so, subject to
+** the following conditions:
+**
+** The above copyright notice and this permission notice shall be included
+** in all copies or substantial portions of the Materials.
+**
+** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
+*/
+/*
+** This header is generated from the Khronos OpenGL / OpenGL ES XML
+** API Registry. The current version of the Registry, generator scripts
+** used to make the header, and the header can be found at
+**   http://www.opengl.org/registry/
+**
+** Khronos $Revision: 24756 $ on $Date: 2014-01-14 03:42:29 -0800 (Tue, 14 Jan 2014) $
+*/
+
+#if defined(_WIN32) && !defined(APIENTRY) && !defined(__CYGWIN__) && !defined(__SCITECH_SNAP__)
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN 1
+#endif
+#include <windows.h>
+#endif
+
+#ifndef APIENTRY
+#define APIENTRY
+#endif
+#ifndef APIENTRYP
+#define APIENTRYP APIENTRY *
+#endif
+#ifndef GLAPI
+#define GLAPI extern
+#endif
+
+#define GL_GLEXT_VERSION 20140114
+
+/* Generated C header for:
+ * API: gl
+ * Profile: compatibility
+ * Versions considered: .*
+ * Versions emitted: 1\.[2-9]|[234]\.[0-9]
+ * Default extensions included: gl
+ * Additional extensions included: _nomatch_^
+ * Extensions removed: _nomatch_^
+ */
+
+#ifndef GL_VERSION_1_2
+#define GL_VERSION_1_2 1
+#define GL_UNSIGNED_BYTE_3_3_2            0x8032
+#define GL_UNSIGNED_SHORT_4_4_4_4         0x8033
+#define GL_UNSIGNED_SHORT_5_5_5_1         0x8034
+#define GL_UNSIGNED_INT_8_8_8_8           0x8035
+#define GL_UNSIGNED_INT_10_10_10_2        0x8036
+#define GL_TEXTURE_BINDING_3D             0x806A
+#define GL_PACK_SKIP_IMAGES               0x806B
+#define GL_PACK_IMAGE_HEIGHT              0x806C
+#define GL_UNPACK_SKIP_IMAGES             0x806D
+#define GL_UNPACK_IMAGE_HEIGHT            0x806E
+#define GL_TEXTURE_3D                     0x806F
+#define GL_PROXY_TEXTURE_3D               0x8070
+#define GL_TEXTURE_DEPTH                  0x8071
+#define GL_TEXTURE_WRAP_R                 0x8072
+#define GL_MAX_3D_TEXTURE_SIZE            0x8073
+#define GL_UNSIGNED_BYTE_2_3_3_REV        0x8362
+#define GL_UNSIGNED_SHORT_5_6_5           0x8363
+#define GL_UNSIGNED_SHORT_5_6_5_REV       0x8364
+#define GL_UNSIGNED_SHORT_4_4_4_4_REV     0x8365
+#define GL_UNSIGNED_SHORT_1_5_5_5_REV     0x8366
+#define GL_UNSIGNED_INT_8_8_8_8_REV       0x8367
+#define GL_UNSIGNED_INT_2_10_10_10_REV    0x8368
+#define GL_BGR                            0x80E0
+#define GL_BGRA                           0x80E1
+#define GL_MAX_ELEMENTS_VERTICES          0x80E8
+#define GL_MAX_ELEMENTS_INDICES           0x80E9
+#define GL_CLAMP_TO_EDGE                  0x812F
+#define GL_TEXTURE_MIN_LOD                0x813A
+#define GL_TEXTURE_MAX_LOD                0x813B
+#define GL_TEXTURE_BASE_LEVEL             0x813C
+#define GL_TEXTURE_MAX_LEVEL              0x813D
+#define GL_SMOOTH_POINT_SIZE_RANGE        0x0B12
+#define GL_SMOOTH_POINT_SIZE_GRANULARITY  0x0B13
+#define GL_SMOOTH_LINE_WIDTH_RANGE        0x0B22
+#define GL_SMOOTH_LINE_WIDTH_GRANULARITY  0x0B23
+#define GL_ALIASED_LINE_WIDTH_RANGE       0x846E
+#define GL_RESCALE_NORMAL                 0x803A
+#define GL_LIGHT_MODEL_COLOR_CONTROL      0x81F8
+#define GL_SINGLE_COLOR                   0x81F9
+#define GL_SEPARATE_SPECULAR_COLOR        0x81FA
+#define GL_ALIASED_POINT_SIZE_RANGE       0x846D
+typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTSPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
+typedef void (APIENTRYP PFNGLTEXIMAGE3DPROC) (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawRangeElements (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
+GLAPI void APIENTRY glTexImage3D (GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTexSubImage3D (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glCopyTexSubImage3D (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+#endif
+#endif /* GL_VERSION_1_2 */
+
+#ifndef GL_VERSION_1_3
+#define GL_VERSION_1_3 1
+#define GL_TEXTURE0                       0x84C0
+#define GL_TEXTURE1                       0x84C1
+#define GL_TEXTURE2                       0x84C2
+#define GL_TEXTURE3                       0x84C3
+#define GL_TEXTURE4                       0x84C4
+#define GL_TEXTURE5                       0x84C5
+#define GL_TEXTURE6                       0x84C6
+#define GL_TEXTURE7                       0x84C7
+#define GL_TEXTURE8                       0x84C8
+#define GL_TEXTURE9                       0x84C9
+#define GL_TEXTURE10                      0x84CA
+#define GL_TEXTURE11                      0x84CB
+#define GL_TEXTURE12                      0x84CC
+#define GL_TEXTURE13                      0x84CD
+#define GL_TEXTURE14                      0x84CE
+#define GL_TEXTURE15                      0x84CF
+#define GL_TEXTURE16                      0x84D0
+#define GL_TEXTURE17                      0x84D1
+#define GL_TEXTURE18                      0x84D2
+#define GL_TEXTURE19                      0x84D3
+#define GL_TEXTURE20                      0x84D4
+#define GL_TEXTURE21                      0x84D5
+#define GL_TEXTURE22                      0x84D6
+#define GL_TEXTURE23                      0x84D7
+#define GL_TEXTURE24                      0x84D8
+#define GL_TEXTURE25                      0x84D9
+#define GL_TEXTURE26                      0x84DA
+#define GL_TEXTURE27                      0x84DB
+#define GL_TEXTURE28                      0x84DC
+#define GL_TEXTURE29                      0x84DD
+#define GL_TEXTURE30                      0x84DE
+#define GL_TEXTURE31                      0x84DF
+#define GL_ACTIVE_TEXTURE                 0x84E0
+#define GL_MULTISAMPLE                    0x809D
+#define GL_SAMPLE_ALPHA_TO_COVERAGE       0x809E
+#define GL_SAMPLE_ALPHA_TO_ONE            0x809F
+#define GL_SAMPLE_COVERAGE                0x80A0
+#define GL_SAMPLE_BUFFERS                 0x80A8
+#define GL_SAMPLES                        0x80A9
+#define GL_SAMPLE_COVERAGE_VALUE          0x80AA
+#define GL_SAMPLE_COVERAGE_INVERT         0x80AB
+#define GL_TEXTURE_CUBE_MAP               0x8513
+#define GL_TEXTURE_BINDING_CUBE_MAP       0x8514
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_X    0x8515
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_X    0x8516
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_Y    0x8517
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_Y    0x8518
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_Z    0x8519
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_Z    0x851A
+#define GL_PROXY_TEXTURE_CUBE_MAP         0x851B
+#define GL_MAX_CUBE_MAP_TEXTURE_SIZE      0x851C
+#define GL_COMPRESSED_RGB                 0x84ED
+#define GL_COMPRESSED_RGBA                0x84EE
+#define GL_TEXTURE_COMPRESSION_HINT       0x84EF
+#define GL_TEXTURE_COMPRESSED_IMAGE_SIZE  0x86A0
+#define GL_TEXTURE_COMPRESSED             0x86A1
+#define GL_NUM_COMPRESSED_TEXTURE_FORMATS 0x86A2
+#define GL_COMPRESSED_TEXTURE_FORMATS     0x86A3
+#define GL_CLAMP_TO_BORDER                0x812D
+#define GL_CLIENT_ACTIVE_TEXTURE          0x84E1
+#define GL_MAX_TEXTURE_UNITS              0x84E2
+#define GL_TRANSPOSE_MODELVIEW_MATRIX     0x84E3
+#define GL_TRANSPOSE_PROJECTION_MATRIX    0x84E4
+#define GL_TRANSPOSE_TEXTURE_MATRIX       0x84E5
+#define GL_TRANSPOSE_COLOR_MATRIX         0x84E6
+#define GL_MULTISAMPLE_BIT                0x20000000
+#define GL_NORMAL_MAP                     0x8511
+#define GL_REFLECTION_MAP                 0x8512
+#define GL_COMPRESSED_ALPHA               0x84E9
+#define GL_COMPRESSED_LUMINANCE           0x84EA
+#define GL_COMPRESSED_LUMINANCE_ALPHA     0x84EB
+#define GL_COMPRESSED_INTENSITY           0x84EC
+#define GL_COMBINE                        0x8570
+#define GL_COMBINE_RGB                    0x8571
+#define GL_COMBINE_ALPHA                  0x8572
+#define GL_SOURCE0_RGB                    0x8580
+#define GL_SOURCE1_RGB                    0x8581
+#define GL_SOURCE2_RGB                    0x8582
+#define GL_SOURCE0_ALPHA                  0x8588
+#define GL_SOURCE1_ALPHA                  0x8589
+#define GL_SOURCE2_ALPHA                  0x858A
+#define GL_OPERAND0_RGB                   0x8590
+#define GL_OPERAND1_RGB                   0x8591
+#define GL_OPERAND2_RGB                   0x8592
+#define GL_OPERAND0_ALPHA                 0x8598
+#define GL_OPERAND1_ALPHA                 0x8599
+#define GL_OPERAND2_ALPHA                 0x859A
+#define GL_RGB_SCALE                      0x8573
+#define GL_ADD_SIGNED                     0x8574
+#define GL_INTERPOLATE                    0x8575
+#define GL_SUBTRACT                       0x84E7
+#define GL_CONSTANT                       0x8576
+#define GL_PRIMARY_COLOR                  0x8577
+#define GL_PREVIOUS                       0x8578
+#define GL_DOT3_RGB                       0x86AE
+#define GL_DOT3_RGBA                      0x86AF
+typedef void (APIENTRYP PFNGLACTIVETEXTUREPROC) (GLenum texture);
+typedef void (APIENTRYP PFNGLSAMPLECOVERAGEPROC) (GLfloat value, GLboolean invert);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE3DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE1DPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE3DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE1DPROC) (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXIMAGEPROC) (GLenum target, GLint level, void *img);
+typedef void (APIENTRYP PFNGLCLIENTACTIVETEXTUREPROC) (GLenum texture);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1DPROC) (GLenum target, GLdouble s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1DVPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1FPROC) (GLenum target, GLfloat s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1FVPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1IPROC) (GLenum target, GLint s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1IVPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1SPROC) (GLenum target, GLshort s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1SVPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2DPROC) (GLenum target, GLdouble s, GLdouble t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2DVPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2FPROC) (GLenum target, GLfloat s, GLfloat t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2FVPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2IPROC) (GLenum target, GLint s, GLint t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2IVPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2SPROC) (GLenum target, GLshort s, GLshort t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2SVPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3DPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3DVPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3FPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3FVPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3IPROC) (GLenum target, GLint s, GLint t, GLint r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3IVPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3SPROC) (GLenum target, GLshort s, GLshort t, GLshort r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3SVPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4DPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r, GLdouble q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4DVPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4FPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r, GLfloat q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4FVPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4IPROC) (GLenum target, GLint s, GLint t, GLint r, GLint q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4IVPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4SPROC) (GLenum target, GLshort s, GLshort t, GLshort r, GLshort q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4SVPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXFPROC) (const GLfloat *m);
+typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXDPROC) (const GLdouble *m);
+typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXFPROC) (const GLfloat *m);
+typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXDPROC) (const GLdouble *m);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glActiveTexture (GLenum texture);
+GLAPI void APIENTRY glSampleCoverage (GLfloat value, GLboolean invert);
+GLAPI void APIENTRY glCompressedTexImage3D (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexImage2D (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexImage1D (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexSubImage3D (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexSubImage2D (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexSubImage1D (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glGetCompressedTexImage (GLenum target, GLint level, void *img);
+GLAPI void APIENTRY glClientActiveTexture (GLenum texture);
+GLAPI void APIENTRY glMultiTexCoord1d (GLenum target, GLdouble s);
+GLAPI void APIENTRY glMultiTexCoord1dv (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord1f (GLenum target, GLfloat s);
+GLAPI void APIENTRY glMultiTexCoord1fv (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord1i (GLenum target, GLint s);
+GLAPI void APIENTRY glMultiTexCoord1iv (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord1s (GLenum target, GLshort s);
+GLAPI void APIENTRY glMultiTexCoord1sv (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glMultiTexCoord2d (GLenum target, GLdouble s, GLdouble t);
+GLAPI void APIENTRY glMultiTexCoord2dv (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord2f (GLenum target, GLfloat s, GLfloat t);
+GLAPI void APIENTRY glMultiTexCoord2fv (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord2i (GLenum target, GLint s, GLint t);
+GLAPI void APIENTRY glMultiTexCoord2iv (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord2s (GLenum target, GLshort s, GLshort t);
+GLAPI void APIENTRY glMultiTexCoord2sv (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glMultiTexCoord3d (GLenum target, GLdouble s, GLdouble t, GLdouble r);
+GLAPI void APIENTRY glMultiTexCoord3dv (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord3f (GLenum target, GLfloat s, GLfloat t, GLfloat r);
+GLAPI void APIENTRY glMultiTexCoord3fv (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord3i (GLenum target, GLint s, GLint t, GLint r);
+GLAPI void APIENTRY glMultiTexCoord3iv (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord3s (GLenum target, GLshort s, GLshort t, GLshort r);
+GLAPI void APIENTRY glMultiTexCoord3sv (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glMultiTexCoord4d (GLenum target, GLdouble s, GLdouble t, GLdouble r, GLdouble q);
+GLAPI void APIENTRY glMultiTexCoord4dv (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord4f (GLenum target, GLfloat s, GLfloat t, GLfloat r, GLfloat q);
+GLAPI void APIENTRY glMultiTexCoord4fv (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord4i (GLenum target, GLint s, GLint t, GLint r, GLint q);
+GLAPI void APIENTRY glMultiTexCoord4iv (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord4s (GLenum target, GLshort s, GLshort t, GLshort r, GLshort q);
+GLAPI void APIENTRY glMultiTexCoord4sv (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glLoadTransposeMatrixf (const GLfloat *m);
+GLAPI void APIENTRY glLoadTransposeMatrixd (const GLdouble *m);
+GLAPI void APIENTRY glMultTransposeMatrixf (const GLfloat *m);
+GLAPI void APIENTRY glMultTransposeMatrixd (const GLdouble *m);
+#endif
+#endif /* GL_VERSION_1_3 */
+
+#ifndef GL_VERSION_1_4
+#define GL_VERSION_1_4 1
+#define GL_BLEND_DST_RGB                  0x80C8
+#define GL_BLEND_SRC_RGB                  0x80C9
+#define GL_BLEND_DST_ALPHA                0x80CA
+#define GL_BLEND_SRC_ALPHA                0x80CB
+#define GL_POINT_FADE_THRESHOLD_SIZE      0x8128
+#define GL_DEPTH_COMPONENT16              0x81A5
+#define GL_DEPTH_COMPONENT24              0x81A6
+#define GL_DEPTH_COMPONENT32              0x81A7
+#define GL_MIRRORED_REPEAT                0x8370
+#define GL_MAX_TEXTURE_LOD_BIAS           0x84FD
+#define GL_TEXTURE_LOD_BIAS               0x8501
+#define GL_INCR_WRAP                      0x8507
+#define GL_DECR_WRAP                      0x8508
+#define GL_TEXTURE_DEPTH_SIZE             0x884A
+#define GL_TEXTURE_COMPARE_MODE           0x884C
+#define GL_TEXTURE_COMPARE_FUNC           0x884D
+#define GL_POINT_SIZE_MIN                 0x8126
+#define GL_POINT_SIZE_MAX                 0x8127
+#define GL_POINT_DISTANCE_ATTENUATION     0x8129
+#define GL_GENERATE_MIPMAP                0x8191
+#define GL_GENERATE_MIPMAP_HINT           0x8192
+#define GL_FOG_COORDINATE_SOURCE          0x8450
+#define GL_FOG_COORDINATE                 0x8451
+#define GL_FRAGMENT_DEPTH                 0x8452
+#define GL_CURRENT_FOG_COORDINATE         0x8453
+#define GL_FOG_COORDINATE_ARRAY_TYPE      0x8454
+#define GL_FOG_COORDINATE_ARRAY_STRIDE    0x8455
+#define GL_FOG_COORDINATE_ARRAY_POINTER   0x8456
+#define GL_FOG_COORDINATE_ARRAY           0x8457
+#define GL_COLOR_SUM                      0x8458
+#define GL_CURRENT_SECONDARY_COLOR        0x8459
+#define GL_SECONDARY_COLOR_ARRAY_SIZE     0x845A
+#define GL_SECONDARY_COLOR_ARRAY_TYPE     0x845B
+#define GL_SECONDARY_COLOR_ARRAY_STRIDE   0x845C
+#define GL_SECONDARY_COLOR_ARRAY_POINTER  0x845D
+#define GL_SECONDARY_COLOR_ARRAY          0x845E
+#define GL_TEXTURE_FILTER_CONTROL         0x8500
+#define GL_DEPTH_TEXTURE_MODE             0x884B
+#define GL_COMPARE_R_TO_TEXTURE           0x884E
+#define GL_FUNC_ADD                       0x8006
+#define GL_FUNC_SUBTRACT                  0x800A
+#define GL_FUNC_REVERSE_SUBTRACT          0x800B
+#define GL_MIN                            0x8007
+#define GL_MAX                            0x8008
+#define GL_CONSTANT_COLOR                 0x8001
+#define GL_ONE_MINUS_CONSTANT_COLOR       0x8002
+#define GL_CONSTANT_ALPHA                 0x8003
+#define GL_ONE_MINUS_CONSTANT_ALPHA       0x8004
+typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
+typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSPROC) (GLenum mode, const GLint *first, const GLsizei *count, GLsizei drawcount);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSPROC) (GLenum mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei drawcount);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFVPROC) (GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERIPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERIVPROC) (GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLFOGCOORDFPROC) (GLfloat coord);
+typedef void (APIENTRYP PFNGLFOGCOORDFVPROC) (const GLfloat *coord);
+typedef void (APIENTRYP PFNGLFOGCOORDDPROC) (GLdouble coord);
+typedef void (APIENTRYP PFNGLFOGCOORDDVPROC) (const GLdouble *coord);
+typedef void (APIENTRYP PFNGLFOGCOORDPOINTERPROC) (GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BPROC) (GLbyte red, GLbyte green, GLbyte blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BVPROC) (const GLbyte *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DPROC) (GLdouble red, GLdouble green, GLdouble blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DVPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FPROC) (GLfloat red, GLfloat green, GLfloat blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FVPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IPROC) (GLint red, GLint green, GLint blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IVPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SPROC) (GLshort red, GLshort green, GLshort blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SVPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBPROC) (GLubyte red, GLubyte green, GLubyte blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBVPROC) (const GLubyte *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIPROC) (GLuint red, GLuint green, GLuint blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIVPROC) (const GLuint *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USPROC) (GLushort red, GLushort green, GLushort blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USVPROC) (const GLushort *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLORPOINTERPROC) (GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLWINDOWPOS2DPROC) (GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2DVPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2FPROC) (GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2FVPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2IPROC) (GLint x, GLint y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2IVPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2SPROC) (GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2SVPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3DPROC) (GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3DVPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3FPROC) (GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3FVPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3IPROC) (GLint x, GLint y, GLint z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3IVPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3SPROC) (GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3SVPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLBLENDCOLORPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
+typedef void (APIENTRYP PFNGLBLENDEQUATIONPROC) (GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendFuncSeparate (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
+GLAPI void APIENTRY glMultiDrawArrays (GLenum mode, const GLint *first, const GLsizei *count, GLsizei drawcount);
+GLAPI void APIENTRY glMultiDrawElements (GLenum mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei drawcount);
+GLAPI void APIENTRY glPointParameterf (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glPointParameterfv (GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glPointParameteri (GLenum pname, GLint param);
+GLAPI void APIENTRY glPointParameteriv (GLenum pname, const GLint *params);
+GLAPI void APIENTRY glFogCoordf (GLfloat coord);
+GLAPI void APIENTRY glFogCoordfv (const GLfloat *coord);
+GLAPI void APIENTRY glFogCoordd (GLdouble coord);
+GLAPI void APIENTRY glFogCoorddv (const GLdouble *coord);
+GLAPI void APIENTRY glFogCoordPointer (GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glSecondaryColor3b (GLbyte red, GLbyte green, GLbyte blue);
+GLAPI void APIENTRY glSecondaryColor3bv (const GLbyte *v);
+GLAPI void APIENTRY glSecondaryColor3d (GLdouble red, GLdouble green, GLdouble blue);
+GLAPI void APIENTRY glSecondaryColor3dv (const GLdouble *v);
+GLAPI void APIENTRY glSecondaryColor3f (GLfloat red, GLfloat green, GLfloat blue);
+GLAPI void APIENTRY glSecondaryColor3fv (const GLfloat *v);
+GLAPI void APIENTRY glSecondaryColor3i (GLint red, GLint green, GLint blue);
+GLAPI void APIENTRY glSecondaryColor3iv (const GLint *v);
+GLAPI void APIENTRY glSecondaryColor3s (GLshort red, GLshort green, GLshort blue);
+GLAPI void APIENTRY glSecondaryColor3sv (const GLshort *v);
+GLAPI void APIENTRY glSecondaryColor3ub (GLubyte red, GLubyte green, GLubyte blue);
+GLAPI void APIENTRY glSecondaryColor3ubv (const GLubyte *v);
+GLAPI void APIENTRY glSecondaryColor3ui (GLuint red, GLuint green, GLuint blue);
+GLAPI void APIENTRY glSecondaryColor3uiv (const GLuint *v);
+GLAPI void APIENTRY glSecondaryColor3us (GLushort red, GLushort green, GLushort blue);
+GLAPI void APIENTRY glSecondaryColor3usv (const GLushort *v);
+GLAPI void APIENTRY glSecondaryColorPointer (GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glWindowPos2d (GLdouble x, GLdouble y);
+GLAPI void APIENTRY glWindowPos2dv (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos2f (GLfloat x, GLfloat y);
+GLAPI void APIENTRY glWindowPos2fv (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos2i (GLint x, GLint y);
+GLAPI void APIENTRY glWindowPos2iv (const GLint *v);
+GLAPI void APIENTRY glWindowPos2s (GLshort x, GLshort y);
+GLAPI void APIENTRY glWindowPos2sv (const GLshort *v);
+GLAPI void APIENTRY glWindowPos3d (GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glWindowPos3dv (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos3f (GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glWindowPos3fv (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos3i (GLint x, GLint y, GLint z);
+GLAPI void APIENTRY glWindowPos3iv (const GLint *v);
+GLAPI void APIENTRY glWindowPos3s (GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glWindowPos3sv (const GLshort *v);
+GLAPI void APIENTRY glBlendColor (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
+GLAPI void APIENTRY glBlendEquation (GLenum mode);
+#endif
+#endif /* GL_VERSION_1_4 */
+
+#ifndef GL_VERSION_1_5
+#define GL_VERSION_1_5 1
+#include <stddef.h>
+typedef ptrdiff_t GLsizeiptr;
+typedef ptrdiff_t GLintptr;
+#define GL_BUFFER_SIZE                    0x8764
+#define GL_BUFFER_USAGE                   0x8765
+#define GL_QUERY_COUNTER_BITS             0x8864
+#define GL_CURRENT_QUERY                  0x8865
+#define GL_QUERY_RESULT                   0x8866
+#define GL_QUERY_RESULT_AVAILABLE         0x8867
+#define GL_ARRAY_BUFFER                   0x8892
+#define GL_ELEMENT_ARRAY_BUFFER           0x8893
+#define GL_ARRAY_BUFFER_BINDING           0x8894
+#define GL_ELEMENT_ARRAY_BUFFER_BINDING   0x8895
+#define GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING 0x889F
+#define GL_READ_ONLY                      0x88B8
+#define GL_WRITE_ONLY                     0x88B9
+#define GL_READ_WRITE                     0x88BA
+#define GL_BUFFER_ACCESS                  0x88BB
+#define GL_BUFFER_MAPPED                  0x88BC
+#define GL_BUFFER_MAP_POINTER             0x88BD
+#define GL_STREAM_DRAW                    0x88E0
+#define GL_STREAM_READ                    0x88E1
+#define GL_STREAM_COPY                    0x88E2
+#define GL_STATIC_DRAW                    0x88E4
+#define GL_STATIC_READ                    0x88E5
+#define GL_STATIC_COPY                    0x88E6
+#define GL_DYNAMIC_DRAW                   0x88E8
+#define GL_DYNAMIC_READ                   0x88E9
+#define GL_DYNAMIC_COPY                   0x88EA
+#define GL_SAMPLES_PASSED                 0x8914
+#define GL_SRC1_ALPHA                     0x8589
+#define GL_VERTEX_ARRAY_BUFFER_BINDING    0x8896
+#define GL_NORMAL_ARRAY_BUFFER_BINDING    0x8897
+#define GL_COLOR_ARRAY_BUFFER_BINDING     0x8898
+#define GL_INDEX_ARRAY_BUFFER_BINDING     0x8899
+#define GL_TEXTURE_COORD_ARRAY_BUFFER_BINDING 0x889A
+#define GL_EDGE_FLAG_ARRAY_BUFFER_BINDING 0x889B
+#define GL_SECONDARY_COLOR_ARRAY_BUFFER_BINDING 0x889C
+#define GL_FOG_COORDINATE_ARRAY_BUFFER_BINDING 0x889D
+#define GL_WEIGHT_ARRAY_BUFFER_BINDING    0x889E
+#define GL_FOG_COORD_SRC                  0x8450
+#define GL_FOG_COORD                      0x8451
+#define GL_CURRENT_FOG_COORD              0x8453
+#define GL_FOG_COORD_ARRAY_TYPE           0x8454
+#define GL_FOG_COORD_ARRAY_STRIDE         0x8455
+#define GL_FOG_COORD_ARRAY_POINTER        0x8456
+#define GL_FOG_COORD_ARRAY                0x8457
+#define GL_FOG_COORD_ARRAY_BUFFER_BINDING 0x889D
+#define GL_SRC0_RGB                       0x8580
+#define GL_SRC1_RGB                       0x8581
+#define GL_SRC2_RGB                       0x8582
+#define GL_SRC0_ALPHA                     0x8588
+#define GL_SRC2_ALPHA                     0x858A
+typedef void (APIENTRYP PFNGLGENQUERIESPROC) (GLsizei n, GLuint *ids);
+typedef void (APIENTRYP PFNGLDELETEQUERIESPROC) (GLsizei n, const GLuint *ids);
+typedef GLboolean (APIENTRYP PFNGLISQUERYPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLBEGINQUERYPROC) (GLenum target, GLuint id);
+typedef void (APIENTRYP PFNGLENDQUERYPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLGETQUERYIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTIVPROC) (GLuint id, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTUIVPROC) (GLuint id, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLBINDBUFFERPROC) (GLenum target, GLuint buffer);
+typedef void (APIENTRYP PFNGLDELETEBUFFERSPROC) (GLsizei n, const GLuint *buffers);
+typedef void (APIENTRYP PFNGLGENBUFFERSPROC) (GLsizei n, GLuint *buffers);
+typedef GLboolean (APIENTRYP PFNGLISBUFFERPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLBUFFERDATAPROC) (GLenum target, GLsizeiptr size, const void *data, GLenum usage);
+typedef void (APIENTRYP PFNGLBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, const void *data);
+typedef void (APIENTRYP PFNGLGETBUFFERSUBDATAPROC) (GLenum target, GLintptr offset, GLsizeiptr size, void *data);
+typedef void *(APIENTRYP PFNGLMAPBUFFERPROC) (GLenum target, GLenum access);
+typedef GLboolean (APIENTRYP PFNGLUNMAPBUFFERPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETBUFFERPOINTERVPROC) (GLenum target, GLenum pname, void **params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGenQueries (GLsizei n, GLuint *ids);
+GLAPI void APIENTRY glDeleteQueries (GLsizei n, const GLuint *ids);
+GLAPI GLboolean APIENTRY glIsQuery (GLuint id);
+GLAPI void APIENTRY glBeginQuery (GLenum target, GLuint id);
+GLAPI void APIENTRY glEndQuery (GLenum target);
+GLAPI void APIENTRY glGetQueryiv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetQueryObjectiv (GLuint id, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetQueryObjectuiv (GLuint id, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glBindBuffer (GLenum target, GLuint buffer);
+GLAPI void APIENTRY glDeleteBuffers (GLsizei n, const GLuint *buffers);
+GLAPI void APIENTRY glGenBuffers (GLsizei n, GLuint *buffers);
+GLAPI GLboolean APIENTRY glIsBuffer (GLuint buffer);
+GLAPI void APIENTRY glBufferData (GLenum target, GLsizeiptr size, const void *data, GLenum usage);
+GLAPI void APIENTRY glBufferSubData (GLenum target, GLintptr offset, GLsizeiptr size, const void *data);
+GLAPI void APIENTRY glGetBufferSubData (GLenum target, GLintptr offset, GLsizeiptr size, void *data);
+GLAPI void *APIENTRY glMapBuffer (GLenum target, GLenum access);
+GLAPI GLboolean APIENTRY glUnmapBuffer (GLenum target);
+GLAPI void APIENTRY glGetBufferParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetBufferPointerv (GLenum target, GLenum pname, void **params);
+#endif
+#endif /* GL_VERSION_1_5 */
+
+#ifndef GL_VERSION_2_0
+#define GL_VERSION_2_0 1
+typedef char GLchar;
+#define GL_BLEND_EQUATION_RGB             0x8009
+#define GL_VERTEX_ATTRIB_ARRAY_ENABLED    0x8622
+#define GL_VERTEX_ATTRIB_ARRAY_SIZE       0x8623
+#define GL_VERTEX_ATTRIB_ARRAY_STRIDE     0x8624
+#define GL_VERTEX_ATTRIB_ARRAY_TYPE       0x8625
+#define GL_CURRENT_VERTEX_ATTRIB          0x8626
+#define GL_VERTEX_PROGRAM_POINT_SIZE      0x8642
+#define GL_VERTEX_ATTRIB_ARRAY_POINTER    0x8645
+#define GL_STENCIL_BACK_FUNC              0x8800
+#define GL_STENCIL_BACK_FAIL              0x8801
+#define GL_STENCIL_BACK_PASS_DEPTH_FAIL   0x8802
+#define GL_STENCIL_BACK_PASS_DEPTH_PASS   0x8803
+#define GL_MAX_DRAW_BUFFERS               0x8824
+#define GL_DRAW_BUFFER0                   0x8825
+#define GL_DRAW_BUFFER1                   0x8826
+#define GL_DRAW_BUFFER2                   0x8827
+#define GL_DRAW_BUFFER3                   0x8828
+#define GL_DRAW_BUFFER4                   0x8829
+#define GL_DRAW_BUFFER5                   0x882A
+#define GL_DRAW_BUFFER6                   0x882B
+#define GL_DRAW_BUFFER7                   0x882C
+#define GL_DRAW_BUFFER8                   0x882D
+#define GL_DRAW_BUFFER9                   0x882E
+#define GL_DRAW_BUFFER10                  0x882F
+#define GL_DRAW_BUFFER11                  0x8830
+#define GL_DRAW_BUFFER12                  0x8831
+#define GL_DRAW_BUFFER13                  0x8832
+#define GL_DRAW_BUFFER14                  0x8833
+#define GL_DRAW_BUFFER15                  0x8834
+#define GL_BLEND_EQUATION_ALPHA           0x883D
+#define GL_MAX_VERTEX_ATTRIBS             0x8869
+#define GL_VERTEX_ATTRIB_ARRAY_NORMALIZED 0x886A
+#define GL_MAX_TEXTURE_IMAGE_UNITS        0x8872
+#define GL_FRAGMENT_SHADER                0x8B30
+#define GL_VERTEX_SHADER                  0x8B31
+#define GL_MAX_FRAGMENT_UNIFORM_COMPONENTS 0x8B49
+#define GL_MAX_VERTEX_UNIFORM_COMPONENTS  0x8B4A
+#define GL_MAX_VARYING_FLOATS             0x8B4B
+#define GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS 0x8B4C
+#define GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS 0x8B4D
+#define GL_SHADER_TYPE                    0x8B4F
+#define GL_FLOAT_VEC2                     0x8B50
+#define GL_FLOAT_VEC3                     0x8B51
+#define GL_FLOAT_VEC4                     0x8B52
+#define GL_INT_VEC2                       0x8B53
+#define GL_INT_VEC3                       0x8B54
+#define GL_INT_VEC4                       0x8B55
+#define GL_BOOL                           0x8B56
+#define GL_BOOL_VEC2                      0x8B57
+#define GL_BOOL_VEC3                      0x8B58
+#define GL_BOOL_VEC4                      0x8B59
+#define GL_FLOAT_MAT2                     0x8B5A
+#define GL_FLOAT_MAT3                     0x8B5B
+#define GL_FLOAT_MAT4                     0x8B5C
+#define GL_SAMPLER_1D                     0x8B5D
+#define GL_SAMPLER_2D                     0x8B5E
+#define GL_SAMPLER_3D                     0x8B5F
+#define GL_SAMPLER_CUBE                   0x8B60
+#define GL_SAMPLER_1D_SHADOW              0x8B61
+#define GL_SAMPLER_2D_SHADOW              0x8B62
+#define GL_DELETE_STATUS                  0x8B80
+#define GL_COMPILE_STATUS                 0x8B81
+#define GL_LINK_STATUS                    0x8B82
+#define GL_VALIDATE_STATUS                0x8B83
+#define GL_INFO_LOG_LENGTH                0x8B84
+#define GL_ATTACHED_SHADERS               0x8B85
+#define GL_ACTIVE_UNIFORMS                0x8B86
+#define GL_ACTIVE_UNIFORM_MAX_LENGTH      0x8B87
+#define GL_SHADER_SOURCE_LENGTH           0x8B88
+#define GL_ACTIVE_ATTRIBUTES              0x8B89
+#define GL_ACTIVE_ATTRIBUTE_MAX_LENGTH    0x8B8A
+#define GL_FRAGMENT_SHADER_DERIVATIVE_HINT 0x8B8B
+#define GL_SHADING_LANGUAGE_VERSION       0x8B8C
+#define GL_CURRENT_PROGRAM                0x8B8D
+#define GL_POINT_SPRITE_COORD_ORIGIN      0x8CA0
+#define GL_LOWER_LEFT                     0x8CA1
+#define GL_UPPER_LEFT                     0x8CA2
+#define GL_STENCIL_BACK_REF               0x8CA3
+#define GL_STENCIL_BACK_VALUE_MASK        0x8CA4
+#define GL_STENCIL_BACK_WRITEMASK         0x8CA5
+#define GL_VERTEX_PROGRAM_TWO_SIDE        0x8643
+#define GL_POINT_SPRITE                   0x8861
+#define GL_COORD_REPLACE                  0x8862
+#define GL_MAX_TEXTURE_COORDS             0x8871
+typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEPROC) (GLenum modeRGB, GLenum modeAlpha);
+typedef void (APIENTRYP PFNGLDRAWBUFFERSPROC) (GLsizei n, const GLenum *bufs);
+typedef void (APIENTRYP PFNGLSTENCILOPSEPARATEPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
+typedef void (APIENTRYP PFNGLSTENCILFUNCSEPARATEPROC) (GLenum face, GLenum func, GLint ref, GLuint mask);
+typedef void (APIENTRYP PFNGLSTENCILMASKSEPARATEPROC) (GLenum face, GLuint mask);
+typedef void (APIENTRYP PFNGLATTACHSHADERPROC) (GLuint program, GLuint shader);
+typedef void (APIENTRYP PFNGLBINDATTRIBLOCATIONPROC) (GLuint program, GLuint index, const GLchar *name);
+typedef void (APIENTRYP PFNGLCOMPILESHADERPROC) (GLuint shader);
+typedef GLuint (APIENTRYP PFNGLCREATEPROGRAMPROC) (void);
+typedef GLuint (APIENTRYP PFNGLCREATESHADERPROC) (GLenum type);
+typedef void (APIENTRYP PFNGLDELETEPROGRAMPROC) (GLuint program);
+typedef void (APIENTRYP PFNGLDELETESHADERPROC) (GLuint shader);
+typedef void (APIENTRYP PFNGLDETACHSHADERPROC) (GLuint program, GLuint shader);
+typedef void (APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYPROC) (GLuint index);
+typedef void (APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYPROC) (GLuint index);
+typedef void (APIENTRYP PFNGLGETACTIVEATTRIBPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
+typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
+typedef void (APIENTRYP PFNGLGETATTACHEDSHADERSPROC) (GLuint program, GLsizei maxCount, GLsizei *count, GLuint *shaders);
+typedef GLint (APIENTRYP PFNGLGETATTRIBLOCATIONPROC) (GLuint program, const GLchar *name);
+typedef void (APIENTRYP PFNGLGETPROGRAMIVPROC) (GLuint program, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMINFOLOGPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
+typedef void (APIENTRYP PFNGLGETSHADERIVPROC) (GLuint shader, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETSHADERINFOLOGPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
+typedef void (APIENTRYP PFNGLGETSHADERSOURCEPROC) (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *source);
+typedef GLint (APIENTRYP PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const GLchar *name);
+typedef void (APIENTRYP PFNGLGETUNIFORMFVPROC) (GLuint program, GLint location, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETUNIFORMIVPROC) (GLuint program, GLint location, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBDVPROC) (GLuint index, GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBFVPROC) (GLuint index, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIVPROC) (GLuint index, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVPROC) (GLuint index, GLenum pname, void **pointer);
+typedef GLboolean (APIENTRYP PFNGLISPROGRAMPROC) (GLuint program);
+typedef GLboolean (APIENTRYP PFNGLISSHADERPROC) (GLuint shader);
+typedef void (APIENTRYP PFNGLLINKPROGRAMPROC) (GLuint program);
+typedef void (APIENTRYP PFNGLSHADERSOURCEPROC) (GLuint shader, GLsizei count, const GLchar *const*string, const GLint *length);
+typedef void (APIENTRYP PFNGLUSEPROGRAMPROC) (GLuint program);
+typedef void (APIENTRYP PFNGLUNIFORM1FPROC) (GLint location, GLfloat v0);
+typedef void (APIENTRYP PFNGLUNIFORM2FPROC) (GLint location, GLfloat v0, GLfloat v1);
+typedef void (APIENTRYP PFNGLUNIFORM3FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+typedef void (APIENTRYP PFNGLUNIFORM4FPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+typedef void (APIENTRYP PFNGLUNIFORM1IPROC) (GLint location, GLint v0);
+typedef void (APIENTRYP PFNGLUNIFORM2IPROC) (GLint location, GLint v0, GLint v1);
+typedef void (APIENTRYP PFNGLUNIFORM3IPROC) (GLint location, GLint v0, GLint v1, GLint v2);
+typedef void (APIENTRYP PFNGLUNIFORM4IPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+typedef void (APIENTRYP PFNGLUNIFORM1FVPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM3FVPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM4FVPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM1IVPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORM2IVPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORM3IVPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORM4IVPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLVALIDATEPROGRAMPROC) (GLuint program);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1DPROC) (GLuint index, GLdouble x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1FPROC) (GLuint index, GLfloat x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1FVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1SPROC) (GLuint index, GLshort x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1SVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2DPROC) (GLuint index, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2FPROC) (GLuint index, GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2FVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2SPROC) (GLuint index, GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2SVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3DPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3FVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3SPROC) (GLuint index, GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3SVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NBVPROC) (GLuint index, const GLbyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NIVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NSVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBPROC) (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBVPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUIVPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUSVPROC) (GLuint index, const GLushort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4BVPROC) (GLuint index, const GLbyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4DPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4FPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4FVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4IVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4SPROC) (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4SVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBVPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4UIVPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4USVPROC) (GLuint index, const GLushort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBPOINTERPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendEquationSeparate (GLenum modeRGB, GLenum modeAlpha);
+GLAPI void APIENTRY glDrawBuffers (GLsizei n, const GLenum *bufs);
+GLAPI void APIENTRY glStencilOpSeparate (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
+GLAPI void APIENTRY glStencilFuncSeparate (GLenum face, GLenum func, GLint ref, GLuint mask);
+GLAPI void APIENTRY glStencilMaskSeparate (GLenum face, GLuint mask);
+GLAPI void APIENTRY glAttachShader (GLuint program, GLuint shader);
+GLAPI void APIENTRY glBindAttribLocation (GLuint program, GLuint index, const GLchar *name);
+GLAPI void APIENTRY glCompileShader (GLuint shader);
+GLAPI GLuint APIENTRY glCreateProgram (void);
+GLAPI GLuint APIENTRY glCreateShader (GLenum type);
+GLAPI void APIENTRY glDeleteProgram (GLuint program);
+GLAPI void APIENTRY glDeleteShader (GLuint shader);
+GLAPI void APIENTRY glDetachShader (GLuint program, GLuint shader);
+GLAPI void APIENTRY glDisableVertexAttribArray (GLuint index);
+GLAPI void APIENTRY glEnableVertexAttribArray (GLuint index);
+GLAPI void APIENTRY glGetActiveAttrib (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
+GLAPI void APIENTRY glGetActiveUniform (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLint *size, GLenum *type, GLchar *name);
+GLAPI void APIENTRY glGetAttachedShaders (GLuint program, GLsizei maxCount, GLsizei *count, GLuint *shaders);
+GLAPI GLint APIENTRY glGetAttribLocation (GLuint program, const GLchar *name);
+GLAPI void APIENTRY glGetProgramiv (GLuint program, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetProgramInfoLog (GLuint program, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
+GLAPI void APIENTRY glGetShaderiv (GLuint shader, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetShaderInfoLog (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
+GLAPI void APIENTRY glGetShaderSource (GLuint shader, GLsizei bufSize, GLsizei *length, GLchar *source);
+GLAPI GLint APIENTRY glGetUniformLocation (GLuint program, const GLchar *name);
+GLAPI void APIENTRY glGetUniformfv (GLuint program, GLint location, GLfloat *params);
+GLAPI void APIENTRY glGetUniformiv (GLuint program, GLint location, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribdv (GLuint index, GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glGetVertexAttribfv (GLuint index, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetVertexAttribiv (GLuint index, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribPointerv (GLuint index, GLenum pname, void **pointer);
+GLAPI GLboolean APIENTRY glIsProgram (GLuint program);
+GLAPI GLboolean APIENTRY glIsShader (GLuint shader);
+GLAPI void APIENTRY glLinkProgram (GLuint program);
+GLAPI void APIENTRY glShaderSource (GLuint shader, GLsizei count, const GLchar *const*string, const GLint *length);
+GLAPI void APIENTRY glUseProgram (GLuint program);
+GLAPI void APIENTRY glUniform1f (GLint location, GLfloat v0);
+GLAPI void APIENTRY glUniform2f (GLint location, GLfloat v0, GLfloat v1);
+GLAPI void APIENTRY glUniform3f (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+GLAPI void APIENTRY glUniform4f (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+GLAPI void APIENTRY glUniform1i (GLint location, GLint v0);
+GLAPI void APIENTRY glUniform2i (GLint location, GLint v0, GLint v1);
+GLAPI void APIENTRY glUniform3i (GLint location, GLint v0, GLint v1, GLint v2);
+GLAPI void APIENTRY glUniform4i (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+GLAPI void APIENTRY glUniform1fv (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform2fv (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform3fv (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform4fv (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform1iv (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniform2iv (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniform3iv (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniform4iv (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniformMatrix2fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix3fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix4fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glValidateProgram (GLuint program);
+GLAPI void APIENTRY glVertexAttrib1d (GLuint index, GLdouble x);
+GLAPI void APIENTRY glVertexAttrib1dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib1f (GLuint index, GLfloat x);
+GLAPI void APIENTRY glVertexAttrib1fv (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib1s (GLuint index, GLshort x);
+GLAPI void APIENTRY glVertexAttrib1sv (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib2d (GLuint index, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glVertexAttrib2dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib2f (GLuint index, GLfloat x, GLfloat y);
+GLAPI void APIENTRY glVertexAttrib2fv (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib2s (GLuint index, GLshort x, GLshort y);
+GLAPI void APIENTRY glVertexAttrib2sv (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib3d (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glVertexAttrib3dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib3f (GLuint index, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glVertexAttrib3fv (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib3s (GLuint index, GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glVertexAttrib3sv (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4Nbv (GLuint index, const GLbyte *v);
+GLAPI void APIENTRY glVertexAttrib4Niv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttrib4Nsv (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4Nub (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w);
+GLAPI void APIENTRY glVertexAttrib4Nubv (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttrib4Nuiv (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttrib4Nusv (GLuint index, const GLushort *v);
+GLAPI void APIENTRY glVertexAttrib4bv (GLuint index, const GLbyte *v);
+GLAPI void APIENTRY glVertexAttrib4d (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glVertexAttrib4dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib4f (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glVertexAttrib4fv (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib4iv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttrib4s (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w);
+GLAPI void APIENTRY glVertexAttrib4sv (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4ubv (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttrib4uiv (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttrib4usv (GLuint index, const GLushort *v);
+GLAPI void APIENTRY glVertexAttribPointer (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
+#endif
+#endif /* GL_VERSION_2_0 */
+
+#ifndef GL_VERSION_2_1
+#define GL_VERSION_2_1 1
+#define GL_PIXEL_PACK_BUFFER              0x88EB
+#define GL_PIXEL_UNPACK_BUFFER            0x88EC
+#define GL_PIXEL_PACK_BUFFER_BINDING      0x88ED
+#define GL_PIXEL_UNPACK_BUFFER_BINDING    0x88EF
+#define GL_FLOAT_MAT2x3                   0x8B65
+#define GL_FLOAT_MAT2x4                   0x8B66
+#define GL_FLOAT_MAT3x2                   0x8B67
+#define GL_FLOAT_MAT3x4                   0x8B68
+#define GL_FLOAT_MAT4x2                   0x8B69
+#define GL_FLOAT_MAT4x3                   0x8B6A
+#define GL_SRGB                           0x8C40
+#define GL_SRGB8                          0x8C41
+#define GL_SRGB_ALPHA                     0x8C42
+#define GL_SRGB8_ALPHA8                   0x8C43
+#define GL_COMPRESSED_SRGB                0x8C48
+#define GL_COMPRESSED_SRGB_ALPHA          0x8C49
+#define GL_CURRENT_RASTER_SECONDARY_COLOR 0x845F
+#define GL_SLUMINANCE_ALPHA               0x8C44
+#define GL_SLUMINANCE8_ALPHA8             0x8C45
+#define GL_SLUMINANCE                     0x8C46
+#define GL_SLUMINANCE8                    0x8C47
+#define GL_COMPRESSED_SLUMINANCE          0x8C4A
+#define GL_COMPRESSED_SLUMINANCE_ALPHA    0x8C4B
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4X2FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3X4FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4X3FVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glUniformMatrix2x3fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix3x2fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix2x4fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix4x2fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix3x4fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix4x3fv (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+#endif
+#endif /* GL_VERSION_2_1 */
+
+#ifndef GL_VERSION_3_0
+#define GL_VERSION_3_0 1
+typedef unsigned short GLhalf;
+#define GL_COMPARE_REF_TO_TEXTURE         0x884E
+#define GL_CLIP_DISTANCE0                 0x3000
+#define GL_CLIP_DISTANCE1                 0x3001
+#define GL_CLIP_DISTANCE2                 0x3002
+#define GL_CLIP_DISTANCE3                 0x3003
+#define GL_CLIP_DISTANCE4                 0x3004
+#define GL_CLIP_DISTANCE5                 0x3005
+#define GL_CLIP_DISTANCE6                 0x3006
+#define GL_CLIP_DISTANCE7                 0x3007
+#define GL_MAX_CLIP_DISTANCES             0x0D32
+#define GL_MAJOR_VERSION                  0x821B
+#define GL_MINOR_VERSION                  0x821C
+#define GL_NUM_EXTENSIONS                 0x821D
+#define GL_CONTEXT_FLAGS                  0x821E
+#define GL_COMPRESSED_RED                 0x8225
+#define GL_COMPRESSED_RG                  0x8226
+#define GL_CONTEXT_FLAG_FORWARD_COMPATIBLE_BIT 0x00000001
+#define GL_RGBA32F                        0x8814
+#define GL_RGB32F                         0x8815
+#define GL_RGBA16F                        0x881A
+#define GL_RGB16F                         0x881B
+#define GL_VERTEX_ATTRIB_ARRAY_INTEGER    0x88FD
+#define GL_MAX_ARRAY_TEXTURE_LAYERS       0x88FF
+#define GL_MIN_PROGRAM_TEXEL_OFFSET       0x8904
+#define GL_MAX_PROGRAM_TEXEL_OFFSET       0x8905
+#define GL_CLAMP_READ_COLOR               0x891C
+#define GL_FIXED_ONLY                     0x891D
+#define GL_MAX_VARYING_COMPONENTS         0x8B4B
+#define GL_TEXTURE_1D_ARRAY               0x8C18
+#define GL_PROXY_TEXTURE_1D_ARRAY         0x8C19
+#define GL_TEXTURE_2D_ARRAY               0x8C1A
+#define GL_PROXY_TEXTURE_2D_ARRAY         0x8C1B
+#define GL_TEXTURE_BINDING_1D_ARRAY       0x8C1C
+#define GL_TEXTURE_BINDING_2D_ARRAY       0x8C1D
+#define GL_R11F_G11F_B10F                 0x8C3A
+#define GL_UNSIGNED_INT_10F_11F_11F_REV   0x8C3B
+#define GL_RGB9_E5                        0x8C3D
+#define GL_UNSIGNED_INT_5_9_9_9_REV       0x8C3E
+#define GL_TEXTURE_SHARED_SIZE            0x8C3F
+#define GL_TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH 0x8C76
+#define GL_TRANSFORM_FEEDBACK_BUFFER_MODE 0x8C7F
+#define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 0x8C80
+#define GL_TRANSFORM_FEEDBACK_VARYINGS    0x8C83
+#define GL_TRANSFORM_FEEDBACK_BUFFER_START 0x8C84
+#define GL_TRANSFORM_FEEDBACK_BUFFER_SIZE 0x8C85
+#define GL_PRIMITIVES_GENERATED           0x8C87
+#define GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN 0x8C88
+#define GL_RASTERIZER_DISCARD             0x8C89
+#define GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 0x8C8A
+#define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 0x8C8B
+#define GL_INTERLEAVED_ATTRIBS            0x8C8C
+#define GL_SEPARATE_ATTRIBS               0x8C8D
+#define GL_TRANSFORM_FEEDBACK_BUFFER      0x8C8E
+#define GL_TRANSFORM_FEEDBACK_BUFFER_BINDING 0x8C8F
+#define GL_RGBA32UI                       0x8D70
+#define GL_RGB32UI                        0x8D71
+#define GL_RGBA16UI                       0x8D76
+#define GL_RGB16UI                        0x8D77
+#define GL_RGBA8UI                        0x8D7C
+#define GL_RGB8UI                         0x8D7D
+#define GL_RGBA32I                        0x8D82
+#define GL_RGB32I                         0x8D83
+#define GL_RGBA16I                        0x8D88
+#define GL_RGB16I                         0x8D89
+#define GL_RGBA8I                         0x8D8E
+#define GL_RGB8I                          0x8D8F
+#define GL_RED_INTEGER                    0x8D94
+#define GL_GREEN_INTEGER                  0x8D95
+#define GL_BLUE_INTEGER                   0x8D96
+#define GL_RGB_INTEGER                    0x8D98
+#define GL_RGBA_INTEGER                   0x8D99
+#define GL_BGR_INTEGER                    0x8D9A
+#define GL_BGRA_INTEGER                   0x8D9B
+#define GL_SAMPLER_1D_ARRAY               0x8DC0
+#define GL_SAMPLER_2D_ARRAY               0x8DC1
+#define GL_SAMPLER_1D_ARRAY_SHADOW        0x8DC3
+#define GL_SAMPLER_2D_ARRAY_SHADOW        0x8DC4
+#define GL_SAMPLER_CUBE_SHADOW            0x8DC5
+#define GL_UNSIGNED_INT_VEC2              0x8DC6
+#define GL_UNSIGNED_INT_VEC3              0x8DC7
+#define GL_UNSIGNED_INT_VEC4              0x8DC8
+#define GL_INT_SAMPLER_1D                 0x8DC9
+#define GL_INT_SAMPLER_2D                 0x8DCA
+#define GL_INT_SAMPLER_3D                 0x8DCB
+#define GL_INT_SAMPLER_CUBE               0x8DCC
+#define GL_INT_SAMPLER_1D_ARRAY           0x8DCE
+#define GL_INT_SAMPLER_2D_ARRAY           0x8DCF
+#define GL_UNSIGNED_INT_SAMPLER_1D        0x8DD1
+#define GL_UNSIGNED_INT_SAMPLER_2D        0x8DD2
+#define GL_UNSIGNED_INT_SAMPLER_3D        0x8DD3
+#define GL_UNSIGNED_INT_SAMPLER_CUBE      0x8DD4
+#define GL_UNSIGNED_INT_SAMPLER_1D_ARRAY  0x8DD6
+#define GL_UNSIGNED_INT_SAMPLER_2D_ARRAY  0x8DD7
+#define GL_QUERY_WAIT                     0x8E13
+#define GL_QUERY_NO_WAIT                  0x8E14
+#define GL_QUERY_BY_REGION_WAIT           0x8E15
+#define GL_QUERY_BY_REGION_NO_WAIT        0x8E16
+#define GL_BUFFER_ACCESS_FLAGS            0x911F
+#define GL_BUFFER_MAP_LENGTH              0x9120
+#define GL_BUFFER_MAP_OFFSET              0x9121
+#define GL_DEPTH_COMPONENT32F             0x8CAC
+#define GL_DEPTH32F_STENCIL8              0x8CAD
+#define GL_FLOAT_32_UNSIGNED_INT_24_8_REV 0x8DAD
+#define GL_INVALID_FRAMEBUFFER_OPERATION  0x0506
+#define GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING 0x8210
+#define GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE 0x8211
+#define GL_FRAMEBUFFER_ATTACHMENT_RED_SIZE 0x8212
+#define GL_FRAMEBUFFER_ATTACHMENT_GREEN_SIZE 0x8213
+#define GL_FRAMEBUFFER_ATTACHMENT_BLUE_SIZE 0x8214
+#define GL_FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE 0x8215
+#define GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE 0x8216
+#define GL_FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE 0x8217
+#define GL_FRAMEBUFFER_DEFAULT            0x8218
+#define GL_FRAMEBUFFER_UNDEFINED          0x8219
+#define GL_DEPTH_STENCIL_ATTACHMENT       0x821A
+#define GL_MAX_RENDERBUFFER_SIZE          0x84E8
+#define GL_DEPTH_STENCIL                  0x84F9
+#define GL_UNSIGNED_INT_24_8              0x84FA
+#define GL_DEPTH24_STENCIL8               0x88F0
+#define GL_TEXTURE_STENCIL_SIZE           0x88F1
+#define GL_TEXTURE_RED_TYPE               0x8C10
+#define GL_TEXTURE_GREEN_TYPE             0x8C11
+#define GL_TEXTURE_BLUE_TYPE              0x8C12
+#define GL_TEXTURE_ALPHA_TYPE             0x8C13
+#define GL_TEXTURE_DEPTH_TYPE             0x8C16
+#define GL_UNSIGNED_NORMALIZED            0x8C17
+#define GL_FRAMEBUFFER_BINDING            0x8CA6
+#define GL_DRAW_FRAMEBUFFER_BINDING       0x8CA6
+#define GL_RENDERBUFFER_BINDING           0x8CA7
+#define GL_READ_FRAMEBUFFER               0x8CA8
+#define GL_DRAW_FRAMEBUFFER               0x8CA9
+#define GL_READ_FRAMEBUFFER_BINDING       0x8CAA
+#define GL_RENDERBUFFER_SAMPLES           0x8CAB
+#define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE 0x8CD0
+#define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME 0x8CD1
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL 0x8CD2
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE 0x8CD3
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER 0x8CD4
+#define GL_FRAMEBUFFER_COMPLETE           0x8CD5
+#define GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT 0x8CD6
+#define GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT 0x8CD7
+#define GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER 0x8CDB
+#define GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER 0x8CDC
+#define GL_FRAMEBUFFER_UNSUPPORTED        0x8CDD
+#define GL_MAX_COLOR_ATTACHMENTS          0x8CDF
+#define GL_COLOR_ATTACHMENT0              0x8CE0
+#define GL_COLOR_ATTACHMENT1              0x8CE1
+#define GL_COLOR_ATTACHMENT2              0x8CE2
+#define GL_COLOR_ATTACHMENT3              0x8CE3
+#define GL_COLOR_ATTACHMENT4              0x8CE4
+#define GL_COLOR_ATTACHMENT5              0x8CE5
+#define GL_COLOR_ATTACHMENT6              0x8CE6
+#define GL_COLOR_ATTACHMENT7              0x8CE7
+#define GL_COLOR_ATTACHMENT8              0x8CE8
+#define GL_COLOR_ATTACHMENT9              0x8CE9
+#define GL_COLOR_ATTACHMENT10             0x8CEA
+#define GL_COLOR_ATTACHMENT11             0x8CEB
+#define GL_COLOR_ATTACHMENT12             0x8CEC
+#define GL_COLOR_ATTACHMENT13             0x8CED
+#define GL_COLOR_ATTACHMENT14             0x8CEE
+#define GL_COLOR_ATTACHMENT15             0x8CEF
+#define GL_DEPTH_ATTACHMENT               0x8D00
+#define GL_STENCIL_ATTACHMENT             0x8D20
+#define GL_FRAMEBUFFER                    0x8D40
+#define GL_RENDERBUFFER                   0x8D41
+#define GL_RENDERBUFFER_WIDTH             0x8D42
+#define GL_RENDERBUFFER_HEIGHT            0x8D43
+#define GL_RENDERBUFFER_INTERNAL_FORMAT   0x8D44
+#define GL_STENCIL_INDEX1                 0x8D46
+#define GL_STENCIL_INDEX4                 0x8D47
+#define GL_STENCIL_INDEX8                 0x8D48
+#define GL_STENCIL_INDEX16                0x8D49
+#define GL_RENDERBUFFER_RED_SIZE          0x8D50
+#define GL_RENDERBUFFER_GREEN_SIZE        0x8D51
+#define GL_RENDERBUFFER_BLUE_SIZE         0x8D52
+#define GL_RENDERBUFFER_ALPHA_SIZE        0x8D53
+#define GL_RENDERBUFFER_DEPTH_SIZE        0x8D54
+#define GL_RENDERBUFFER_STENCIL_SIZE      0x8D55
+#define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE 0x8D56
+#define GL_MAX_SAMPLES                    0x8D57
+#define GL_INDEX                          0x8222
+#define GL_TEXTURE_LUMINANCE_TYPE         0x8C14
+#define GL_TEXTURE_INTENSITY_TYPE         0x8C15
+#define GL_FRAMEBUFFER_SRGB               0x8DB9
+#define GL_HALF_FLOAT                     0x140B
+#define GL_MAP_READ_BIT                   0x0001
+#define GL_MAP_WRITE_BIT                  0x0002
+#define GL_MAP_INVALIDATE_RANGE_BIT       0x0004
+#define GL_MAP_INVALIDATE_BUFFER_BIT      0x0008
+#define GL_MAP_FLUSH_EXPLICIT_BIT         0x0010
+#define GL_MAP_UNSYNCHRONIZED_BIT         0x0020
+#define GL_COMPRESSED_RED_RGTC1           0x8DBB
+#define GL_COMPRESSED_SIGNED_RED_RGTC1    0x8DBC
+#define GL_COMPRESSED_RG_RGTC2            0x8DBD
+#define GL_COMPRESSED_SIGNED_RG_RGTC2     0x8DBE
+#define GL_RG                             0x8227
+#define GL_RG_INTEGER                     0x8228
+#define GL_R8                             0x8229
+#define GL_R16                            0x822A
+#define GL_RG8                            0x822B
+#define GL_RG16                           0x822C
+#define GL_R16F                           0x822D
+#define GL_R32F                           0x822E
+#define GL_RG16F                          0x822F
+#define GL_RG32F                          0x8230
+#define GL_R8I                            0x8231
+#define GL_R8UI                           0x8232
+#define GL_R16I                           0x8233
+#define GL_R16UI                          0x8234
+#define GL_R32I                           0x8235
+#define GL_R32UI                          0x8236
+#define GL_RG8I                           0x8237
+#define GL_RG8UI                          0x8238
+#define GL_RG16I                          0x8239
+#define GL_RG16UI                         0x823A
+#define GL_RG32I                          0x823B
+#define GL_RG32UI                         0x823C
+#define GL_VERTEX_ARRAY_BINDING           0x85B5
+#define GL_CLAMP_VERTEX_COLOR             0x891A
+#define GL_CLAMP_FRAGMENT_COLOR           0x891B
+#define GL_ALPHA_INTEGER                  0x8D97
+typedef void (APIENTRYP PFNGLCOLORMASKIPROC) (GLuint index, GLboolean r, GLboolean g, GLboolean b, GLboolean a);
+typedef void (APIENTRYP PFNGLGETBOOLEANI_VPROC) (GLenum target, GLuint index, GLboolean *data);
+typedef void (APIENTRYP PFNGLGETINTEGERI_VPROC) (GLenum target, GLuint index, GLint *data);
+typedef void (APIENTRYP PFNGLENABLEIPROC) (GLenum target, GLuint index);
+typedef void (APIENTRYP PFNGLDISABLEIPROC) (GLenum target, GLuint index);
+typedef GLboolean (APIENTRYP PFNGLISENABLEDIPROC) (GLenum target, GLuint index);
+typedef void (APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKPROC) (GLenum primitiveMode);
+typedef void (APIENTRYP PFNGLENDTRANSFORMFEEDBACKPROC) (void);
+typedef void (APIENTRYP PFNGLBINDBUFFERRANGEPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLBINDBUFFERBASEPROC) (GLenum target, GLuint index, GLuint buffer);
+typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSPROC) (GLuint program, GLsizei count, const GLchar *const*varyings, GLenum bufferMode);
+typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
+typedef void (APIENTRYP PFNGLCLAMPCOLORPROC) (GLenum target, GLenum clamp);
+typedef void (APIENTRYP PFNGLBEGINCONDITIONALRENDERPROC) (GLuint id, GLenum mode);
+typedef void (APIENTRYP PFNGLENDCONDITIONALRENDERPROC) (void);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBIPOINTERPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIIVPROC) (GLuint index, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIUIVPROC) (GLuint index, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IPROC) (GLuint index, GLint x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IPROC) (GLuint index, GLint x, GLint y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IPROC) (GLuint index, GLint x, GLint y, GLint z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IPROC) (GLuint index, GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIPROC) (GLuint index, GLuint x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIPROC) (GLuint index, GLuint x, GLuint y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIPROC) (GLuint index, GLuint x, GLuint y, GLuint z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIPROC) (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIVPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIVPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIVPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIVPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4BVPROC) (GLuint index, const GLbyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4SVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UBVPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4USVPROC) (GLuint index, const GLushort *v);
+typedef void (APIENTRYP PFNGLGETUNIFORMUIVPROC) (GLuint program, GLint location, GLuint *params);
+typedef void (APIENTRYP PFNGLBINDFRAGDATALOCATIONPROC) (GLuint program, GLuint color, const GLchar *name);
+typedef GLint (APIENTRYP PFNGLGETFRAGDATALOCATIONPROC) (GLuint program, const GLchar *name);
+typedef void (APIENTRYP PFNGLUNIFORM1UIPROC) (GLint location, GLuint v0);
+typedef void (APIENTRYP PFNGLUNIFORM2UIPROC) (GLint location, GLuint v0, GLuint v1);
+typedef void (APIENTRYP PFNGLUNIFORM3UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2);
+typedef void (APIENTRYP PFNGLUNIFORM4UIPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+typedef void (APIENTRYP PFNGLUNIFORM1UIVPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLUNIFORM2UIVPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLUNIFORM3UIVPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLUNIFORM4UIVPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLTEXPARAMETERIIVPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLTEXPARAMETERIUIVPROC) (GLenum target, GLenum pname, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETTEXPARAMETERIIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETTEXPARAMETERIUIVPROC) (GLenum target, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLCLEARBUFFERIVPROC) (GLenum buffer, GLint drawbuffer, const GLint *value);
+typedef void (APIENTRYP PFNGLCLEARBUFFERUIVPROC) (GLenum buffer, GLint drawbuffer, const GLuint *value);
+typedef void (APIENTRYP PFNGLCLEARBUFFERFVPROC) (GLenum buffer, GLint drawbuffer, const GLfloat *value);
+typedef void (APIENTRYP PFNGLCLEARBUFFERFIPROC) (GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
+typedef const GLubyte *(APIENTRYP PFNGLGETSTRINGIPROC) (GLenum name, GLuint index);
+typedef GLboolean (APIENTRYP PFNGLISRENDERBUFFERPROC) (GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLBINDRENDERBUFFERPROC) (GLenum target, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLDELETERENDERBUFFERSPROC) (GLsizei n, const GLuint *renderbuffers);
+typedef void (APIENTRYP PFNGLGENRENDERBUFFERSPROC) (GLsizei n, GLuint *renderbuffers);
+typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef GLboolean (APIENTRYP PFNGLISFRAMEBUFFERPROC) (GLuint framebuffer);
+typedef void (APIENTRYP PFNGLBINDFRAMEBUFFERPROC) (GLenum target, GLuint framebuffer);
+typedef void (APIENTRYP PFNGLDELETEFRAMEBUFFERSPROC) (GLsizei n, const GLuint *framebuffers);
+typedef void (APIENTRYP PFNGLGENFRAMEBUFFERSPROC) (GLsizei n, GLuint *framebuffers);
+typedef GLenum (APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE1DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE3DPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFERPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGENERATEMIPMAPPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLBLITFRAMEBUFFERPROC) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
+typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
+typedef void *(APIENTRYP PFNGLMAPBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length, GLbitfield access);
+typedef void (APIENTRYP PFNGLFLUSHMAPPEDBUFFERRANGEPROC) (GLenum target, GLintptr offset, GLsizeiptr length);
+typedef void (APIENTRYP PFNGLBINDVERTEXARRAYPROC) (GLuint array);
+typedef void (APIENTRYP PFNGLDELETEVERTEXARRAYSPROC) (GLsizei n, const GLuint *arrays);
+typedef void (APIENTRYP PFNGLGENVERTEXARRAYSPROC) (GLsizei n, GLuint *arrays);
+typedef GLboolean (APIENTRYP PFNGLISVERTEXARRAYPROC) (GLuint array);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorMaski (GLuint index, GLboolean r, GLboolean g, GLboolean b, GLboolean a);
+GLAPI void APIENTRY glGetBooleani_v (GLenum target, GLuint index, GLboolean *data);
+GLAPI void APIENTRY glGetIntegeri_v (GLenum target, GLuint index, GLint *data);
+GLAPI void APIENTRY glEnablei (GLenum target, GLuint index);
+GLAPI void APIENTRY glDisablei (GLenum target, GLuint index);
+GLAPI GLboolean APIENTRY glIsEnabledi (GLenum target, GLuint index);
+GLAPI void APIENTRY glBeginTransformFeedback (GLenum primitiveMode);
+GLAPI void APIENTRY glEndTransformFeedback (void);
+GLAPI void APIENTRY glBindBufferRange (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
+GLAPI void APIENTRY glBindBufferBase (GLenum target, GLuint index, GLuint buffer);
+GLAPI void APIENTRY glTransformFeedbackVaryings (GLuint program, GLsizei count, const GLchar *const*varyings, GLenum bufferMode);
+GLAPI void APIENTRY glGetTransformFeedbackVarying (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
+GLAPI void APIENTRY glClampColor (GLenum target, GLenum clamp);
+GLAPI void APIENTRY glBeginConditionalRender (GLuint id, GLenum mode);
+GLAPI void APIENTRY glEndConditionalRender (void);
+GLAPI void APIENTRY glVertexAttribIPointer (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glGetVertexAttribIiv (GLuint index, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribIuiv (GLuint index, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glVertexAttribI1i (GLuint index, GLint x);
+GLAPI void APIENTRY glVertexAttribI2i (GLuint index, GLint x, GLint y);
+GLAPI void APIENTRY glVertexAttribI3i (GLuint index, GLint x, GLint y, GLint z);
+GLAPI void APIENTRY glVertexAttribI4i (GLuint index, GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glVertexAttribI1ui (GLuint index, GLuint x);
+GLAPI void APIENTRY glVertexAttribI2ui (GLuint index, GLuint x, GLuint y);
+GLAPI void APIENTRY glVertexAttribI3ui (GLuint index, GLuint x, GLuint y, GLuint z);
+GLAPI void APIENTRY glVertexAttribI4ui (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+GLAPI void APIENTRY glVertexAttribI1iv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI2iv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI3iv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI4iv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI1uiv (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI2uiv (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI3uiv (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI4uiv (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI4bv (GLuint index, const GLbyte *v);
+GLAPI void APIENTRY glVertexAttribI4sv (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttribI4ubv (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttribI4usv (GLuint index, const GLushort *v);
+GLAPI void APIENTRY glGetUniformuiv (GLuint program, GLint location, GLuint *params);
+GLAPI void APIENTRY glBindFragDataLocation (GLuint program, GLuint color, const GLchar *name);
+GLAPI GLint APIENTRY glGetFragDataLocation (GLuint program, const GLchar *name);
+GLAPI void APIENTRY glUniform1ui (GLint location, GLuint v0);
+GLAPI void APIENTRY glUniform2ui (GLint location, GLuint v0, GLuint v1);
+GLAPI void APIENTRY glUniform3ui (GLint location, GLuint v0, GLuint v1, GLuint v2);
+GLAPI void APIENTRY glUniform4ui (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+GLAPI void APIENTRY glUniform1uiv (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glUniform2uiv (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glUniform3uiv (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glUniform4uiv (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glTexParameterIiv (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glTexParameterIuiv (GLenum target, GLenum pname, const GLuint *params);
+GLAPI void APIENTRY glGetTexParameterIiv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetTexParameterIuiv (GLenum target, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glClearBufferiv (GLenum buffer, GLint drawbuffer, const GLint *value);
+GLAPI void APIENTRY glClearBufferuiv (GLenum buffer, GLint drawbuffer, const GLuint *value);
+GLAPI void APIENTRY glClearBufferfv (GLenum buffer, GLint drawbuffer, const GLfloat *value);
+GLAPI void APIENTRY glClearBufferfi (GLenum buffer, GLint drawbuffer, GLfloat depth, GLint stencil);
+GLAPI const GLubyte *APIENTRY glGetStringi (GLenum name, GLuint index);
+GLAPI GLboolean APIENTRY glIsRenderbuffer (GLuint renderbuffer);
+GLAPI void APIENTRY glBindRenderbuffer (GLenum target, GLuint renderbuffer);
+GLAPI void APIENTRY glDeleteRenderbuffers (GLsizei n, const GLuint *renderbuffers);
+GLAPI void APIENTRY glGenRenderbuffers (GLsizei n, GLuint *renderbuffers);
+GLAPI void APIENTRY glRenderbufferStorage (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetRenderbufferParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI GLboolean APIENTRY glIsFramebuffer (GLuint framebuffer);
+GLAPI void APIENTRY glBindFramebuffer (GLenum target, GLuint framebuffer);
+GLAPI void APIENTRY glDeleteFramebuffers (GLsizei n, const GLuint *framebuffers);
+GLAPI void APIENTRY glGenFramebuffers (GLsizei n, GLuint *framebuffers);
+GLAPI GLenum APIENTRY glCheckFramebufferStatus (GLenum target);
+GLAPI void APIENTRY glFramebufferTexture1D (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+GLAPI void APIENTRY glFramebufferTexture2D (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+GLAPI void APIENTRY glFramebufferTexture3D (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset);
+GLAPI void APIENTRY glFramebufferRenderbuffer (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
+GLAPI void APIENTRY glGetFramebufferAttachmentParameteriv (GLenum target, GLenum attachment, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGenerateMipmap (GLenum target);
+GLAPI void APIENTRY glBlitFramebuffer (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
+GLAPI void APIENTRY glRenderbufferStorageMultisample (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glFramebufferTextureLayer (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
+GLAPI void *APIENTRY glMapBufferRange (GLenum target, GLintptr offset, GLsizeiptr length, GLbitfield access);
+GLAPI void APIENTRY glFlushMappedBufferRange (GLenum target, GLintptr offset, GLsizeiptr length);
+GLAPI void APIENTRY glBindVertexArray (GLuint array);
+GLAPI void APIENTRY glDeleteVertexArrays (GLsizei n, const GLuint *arrays);
+GLAPI void APIENTRY glGenVertexArrays (GLsizei n, GLuint *arrays);
+GLAPI GLboolean APIENTRY glIsVertexArray (GLuint array);
+#endif
+#endif /* GL_VERSION_3_0 */
+
+#ifndef GL_VERSION_3_1
+#define GL_VERSION_3_1 1
+#define GL_SAMPLER_2D_RECT                0x8B63
+#define GL_SAMPLER_2D_RECT_SHADOW         0x8B64
+#define GL_SAMPLER_BUFFER                 0x8DC2
+#define GL_INT_SAMPLER_2D_RECT            0x8DCD
+#define GL_INT_SAMPLER_BUFFER             0x8DD0
+#define GL_UNSIGNED_INT_SAMPLER_2D_RECT   0x8DD5
+#define GL_UNSIGNED_INT_SAMPLER_BUFFER    0x8DD8
+#define GL_TEXTURE_BUFFER                 0x8C2A
+#define GL_MAX_TEXTURE_BUFFER_SIZE        0x8C2B
+#define GL_TEXTURE_BINDING_BUFFER         0x8C2C
+#define GL_TEXTURE_BUFFER_DATA_STORE_BINDING 0x8C2D
+#define GL_TEXTURE_RECTANGLE              0x84F5
+#define GL_TEXTURE_BINDING_RECTANGLE      0x84F6
+#define GL_PROXY_TEXTURE_RECTANGLE        0x84F7
+#define GL_MAX_RECTANGLE_TEXTURE_SIZE     0x84F8
+#define GL_R8_SNORM                       0x8F94
+#define GL_RG8_SNORM                      0x8F95
+#define GL_RGB8_SNORM                     0x8F96
+#define GL_RGBA8_SNORM                    0x8F97
+#define GL_R16_SNORM                      0x8F98
+#define GL_RG16_SNORM                     0x8F99
+#define GL_RGB16_SNORM                    0x8F9A
+#define GL_RGBA16_SNORM                   0x8F9B
+#define GL_SIGNED_NORMALIZED              0x8F9C
+#define GL_PRIMITIVE_RESTART              0x8F9D
+#define GL_PRIMITIVE_RESTART_INDEX        0x8F9E
+#define GL_COPY_READ_BUFFER               0x8F36
+#define GL_COPY_WRITE_BUFFER              0x8F37
+#define GL_UNIFORM_BUFFER                 0x8A11
+#define GL_UNIFORM_BUFFER_BINDING         0x8A28
+#define GL_UNIFORM_BUFFER_START           0x8A29
+#define GL_UNIFORM_BUFFER_SIZE            0x8A2A
+#define GL_MAX_VERTEX_UNIFORM_BLOCKS      0x8A2B
+#define GL_MAX_FRAGMENT_UNIFORM_BLOCKS    0x8A2D
+#define GL_MAX_COMBINED_UNIFORM_BLOCKS    0x8A2E
+#define GL_MAX_UNIFORM_BUFFER_BINDINGS    0x8A2F
+#define GL_MAX_UNIFORM_BLOCK_SIZE         0x8A30
+#define GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS 0x8A31
+#define GL_MAX_COMBINED_FRAGMENT_UNIFORM_COMPONENTS 0x8A33
+#define GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT 0x8A34
+#define GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH 0x8A35
+#define GL_ACTIVE_UNIFORM_BLOCKS          0x8A36
+#define GL_UNIFORM_TYPE                   0x8A37
+#define GL_UNIFORM_SIZE                   0x8A38
+#define GL_UNIFORM_NAME_LENGTH            0x8A39
+#define GL_UNIFORM_BLOCK_INDEX            0x8A3A
+#define GL_UNIFORM_OFFSET                 0x8A3B
+#define GL_UNIFORM_ARRAY_STRIDE           0x8A3C
+#define GL_UNIFORM_MATRIX_STRIDE          0x8A3D
+#define GL_UNIFORM_IS_ROW_MAJOR           0x8A3E
+#define GL_UNIFORM_BLOCK_BINDING          0x8A3F
+#define GL_UNIFORM_BLOCK_DATA_SIZE        0x8A40
+#define GL_UNIFORM_BLOCK_NAME_LENGTH      0x8A41
+#define GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS  0x8A42
+#define GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES 0x8A43
+#define GL_UNIFORM_BLOCK_REFERENCED_BY_VERTEX_SHADER 0x8A44
+#define GL_UNIFORM_BLOCK_REFERENCED_BY_FRAGMENT_SHADER 0x8A46
+#define GL_INVALID_INDEX                  0xFFFFFFFFu
+typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDPROC) (GLenum mode, GLint first, GLsizei count, GLsizei instancecount);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount);
+typedef void (APIENTRYP PFNGLTEXBUFFERPROC) (GLenum target, GLenum internalformat, GLuint buffer);
+typedef void (APIENTRYP PFNGLPRIMITIVERESTARTINDEXPROC) (GLuint index);
+typedef void (APIENTRYP PFNGLCOPYBUFFERSUBDATAPROC) (GLenum readTarget, GLenum writeTarget, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLGETUNIFORMINDICESPROC) (GLuint program, GLsizei uniformCount, const GLchar *const*uniformNames, GLuint *uniformIndices);
+typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMSIVPROC) (GLuint program, GLsizei uniformCount, const GLuint *uniformIndices, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMNAMEPROC) (GLuint program, GLuint uniformIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformName);
+typedef GLuint (APIENTRYP PFNGLGETUNIFORMBLOCKINDEXPROC) (GLuint program, const GLchar *uniformBlockName);
+typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKIVPROC) (GLuint program, GLuint uniformBlockIndex, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMBLOCKNAMEPROC) (GLuint program, GLuint uniformBlockIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformBlockName);
+typedef void (APIENTRYP PFNGLUNIFORMBLOCKBINDINGPROC) (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawArraysInstanced (GLenum mode, GLint first, GLsizei count, GLsizei instancecount);
+GLAPI void APIENTRY glDrawElementsInstanced (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount);
+GLAPI void APIENTRY glTexBuffer (GLenum target, GLenum internalformat, GLuint buffer);
+GLAPI void APIENTRY glPrimitiveRestartIndex (GLuint index);
+GLAPI void APIENTRY glCopyBufferSubData (GLenum readTarget, GLenum writeTarget, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
+GLAPI void APIENTRY glGetUniformIndices (GLuint program, GLsizei uniformCount, const GLchar *const*uniformNames, GLuint *uniformIndices);
+GLAPI void APIENTRY glGetActiveUniformsiv (GLuint program, GLsizei uniformCount, const GLuint *uniformIndices, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetActiveUniformName (GLuint program, GLuint uniformIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformName);
+GLAPI GLuint APIENTRY glGetUniformBlockIndex (GLuint program, const GLchar *uniformBlockName);
+GLAPI void APIENTRY glGetActiveUniformBlockiv (GLuint program, GLuint uniformBlockIndex, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetActiveUniformBlockName (GLuint program, GLuint uniformBlockIndex, GLsizei bufSize, GLsizei *length, GLchar *uniformBlockName);
+GLAPI void APIENTRY glUniformBlockBinding (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding);
+#endif
+#endif /* GL_VERSION_3_1 */
+
+#ifndef GL_VERSION_3_2
+#define GL_VERSION_3_2 1
+typedef struct __GLsync *GLsync;
+#ifndef GLEXT_64_TYPES_DEFINED
+/* This code block is duplicated in glxext.h, so must be protected */
+#define GLEXT_64_TYPES_DEFINED
+/* Define int32_t, int64_t, and uint64_t types for UST/MSC */
+/* (as used in the GL_EXT_timer_query extension). */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+#include <inttypes.h>
+#elif defined(__sun__) || defined(__digital__)
+#include <inttypes.h>
+#if defined(__STDC__)
+#if defined(__arch64__) || defined(_LP64)
+typedef long int int64_t;
+typedef unsigned long int uint64_t;
+#else
+typedef long long int int64_t;
+typedef unsigned long long int uint64_t;
+#endif /* __arch64__ */
+#endif /* __STDC__ */
+#elif defined( __VMS ) || defined(__sgi)
+#include <inttypes.h>
+#elif defined(__SCO__) || defined(__USLC__)
+#include <stdint.h>
+#elif defined(__UNIXOS2__) || defined(__SOL64__)
+typedef long int int32_t;
+typedef long long int int64_t;
+typedef unsigned long long int uint64_t;
+#elif defined(_WIN32) && defined(__GNUC__)
+#include <stdint.h>
+#elif defined(_WIN32)
+typedef __int32 int32_t;
+typedef __int64 int64_t;
+typedef unsigned __int64 uint64_t;
+#else
+/* Fallback if nothing above works */
+#include <inttypes.h>
+#endif
+#endif
+typedef uint64_t GLuint64;
+typedef int64_t GLint64;
+#define GL_CONTEXT_CORE_PROFILE_BIT       0x00000001
+#define GL_CONTEXT_COMPATIBILITY_PROFILE_BIT 0x00000002
+#define GL_LINES_ADJACENCY                0x000A
+#define GL_LINE_STRIP_ADJACENCY           0x000B
+#define GL_TRIANGLES_ADJACENCY            0x000C
+#define GL_TRIANGLE_STRIP_ADJACENCY       0x000D
+#define GL_PROGRAM_POINT_SIZE             0x8642
+#define GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS 0x8C29
+#define GL_FRAMEBUFFER_ATTACHMENT_LAYERED 0x8DA7
+#define GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS 0x8DA8
+#define GL_GEOMETRY_SHADER                0x8DD9
+#define GL_GEOMETRY_VERTICES_OUT          0x8916
+#define GL_GEOMETRY_INPUT_TYPE            0x8917
+#define GL_GEOMETRY_OUTPUT_TYPE           0x8918
+#define GL_MAX_GEOMETRY_UNIFORM_COMPONENTS 0x8DDF
+#define GL_MAX_GEOMETRY_OUTPUT_VERTICES   0x8DE0
+#define GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS 0x8DE1
+#define GL_MAX_VERTEX_OUTPUT_COMPONENTS   0x9122
+#define GL_MAX_GEOMETRY_INPUT_COMPONENTS  0x9123
+#define GL_MAX_GEOMETRY_OUTPUT_COMPONENTS 0x9124
+#define GL_MAX_FRAGMENT_INPUT_COMPONENTS  0x9125
+#define GL_CONTEXT_PROFILE_MASK           0x9126
+#define GL_DEPTH_CLAMP                    0x864F
+#define GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION 0x8E4C
+#define GL_FIRST_VERTEX_CONVENTION        0x8E4D
+#define GL_LAST_VERTEX_CONVENTION         0x8E4E
+#define GL_PROVOKING_VERTEX               0x8E4F
+#define GL_TEXTURE_CUBE_MAP_SEAMLESS      0x884F
+#define GL_MAX_SERVER_WAIT_TIMEOUT        0x9111
+#define GL_OBJECT_TYPE                    0x9112
+#define GL_SYNC_CONDITION                 0x9113
+#define GL_SYNC_STATUS                    0x9114
+#define GL_SYNC_FLAGS                     0x9115
+#define GL_SYNC_FENCE                     0x9116
+#define GL_SYNC_GPU_COMMANDS_COMPLETE     0x9117
+#define GL_UNSIGNALED                     0x9118
+#define GL_SIGNALED                       0x9119
+#define GL_ALREADY_SIGNALED               0x911A
+#define GL_TIMEOUT_EXPIRED                0x911B
+#define GL_CONDITION_SATISFIED            0x911C
+#define GL_WAIT_FAILED                    0x911D
+#define GL_TIMEOUT_IGNORED                0xFFFFFFFFFFFFFFFFull
+#define GL_SYNC_FLUSH_COMMANDS_BIT        0x00000001
+#define GL_SAMPLE_POSITION                0x8E50
+#define GL_SAMPLE_MASK                    0x8E51
+#define GL_SAMPLE_MASK_VALUE              0x8E52
+#define GL_MAX_SAMPLE_MASK_WORDS          0x8E59
+#define GL_TEXTURE_2D_MULTISAMPLE         0x9100
+#define GL_PROXY_TEXTURE_2D_MULTISAMPLE   0x9101
+#define GL_TEXTURE_2D_MULTISAMPLE_ARRAY   0x9102
+#define GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY 0x9103
+#define GL_TEXTURE_BINDING_2D_MULTISAMPLE 0x9104
+#define GL_TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY 0x9105
+#define GL_TEXTURE_SAMPLES                0x9106
+#define GL_TEXTURE_FIXED_SAMPLE_LOCATIONS 0x9107
+#define GL_SAMPLER_2D_MULTISAMPLE         0x9108
+#define GL_INT_SAMPLER_2D_MULTISAMPLE     0x9109
+#define GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE 0x910A
+#define GL_SAMPLER_2D_MULTISAMPLE_ARRAY   0x910B
+#define GL_INT_SAMPLER_2D_MULTISAMPLE_ARRAY 0x910C
+#define GL_UNSIGNED_INT_SAMPLER_2D_MULTISAMPLE_ARRAY 0x910D
+#define GL_MAX_COLOR_TEXTURE_SAMPLES      0x910E
+#define GL_MAX_DEPTH_TEXTURE_SAMPLES      0x910F
+#define GL_MAX_INTEGER_SAMPLES            0x9110
+typedef void (APIENTRYP PFNGLDRAWELEMENTSBASEVERTEXPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLint basevertex);
+typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTSBASEVERTEXPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices, GLint basevertex);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDBASEVERTEXPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLint basevertex);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSBASEVERTEXPROC) (GLenum mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei drawcount, const GLint *basevertex);
+typedef void (APIENTRYP PFNGLPROVOKINGVERTEXPROC) (GLenum mode);
+typedef GLsync (APIENTRYP PFNGLFENCESYNCPROC) (GLenum condition, GLbitfield flags);
+typedef GLboolean (APIENTRYP PFNGLISSYNCPROC) (GLsync sync);
+typedef void (APIENTRYP PFNGLDELETESYNCPROC) (GLsync sync);
+typedef GLenum (APIENTRYP PFNGLCLIENTWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout);
+typedef void (APIENTRYP PFNGLWAITSYNCPROC) (GLsync sync, GLbitfield flags, GLuint64 timeout);
+typedef void (APIENTRYP PFNGLGETINTEGER64VPROC) (GLenum pname, GLint64 *data);
+typedef void (APIENTRYP PFNGLGETSYNCIVPROC) (GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values);
+typedef void (APIENTRYP PFNGLGETINTEGER64I_VPROC) (GLenum target, GLuint index, GLint64 *data);
+typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERI64VPROC) (GLenum target, GLenum pname, GLint64 *params);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLTEXIMAGE2DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
+typedef void (APIENTRYP PFNGLTEXIMAGE3DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations);
+typedef void (APIENTRYP PFNGLGETMULTISAMPLEFVPROC) (GLenum pname, GLuint index, GLfloat *val);
+typedef void (APIENTRYP PFNGLSAMPLEMASKIPROC) (GLuint maskNumber, GLbitfield mask);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawElementsBaseVertex (GLenum mode, GLsizei count, GLenum type, const void *indices, GLint basevertex);
+GLAPI void APIENTRY glDrawRangeElementsBaseVertex (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices, GLint basevertex);
+GLAPI void APIENTRY glDrawElementsInstancedBaseVertex (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLint basevertex);
+GLAPI void APIENTRY glMultiDrawElementsBaseVertex (GLenum mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei drawcount, const GLint *basevertex);
+GLAPI void APIENTRY glProvokingVertex (GLenum mode);
+GLAPI GLsync APIENTRY glFenceSync (GLenum condition, GLbitfield flags);
+GLAPI GLboolean APIENTRY glIsSync (GLsync sync);
+GLAPI void APIENTRY glDeleteSync (GLsync sync);
+GLAPI GLenum APIENTRY glClientWaitSync (GLsync sync, GLbitfield flags, GLuint64 timeout);
+GLAPI void APIENTRY glWaitSync (GLsync sync, GLbitfield flags, GLuint64 timeout);
+GLAPI void APIENTRY glGetInteger64v (GLenum pname, GLint64 *data);
+GLAPI void APIENTRY glGetSynciv (GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values);
+GLAPI void APIENTRY glGetInteger64i_v (GLenum target, GLuint index, GLint64 *data);
+GLAPI void APIENTRY glGetBufferParameteri64v (GLenum target, GLenum pname, GLint64 *params);
+GLAPI void APIENTRY glFramebufferTexture (GLenum target, GLenum attachment, GLuint texture, GLint level);
+GLAPI void APIENTRY glTexImage2DMultisample (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
+GLAPI void APIENTRY glTexImage3DMultisample (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations);
+GLAPI void APIENTRY glGetMultisamplefv (GLenum pname, GLuint index, GLfloat *val);
+GLAPI void APIENTRY glSampleMaski (GLuint maskNumber, GLbitfield mask);
+#endif
+#endif /* GL_VERSION_3_2 */
+
+#ifndef GL_VERSION_3_3
+#define GL_VERSION_3_3 1
+#define GL_VERTEX_ATTRIB_ARRAY_DIVISOR    0x88FE
+#define GL_SRC1_COLOR                     0x88F9
+#define GL_ONE_MINUS_SRC1_COLOR           0x88FA
+#define GL_ONE_MINUS_SRC1_ALPHA           0x88FB
+#define GL_MAX_DUAL_SOURCE_DRAW_BUFFERS   0x88FC
+#define GL_ANY_SAMPLES_PASSED             0x8C2F
+#define GL_SAMPLER_BINDING                0x8919
+#define GL_RGB10_A2UI                     0x906F
+#define GL_TEXTURE_SWIZZLE_R              0x8E42
+#define GL_TEXTURE_SWIZZLE_G              0x8E43
+#define GL_TEXTURE_SWIZZLE_B              0x8E44
+#define GL_TEXTURE_SWIZZLE_A              0x8E45
+#define GL_TEXTURE_SWIZZLE_RGBA           0x8E46
+#define GL_TIME_ELAPSED                   0x88BF
+#define GL_TIMESTAMP                      0x8E28
+#define GL_INT_2_10_10_10_REV             0x8D9F
+typedef void (APIENTRYP PFNGLBINDFRAGDATALOCATIONINDEXEDPROC) (GLuint program, GLuint colorNumber, GLuint index, const GLchar *name);
+typedef GLint (APIENTRYP PFNGLGETFRAGDATAINDEXPROC) (GLuint program, const GLchar *name);
+typedef void (APIENTRYP PFNGLGENSAMPLERSPROC) (GLsizei count, GLuint *samplers);
+typedef void (APIENTRYP PFNGLDELETESAMPLERSPROC) (GLsizei count, const GLuint *samplers);
+typedef GLboolean (APIENTRYP PFNGLISSAMPLERPROC) (GLuint sampler);
+typedef void (APIENTRYP PFNGLBINDSAMPLERPROC) (GLuint unit, GLuint sampler);
+typedef void (APIENTRYP PFNGLSAMPLERPARAMETERIPROC) (GLuint sampler, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLSAMPLERPARAMETERIVPROC) (GLuint sampler, GLenum pname, const GLint *param);
+typedef void (APIENTRYP PFNGLSAMPLERPARAMETERFPROC) (GLuint sampler, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLSAMPLERPARAMETERFVPROC) (GLuint sampler, GLenum pname, const GLfloat *param);
+typedef void (APIENTRYP PFNGLSAMPLERPARAMETERIIVPROC) (GLuint sampler, GLenum pname, const GLint *param);
+typedef void (APIENTRYP PFNGLSAMPLERPARAMETERIUIVPROC) (GLuint sampler, GLenum pname, const GLuint *param);
+typedef void (APIENTRYP PFNGLGETSAMPLERPARAMETERIVPROC) (GLuint sampler, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETSAMPLERPARAMETERIIVPROC) (GLuint sampler, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETSAMPLERPARAMETERFVPROC) (GLuint sampler, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETSAMPLERPARAMETERIUIVPROC) (GLuint sampler, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLQUERYCOUNTERPROC) (GLuint id, GLenum target);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTI64VPROC) (GLuint id, GLenum pname, GLint64 *params);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTUI64VPROC) (GLuint id, GLenum pname, GLuint64 *params);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBDIVISORPROC) (GLuint index, GLuint divisor);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP1UIPROC) (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP1UIVPROC) (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP2UIPROC) (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP2UIVPROC) (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP3UIPROC) (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP3UIVPROC) (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP4UIPROC) (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBP4UIVPROC) (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+typedef void (APIENTRYP PFNGLVERTEXP2UIPROC) (GLenum type, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXP2UIVPROC) (GLenum type, const GLuint *value);
+typedef void (APIENTRYP PFNGLVERTEXP3UIPROC) (GLenum type, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXP3UIVPROC) (GLenum type, const GLuint *value);
+typedef void (APIENTRYP PFNGLVERTEXP4UIPROC) (GLenum type, GLuint value);
+typedef void (APIENTRYP PFNGLVERTEXP4UIVPROC) (GLenum type, const GLuint *value);
+typedef void (APIENTRYP PFNGLTEXCOORDP1UIPROC) (GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP1UIVPROC) (GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP2UIPROC) (GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP2UIVPROC) (GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP3UIPROC) (GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP3UIVPROC) (GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP4UIPROC) (GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLTEXCOORDP4UIVPROC) (GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP1UIPROC) (GLenum texture, GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP1UIVPROC) (GLenum texture, GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP2UIPROC) (GLenum texture, GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP2UIVPROC) (GLenum texture, GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP3UIPROC) (GLenum texture, GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP3UIVPROC) (GLenum texture, GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP4UIPROC) (GLenum texture, GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDP4UIVPROC) (GLenum texture, GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLNORMALP3UIPROC) (GLenum type, GLuint coords);
+typedef void (APIENTRYP PFNGLNORMALP3UIVPROC) (GLenum type, const GLuint *coords);
+typedef void (APIENTRYP PFNGLCOLORP3UIPROC) (GLenum type, GLuint color);
+typedef void (APIENTRYP PFNGLCOLORP3UIVPROC) (GLenum type, const GLuint *color);
+typedef void (APIENTRYP PFNGLCOLORP4UIPROC) (GLenum type, GLuint color);
+typedef void (APIENTRYP PFNGLCOLORP4UIVPROC) (GLenum type, const GLuint *color);
+typedef void (APIENTRYP PFNGLSECONDARYCOLORP3UIPROC) (GLenum type, GLuint color);
+typedef void (APIENTRYP PFNGLSECONDARYCOLORP3UIVPROC) (GLenum type, const GLuint *color);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBindFragDataLocationIndexed (GLuint program, GLuint colorNumber, GLuint index, const GLchar *name);
+GLAPI GLint APIENTRY glGetFragDataIndex (GLuint program, const GLchar *name);
+GLAPI void APIENTRY glGenSamplers (GLsizei count, GLuint *samplers);
+GLAPI void APIENTRY glDeleteSamplers (GLsizei count, const GLuint *samplers);
+GLAPI GLboolean APIENTRY glIsSampler (GLuint sampler);
+GLAPI void APIENTRY glBindSampler (GLuint unit, GLuint sampler);
+GLAPI void APIENTRY glSamplerParameteri (GLuint sampler, GLenum pname, GLint param);
+GLAPI void APIENTRY glSamplerParameteriv (GLuint sampler, GLenum pname, const GLint *param);
+GLAPI void APIENTRY glSamplerParameterf (GLuint sampler, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glSamplerParameterfv (GLuint sampler, GLenum pname, const GLfloat *param);
+GLAPI void APIENTRY glSamplerParameterIiv (GLuint sampler, GLenum pname, const GLint *param);
+GLAPI void APIENTRY glSamplerParameterIuiv (GLuint sampler, GLenum pname, const GLuint *param);
+GLAPI void APIENTRY glGetSamplerParameteriv (GLuint sampler, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetSamplerParameterIiv (GLuint sampler, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetSamplerParameterfv (GLuint sampler, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetSamplerParameterIuiv (GLuint sampler, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glQueryCounter (GLuint id, GLenum target);
+GLAPI void APIENTRY glGetQueryObjecti64v (GLuint id, GLenum pname, GLint64 *params);
+GLAPI void APIENTRY glGetQueryObjectui64v (GLuint id, GLenum pname, GLuint64 *params);
+GLAPI void APIENTRY glVertexAttribDivisor (GLuint index, GLuint divisor);
+GLAPI void APIENTRY glVertexAttribP1ui (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+GLAPI void APIENTRY glVertexAttribP1uiv (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+GLAPI void APIENTRY glVertexAttribP2ui (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+GLAPI void APIENTRY glVertexAttribP2uiv (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+GLAPI void APIENTRY glVertexAttribP3ui (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+GLAPI void APIENTRY glVertexAttribP3uiv (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+GLAPI void APIENTRY glVertexAttribP4ui (GLuint index, GLenum type, GLboolean normalized, GLuint value);
+GLAPI void APIENTRY glVertexAttribP4uiv (GLuint index, GLenum type, GLboolean normalized, const GLuint *value);
+GLAPI void APIENTRY glVertexP2ui (GLenum type, GLuint value);
+GLAPI void APIENTRY glVertexP2uiv (GLenum type, const GLuint *value);
+GLAPI void APIENTRY glVertexP3ui (GLenum type, GLuint value);
+GLAPI void APIENTRY glVertexP3uiv (GLenum type, const GLuint *value);
+GLAPI void APIENTRY glVertexP4ui (GLenum type, GLuint value);
+GLAPI void APIENTRY glVertexP4uiv (GLenum type, const GLuint *value);
+GLAPI void APIENTRY glTexCoordP1ui (GLenum type, GLuint coords);
+GLAPI void APIENTRY glTexCoordP1uiv (GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glTexCoordP2ui (GLenum type, GLuint coords);
+GLAPI void APIENTRY glTexCoordP2uiv (GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glTexCoordP3ui (GLenum type, GLuint coords);
+GLAPI void APIENTRY glTexCoordP3uiv (GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glTexCoordP4ui (GLenum type, GLuint coords);
+GLAPI void APIENTRY glTexCoordP4uiv (GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glMultiTexCoordP1ui (GLenum texture, GLenum type, GLuint coords);
+GLAPI void APIENTRY glMultiTexCoordP1uiv (GLenum texture, GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glMultiTexCoordP2ui (GLenum texture, GLenum type, GLuint coords);
+GLAPI void APIENTRY glMultiTexCoordP2uiv (GLenum texture, GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glMultiTexCoordP3ui (GLenum texture, GLenum type, GLuint coords);
+GLAPI void APIENTRY glMultiTexCoordP3uiv (GLenum texture, GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glMultiTexCoordP4ui (GLenum texture, GLenum type, GLuint coords);
+GLAPI void APIENTRY glMultiTexCoordP4uiv (GLenum texture, GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glNormalP3ui (GLenum type, GLuint coords);
+GLAPI void APIENTRY glNormalP3uiv (GLenum type, const GLuint *coords);
+GLAPI void APIENTRY glColorP3ui (GLenum type, GLuint color);
+GLAPI void APIENTRY glColorP3uiv (GLenum type, const GLuint *color);
+GLAPI void APIENTRY glColorP4ui (GLenum type, GLuint color);
+GLAPI void APIENTRY glColorP4uiv (GLenum type, const GLuint *color);
+GLAPI void APIENTRY glSecondaryColorP3ui (GLenum type, GLuint color);
+GLAPI void APIENTRY glSecondaryColorP3uiv (GLenum type, const GLuint *color);
+#endif
+#endif /* GL_VERSION_3_3 */
+
+#ifndef GL_VERSION_4_0
+#define GL_VERSION_4_0 1
+#define GL_SAMPLE_SHADING                 0x8C36
+#define GL_MIN_SAMPLE_SHADING_VALUE       0x8C37
+#define GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET 0x8E5E
+#define GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET 0x8E5F
+#define GL_TEXTURE_CUBE_MAP_ARRAY         0x9009
+#define GL_TEXTURE_BINDING_CUBE_MAP_ARRAY 0x900A
+#define GL_PROXY_TEXTURE_CUBE_MAP_ARRAY   0x900B
+#define GL_SAMPLER_CUBE_MAP_ARRAY         0x900C
+#define GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW  0x900D
+#define GL_INT_SAMPLER_CUBE_MAP_ARRAY     0x900E
+#define GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY 0x900F
+#define GL_DRAW_INDIRECT_BUFFER           0x8F3F
+#define GL_DRAW_INDIRECT_BUFFER_BINDING   0x8F43
+#define GL_GEOMETRY_SHADER_INVOCATIONS    0x887F
+#define GL_MAX_GEOMETRY_SHADER_INVOCATIONS 0x8E5A
+#define GL_MIN_FRAGMENT_INTERPOLATION_OFFSET 0x8E5B
+#define GL_MAX_FRAGMENT_INTERPOLATION_OFFSET 0x8E5C
+#define GL_FRAGMENT_INTERPOLATION_OFFSET_BITS 0x8E5D
+#define GL_MAX_VERTEX_STREAMS             0x8E71
+#define GL_DOUBLE_VEC2                    0x8FFC
+#define GL_DOUBLE_VEC3                    0x8FFD
+#define GL_DOUBLE_VEC4                    0x8FFE
+#define GL_DOUBLE_MAT2                    0x8F46
+#define GL_DOUBLE_MAT3                    0x8F47
+#define GL_DOUBLE_MAT4                    0x8F48
+#define GL_DOUBLE_MAT2x3                  0x8F49
+#define GL_DOUBLE_MAT2x4                  0x8F4A
+#define GL_DOUBLE_MAT3x2                  0x8F4B
+#define GL_DOUBLE_MAT3x4                  0x8F4C
+#define GL_DOUBLE_MAT4x2                  0x8F4D
+#define GL_DOUBLE_MAT4x3                  0x8F4E
+#define GL_ACTIVE_SUBROUTINES             0x8DE5
+#define GL_ACTIVE_SUBROUTINE_UNIFORMS     0x8DE6
+#define GL_ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS 0x8E47
+#define GL_ACTIVE_SUBROUTINE_MAX_LENGTH   0x8E48
+#define GL_ACTIVE_SUBROUTINE_UNIFORM_MAX_LENGTH 0x8E49
+#define GL_MAX_SUBROUTINES                0x8DE7
+#define GL_MAX_SUBROUTINE_UNIFORM_LOCATIONS 0x8DE8
+#define GL_NUM_COMPATIBLE_SUBROUTINES     0x8E4A
+#define GL_COMPATIBLE_SUBROUTINES         0x8E4B
+#define GL_PATCHES                        0x000E
+#define GL_PATCH_VERTICES                 0x8E72
+#define GL_PATCH_DEFAULT_INNER_LEVEL      0x8E73
+#define GL_PATCH_DEFAULT_OUTER_LEVEL      0x8E74
+#define GL_TESS_CONTROL_OUTPUT_VERTICES   0x8E75
+#define GL_TESS_GEN_MODE                  0x8E76
+#define GL_TESS_GEN_SPACING               0x8E77
+#define GL_TESS_GEN_VERTEX_ORDER          0x8E78
+#define GL_TESS_GEN_POINT_MODE            0x8E79
+#define GL_ISOLINES                       0x8E7A
+#define GL_FRACTIONAL_ODD                 0x8E7B
+#define GL_FRACTIONAL_EVEN                0x8E7C
+#define GL_MAX_PATCH_VERTICES             0x8E7D
+#define GL_MAX_TESS_GEN_LEVEL             0x8E7E
+#define GL_MAX_TESS_CONTROL_UNIFORM_COMPONENTS 0x8E7F
+#define GL_MAX_TESS_EVALUATION_UNIFORM_COMPONENTS 0x8E80
+#define GL_MAX_TESS_CONTROL_TEXTURE_IMAGE_UNITS 0x8E81
+#define GL_MAX_TESS_EVALUATION_TEXTURE_IMAGE_UNITS 0x8E82
+#define GL_MAX_TESS_CONTROL_OUTPUT_COMPONENTS 0x8E83
+#define GL_MAX_TESS_PATCH_COMPONENTS      0x8E84
+#define GL_MAX_TESS_CONTROL_TOTAL_OUTPUT_COMPONENTS 0x8E85
+#define GL_MAX_TESS_EVALUATION_OUTPUT_COMPONENTS 0x8E86
+#define GL_MAX_TESS_CONTROL_UNIFORM_BLOCKS 0x8E89
+#define GL_MAX_TESS_EVALUATION_UNIFORM_BLOCKS 0x8E8A
+#define GL_MAX_TESS_CONTROL_INPUT_COMPONENTS 0x886C
+#define GL_MAX_TESS_EVALUATION_INPUT_COMPONENTS 0x886D
+#define GL_MAX_COMBINED_TESS_CONTROL_UNIFORM_COMPONENTS 0x8E1E
+#define GL_MAX_COMBINED_TESS_EVALUATION_UNIFORM_COMPONENTS 0x8E1F
+#define GL_UNIFORM_BLOCK_REFERENCED_BY_TESS_CONTROL_SHADER 0x84F0
+#define GL_UNIFORM_BLOCK_REFERENCED_BY_TESS_EVALUATION_SHADER 0x84F1
+#define GL_TESS_EVALUATION_SHADER         0x8E87
+#define GL_TESS_CONTROL_SHADER            0x8E88
+#define GL_TRANSFORM_FEEDBACK             0x8E22
+#define GL_TRANSFORM_FEEDBACK_BUFFER_PAUSED 0x8E23
+#define GL_TRANSFORM_FEEDBACK_BUFFER_ACTIVE 0x8E24
+#define GL_TRANSFORM_FEEDBACK_BINDING     0x8E25
+#define GL_MAX_TRANSFORM_FEEDBACK_BUFFERS 0x8E70
+typedef void (APIENTRYP PFNGLMINSAMPLESHADINGPROC) (GLfloat value);
+typedef void (APIENTRYP PFNGLBLENDEQUATIONIPROC) (GLuint buf, GLenum mode);
+typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEIPROC) (GLuint buf, GLenum modeRGB, GLenum modeAlpha);
+typedef void (APIENTRYP PFNGLBLENDFUNCIPROC) (GLuint buf, GLenum src, GLenum dst);
+typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEIPROC) (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha);
+typedef void (APIENTRYP PFNGLDRAWARRAYSINDIRECTPROC) (GLenum mode, const void *indirect);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINDIRECTPROC) (GLenum mode, GLenum type, const void *indirect);
+typedef void (APIENTRYP PFNGLUNIFORM1DPROC) (GLint location, GLdouble x);
+typedef void (APIENTRYP PFNGLUNIFORM2DPROC) (GLint location, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLUNIFORM3DPROC) (GLint location, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLUNIFORM4DPROC) (GLint location, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLUNIFORM1DVPROC) (GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORM2DVPROC) (GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORM3DVPROC) (GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORM4DVPROC) (GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2X3DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2X4DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3X2DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3X4DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4X2DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4X3DVPROC) (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLGETUNIFORMDVPROC) (GLuint program, GLint location, GLdouble *params);
+typedef GLint (APIENTRYP PFNGLGETSUBROUTINEUNIFORMLOCATIONPROC) (GLuint program, GLenum shadertype, const GLchar *name);
+typedef GLuint (APIENTRYP PFNGLGETSUBROUTINEINDEXPROC) (GLuint program, GLenum shadertype, const GLchar *name);
+typedef void (APIENTRYP PFNGLGETACTIVESUBROUTINEUNIFORMIVPROC) (GLuint program, GLenum shadertype, GLuint index, GLenum pname, GLint *values);
+typedef void (APIENTRYP PFNGLGETACTIVESUBROUTINEUNIFORMNAMEPROC) (GLuint program, GLenum shadertype, GLuint index, GLsizei bufsize, GLsizei *length, GLchar *name);
+typedef void (APIENTRYP PFNGLGETACTIVESUBROUTINENAMEPROC) (GLuint program, GLenum shadertype, GLuint index, GLsizei bufsize, GLsizei *length, GLchar *name);
+typedef void (APIENTRYP PFNGLUNIFORMSUBROUTINESUIVPROC) (GLenum shadertype, GLsizei count, const GLuint *indices);
+typedef void (APIENTRYP PFNGLGETUNIFORMSUBROUTINEUIVPROC) (GLenum shadertype, GLint location, GLuint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMSTAGEIVPROC) (GLuint program, GLenum shadertype, GLenum pname, GLint *values);
+typedef void (APIENTRYP PFNGLPATCHPARAMETERIPROC) (GLenum pname, GLint value);
+typedef void (APIENTRYP PFNGLPATCHPARAMETERFVPROC) (GLenum pname, const GLfloat *values);
+typedef void (APIENTRYP PFNGLBINDTRANSFORMFEEDBACKPROC) (GLenum target, GLuint id);
+typedef void (APIENTRYP PFNGLDELETETRANSFORMFEEDBACKSPROC) (GLsizei n, const GLuint *ids);
+typedef void (APIENTRYP PFNGLGENTRANSFORMFEEDBACKSPROC) (GLsizei n, GLuint *ids);
+typedef GLboolean (APIENTRYP PFNGLISTRANSFORMFEEDBACKPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLPAUSETRANSFORMFEEDBACKPROC) (void);
+typedef void (APIENTRYP PFNGLRESUMETRANSFORMFEEDBACKPROC) (void);
+typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKPROC) (GLenum mode, GLuint id);
+typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKSTREAMPROC) (GLenum mode, GLuint id, GLuint stream);
+typedef void (APIENTRYP PFNGLBEGINQUERYINDEXEDPROC) (GLenum target, GLuint index, GLuint id);
+typedef void (APIENTRYP PFNGLENDQUERYINDEXEDPROC) (GLenum target, GLuint index);
+typedef void (APIENTRYP PFNGLGETQUERYINDEXEDIVPROC) (GLenum target, GLuint index, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMinSampleShading (GLfloat value);
+GLAPI void APIENTRY glBlendEquationi (GLuint buf, GLenum mode);
+GLAPI void APIENTRY glBlendEquationSeparatei (GLuint buf, GLenum modeRGB, GLenum modeAlpha);
+GLAPI void APIENTRY glBlendFunci (GLuint buf, GLenum src, GLenum dst);
+GLAPI void APIENTRY glBlendFuncSeparatei (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha);
+GLAPI void APIENTRY glDrawArraysIndirect (GLenum mode, const void *indirect);
+GLAPI void APIENTRY glDrawElementsIndirect (GLenum mode, GLenum type, const void *indirect);
+GLAPI void APIENTRY glUniform1d (GLint location, GLdouble x);
+GLAPI void APIENTRY glUniform2d (GLint location, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glUniform3d (GLint location, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glUniform4d (GLint location, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glUniform1dv (GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glUniform2dv (GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glUniform3dv (GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glUniform4dv (GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix2dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix3dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix4dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix2x3dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix2x4dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix3x2dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix3x4dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix4x2dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glUniformMatrix4x3dv (GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glGetUniformdv (GLuint program, GLint location, GLdouble *params);
+GLAPI GLint APIENTRY glGetSubroutineUniformLocation (GLuint program, GLenum shadertype, const GLchar *name);
+GLAPI GLuint APIENTRY glGetSubroutineIndex (GLuint program, GLenum shadertype, const GLchar *name);
+GLAPI void APIENTRY glGetActiveSubroutineUniformiv (GLuint program, GLenum shadertype, GLuint index, GLenum pname, GLint *values);
+GLAPI void APIENTRY glGetActiveSubroutineUniformName (GLuint program, GLenum shadertype, GLuint index, GLsizei bufsize, GLsizei *length, GLchar *name);
+GLAPI void APIENTRY glGetActiveSubroutineName (GLuint program, GLenum shadertype, GLuint index, GLsizei bufsize, GLsizei *length, GLchar *name);
+GLAPI void APIENTRY glUniformSubroutinesuiv (GLenum shadertype, GLsizei count, const GLuint *indices);
+GLAPI void APIENTRY glGetUniformSubroutineuiv (GLenum shadertype, GLint location, GLuint *params);
+GLAPI void APIENTRY glGetProgramStageiv (GLuint program, GLenum shadertype, GLenum pname, GLint *values);
+GLAPI void APIENTRY glPatchParameteri (GLenum pname, GLint value);
+GLAPI void APIENTRY glPatchParameterfv (GLenum pname, const GLfloat *values);
+GLAPI void APIENTRY glBindTransformFeedback (GLenum target, GLuint id);
+GLAPI void APIENTRY glDeleteTransformFeedbacks (GLsizei n, const GLuint *ids);
+GLAPI void APIENTRY glGenTransformFeedbacks (GLsizei n, GLuint *ids);
+GLAPI GLboolean APIENTRY glIsTransformFeedback (GLuint id);
+GLAPI void APIENTRY glPauseTransformFeedback (void);
+GLAPI void APIENTRY glResumeTransformFeedback (void);
+GLAPI void APIENTRY glDrawTransformFeedback (GLenum mode, GLuint id);
+GLAPI void APIENTRY glDrawTransformFeedbackStream (GLenum mode, GLuint id, GLuint stream);
+GLAPI void APIENTRY glBeginQueryIndexed (GLenum target, GLuint index, GLuint id);
+GLAPI void APIENTRY glEndQueryIndexed (GLenum target, GLuint index);
+GLAPI void APIENTRY glGetQueryIndexediv (GLenum target, GLuint index, GLenum pname, GLint *params);
+#endif
+#endif /* GL_VERSION_4_0 */
+
+#ifndef GL_VERSION_4_1
+#define GL_VERSION_4_1 1
+#define GL_FIXED                          0x140C
+#define GL_IMPLEMENTATION_COLOR_READ_TYPE 0x8B9A
+#define GL_IMPLEMENTATION_COLOR_READ_FORMAT 0x8B9B
+#define GL_LOW_FLOAT                      0x8DF0
+#define GL_MEDIUM_FLOAT                   0x8DF1
+#define GL_HIGH_FLOAT                     0x8DF2
+#define GL_LOW_INT                        0x8DF3
+#define GL_MEDIUM_INT                     0x8DF4
+#define GL_HIGH_INT                       0x8DF5
+#define GL_SHADER_COMPILER                0x8DFA
+#define GL_SHADER_BINARY_FORMATS          0x8DF8
+#define GL_NUM_SHADER_BINARY_FORMATS      0x8DF9
+#define GL_MAX_VERTEX_UNIFORM_VECTORS     0x8DFB
+#define GL_MAX_VARYING_VECTORS            0x8DFC
+#define GL_MAX_FRAGMENT_UNIFORM_VECTORS   0x8DFD
+#define GL_RGB565                         0x8D62
+#define GL_PROGRAM_BINARY_RETRIEVABLE_HINT 0x8257
+#define GL_PROGRAM_BINARY_LENGTH          0x8741
+#define GL_NUM_PROGRAM_BINARY_FORMATS     0x87FE
+#define GL_PROGRAM_BINARY_FORMATS         0x87FF
+#define GL_VERTEX_SHADER_BIT              0x00000001
+#define GL_FRAGMENT_SHADER_BIT            0x00000002
+#define GL_GEOMETRY_SHADER_BIT            0x00000004
+#define GL_TESS_CONTROL_SHADER_BIT        0x00000008
+#define GL_TESS_EVALUATION_SHADER_BIT     0x00000010
+#define GL_ALL_SHADER_BITS                0xFFFFFFFF
+#define GL_PROGRAM_SEPARABLE              0x8258
+#define GL_ACTIVE_PROGRAM                 0x8259
+#define GL_PROGRAM_PIPELINE_BINDING       0x825A
+#define GL_MAX_VIEWPORTS                  0x825B
+#define GL_VIEWPORT_SUBPIXEL_BITS         0x825C
+#define GL_VIEWPORT_BOUNDS_RANGE          0x825D
+#define GL_LAYER_PROVOKING_VERTEX         0x825E
+#define GL_VIEWPORT_INDEX_PROVOKING_VERTEX 0x825F
+#define GL_UNDEFINED_VERTEX               0x8260
+typedef void (APIENTRYP PFNGLRELEASESHADERCOMPILERPROC) (void);
+typedef void (APIENTRYP PFNGLSHADERBINARYPROC) (GLsizei count, const GLuint *shaders, GLenum binaryformat, const void *binary, GLsizei length);
+typedef void (APIENTRYP PFNGLGETSHADERPRECISIONFORMATPROC) (GLenum shadertype, GLenum precisiontype, GLint *range, GLint *precision);
+typedef void (APIENTRYP PFNGLDEPTHRANGEFPROC) (GLfloat n, GLfloat f);
+typedef void (APIENTRYP PFNGLCLEARDEPTHFPROC) (GLfloat d);
+typedef void (APIENTRYP PFNGLGETPROGRAMBINARYPROC) (GLuint program, GLsizei bufSize, GLsizei *length, GLenum *binaryFormat, void *binary);
+typedef void (APIENTRYP PFNGLPROGRAMBINARYPROC) (GLuint program, GLenum binaryFormat, const void *binary, GLsizei length);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETERIPROC) (GLuint program, GLenum pname, GLint value);
+typedef void (APIENTRYP PFNGLUSEPROGRAMSTAGESPROC) (GLuint pipeline, GLbitfield stages, GLuint program);
+typedef void (APIENTRYP PFNGLACTIVESHADERPROGRAMPROC) (GLuint pipeline, GLuint program);
+typedef GLuint (APIENTRYP PFNGLCREATESHADERPROGRAMVPROC) (GLenum type, GLsizei count, const GLchar *const*strings);
+typedef void (APIENTRYP PFNGLBINDPROGRAMPIPELINEPROC) (GLuint pipeline);
+typedef void (APIENTRYP PFNGLDELETEPROGRAMPIPELINESPROC) (GLsizei n, const GLuint *pipelines);
+typedef void (APIENTRYP PFNGLGENPROGRAMPIPELINESPROC) (GLsizei n, GLuint *pipelines);
+typedef GLboolean (APIENTRYP PFNGLISPROGRAMPIPELINEPROC) (GLuint pipeline);
+typedef void (APIENTRYP PFNGLGETPROGRAMPIPELINEIVPROC) (GLuint pipeline, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1IPROC) (GLuint program, GLint location, GLint v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1FPROC) (GLuint program, GLint location, GLfloat v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1DPROC) (GLuint program, GLint location, GLdouble v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1DVPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UIPROC) (GLuint program, GLint location, GLuint v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2IPROC) (GLuint program, GLint location, GLint v0, GLint v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2FPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2DPROC) (GLuint program, GLint location, GLdouble v0, GLdouble v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2DVPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UIPROC) (GLuint program, GLint location, GLuint v0, GLuint v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3IPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3FPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3DPROC) (GLuint program, GLint location, GLdouble v0, GLdouble v1, GLdouble v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3DVPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UIPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4IPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4IVPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4FPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4FVPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4DPROC) (GLuint program, GLint location, GLdouble v0, GLdouble v1, GLdouble v2, GLdouble v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4DVPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UIPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UIVPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X3FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X2FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X4FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X2FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X4FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X3FVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X3DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X2DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X4DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X2DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X4DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X3DVPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLVALIDATEPROGRAMPIPELINEPROC) (GLuint pipeline);
+typedef void (APIENTRYP PFNGLGETPROGRAMPIPELINEINFOLOGPROC) (GLuint pipeline, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1DPROC) (GLuint index, GLdouble x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2DPROC) (GLuint index, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3DPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4DPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4DVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBLPOINTERPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBLDVPROC) (GLuint index, GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLVIEWPORTARRAYVPROC) (GLuint first, GLsizei count, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVIEWPORTINDEXEDFPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat w, GLfloat h);
+typedef void (APIENTRYP PFNGLVIEWPORTINDEXEDFVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLSCISSORARRAYVPROC) (GLuint first, GLsizei count, const GLint *v);
+typedef void (APIENTRYP PFNGLSCISSORINDEXEDPROC) (GLuint index, GLint left, GLint bottom, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLSCISSORINDEXEDVPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLDEPTHRANGEARRAYVPROC) (GLuint first, GLsizei count, const GLdouble *v);
+typedef void (APIENTRYP PFNGLDEPTHRANGEINDEXEDPROC) (GLuint index, GLdouble n, GLdouble f);
+typedef void (APIENTRYP PFNGLGETFLOATI_VPROC) (GLenum target, GLuint index, GLfloat *data);
+typedef void (APIENTRYP PFNGLGETDOUBLEI_VPROC) (GLenum target, GLuint index, GLdouble *data);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glReleaseShaderCompiler (void);
+GLAPI void APIENTRY glShaderBinary (GLsizei count, const GLuint *shaders, GLenum binaryformat, const void *binary, GLsizei length);
+GLAPI void APIENTRY glGetShaderPrecisionFormat (GLenum shadertype, GLenum precisiontype, GLint *range, GLint *precision);
+GLAPI void APIENTRY glDepthRangef (GLfloat n, GLfloat f);
+GLAPI void APIENTRY glClearDepthf (GLfloat d);
+GLAPI void APIENTRY glGetProgramBinary (GLuint program, GLsizei bufSize, GLsizei *length, GLenum *binaryFormat, void *binary);
+GLAPI void APIENTRY glProgramBinary (GLuint program, GLenum binaryFormat, const void *binary, GLsizei length);
+GLAPI void APIENTRY glProgramParameteri (GLuint program, GLenum pname, GLint value);
+GLAPI void APIENTRY glUseProgramStages (GLuint pipeline, GLbitfield stages, GLuint program);
+GLAPI void APIENTRY glActiveShaderProgram (GLuint pipeline, GLuint program);
+GLAPI GLuint APIENTRY glCreateShaderProgramv (GLenum type, GLsizei count, const GLchar *const*strings);
+GLAPI void APIENTRY glBindProgramPipeline (GLuint pipeline);
+GLAPI void APIENTRY glDeleteProgramPipelines (GLsizei n, const GLuint *pipelines);
+GLAPI void APIENTRY glGenProgramPipelines (GLsizei n, GLuint *pipelines);
+GLAPI GLboolean APIENTRY glIsProgramPipeline (GLuint pipeline);
+GLAPI void APIENTRY glGetProgramPipelineiv (GLuint pipeline, GLenum pname, GLint *params);
+GLAPI void APIENTRY glProgramUniform1i (GLuint program, GLint location, GLint v0);
+GLAPI void APIENTRY glProgramUniform1iv (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform1f (GLuint program, GLint location, GLfloat v0);
+GLAPI void APIENTRY glProgramUniform1fv (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform1d (GLuint program, GLint location, GLdouble v0);
+GLAPI void APIENTRY glProgramUniform1dv (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform1ui (GLuint program, GLint location, GLuint v0);
+GLAPI void APIENTRY glProgramUniform1uiv (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniform2i (GLuint program, GLint location, GLint v0, GLint v1);
+GLAPI void APIENTRY glProgramUniform2iv (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform2f (GLuint program, GLint location, GLfloat v0, GLfloat v1);
+GLAPI void APIENTRY glProgramUniform2fv (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform2d (GLuint program, GLint location, GLdouble v0, GLdouble v1);
+GLAPI void APIENTRY glProgramUniform2dv (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform2ui (GLuint program, GLint location, GLuint v0, GLuint v1);
+GLAPI void APIENTRY glProgramUniform2uiv (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniform3i (GLuint program, GLint location, GLint v0, GLint v1, GLint v2);
+GLAPI void APIENTRY glProgramUniform3iv (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform3f (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+GLAPI void APIENTRY glProgramUniform3fv (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform3d (GLuint program, GLint location, GLdouble v0, GLdouble v1, GLdouble v2);
+GLAPI void APIENTRY glProgramUniform3dv (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform3ui (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2);
+GLAPI void APIENTRY glProgramUniform3uiv (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniform4i (GLuint program, GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+GLAPI void APIENTRY glProgramUniform4iv (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform4f (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+GLAPI void APIENTRY glProgramUniform4fv (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform4d (GLuint program, GLint location, GLdouble v0, GLdouble v1, GLdouble v2, GLdouble v3);
+GLAPI void APIENTRY glProgramUniform4dv (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform4ui (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+GLAPI void APIENTRY glProgramUniform4uiv (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniformMatrix2fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix3fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix4fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix2dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix3dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix4dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x3fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x2fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x4fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x2fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x4fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x3fv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x3dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x2dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x4dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x2dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x4dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x3dv (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glValidateProgramPipeline (GLuint pipeline);
+GLAPI void APIENTRY glGetProgramPipelineInfoLog (GLuint pipeline, GLsizei bufSize, GLsizei *length, GLchar *infoLog);
+GLAPI void APIENTRY glVertexAttribL1d (GLuint index, GLdouble x);
+GLAPI void APIENTRY glVertexAttribL2d (GLuint index, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glVertexAttribL3d (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glVertexAttribL4d (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glVertexAttribL1dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribL2dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribL3dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribL4dv (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribLPointer (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glGetVertexAttribLdv (GLuint index, GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glViewportArrayv (GLuint first, GLsizei count, const GLfloat *v);
+GLAPI void APIENTRY glViewportIndexedf (GLuint index, GLfloat x, GLfloat y, GLfloat w, GLfloat h);
+GLAPI void APIENTRY glViewportIndexedfv (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glScissorArrayv (GLuint first, GLsizei count, const GLint *v);
+GLAPI void APIENTRY glScissorIndexed (GLuint index, GLint left, GLint bottom, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glScissorIndexedv (GLuint index, const GLint *v);
+GLAPI void APIENTRY glDepthRangeArrayv (GLuint first, GLsizei count, const GLdouble *v);
+GLAPI void APIENTRY glDepthRangeIndexed (GLuint index, GLdouble n, GLdouble f);
+GLAPI void APIENTRY glGetFloati_v (GLenum target, GLuint index, GLfloat *data);
+GLAPI void APIENTRY glGetDoublei_v (GLenum target, GLuint index, GLdouble *data);
+#endif
+#endif /* GL_VERSION_4_1 */
+
+#ifndef GL_VERSION_4_2
+#define GL_VERSION_4_2 1
+#define GL_UNPACK_COMPRESSED_BLOCK_WIDTH  0x9127
+#define GL_UNPACK_COMPRESSED_BLOCK_HEIGHT 0x9128
+#define GL_UNPACK_COMPRESSED_BLOCK_DEPTH  0x9129
+#define GL_UNPACK_COMPRESSED_BLOCK_SIZE   0x912A
+#define GL_PACK_COMPRESSED_BLOCK_WIDTH    0x912B
+#define GL_PACK_COMPRESSED_BLOCK_HEIGHT   0x912C
+#define GL_PACK_COMPRESSED_BLOCK_DEPTH    0x912D
+#define GL_PACK_COMPRESSED_BLOCK_SIZE     0x912E
+#define GL_NUM_SAMPLE_COUNTS              0x9380
+#define GL_MIN_MAP_BUFFER_ALIGNMENT       0x90BC
+#define GL_ATOMIC_COUNTER_BUFFER          0x92C0
+#define GL_ATOMIC_COUNTER_BUFFER_BINDING  0x92C1
+#define GL_ATOMIC_COUNTER_BUFFER_START    0x92C2
+#define GL_ATOMIC_COUNTER_BUFFER_SIZE     0x92C3
+#define GL_ATOMIC_COUNTER_BUFFER_DATA_SIZE 0x92C4
+#define GL_ATOMIC_COUNTER_BUFFER_ACTIVE_ATOMIC_COUNTERS 0x92C5
+#define GL_ATOMIC_COUNTER_BUFFER_ACTIVE_ATOMIC_COUNTER_INDICES 0x92C6
+#define GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_VERTEX_SHADER 0x92C7
+#define GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_CONTROL_SHADER 0x92C8
+#define GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_TESS_EVALUATION_SHADER 0x92C9
+#define GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_GEOMETRY_SHADER 0x92CA
+#define GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_FRAGMENT_SHADER 0x92CB
+#define GL_MAX_VERTEX_ATOMIC_COUNTER_BUFFERS 0x92CC
+#define GL_MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS 0x92CD
+#define GL_MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS 0x92CE
+#define GL_MAX_GEOMETRY_ATOMIC_COUNTER_BUFFERS 0x92CF
+#define GL_MAX_FRAGMENT_ATOMIC_COUNTER_BUFFERS 0x92D0
+#define GL_MAX_COMBINED_ATOMIC_COUNTER_BUFFERS 0x92D1
+#define GL_MAX_VERTEX_ATOMIC_COUNTERS     0x92D2
+#define GL_MAX_TESS_CONTROL_ATOMIC_COUNTERS 0x92D3
+#define GL_MAX_TESS_EVALUATION_ATOMIC_COUNTERS 0x92D4
+#define GL_MAX_GEOMETRY_ATOMIC_COUNTERS   0x92D5
+#define GL_MAX_FRAGMENT_ATOMIC_COUNTERS   0x92D6
+#define GL_MAX_COMBINED_ATOMIC_COUNTERS   0x92D7
+#define GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE 0x92D8
+#define GL_MAX_ATOMIC_COUNTER_BUFFER_BINDINGS 0x92DC
+#define GL_ACTIVE_ATOMIC_COUNTER_BUFFERS  0x92D9
+#define GL_UNIFORM_ATOMIC_COUNTER_BUFFER_INDEX 0x92DA
+#define GL_UNSIGNED_INT_ATOMIC_COUNTER    0x92DB
+#define GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT 0x00000001
+#define GL_ELEMENT_ARRAY_BARRIER_BIT      0x00000002
+#define GL_UNIFORM_BARRIER_BIT            0x00000004
+#define GL_TEXTURE_FETCH_BARRIER_BIT      0x00000008
+#define GL_SHADER_IMAGE_ACCESS_BARRIER_BIT 0x00000020
+#define GL_COMMAND_BARRIER_BIT            0x00000040
+#define GL_PIXEL_BUFFER_BARRIER_BIT       0x00000080
+#define GL_TEXTURE_UPDATE_BARRIER_BIT     0x00000100
+#define GL_BUFFER_UPDATE_BARRIER_BIT      0x00000200
+#define GL_FRAMEBUFFER_BARRIER_BIT        0x00000400
+#define GL_TRANSFORM_FEEDBACK_BARRIER_BIT 0x00000800
+#define GL_ATOMIC_COUNTER_BARRIER_BIT     0x00001000
+#define GL_ALL_BARRIER_BITS               0xFFFFFFFF
+#define GL_MAX_IMAGE_UNITS                0x8F38
+#define GL_MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS 0x8F39
+#define GL_IMAGE_BINDING_NAME             0x8F3A
+#define GL_IMAGE_BINDING_LEVEL            0x8F3B
+#define GL_IMAGE_BINDING_LAYERED          0x8F3C
+#define GL_IMAGE_BINDING_LAYER            0x8F3D
+#define GL_IMAGE_BINDING_ACCESS           0x8F3E
+#define GL_IMAGE_1D                       0x904C
+#define GL_IMAGE_2D                       0x904D
+#define GL_IMAGE_3D                       0x904E
+#define GL_IMAGE_2D_RECT                  0x904F
+#define GL_IMAGE_CUBE                     0x9050
+#define GL_IMAGE_BUFFER                   0x9051
+#define GL_IMAGE_1D_ARRAY                 0x9052
+#define GL_IMAGE_2D_ARRAY                 0x9053
+#define GL_IMAGE_CUBE_MAP_ARRAY           0x9054
+#define GL_IMAGE_2D_MULTISAMPLE           0x9055
+#define GL_IMAGE_2D_MULTISAMPLE_ARRAY     0x9056
+#define GL_INT_IMAGE_1D                   0x9057
+#define GL_INT_IMAGE_2D                   0x9058
+#define GL_INT_IMAGE_3D                   0x9059
+#define GL_INT_IMAGE_2D_RECT              0x905A
+#define GL_INT_IMAGE_CUBE                 0x905B
+#define GL_INT_IMAGE_BUFFER               0x905C
+#define GL_INT_IMAGE_1D_ARRAY             0x905D
+#define GL_INT_IMAGE_2D_ARRAY             0x905E
+#define GL_INT_IMAGE_CUBE_MAP_ARRAY       0x905F
+#define GL_INT_IMAGE_2D_MULTISAMPLE       0x9060
+#define GL_INT_IMAGE_2D_MULTISAMPLE_ARRAY 0x9061
+#define GL_UNSIGNED_INT_IMAGE_1D          0x9062
+#define GL_UNSIGNED_INT_IMAGE_2D          0x9063
+#define GL_UNSIGNED_INT_IMAGE_3D          0x9064
+#define GL_UNSIGNED_INT_IMAGE_2D_RECT     0x9065
+#define GL_UNSIGNED_INT_IMAGE_CUBE        0x9066
+#define GL_UNSIGNED_INT_IMAGE_BUFFER      0x9067
+#define GL_UNSIGNED_INT_IMAGE_1D_ARRAY    0x9068
+#define GL_UNSIGNED_INT_IMAGE_2D_ARRAY    0x9069
+#define GL_UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY 0x906A
+#define GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE 0x906B
+#define GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY 0x906C
+#define GL_MAX_IMAGE_SAMPLES              0x906D
+#define GL_IMAGE_BINDING_FORMAT           0x906E
+#define GL_IMAGE_FORMAT_COMPATIBILITY_TYPE 0x90C7
+#define GL_IMAGE_FORMAT_COMPATIBILITY_BY_SIZE 0x90C8
+#define GL_IMAGE_FORMAT_COMPATIBILITY_BY_CLASS 0x90C9
+#define GL_MAX_VERTEX_IMAGE_UNIFORMS      0x90CA
+#define GL_MAX_TESS_CONTROL_IMAGE_UNIFORMS 0x90CB
+#define GL_MAX_TESS_EVALUATION_IMAGE_UNIFORMS 0x90CC
+#define GL_MAX_GEOMETRY_IMAGE_UNIFORMS    0x90CD
+#define GL_MAX_FRAGMENT_IMAGE_UNIFORMS    0x90CE
+#define GL_MAX_COMBINED_IMAGE_UNIFORMS    0x90CF
+#define GL_COMPRESSED_RGBA_BPTC_UNORM     0x8E8C
+#define GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM 0x8E8D
+#define GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT 0x8E8E
+#define GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT 0x8E8F
+#define GL_TEXTURE_IMMUTABLE_FORMAT       0x912F
+typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDBASEINSTANCEPROC) (GLenum mode, GLint first, GLsizei count, GLsizei instancecount, GLuint baseinstance);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDBASEINSTANCEPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLuint baseinstance);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDBASEVERTEXBASEINSTANCEPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLint basevertex, GLuint baseinstance);
+typedef void (APIENTRYP PFNGLGETINTERNALFORMATIVPROC) (GLenum target, GLenum internalformat, GLenum pname, GLsizei bufSize, GLint *params);
+typedef void (APIENTRYP PFNGLGETACTIVEATOMICCOUNTERBUFFERIVPROC) (GLuint program, GLuint bufferIndex, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLBINDIMAGETEXTUREPROC) (GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format);
+typedef void (APIENTRYP PFNGLMEMORYBARRIERPROC) (GLbitfield barriers);
+typedef void (APIENTRYP PFNGLTEXSTORAGE1DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width);
+typedef void (APIENTRYP PFNGLTEXSTORAGE2DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLTEXSTORAGE3DPROC) (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
+typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKINSTANCEDPROC) (GLenum mode, GLuint id, GLsizei instancecount);
+typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKSTREAMINSTANCEDPROC) (GLenum mode, GLuint id, GLuint stream, GLsizei instancecount);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawArraysInstancedBaseInstance (GLenum mode, GLint first, GLsizei count, GLsizei instancecount, GLuint baseinstance);
+GLAPI void APIENTRY glDrawElementsInstancedBaseInstance (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLuint baseinstance);
+GLAPI void APIENTRY glDrawElementsInstancedBaseVertexBaseInstance (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLint basevertex, GLuint baseinstance);
+GLAPI void APIENTRY glGetInternalformativ (GLenum target, GLenum internalformat, GLenum pname, GLsizei bufSize, GLint *params);
+GLAPI void APIENTRY glGetActiveAtomicCounterBufferiv (GLuint program, GLuint bufferIndex, GLenum pname, GLint *params);
+GLAPI void APIENTRY glBindImageTexture (GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format);
+GLAPI void APIENTRY glMemoryBarrier (GLbitfield barriers);
+GLAPI void APIENTRY glTexStorage1D (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width);
+GLAPI void APIENTRY glTexStorage2D (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glTexStorage3D (GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
+GLAPI void APIENTRY glDrawTransformFeedbackInstanced (GLenum mode, GLuint id, GLsizei instancecount);
+GLAPI void APIENTRY glDrawTransformFeedbackStreamInstanced (GLenum mode, GLuint id, GLuint stream, GLsizei instancecount);
+#endif
+#endif /* GL_VERSION_4_2 */
+
+#ifndef GL_VERSION_4_3
+#define GL_VERSION_4_3 1
+typedef void (APIENTRY  *GLDEBUGPROC)(GLenum source,GLenum type,GLuint id,GLenum severity,GLsizei length,const GLchar *message,const void *userParam);
+#define GL_NUM_SHADING_LANGUAGE_VERSIONS  0x82E9
+#define GL_VERTEX_ATTRIB_ARRAY_LONG       0x874E
+#define GL_COMPRESSED_RGB8_ETC2           0x9274
+#define GL_COMPRESSED_SRGB8_ETC2          0x9275
+#define GL_COMPRESSED_RGB8_PUNCHTHROUGH_ALPHA1_ETC2 0x9276
+#define GL_COMPRESSED_SRGB8_PUNCHTHROUGH_ALPHA1_ETC2 0x9277
+#define GL_COMPRESSED_RGBA8_ETC2_EAC      0x9278
+#define GL_COMPRESSED_SRGB8_ALPHA8_ETC2_EAC 0x9279
+#define GL_COMPRESSED_R11_EAC             0x9270
+#define GL_COMPRESSED_SIGNED_R11_EAC      0x9271
+#define GL_COMPRESSED_RG11_EAC            0x9272
+#define GL_COMPRESSED_SIGNED_RG11_EAC     0x9273
+#define GL_PRIMITIVE_RESTART_FIXED_INDEX  0x8D69
+#define GL_ANY_SAMPLES_PASSED_CONSERVATIVE 0x8D6A
+#define GL_MAX_ELEMENT_INDEX              0x8D6B
+#define GL_COMPUTE_SHADER                 0x91B9
+#define GL_MAX_COMPUTE_UNIFORM_BLOCKS     0x91BB
+#define GL_MAX_COMPUTE_TEXTURE_IMAGE_UNITS 0x91BC
+#define GL_MAX_COMPUTE_IMAGE_UNIFORMS     0x91BD
+#define GL_MAX_COMPUTE_SHARED_MEMORY_SIZE 0x8262
+#define GL_MAX_COMPUTE_UNIFORM_COMPONENTS 0x8263
+#define GL_MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS 0x8264
+#define GL_MAX_COMPUTE_ATOMIC_COUNTERS    0x8265
+#define GL_MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS 0x8266
+#define GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS 0x90EB
+#define GL_MAX_COMPUTE_WORK_GROUP_COUNT   0x91BE
+#define GL_MAX_COMPUTE_WORK_GROUP_SIZE    0x91BF
+#define GL_COMPUTE_WORK_GROUP_SIZE        0x8267
+#define GL_UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER 0x90EC
+#define GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER 0x90ED
+#define GL_DISPATCH_INDIRECT_BUFFER       0x90EE
+#define GL_DISPATCH_INDIRECT_BUFFER_BINDING 0x90EF
+#define GL_DEBUG_OUTPUT_SYNCHRONOUS       0x8242
+#define GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH 0x8243
+#define GL_DEBUG_CALLBACK_FUNCTION        0x8244
+#define GL_DEBUG_CALLBACK_USER_PARAM      0x8245
+#define GL_DEBUG_SOURCE_API               0x8246
+#define GL_DEBUG_SOURCE_WINDOW_SYSTEM     0x8247
+#define GL_DEBUG_SOURCE_SHADER_COMPILER   0x8248
+#define GL_DEBUG_SOURCE_THIRD_PARTY       0x8249
+#define GL_DEBUG_SOURCE_APPLICATION       0x824A
+#define GL_DEBUG_SOURCE_OTHER             0x824B
+#define GL_DEBUG_TYPE_ERROR               0x824C
+#define GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR 0x824D
+#define GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR  0x824E
+#define GL_DEBUG_TYPE_PORTABILITY         0x824F
+#define GL_DEBUG_TYPE_PERFORMANCE         0x8250
+#define GL_DEBUG_TYPE_OTHER               0x8251
+#define GL_MAX_DEBUG_MESSAGE_LENGTH       0x9143
+#define GL_MAX_DEBUG_LOGGED_MESSAGES      0x9144
+#define GL_DEBUG_LOGGED_MESSAGES          0x9145
+#define GL_DEBUG_SEVERITY_HIGH            0x9146
+#define GL_DEBUG_SEVERITY_MEDIUM          0x9147
+#define GL_DEBUG_SEVERITY_LOW             0x9148
+#define GL_DEBUG_TYPE_MARKER              0x8268
+#define GL_DEBUG_TYPE_PUSH_GROUP          0x8269
+#define GL_DEBUG_TYPE_POP_GROUP           0x826A
+#define GL_DEBUG_SEVERITY_NOTIFICATION    0x826B
+#define GL_MAX_DEBUG_GROUP_STACK_DEPTH    0x826C
+#define GL_DEBUG_GROUP_STACK_DEPTH        0x826D
+#define GL_BUFFER                         0x82E0
+#define GL_SHADER                         0x82E1
+#define GL_PROGRAM                        0x82E2
+#define GL_QUERY                          0x82E3
+#define GL_PROGRAM_PIPELINE               0x82E4
+#define GL_SAMPLER                        0x82E6
+#define GL_MAX_LABEL_LENGTH               0x82E8
+#define GL_DEBUG_OUTPUT                   0x92E0
+#define GL_CONTEXT_FLAG_DEBUG_BIT         0x00000002
+#define GL_MAX_UNIFORM_LOCATIONS          0x826E
+#define GL_FRAMEBUFFER_DEFAULT_WIDTH      0x9310
+#define GL_FRAMEBUFFER_DEFAULT_HEIGHT     0x9311
+#define GL_FRAMEBUFFER_DEFAULT_LAYERS     0x9312
+#define GL_FRAMEBUFFER_DEFAULT_SAMPLES    0x9313
+#define GL_FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS 0x9314
+#define GL_MAX_FRAMEBUFFER_WIDTH          0x9315
+#define GL_MAX_FRAMEBUFFER_HEIGHT         0x9316
+#define GL_MAX_FRAMEBUFFER_LAYERS         0x9317
+#define GL_MAX_FRAMEBUFFER_SAMPLES        0x9318
+#define GL_INTERNALFORMAT_SUPPORTED       0x826F
+#define GL_INTERNALFORMAT_PREFERRED       0x8270
+#define GL_INTERNALFORMAT_RED_SIZE        0x8271
+#define GL_INTERNALFORMAT_GREEN_SIZE      0x8272
+#define GL_INTERNALFORMAT_BLUE_SIZE       0x8273
+#define GL_INTERNALFORMAT_ALPHA_SIZE      0x8274
+#define GL_INTERNALFORMAT_DEPTH_SIZE      0x8275
+#define GL_INTERNALFORMAT_STENCIL_SIZE    0x8276
+#define GL_INTERNALFORMAT_SHARED_SIZE     0x8277
+#define GL_INTERNALFORMAT_RED_TYPE        0x8278
+#define GL_INTERNALFORMAT_GREEN_TYPE      0x8279
+#define GL_INTERNALFORMAT_BLUE_TYPE       0x827A
+#define GL_INTERNALFORMAT_ALPHA_TYPE      0x827B
+#define GL_INTERNALFORMAT_DEPTH_TYPE      0x827C
+#define GL_INTERNALFORMAT_STENCIL_TYPE    0x827D
+#define GL_MAX_WIDTH                      0x827E
+#define GL_MAX_HEIGHT                     0x827F
+#define GL_MAX_DEPTH                      0x8280
+#define GL_MAX_LAYERS                     0x8281
+#define GL_MAX_COMBINED_DIMENSIONS        0x8282
+#define GL_COLOR_COMPONENTS               0x8283
+#define GL_DEPTH_COMPONENTS               0x8284
+#define GL_STENCIL_COMPONENTS             0x8285
+#define GL_COLOR_RENDERABLE               0x8286
+#define GL_DEPTH_RENDERABLE               0x8287
+#define GL_STENCIL_RENDERABLE             0x8288
+#define GL_FRAMEBUFFER_RENDERABLE         0x8289
+#define GL_FRAMEBUFFER_RENDERABLE_LAYERED 0x828A
+#define GL_FRAMEBUFFER_BLEND              0x828B
+#define GL_READ_PIXELS                    0x828C
+#define GL_READ_PIXELS_FORMAT             0x828D
+#define GL_READ_PIXELS_TYPE               0x828E
+#define GL_TEXTURE_IMAGE_FORMAT           0x828F
+#define GL_TEXTURE_IMAGE_TYPE             0x8290
+#define GL_GET_TEXTURE_IMAGE_FORMAT       0x8291
+#define GL_GET_TEXTURE_IMAGE_TYPE         0x8292
+#define GL_MIPMAP                         0x8293
+#define GL_MANUAL_GENERATE_MIPMAP         0x8294
+#define GL_AUTO_GENERATE_MIPMAP           0x8295
+#define GL_COLOR_ENCODING                 0x8296
+#define GL_SRGB_READ                      0x8297
+#define GL_SRGB_WRITE                     0x8298
+#define GL_FILTER                         0x829A
+#define GL_VERTEX_TEXTURE                 0x829B
+#define GL_TESS_CONTROL_TEXTURE           0x829C
+#define GL_TESS_EVALUATION_TEXTURE        0x829D
+#define GL_GEOMETRY_TEXTURE               0x829E
+#define GL_FRAGMENT_TEXTURE               0x829F
+#define GL_COMPUTE_TEXTURE                0x82A0
+#define GL_TEXTURE_SHADOW                 0x82A1
+#define GL_TEXTURE_GATHER                 0x82A2
+#define GL_TEXTURE_GATHER_SHADOW          0x82A3
+#define GL_SHADER_IMAGE_LOAD              0x82A4
+#define GL_SHADER_IMAGE_STORE             0x82A5
+#define GL_SHADER_IMAGE_ATOMIC            0x82A6
+#define GL_IMAGE_TEXEL_SIZE               0x82A7
+#define GL_IMAGE_COMPATIBILITY_CLASS      0x82A8
+#define GL_IMAGE_PIXEL_FORMAT             0x82A9
+#define GL_IMAGE_PIXEL_TYPE               0x82AA
+#define GL_SIMULTANEOUS_TEXTURE_AND_DEPTH_TEST 0x82AC
+#define GL_SIMULTANEOUS_TEXTURE_AND_STENCIL_TEST 0x82AD
+#define GL_SIMULTANEOUS_TEXTURE_AND_DEPTH_WRITE 0x82AE
+#define GL_SIMULTANEOUS_TEXTURE_AND_STENCIL_WRITE 0x82AF
+#define GL_TEXTURE_COMPRESSED_BLOCK_WIDTH 0x82B1
+#define GL_TEXTURE_COMPRESSED_BLOCK_HEIGHT 0x82B2
+#define GL_TEXTURE_COMPRESSED_BLOCK_SIZE  0x82B3
+#define GL_CLEAR_BUFFER                   0x82B4
+#define GL_TEXTURE_VIEW                   0x82B5
+#define GL_VIEW_COMPATIBILITY_CLASS       0x82B6
+#define GL_FULL_SUPPORT                   0x82B7
+#define GL_CAVEAT_SUPPORT                 0x82B8
+#define GL_IMAGE_CLASS_4_X_32             0x82B9
+#define GL_IMAGE_CLASS_2_X_32             0x82BA
+#define GL_IMAGE_CLASS_1_X_32             0x82BB
+#define GL_IMAGE_CLASS_4_X_16             0x82BC
+#define GL_IMAGE_CLASS_2_X_16             0x82BD
+#define GL_IMAGE_CLASS_1_X_16             0x82BE
+#define GL_IMAGE_CLASS_4_X_8              0x82BF
+#define GL_IMAGE_CLASS_2_X_8              0x82C0
+#define GL_IMAGE_CLASS_1_X_8              0x82C1
+#define GL_IMAGE_CLASS_11_11_10           0x82C2
+#define GL_IMAGE_CLASS_10_10_10_2         0x82C3
+#define GL_VIEW_CLASS_128_BITS            0x82C4
+#define GL_VIEW_CLASS_96_BITS             0x82C5
+#define GL_VIEW_CLASS_64_BITS             0x82C6
+#define GL_VIEW_CLASS_48_BITS             0x82C7
+#define GL_VIEW_CLASS_32_BITS             0x82C8
+#define GL_VIEW_CLASS_24_BITS             0x82C9
+#define GL_VIEW_CLASS_16_BITS             0x82CA
+#define GL_VIEW_CLASS_8_BITS              0x82CB
+#define GL_VIEW_CLASS_S3TC_DXT1_RGB       0x82CC
+#define GL_VIEW_CLASS_S3TC_DXT1_RGBA      0x82CD
+#define GL_VIEW_CLASS_S3TC_DXT3_RGBA      0x82CE
+#define GL_VIEW_CLASS_S3TC_DXT5_RGBA      0x82CF
+#define GL_VIEW_CLASS_RGTC1_RED           0x82D0
+#define GL_VIEW_CLASS_RGTC2_RG            0x82D1
+#define GL_VIEW_CLASS_BPTC_UNORM          0x82D2
+#define GL_VIEW_CLASS_BPTC_FLOAT          0x82D3
+#define GL_UNIFORM                        0x92E1
+#define GL_UNIFORM_BLOCK                  0x92E2
+#define GL_PROGRAM_INPUT                  0x92E3
+#define GL_PROGRAM_OUTPUT                 0x92E4
+#define GL_BUFFER_VARIABLE                0x92E5
+#define GL_SHADER_STORAGE_BLOCK           0x92E6
+#define GL_VERTEX_SUBROUTINE              0x92E8
+#define GL_TESS_CONTROL_SUBROUTINE        0x92E9
+#define GL_TESS_EVALUATION_SUBROUTINE     0x92EA
+#define GL_GEOMETRY_SUBROUTINE            0x92EB
+#define GL_FRAGMENT_SUBROUTINE            0x92EC
+#define GL_COMPUTE_SUBROUTINE             0x92ED
+#define GL_VERTEX_SUBROUTINE_UNIFORM      0x92EE
+#define GL_TESS_CONTROL_SUBROUTINE_UNIFORM 0x92EF
+#define GL_TESS_EVALUATION_SUBROUTINE_UNIFORM 0x92F0
+#define GL_GEOMETRY_SUBROUTINE_UNIFORM    0x92F1
+#define GL_FRAGMENT_SUBROUTINE_UNIFORM    0x92F2
+#define GL_COMPUTE_SUBROUTINE_UNIFORM     0x92F3
+#define GL_TRANSFORM_FEEDBACK_VARYING     0x92F4
+#define GL_ACTIVE_RESOURCES               0x92F5
+#define GL_MAX_NAME_LENGTH                0x92F6
+#define GL_MAX_NUM_ACTIVE_VARIABLES       0x92F7
+#define GL_MAX_NUM_COMPATIBLE_SUBROUTINES 0x92F8
+#define GL_NAME_LENGTH                    0x92F9
+#define GL_TYPE                           0x92FA
+#define GL_ARRAY_SIZE                     0x92FB
+#define GL_OFFSET                         0x92FC
+#define GL_BLOCK_INDEX                    0x92FD
+#define GL_ARRAY_STRIDE                   0x92FE
+#define GL_MATRIX_STRIDE                  0x92FF
+#define GL_IS_ROW_MAJOR                   0x9300
+#define GL_ATOMIC_COUNTER_BUFFER_INDEX    0x9301
+#define GL_BUFFER_BINDING                 0x9302
+#define GL_BUFFER_DATA_SIZE               0x9303
+#define GL_NUM_ACTIVE_VARIABLES           0x9304
+#define GL_ACTIVE_VARIABLES               0x9305
+#define GL_REFERENCED_BY_VERTEX_SHADER    0x9306
+#define GL_REFERENCED_BY_TESS_CONTROL_SHADER 0x9307
+#define GL_REFERENCED_BY_TESS_EVALUATION_SHADER 0x9308
+#define GL_REFERENCED_BY_GEOMETRY_SHADER  0x9309
+#define GL_REFERENCED_BY_FRAGMENT_SHADER  0x930A
+#define GL_REFERENCED_BY_COMPUTE_SHADER   0x930B
+#define GL_TOP_LEVEL_ARRAY_SIZE           0x930C
+#define GL_TOP_LEVEL_ARRAY_STRIDE         0x930D
+#define GL_LOCATION                       0x930E
+#define GL_LOCATION_INDEX                 0x930F
+#define GL_IS_PER_PATCH                   0x92E7
+#define GL_SHADER_STORAGE_BUFFER          0x90D2
+#define GL_SHADER_STORAGE_BUFFER_BINDING  0x90D3
+#define GL_SHADER_STORAGE_BUFFER_START    0x90D4
+#define GL_SHADER_STORAGE_BUFFER_SIZE     0x90D5
+#define GL_MAX_VERTEX_SHADER_STORAGE_BLOCKS 0x90D6
+#define GL_MAX_GEOMETRY_SHADER_STORAGE_BLOCKS 0x90D7
+#define GL_MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS 0x90D8
+#define GL_MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS 0x90D9
+#define GL_MAX_FRAGMENT_SHADER_STORAGE_BLOCKS 0x90DA
+#define GL_MAX_COMPUTE_SHADER_STORAGE_BLOCKS 0x90DB
+#define GL_MAX_COMBINED_SHADER_STORAGE_BLOCKS 0x90DC
+#define GL_MAX_SHADER_STORAGE_BUFFER_BINDINGS 0x90DD
+#define GL_MAX_SHADER_STORAGE_BLOCK_SIZE  0x90DE
+#define GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT 0x90DF
+#define GL_SHADER_STORAGE_BARRIER_BIT     0x00002000
+#define GL_MAX_COMBINED_SHADER_OUTPUT_RESOURCES 0x8F39
+#define GL_DEPTH_STENCIL_TEXTURE_MODE     0x90EA
+#define GL_TEXTURE_BUFFER_OFFSET          0x919D
+#define GL_TEXTURE_BUFFER_SIZE            0x919E
+#define GL_TEXTURE_BUFFER_OFFSET_ALIGNMENT 0x919F
+#define GL_TEXTURE_VIEW_MIN_LEVEL         0x82DB
+#define GL_TEXTURE_VIEW_NUM_LEVELS        0x82DC
+#define GL_TEXTURE_VIEW_MIN_LAYER         0x82DD
+#define GL_TEXTURE_VIEW_NUM_LAYERS        0x82DE
+#define GL_TEXTURE_IMMUTABLE_LEVELS       0x82DF
+#define GL_VERTEX_ATTRIB_BINDING          0x82D4
+#define GL_VERTEX_ATTRIB_RELATIVE_OFFSET  0x82D5
+#define GL_VERTEX_BINDING_DIVISOR         0x82D6
+#define GL_VERTEX_BINDING_OFFSET          0x82D7
+#define GL_VERTEX_BINDING_STRIDE          0x82D8
+#define GL_MAX_VERTEX_ATTRIB_RELATIVE_OFFSET 0x82D9
+#define GL_MAX_VERTEX_ATTRIB_BINDINGS     0x82DA
+#define GL_DISPLAY_LIST                   0x82E7
+typedef void (APIENTRYP PFNGLCLEARBUFFERDATAPROC) (GLenum target, GLenum internalformat, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLCLEARBUFFERSUBDATAPROC) (GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLDISPATCHCOMPUTEPROC) (GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z);
+typedef void (APIENTRYP PFNGLDISPATCHCOMPUTEINDIRECTPROC) (GLintptr indirect);
+typedef void (APIENTRYP PFNGLCOPYIMAGESUBDATAPROC) (GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei srcWidth, GLsizei srcHeight, GLsizei srcDepth);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERPARAMETERIPROC) (GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLGETFRAMEBUFFERPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETINTERNALFORMATI64VPROC) (GLenum target, GLenum internalformat, GLenum pname, GLsizei bufSize, GLint64 *params);
+typedef void (APIENTRYP PFNGLINVALIDATETEXSUBIMAGEPROC) (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth);
+typedef void (APIENTRYP PFNGLINVALIDATETEXIMAGEPROC) (GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLINVALIDATEBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length);
+typedef void (APIENTRYP PFNGLINVALIDATEBUFFERDATAPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLINVALIDATEFRAMEBUFFERPROC) (GLenum target, GLsizei numAttachments, const GLenum *attachments);
+typedef void (APIENTRYP PFNGLINVALIDATESUBFRAMEBUFFERPROC) (GLenum target, GLsizei numAttachments, const GLenum *attachments, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSINDIRECTPROC) (GLenum mode, const void *indirect, GLsizei drawcount, GLsizei stride);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSINDIRECTPROC) (GLenum mode, GLenum type, const void *indirect, GLsizei drawcount, GLsizei stride);
+typedef void (APIENTRYP PFNGLGETPROGRAMINTERFACEIVPROC) (GLuint program, GLenum programInterface, GLenum pname, GLint *params);
+typedef GLuint (APIENTRYP PFNGLGETPROGRAMRESOURCEINDEXPROC) (GLuint program, GLenum programInterface, const GLchar *name);
+typedef void (APIENTRYP PFNGLGETPROGRAMRESOURCENAMEPROC) (GLuint program, GLenum programInterface, GLuint index, GLsizei bufSize, GLsizei *length, GLchar *name);
+typedef void (APIENTRYP PFNGLGETPROGRAMRESOURCEIVPROC) (GLuint program, GLenum programInterface, GLuint index, GLsizei propCount, const GLenum *props, GLsizei bufSize, GLsizei *length, GLint *params);
+typedef GLint (APIENTRYP PFNGLGETPROGRAMRESOURCELOCATIONPROC) (GLuint program, GLenum programInterface, const GLchar *name);
+typedef GLint (APIENTRYP PFNGLGETPROGRAMRESOURCELOCATIONINDEXPROC) (GLuint program, GLenum programInterface, const GLchar *name);
+typedef void (APIENTRYP PFNGLSHADERSTORAGEBLOCKBINDINGPROC) (GLuint program, GLuint storageBlockIndex, GLuint storageBlockBinding);
+typedef void (APIENTRYP PFNGLTEXBUFFERRANGEPROC) (GLenum target, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLTEXSTORAGE2DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
+typedef void (APIENTRYP PFNGLTEXSTORAGE3DMULTISAMPLEPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations);
+typedef void (APIENTRYP PFNGLTEXTUREVIEWPROC) (GLuint texture, GLenum target, GLuint origtexture, GLenum internalformat, GLuint minlevel, GLuint numlevels, GLuint minlayer, GLuint numlayers);
+typedef void (APIENTRYP PFNGLBINDVERTEXBUFFERPROC) (GLuint bindingindex, GLuint buffer, GLintptr offset, GLsizei stride);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBFORMATPROC) (GLuint attribindex, GLint size, GLenum type, GLboolean normalized, GLuint relativeoffset);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBIFORMATPROC) (GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBLFORMATPROC) (GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBBINDINGPROC) (GLuint attribindex, GLuint bindingindex);
+typedef void (APIENTRYP PFNGLVERTEXBINDINGDIVISORPROC) (GLuint bindingindex, GLuint divisor);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGECONTROLPROC) (GLenum source, GLenum type, GLenum severity, GLsizei count, const GLuint *ids, GLboolean enabled);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGEINSERTPROC) (GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar *buf);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGECALLBACKPROC) (GLDEBUGPROC callback, const void *userParam);
+typedef GLuint (APIENTRYP PFNGLGETDEBUGMESSAGELOGPROC) (GLuint count, GLsizei bufSize, GLenum *sources, GLenum *types, GLuint *ids, GLenum *severities, GLsizei *lengths, GLchar *messageLog);
+typedef void (APIENTRYP PFNGLPUSHDEBUGGROUPPROC) (GLenum source, GLuint id, GLsizei length, const GLchar *message);
+typedef void (APIENTRYP PFNGLPOPDEBUGGROUPPROC) (void);
+typedef void (APIENTRYP PFNGLOBJECTLABELPROC) (GLenum identifier, GLuint name, GLsizei length, const GLchar *label);
+typedef void (APIENTRYP PFNGLGETOBJECTLABELPROC) (GLenum identifier, GLuint name, GLsizei bufSize, GLsizei *length, GLchar *label);
+typedef void (APIENTRYP PFNGLOBJECTPTRLABELPROC) (const void *ptr, GLsizei length, const GLchar *label);
+typedef void (APIENTRYP PFNGLGETOBJECTPTRLABELPROC) (const void *ptr, GLsizei bufSize, GLsizei *length, GLchar *label);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glClearBufferData (GLenum target, GLenum internalformat, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glClearBufferSubData (GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glDispatchCompute (GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z);
+GLAPI void APIENTRY glDispatchComputeIndirect (GLintptr indirect);
+GLAPI void APIENTRY glCopyImageSubData (GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei srcWidth, GLsizei srcHeight, GLsizei srcDepth);
+GLAPI void APIENTRY glFramebufferParameteri (GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glGetFramebufferParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetInternalformati64v (GLenum target, GLenum internalformat, GLenum pname, GLsizei bufSize, GLint64 *params);
+GLAPI void APIENTRY glInvalidateTexSubImage (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth);
+GLAPI void APIENTRY glInvalidateTexImage (GLuint texture, GLint level);
+GLAPI void APIENTRY glInvalidateBufferSubData (GLuint buffer, GLintptr offset, GLsizeiptr length);
+GLAPI void APIENTRY glInvalidateBufferData (GLuint buffer);
+GLAPI void APIENTRY glInvalidateFramebuffer (GLenum target, GLsizei numAttachments, const GLenum *attachments);
+GLAPI void APIENTRY glInvalidateSubFramebuffer (GLenum target, GLsizei numAttachments, const GLenum *attachments, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glMultiDrawArraysIndirect (GLenum mode, const void *indirect, GLsizei drawcount, GLsizei stride);
+GLAPI void APIENTRY glMultiDrawElementsIndirect (GLenum mode, GLenum type, const void *indirect, GLsizei drawcount, GLsizei stride);
+GLAPI void APIENTRY glGetProgramInterfaceiv (GLuint program, GLenum programInterface, GLenum pname, GLint *params);
+GLAPI GLuint APIENTRY glGetProgramResourceIndex (GLuint program, GLenum programInterface, const GLchar *name);
+GLAPI void APIENTRY glGetProgramResourceName (GLuint program, GLenum programInterface, GLuint index, GLsizei bufSize, GLsizei *length, GLchar *name);
+GLAPI void APIENTRY glGetProgramResourceiv (GLuint program, GLenum programInterface, GLuint index, GLsizei propCount, const GLenum *props, GLsizei bufSize, GLsizei *length, GLint *params);
+GLAPI GLint APIENTRY glGetProgramResourceLocation (GLuint program, GLenum programInterface, const GLchar *name);
+GLAPI GLint APIENTRY glGetProgramResourceLocationIndex (GLuint program, GLenum programInterface, const GLchar *name);
+GLAPI void APIENTRY glShaderStorageBlockBinding (GLuint program, GLuint storageBlockIndex, GLuint storageBlockBinding);
+GLAPI void APIENTRY glTexBufferRange (GLenum target, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizeiptr size);
+GLAPI void APIENTRY glTexStorage2DMultisample (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
+GLAPI void APIENTRY glTexStorage3DMultisample (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations);
+GLAPI void APIENTRY glTextureView (GLuint texture, GLenum target, GLuint origtexture, GLenum internalformat, GLuint minlevel, GLuint numlevels, GLuint minlayer, GLuint numlayers);
+GLAPI void APIENTRY glBindVertexBuffer (GLuint bindingindex, GLuint buffer, GLintptr offset, GLsizei stride);
+GLAPI void APIENTRY glVertexAttribFormat (GLuint attribindex, GLint size, GLenum type, GLboolean normalized, GLuint relativeoffset);
+GLAPI void APIENTRY glVertexAttribIFormat (GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+GLAPI void APIENTRY glVertexAttribLFormat (GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+GLAPI void APIENTRY glVertexAttribBinding (GLuint attribindex, GLuint bindingindex);
+GLAPI void APIENTRY glVertexBindingDivisor (GLuint bindingindex, GLuint divisor);
+GLAPI void APIENTRY glDebugMessageControl (GLenum source, GLenum type, GLenum severity, GLsizei count, const GLuint *ids, GLboolean enabled);
+GLAPI void APIENTRY glDebugMessageInsert (GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar *buf);
+GLAPI void APIENTRY glDebugMessageCallback (GLDEBUGPROC callback, const void *userParam);
+GLAPI GLuint APIENTRY glGetDebugMessageLog (GLuint count, GLsizei bufSize, GLenum *sources, GLenum *types, GLuint *ids, GLenum *severities, GLsizei *lengths, GLchar *messageLog);
+GLAPI void APIENTRY glPushDebugGroup (GLenum source, GLuint id, GLsizei length, const GLchar *message);
+GLAPI void APIENTRY glPopDebugGroup (void);
+GLAPI void APIENTRY glObjectLabel (GLenum identifier, GLuint name, GLsizei length, const GLchar *label);
+GLAPI void APIENTRY glGetObjectLabel (GLenum identifier, GLuint name, GLsizei bufSize, GLsizei *length, GLchar *label);
+GLAPI void APIENTRY glObjectPtrLabel (const void *ptr, GLsizei length, const GLchar *label);
+GLAPI void APIENTRY glGetObjectPtrLabel (const void *ptr, GLsizei bufSize, GLsizei *length, GLchar *label);
+#endif
+#endif /* GL_VERSION_4_3 */
+
+#ifndef GL_VERSION_4_4
+#define GL_VERSION_4_4 1
+#define GL_MAX_VERTEX_ATTRIB_STRIDE       0x82E5
+#define GL_PRIMITIVE_RESTART_FOR_PATCHES_SUPPORTED 0x8221
+#define GL_TEXTURE_BUFFER_BINDING         0x8C2A
+#define GL_MAP_PERSISTENT_BIT             0x0040
+#define GL_MAP_COHERENT_BIT               0x0080
+#define GL_DYNAMIC_STORAGE_BIT            0x0100
+#define GL_CLIENT_STORAGE_BIT             0x0200
+#define GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT 0x00004000
+#define GL_BUFFER_IMMUTABLE_STORAGE       0x821F
+#define GL_BUFFER_STORAGE_FLAGS           0x8220
+#define GL_CLEAR_TEXTURE                  0x9365
+#define GL_LOCATION_COMPONENT             0x934A
+#define GL_TRANSFORM_FEEDBACK_BUFFER_INDEX 0x934B
+#define GL_TRANSFORM_FEEDBACK_BUFFER_STRIDE 0x934C
+#define GL_QUERY_BUFFER                   0x9192
+#define GL_QUERY_BUFFER_BARRIER_BIT       0x00008000
+#define GL_QUERY_BUFFER_BINDING           0x9193
+#define GL_QUERY_RESULT_NO_WAIT           0x9194
+#define GL_MIRROR_CLAMP_TO_EDGE           0x8743
+typedef void (APIENTRYP PFNGLBUFFERSTORAGEPROC) (GLenum target, GLsizeiptr size, const void *data, GLbitfield flags);
+typedef void (APIENTRYP PFNGLCLEARTEXIMAGEPROC) (GLuint texture, GLint level, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLCLEARTEXSUBIMAGEPROC) (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLBINDBUFFERSBASEPROC) (GLenum target, GLuint first, GLsizei count, const GLuint *buffers);
+typedef void (APIENTRYP PFNGLBINDBUFFERSRANGEPROC) (GLenum target, GLuint first, GLsizei count, const GLuint *buffers, const GLintptr *offsets, const GLsizeiptr *sizes);
+typedef void (APIENTRYP PFNGLBINDTEXTURESPROC) (GLuint first, GLsizei count, const GLuint *textures);
+typedef void (APIENTRYP PFNGLBINDSAMPLERSPROC) (GLuint first, GLsizei count, const GLuint *samplers);
+typedef void (APIENTRYP PFNGLBINDIMAGETEXTURESPROC) (GLuint first, GLsizei count, const GLuint *textures);
+typedef void (APIENTRYP PFNGLBINDVERTEXBUFFERSPROC) (GLuint first, GLsizei count, const GLuint *buffers, const GLintptr *offsets, const GLsizei *strides);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBufferStorage (GLenum target, GLsizeiptr size, const void *data, GLbitfield flags);
+GLAPI void APIENTRY glClearTexImage (GLuint texture, GLint level, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glClearTexSubImage (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glBindBuffersBase (GLenum target, GLuint first, GLsizei count, const GLuint *buffers);
+GLAPI void APIENTRY glBindBuffersRange (GLenum target, GLuint first, GLsizei count, const GLuint *buffers, const GLintptr *offsets, const GLsizeiptr *sizes);
+GLAPI void APIENTRY glBindTextures (GLuint first, GLsizei count, const GLuint *textures);
+GLAPI void APIENTRY glBindSamplers (GLuint first, GLsizei count, const GLuint *samplers);
+GLAPI void APIENTRY glBindImageTextures (GLuint first, GLsizei count, const GLuint *textures);
+GLAPI void APIENTRY glBindVertexBuffers (GLuint first, GLsizei count, const GLuint *buffers, const GLintptr *offsets, const GLsizei *strides);
+#endif
+#endif /* GL_VERSION_4_4 */
+
+#ifndef GL_ARB_ES2_compatibility
+#define GL_ARB_ES2_compatibility 1
+#endif /* GL_ARB_ES2_compatibility */
+
+#ifndef GL_ARB_ES3_compatibility
+#define GL_ARB_ES3_compatibility 1
+#endif /* GL_ARB_ES3_compatibility */
+
+#ifndef GL_ARB_arrays_of_arrays
+#define GL_ARB_arrays_of_arrays 1
+#endif /* GL_ARB_arrays_of_arrays */
+
+#ifndef GL_ARB_base_instance
+#define GL_ARB_base_instance 1
+#endif /* GL_ARB_base_instance */
+
+#ifndef GL_ARB_bindless_texture
+#define GL_ARB_bindless_texture 1
+typedef uint64_t GLuint64EXT;
+#define GL_UNSIGNED_INT64_ARB             0x140F
+typedef GLuint64 (APIENTRYP PFNGLGETTEXTUREHANDLEARBPROC) (GLuint texture);
+typedef GLuint64 (APIENTRYP PFNGLGETTEXTURESAMPLERHANDLEARBPROC) (GLuint texture, GLuint sampler);
+typedef void (APIENTRYP PFNGLMAKETEXTUREHANDLERESIDENTARBPROC) (GLuint64 handle);
+typedef void (APIENTRYP PFNGLMAKETEXTUREHANDLENONRESIDENTARBPROC) (GLuint64 handle);
+typedef GLuint64 (APIENTRYP PFNGLGETIMAGEHANDLEARBPROC) (GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum format);
+typedef void (APIENTRYP PFNGLMAKEIMAGEHANDLERESIDENTARBPROC) (GLuint64 handle, GLenum access);
+typedef void (APIENTRYP PFNGLMAKEIMAGEHANDLENONRESIDENTARBPROC) (GLuint64 handle);
+typedef void (APIENTRYP PFNGLUNIFORMHANDLEUI64ARBPROC) (GLint location, GLuint64 value);
+typedef void (APIENTRYP PFNGLUNIFORMHANDLEUI64VARBPROC) (GLint location, GLsizei count, const GLuint64 *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMHANDLEUI64ARBPROC) (GLuint program, GLint location, GLuint64 value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMHANDLEUI64VARBPROC) (GLuint program, GLint location, GLsizei count, const GLuint64 *values);
+typedef GLboolean (APIENTRYP PFNGLISTEXTUREHANDLERESIDENTARBPROC) (GLuint64 handle);
+typedef GLboolean (APIENTRYP PFNGLISIMAGEHANDLERESIDENTARBPROC) (GLuint64 handle);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1UI64ARBPROC) (GLuint index, GLuint64EXT x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1UI64VARBPROC) (GLuint index, const GLuint64EXT *v);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBLUI64VARBPROC) (GLuint index, GLenum pname, GLuint64EXT *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLuint64 APIENTRY glGetTextureHandleARB (GLuint texture);
+GLAPI GLuint64 APIENTRY glGetTextureSamplerHandleARB (GLuint texture, GLuint sampler);
+GLAPI void APIENTRY glMakeTextureHandleResidentARB (GLuint64 handle);
+GLAPI void APIENTRY glMakeTextureHandleNonResidentARB (GLuint64 handle);
+GLAPI GLuint64 APIENTRY glGetImageHandleARB (GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum format);
+GLAPI void APIENTRY glMakeImageHandleResidentARB (GLuint64 handle, GLenum access);
+GLAPI void APIENTRY glMakeImageHandleNonResidentARB (GLuint64 handle);
+GLAPI void APIENTRY glUniformHandleui64ARB (GLint location, GLuint64 value);
+GLAPI void APIENTRY glUniformHandleui64vARB (GLint location, GLsizei count, const GLuint64 *value);
+GLAPI void APIENTRY glProgramUniformHandleui64ARB (GLuint program, GLint location, GLuint64 value);
+GLAPI void APIENTRY glProgramUniformHandleui64vARB (GLuint program, GLint location, GLsizei count, const GLuint64 *values);
+GLAPI GLboolean APIENTRY glIsTextureHandleResidentARB (GLuint64 handle);
+GLAPI GLboolean APIENTRY glIsImageHandleResidentARB (GLuint64 handle);
+GLAPI void APIENTRY glVertexAttribL1ui64ARB (GLuint index, GLuint64EXT x);
+GLAPI void APIENTRY glVertexAttribL1ui64vARB (GLuint index, const GLuint64EXT *v);
+GLAPI void APIENTRY glGetVertexAttribLui64vARB (GLuint index, GLenum pname, GLuint64EXT *params);
+#endif
+#endif /* GL_ARB_bindless_texture */
+
+#ifndef GL_ARB_blend_func_extended
+#define GL_ARB_blend_func_extended 1
+#endif /* GL_ARB_blend_func_extended */
+
+#ifndef GL_ARB_buffer_storage
+#define GL_ARB_buffer_storage 1
+#endif /* GL_ARB_buffer_storage */
+
+#ifndef GL_ARB_cl_event
+#define GL_ARB_cl_event 1
+struct _cl_context;
+struct _cl_event;
+#define GL_SYNC_CL_EVENT_ARB              0x8240
+#define GL_SYNC_CL_EVENT_COMPLETE_ARB     0x8241
+typedef GLsync (APIENTRYP PFNGLCREATESYNCFROMCLEVENTARBPROC) (struct _cl_context *context, struct _cl_event *event, GLbitfield flags);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLsync APIENTRY glCreateSyncFromCLeventARB (struct _cl_context *context, struct _cl_event *event, GLbitfield flags);
+#endif
+#endif /* GL_ARB_cl_event */
+
+#ifndef GL_ARB_clear_buffer_object
+#define GL_ARB_clear_buffer_object 1
+#endif /* GL_ARB_clear_buffer_object */
+
+#ifndef GL_ARB_clear_texture
+#define GL_ARB_clear_texture 1
+#endif /* GL_ARB_clear_texture */
+
+#ifndef GL_ARB_color_buffer_float
+#define GL_ARB_color_buffer_float 1
+#define GL_RGBA_FLOAT_MODE_ARB            0x8820
+#define GL_CLAMP_VERTEX_COLOR_ARB         0x891A
+#define GL_CLAMP_FRAGMENT_COLOR_ARB       0x891B
+#define GL_CLAMP_READ_COLOR_ARB           0x891C
+#define GL_FIXED_ONLY_ARB                 0x891D
+typedef void (APIENTRYP PFNGLCLAMPCOLORARBPROC) (GLenum target, GLenum clamp);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glClampColorARB (GLenum target, GLenum clamp);
+#endif
+#endif /* GL_ARB_color_buffer_float */
+
+#ifndef GL_ARB_compatibility
+#define GL_ARB_compatibility 1
+#endif /* GL_ARB_compatibility */
+
+#ifndef GL_ARB_compressed_texture_pixel_storage
+#define GL_ARB_compressed_texture_pixel_storage 1
+#endif /* GL_ARB_compressed_texture_pixel_storage */
+
+#ifndef GL_ARB_compute_shader
+#define GL_ARB_compute_shader 1
+#define GL_COMPUTE_SHADER_BIT             0x00000020
+#endif /* GL_ARB_compute_shader */
+
+#ifndef GL_ARB_compute_variable_group_size
+#define GL_ARB_compute_variable_group_size 1
+#define GL_MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB 0x9344
+#define GL_MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB 0x90EB
+#define GL_MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB 0x9345
+#define GL_MAX_COMPUTE_FIXED_GROUP_SIZE_ARB 0x91BF
+typedef void (APIENTRYP PFNGLDISPATCHCOMPUTEGROUPSIZEARBPROC) (GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z, GLuint group_size_x, GLuint group_size_y, GLuint group_size_z);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDispatchComputeGroupSizeARB (GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z, GLuint group_size_x, GLuint group_size_y, GLuint group_size_z);
+#endif
+#endif /* GL_ARB_compute_variable_group_size */
+
+#ifndef GL_ARB_conservative_depth
+#define GL_ARB_conservative_depth 1
+#endif /* GL_ARB_conservative_depth */
+
+#ifndef GL_ARB_copy_buffer
+#define GL_ARB_copy_buffer 1
+#define GL_COPY_READ_BUFFER_BINDING       0x8F36
+#define GL_COPY_WRITE_BUFFER_BINDING      0x8F37
+#endif /* GL_ARB_copy_buffer */
+
+#ifndef GL_ARB_copy_image
+#define GL_ARB_copy_image 1
+#endif /* GL_ARB_copy_image */
+
+#ifndef GL_ARB_debug_output
+#define GL_ARB_debug_output 1
+typedef void (APIENTRY  *GLDEBUGPROCARB)(GLenum source,GLenum type,GLuint id,GLenum severity,GLsizei length,const GLchar *message,const void *userParam);
+#define GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB   0x8242
+#define GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH_ARB 0x8243
+#define GL_DEBUG_CALLBACK_FUNCTION_ARB    0x8244
+#define GL_DEBUG_CALLBACK_USER_PARAM_ARB  0x8245
+#define GL_DEBUG_SOURCE_API_ARB           0x8246
+#define GL_DEBUG_SOURCE_WINDOW_SYSTEM_ARB 0x8247
+#define GL_DEBUG_SOURCE_SHADER_COMPILER_ARB 0x8248
+#define GL_DEBUG_SOURCE_THIRD_PARTY_ARB   0x8249
+#define GL_DEBUG_SOURCE_APPLICATION_ARB   0x824A
+#define GL_DEBUG_SOURCE_OTHER_ARB         0x824B
+#define GL_DEBUG_TYPE_ERROR_ARB           0x824C
+#define GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR_ARB 0x824D
+#define GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR_ARB 0x824E
+#define GL_DEBUG_TYPE_PORTABILITY_ARB     0x824F
+#define GL_DEBUG_TYPE_PERFORMANCE_ARB     0x8250
+#define GL_DEBUG_TYPE_OTHER_ARB           0x8251
+#define GL_MAX_DEBUG_MESSAGE_LENGTH_ARB   0x9143
+#define GL_MAX_DEBUG_LOGGED_MESSAGES_ARB  0x9144
+#define GL_DEBUG_LOGGED_MESSAGES_ARB      0x9145
+#define GL_DEBUG_SEVERITY_HIGH_ARB        0x9146
+#define GL_DEBUG_SEVERITY_MEDIUM_ARB      0x9147
+#define GL_DEBUG_SEVERITY_LOW_ARB         0x9148
+typedef void (APIENTRYP PFNGLDEBUGMESSAGECONTROLARBPROC) (GLenum source, GLenum type, GLenum severity, GLsizei count, const GLuint *ids, GLboolean enabled);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGEINSERTARBPROC) (GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar *buf);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGECALLBACKARBPROC) (GLDEBUGPROCARB callback, const void *userParam);
+typedef GLuint (APIENTRYP PFNGLGETDEBUGMESSAGELOGARBPROC) (GLuint count, GLsizei bufSize, GLenum *sources, GLenum *types, GLuint *ids, GLenum *severities, GLsizei *lengths, GLchar *messageLog);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDebugMessageControlARB (GLenum source, GLenum type, GLenum severity, GLsizei count, const GLuint *ids, GLboolean enabled);
+GLAPI void APIENTRY glDebugMessageInsertARB (GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar *buf);
+GLAPI void APIENTRY glDebugMessageCallbackARB (GLDEBUGPROCARB callback, const void *userParam);
+GLAPI GLuint APIENTRY glGetDebugMessageLogARB (GLuint count, GLsizei bufSize, GLenum *sources, GLenum *types, GLuint *ids, GLenum *severities, GLsizei *lengths, GLchar *messageLog);
+#endif
+#endif /* GL_ARB_debug_output */
+
+#ifndef GL_ARB_depth_buffer_float
+#define GL_ARB_depth_buffer_float 1
+#endif /* GL_ARB_depth_buffer_float */
+
+#ifndef GL_ARB_depth_clamp
+#define GL_ARB_depth_clamp 1
+#endif /* GL_ARB_depth_clamp */
+
+#ifndef GL_ARB_depth_texture
+#define GL_ARB_depth_texture 1
+#define GL_DEPTH_COMPONENT16_ARB          0x81A5
+#define GL_DEPTH_COMPONENT24_ARB          0x81A6
+#define GL_DEPTH_COMPONENT32_ARB          0x81A7
+#define GL_TEXTURE_DEPTH_SIZE_ARB         0x884A
+#define GL_DEPTH_TEXTURE_MODE_ARB         0x884B
+#endif /* GL_ARB_depth_texture */
+
+#ifndef GL_ARB_draw_buffers
+#define GL_ARB_draw_buffers 1
+#define GL_MAX_DRAW_BUFFERS_ARB           0x8824
+#define GL_DRAW_BUFFER0_ARB               0x8825
+#define GL_DRAW_BUFFER1_ARB               0x8826
+#define GL_DRAW_BUFFER2_ARB               0x8827
+#define GL_DRAW_BUFFER3_ARB               0x8828
+#define GL_DRAW_BUFFER4_ARB               0x8829
+#define GL_DRAW_BUFFER5_ARB               0x882A
+#define GL_DRAW_BUFFER6_ARB               0x882B
+#define GL_DRAW_BUFFER7_ARB               0x882C
+#define GL_DRAW_BUFFER8_ARB               0x882D
+#define GL_DRAW_BUFFER9_ARB               0x882E
+#define GL_DRAW_BUFFER10_ARB              0x882F
+#define GL_DRAW_BUFFER11_ARB              0x8830
+#define GL_DRAW_BUFFER12_ARB              0x8831
+#define GL_DRAW_BUFFER13_ARB              0x8832
+#define GL_DRAW_BUFFER14_ARB              0x8833
+#define GL_DRAW_BUFFER15_ARB              0x8834
+typedef void (APIENTRYP PFNGLDRAWBUFFERSARBPROC) (GLsizei n, const GLenum *bufs);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawBuffersARB (GLsizei n, const GLenum *bufs);
+#endif
+#endif /* GL_ARB_draw_buffers */
+
+#ifndef GL_ARB_draw_buffers_blend
+#define GL_ARB_draw_buffers_blend 1
+typedef void (APIENTRYP PFNGLBLENDEQUATIONIARBPROC) (GLuint buf, GLenum mode);
+typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEIARBPROC) (GLuint buf, GLenum modeRGB, GLenum modeAlpha);
+typedef void (APIENTRYP PFNGLBLENDFUNCIARBPROC) (GLuint buf, GLenum src, GLenum dst);
+typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEIARBPROC) (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendEquationiARB (GLuint buf, GLenum mode);
+GLAPI void APIENTRY glBlendEquationSeparateiARB (GLuint buf, GLenum modeRGB, GLenum modeAlpha);
+GLAPI void APIENTRY glBlendFunciARB (GLuint buf, GLenum src, GLenum dst);
+GLAPI void APIENTRY glBlendFuncSeparateiARB (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha);
+#endif
+#endif /* GL_ARB_draw_buffers_blend */
+
+#ifndef GL_ARB_draw_elements_base_vertex
+#define GL_ARB_draw_elements_base_vertex 1
+#endif /* GL_ARB_draw_elements_base_vertex */
+
+#ifndef GL_ARB_draw_indirect
+#define GL_ARB_draw_indirect 1
+#endif /* GL_ARB_draw_indirect */
+
+#ifndef GL_ARB_draw_instanced
+#define GL_ARB_draw_instanced 1
+typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDARBPROC) (GLenum mode, GLint first, GLsizei count, GLsizei primcount);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDARBPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei primcount);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawArraysInstancedARB (GLenum mode, GLint first, GLsizei count, GLsizei primcount);
+GLAPI void APIENTRY glDrawElementsInstancedARB (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei primcount);
+#endif
+#endif /* GL_ARB_draw_instanced */
+
+#ifndef GL_ARB_enhanced_layouts
+#define GL_ARB_enhanced_layouts 1
+#endif /* GL_ARB_enhanced_layouts */
+
+#ifndef GL_ARB_explicit_attrib_location
+#define GL_ARB_explicit_attrib_location 1
+#endif /* GL_ARB_explicit_attrib_location */
+
+#ifndef GL_ARB_explicit_uniform_location
+#define GL_ARB_explicit_uniform_location 1
+#endif /* GL_ARB_explicit_uniform_location */
+
+#ifndef GL_ARB_fragment_coord_conventions
+#define GL_ARB_fragment_coord_conventions 1
+#endif /* GL_ARB_fragment_coord_conventions */
+
+#ifndef GL_ARB_fragment_layer_viewport
+#define GL_ARB_fragment_layer_viewport 1
+#endif /* GL_ARB_fragment_layer_viewport */
+
+#ifndef GL_ARB_fragment_program
+#define GL_ARB_fragment_program 1
+#define GL_FRAGMENT_PROGRAM_ARB           0x8804
+#define GL_PROGRAM_FORMAT_ASCII_ARB       0x8875
+#define GL_PROGRAM_LENGTH_ARB             0x8627
+#define GL_PROGRAM_FORMAT_ARB             0x8876
+#define GL_PROGRAM_BINDING_ARB            0x8677
+#define GL_PROGRAM_INSTRUCTIONS_ARB       0x88A0
+#define GL_MAX_PROGRAM_INSTRUCTIONS_ARB   0x88A1
+#define GL_PROGRAM_NATIVE_INSTRUCTIONS_ARB 0x88A2
+#define GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB 0x88A3
+#define GL_PROGRAM_TEMPORARIES_ARB        0x88A4
+#define GL_MAX_PROGRAM_TEMPORARIES_ARB    0x88A5
+#define GL_PROGRAM_NATIVE_TEMPORARIES_ARB 0x88A6
+#define GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB 0x88A7
+#define GL_PROGRAM_PARAMETERS_ARB         0x88A8
+#define GL_MAX_PROGRAM_PARAMETERS_ARB     0x88A9
+#define GL_PROGRAM_NATIVE_PARAMETERS_ARB  0x88AA
+#define GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB 0x88AB
+#define GL_PROGRAM_ATTRIBS_ARB            0x88AC
+#define GL_MAX_PROGRAM_ATTRIBS_ARB        0x88AD
+#define GL_PROGRAM_NATIVE_ATTRIBS_ARB     0x88AE
+#define GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB 0x88AF
+#define GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB 0x88B4
+#define GL_MAX_PROGRAM_ENV_PARAMETERS_ARB 0x88B5
+#define GL_PROGRAM_UNDER_NATIVE_LIMITS_ARB 0x88B6
+#define GL_PROGRAM_ALU_INSTRUCTIONS_ARB   0x8805
+#define GL_PROGRAM_TEX_INSTRUCTIONS_ARB   0x8806
+#define GL_PROGRAM_TEX_INDIRECTIONS_ARB   0x8807
+#define GL_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB 0x8808
+#define GL_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB 0x8809
+#define GL_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB 0x880A
+#define GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB 0x880B
+#define GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB 0x880C
+#define GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB 0x880D
+#define GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB 0x880E
+#define GL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB 0x880F
+#define GL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB 0x8810
+#define GL_PROGRAM_STRING_ARB             0x8628
+#define GL_PROGRAM_ERROR_POSITION_ARB     0x864B
+#define GL_CURRENT_MATRIX_ARB             0x8641
+#define GL_TRANSPOSE_CURRENT_MATRIX_ARB   0x88B7
+#define GL_CURRENT_MATRIX_STACK_DEPTH_ARB 0x8640
+#define GL_MAX_PROGRAM_MATRICES_ARB       0x862F
+#define GL_MAX_PROGRAM_MATRIX_STACK_DEPTH_ARB 0x862E
+#define GL_MAX_TEXTURE_COORDS_ARB         0x8871
+#define GL_MAX_TEXTURE_IMAGE_UNITS_ARB    0x8872
+#define GL_PROGRAM_ERROR_STRING_ARB       0x8874
+#define GL_MATRIX0_ARB                    0x88C0
+#define GL_MATRIX1_ARB                    0x88C1
+#define GL_MATRIX2_ARB                    0x88C2
+#define GL_MATRIX3_ARB                    0x88C3
+#define GL_MATRIX4_ARB                    0x88C4
+#define GL_MATRIX5_ARB                    0x88C5
+#define GL_MATRIX6_ARB                    0x88C6
+#define GL_MATRIX7_ARB                    0x88C7
+#define GL_MATRIX8_ARB                    0x88C8
+#define GL_MATRIX9_ARB                    0x88C9
+#define GL_MATRIX10_ARB                   0x88CA
+#define GL_MATRIX11_ARB                   0x88CB
+#define GL_MATRIX12_ARB                   0x88CC
+#define GL_MATRIX13_ARB                   0x88CD
+#define GL_MATRIX14_ARB                   0x88CE
+#define GL_MATRIX15_ARB                   0x88CF
+#define GL_MATRIX16_ARB                   0x88D0
+#define GL_MATRIX17_ARB                   0x88D1
+#define GL_MATRIX18_ARB                   0x88D2
+#define GL_MATRIX19_ARB                   0x88D3
+#define GL_MATRIX20_ARB                   0x88D4
+#define GL_MATRIX21_ARB                   0x88D5
+#define GL_MATRIX22_ARB                   0x88D6
+#define GL_MATRIX23_ARB                   0x88D7
+#define GL_MATRIX24_ARB                   0x88D8
+#define GL_MATRIX25_ARB                   0x88D9
+#define GL_MATRIX26_ARB                   0x88DA
+#define GL_MATRIX27_ARB                   0x88DB
+#define GL_MATRIX28_ARB                   0x88DC
+#define GL_MATRIX29_ARB                   0x88DD
+#define GL_MATRIX30_ARB                   0x88DE
+#define GL_MATRIX31_ARB                   0x88DF
+typedef void (APIENTRYP PFNGLPROGRAMSTRINGARBPROC) (GLenum target, GLenum format, GLsizei len, const void *string);
+typedef void (APIENTRYP PFNGLBINDPROGRAMARBPROC) (GLenum target, GLuint program);
+typedef void (APIENTRYP PFNGLDELETEPROGRAMSARBPROC) (GLsizei n, const GLuint *programs);
+typedef void (APIENTRYP PFNGLGENPROGRAMSARBPROC) (GLsizei n, GLuint *programs);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4DARBPROC) (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4DVARBPROC) (GLenum target, GLuint index, const GLdouble *params);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4FARBPROC) (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETER4FVARBPROC) (GLenum target, GLuint index, const GLfloat *params);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4DARBPROC) (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4DVARBPROC) (GLenum target, GLuint index, const GLdouble *params);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4FARBPROC) (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETER4FVARBPROC) (GLenum target, GLuint index, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERDVARBPROC) (GLenum target, GLuint index, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERFVARBPROC) (GLenum target, GLuint index, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERDVARBPROC) (GLenum target, GLuint index, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERFVARBPROC) (GLenum target, GLuint index, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMIVARBPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMSTRINGARBPROC) (GLenum target, GLenum pname, void *string);
+typedef GLboolean (APIENTRYP PFNGLISPROGRAMARBPROC) (GLuint program);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramStringARB (GLenum target, GLenum format, GLsizei len, const void *string);
+GLAPI void APIENTRY glBindProgramARB (GLenum target, GLuint program);
+GLAPI void APIENTRY glDeleteProgramsARB (GLsizei n, const GLuint *programs);
+GLAPI void APIENTRY glGenProgramsARB (GLsizei n, GLuint *programs);
+GLAPI void APIENTRY glProgramEnvParameter4dARB (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glProgramEnvParameter4dvARB (GLenum target, GLuint index, const GLdouble *params);
+GLAPI void APIENTRY glProgramEnvParameter4fARB (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glProgramEnvParameter4fvARB (GLenum target, GLuint index, const GLfloat *params);
+GLAPI void APIENTRY glProgramLocalParameter4dARB (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glProgramLocalParameter4dvARB (GLenum target, GLuint index, const GLdouble *params);
+GLAPI void APIENTRY glProgramLocalParameter4fARB (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glProgramLocalParameter4fvARB (GLenum target, GLuint index, const GLfloat *params);
+GLAPI void APIENTRY glGetProgramEnvParameterdvARB (GLenum target, GLuint index, GLdouble *params);
+GLAPI void APIENTRY glGetProgramEnvParameterfvARB (GLenum target, GLuint index, GLfloat *params);
+GLAPI void APIENTRY glGetProgramLocalParameterdvARB (GLenum target, GLuint index, GLdouble *params);
+GLAPI void APIENTRY glGetProgramLocalParameterfvARB (GLenum target, GLuint index, GLfloat *params);
+GLAPI void APIENTRY glGetProgramivARB (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetProgramStringARB (GLenum target, GLenum pname, void *string);
+GLAPI GLboolean APIENTRY glIsProgramARB (GLuint program);
+#endif
+#endif /* GL_ARB_fragment_program */
+
+#ifndef GL_ARB_fragment_program_shadow
+#define GL_ARB_fragment_program_shadow 1
+#endif /* GL_ARB_fragment_program_shadow */
+
+#ifndef GL_ARB_fragment_shader
+#define GL_ARB_fragment_shader 1
+#define GL_FRAGMENT_SHADER_ARB            0x8B30
+#define GL_MAX_FRAGMENT_UNIFORM_COMPONENTS_ARB 0x8B49
+#define GL_FRAGMENT_SHADER_DERIVATIVE_HINT_ARB 0x8B8B
+#endif /* GL_ARB_fragment_shader */
+
+#ifndef GL_ARB_framebuffer_no_attachments
+#define GL_ARB_framebuffer_no_attachments 1
+#endif /* GL_ARB_framebuffer_no_attachments */
+
+#ifndef GL_ARB_framebuffer_object
+#define GL_ARB_framebuffer_object 1
+#endif /* GL_ARB_framebuffer_object */
+
+#ifndef GL_ARB_framebuffer_sRGB
+#define GL_ARB_framebuffer_sRGB 1
+#endif /* GL_ARB_framebuffer_sRGB */
+
+#ifndef GL_ARB_geometry_shader4
+#define GL_ARB_geometry_shader4 1
+#define GL_LINES_ADJACENCY_ARB            0x000A
+#define GL_LINE_STRIP_ADJACENCY_ARB       0x000B
+#define GL_TRIANGLES_ADJACENCY_ARB        0x000C
+#define GL_TRIANGLE_STRIP_ADJACENCY_ARB   0x000D
+#define GL_PROGRAM_POINT_SIZE_ARB         0x8642
+#define GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS_ARB 0x8C29
+#define GL_FRAMEBUFFER_ATTACHMENT_LAYERED_ARB 0x8DA7
+#define GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS_ARB 0x8DA8
+#define GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB 0x8DA9
+#define GL_GEOMETRY_SHADER_ARB            0x8DD9
+#define GL_GEOMETRY_VERTICES_OUT_ARB      0x8DDA
+#define GL_GEOMETRY_INPUT_TYPE_ARB        0x8DDB
+#define GL_GEOMETRY_OUTPUT_TYPE_ARB       0x8DDC
+#define GL_MAX_GEOMETRY_VARYING_COMPONENTS_ARB 0x8DDD
+#define GL_MAX_VERTEX_VARYING_COMPONENTS_ARB 0x8DDE
+#define GL_MAX_GEOMETRY_UNIFORM_COMPONENTS_ARB 0x8DDF
+#define GL_MAX_GEOMETRY_OUTPUT_VERTICES_ARB 0x8DE0
+#define GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS_ARB 0x8DE1
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETERIARBPROC) (GLuint program, GLenum pname, GLint value);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREARBPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYERARBPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREFACEARBPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramParameteriARB (GLuint program, GLenum pname, GLint value);
+GLAPI void APIENTRY glFramebufferTextureARB (GLenum target, GLenum attachment, GLuint texture, GLint level);
+GLAPI void APIENTRY glFramebufferTextureLayerARB (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
+GLAPI void APIENTRY glFramebufferTextureFaceARB (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face);
+#endif
+#endif /* GL_ARB_geometry_shader4 */
+
+#ifndef GL_ARB_get_program_binary
+#define GL_ARB_get_program_binary 1
+#endif /* GL_ARB_get_program_binary */
+
+#ifndef GL_ARB_gpu_shader5
+#define GL_ARB_gpu_shader5 1
+#endif /* GL_ARB_gpu_shader5 */
+
+#ifndef GL_ARB_gpu_shader_fp64
+#define GL_ARB_gpu_shader_fp64 1
+#endif /* GL_ARB_gpu_shader_fp64 */
+
+#ifndef GL_ARB_half_float_pixel
+#define GL_ARB_half_float_pixel 1
+typedef unsigned short GLhalfARB;
+#define GL_HALF_FLOAT_ARB                 0x140B
+#endif /* GL_ARB_half_float_pixel */
+
+#ifndef GL_ARB_half_float_vertex
+#define GL_ARB_half_float_vertex 1
+#endif /* GL_ARB_half_float_vertex */
+
+#ifndef GL_ARB_imaging
+#define GL_ARB_imaging 1
+#define GL_BLEND_COLOR                    0x8005
+#define GL_BLEND_EQUATION                 0x8009
+#define GL_CONVOLUTION_1D                 0x8010
+#define GL_CONVOLUTION_2D                 0x8011
+#define GL_SEPARABLE_2D                   0x8012
+#define GL_CONVOLUTION_BORDER_MODE        0x8013
+#define GL_CONVOLUTION_FILTER_SCALE       0x8014
+#define GL_CONVOLUTION_FILTER_BIAS        0x8015
+#define GL_REDUCE                         0x8016
+#define GL_CONVOLUTION_FORMAT             0x8017
+#define GL_CONVOLUTION_WIDTH              0x8018
+#define GL_CONVOLUTION_HEIGHT             0x8019
+#define GL_MAX_CONVOLUTION_WIDTH          0x801A
+#define GL_MAX_CONVOLUTION_HEIGHT         0x801B
+#define GL_POST_CONVOLUTION_RED_SCALE     0x801C
+#define GL_POST_CONVOLUTION_GREEN_SCALE   0x801D
+#define GL_POST_CONVOLUTION_BLUE_SCALE    0x801E
+#define GL_POST_CONVOLUTION_ALPHA_SCALE   0x801F
+#define GL_POST_CONVOLUTION_RED_BIAS      0x8020
+#define GL_POST_CONVOLUTION_GREEN_BIAS    0x8021
+#define GL_POST_CONVOLUTION_BLUE_BIAS     0x8022
+#define GL_POST_CONVOLUTION_ALPHA_BIAS    0x8023
+#define GL_HISTOGRAM                      0x8024
+#define GL_PROXY_HISTOGRAM                0x8025
+#define GL_HISTOGRAM_WIDTH                0x8026
+#define GL_HISTOGRAM_FORMAT               0x8027
+#define GL_HISTOGRAM_RED_SIZE             0x8028
+#define GL_HISTOGRAM_GREEN_SIZE           0x8029
+#define GL_HISTOGRAM_BLUE_SIZE            0x802A
+#define GL_HISTOGRAM_ALPHA_SIZE           0x802B
+#define GL_HISTOGRAM_LUMINANCE_SIZE       0x802C
+#define GL_HISTOGRAM_SINK                 0x802D
+#define GL_MINMAX                         0x802E
+#define GL_MINMAX_FORMAT                  0x802F
+#define GL_MINMAX_SINK                    0x8030
+#define GL_TABLE_TOO_LARGE                0x8031
+#define GL_COLOR_MATRIX                   0x80B1
+#define GL_COLOR_MATRIX_STACK_DEPTH       0x80B2
+#define GL_MAX_COLOR_MATRIX_STACK_DEPTH   0x80B3
+#define GL_POST_COLOR_MATRIX_RED_SCALE    0x80B4
+#define GL_POST_COLOR_MATRIX_GREEN_SCALE  0x80B5
+#define GL_POST_COLOR_MATRIX_BLUE_SCALE   0x80B6
+#define GL_POST_COLOR_MATRIX_ALPHA_SCALE  0x80B7
+#define GL_POST_COLOR_MATRIX_RED_BIAS     0x80B8
+#define GL_POST_COLOR_MATRIX_GREEN_BIAS   0x80B9
+#define GL_POST_COLOR_MATRIX_BLUE_BIAS    0x80BA
+#define GL_POST_COLOR_MATRIX_ALPHA_BIAS   0x80BB
+#define GL_COLOR_TABLE                    0x80D0
+#define GL_POST_CONVOLUTION_COLOR_TABLE   0x80D1
+#define GL_POST_COLOR_MATRIX_COLOR_TABLE  0x80D2
+#define GL_PROXY_COLOR_TABLE              0x80D3
+#define GL_PROXY_POST_CONVOLUTION_COLOR_TABLE 0x80D4
+#define GL_PROXY_POST_COLOR_MATRIX_COLOR_TABLE 0x80D5
+#define GL_COLOR_TABLE_SCALE              0x80D6
+#define GL_COLOR_TABLE_BIAS               0x80D7
+#define GL_COLOR_TABLE_FORMAT             0x80D8
+#define GL_COLOR_TABLE_WIDTH              0x80D9
+#define GL_COLOR_TABLE_RED_SIZE           0x80DA
+#define GL_COLOR_TABLE_GREEN_SIZE         0x80DB
+#define GL_COLOR_TABLE_BLUE_SIZE          0x80DC
+#define GL_COLOR_TABLE_ALPHA_SIZE         0x80DD
+#define GL_COLOR_TABLE_LUMINANCE_SIZE     0x80DE
+#define GL_COLOR_TABLE_INTENSITY_SIZE     0x80DF
+#define GL_CONSTANT_BORDER                0x8151
+#define GL_REPLICATE_BORDER               0x8153
+#define GL_CONVOLUTION_BORDER_COLOR       0x8154
+typedef void (APIENTRYP PFNGLCOLORTABLEPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *table);
+typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLCOPYCOLORTABLEPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPROC) (GLenum target, GLenum format, GLenum type, void *table);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLCOLORSUBTABLEPROC) (GLenum target, GLsizei start, GLsizei count, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLCOPYCOLORSUBTABLEPROC) (GLenum target, GLsizei start, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER1DPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *image);
+typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER2DPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *image);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFPROC) (GLenum target, GLenum pname, GLfloat params);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFVPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIPROC) (GLenum target, GLenum pname, GLint params);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIVPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER1DPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER2DPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONFILTERPROC) (GLenum target, GLenum format, GLenum type, void *image);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETSEPARABLEFILTERPROC) (GLenum target, GLenum format, GLenum type, void *row, void *column, void *span);
+typedef void (APIENTRYP PFNGLSEPARABLEFILTER2DPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *row, const void *column);
+typedef void (APIENTRYP PFNGLGETHISTOGRAMPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMINMAXPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERFVPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERIVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLHISTOGRAMPROC) (GLenum target, GLsizei width, GLenum internalformat, GLboolean sink);
+typedef void (APIENTRYP PFNGLMINMAXPROC) (GLenum target, GLenum internalformat, GLboolean sink);
+typedef void (APIENTRYP PFNGLRESETHISTOGRAMPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLRESETMINMAXPROC) (GLenum target);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorTable (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *table);
+GLAPI void APIENTRY glColorTableParameterfv (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glColorTableParameteriv (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glCopyColorTable (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glGetColorTable (GLenum target, GLenum format, GLenum type, void *table);
+GLAPI void APIENTRY glGetColorTableParameterfv (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetColorTableParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glColorSubTable (GLenum target, GLsizei start, GLsizei count, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glCopyColorSubTable (GLenum target, GLsizei start, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glConvolutionFilter1D (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *image);
+GLAPI void APIENTRY glConvolutionFilter2D (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *image);
+GLAPI void APIENTRY glConvolutionParameterf (GLenum target, GLenum pname, GLfloat params);
+GLAPI void APIENTRY glConvolutionParameterfv (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glConvolutionParameteri (GLenum target, GLenum pname, GLint params);
+GLAPI void APIENTRY glConvolutionParameteriv (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glCopyConvolutionFilter1D (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glCopyConvolutionFilter2D (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetConvolutionFilter (GLenum target, GLenum format, GLenum type, void *image);
+GLAPI void APIENTRY glGetConvolutionParameterfv (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetConvolutionParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetSeparableFilter (GLenum target, GLenum format, GLenum type, void *row, void *column, void *span);
+GLAPI void APIENTRY glSeparableFilter2D (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *row, const void *column);
+GLAPI void APIENTRY glGetHistogram (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+GLAPI void APIENTRY glGetHistogramParameterfv (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetHistogramParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMinmax (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+GLAPI void APIENTRY glGetMinmaxParameterfv (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMinmaxParameteriv (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glHistogram (GLenum target, GLsizei width, GLenum internalformat, GLboolean sink);
+GLAPI void APIENTRY glMinmax (GLenum target, GLenum internalformat, GLboolean sink);
+GLAPI void APIENTRY glResetHistogram (GLenum target);
+GLAPI void APIENTRY glResetMinmax (GLenum target);
+#endif
+#endif /* GL_ARB_imaging */
+
+#ifndef GL_ARB_indirect_parameters
+#define GL_ARB_indirect_parameters 1
+#define GL_PARAMETER_BUFFER_ARB           0x80EE
+#define GL_PARAMETER_BUFFER_BINDING_ARB   0x80EF
+typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSINDIRECTCOUNTARBPROC) (GLenum mode, GLintptr indirect, GLintptr drawcount, GLsizei maxdrawcount, GLsizei stride);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSINDIRECTCOUNTARBPROC) (GLenum mode, GLenum type, GLintptr indirect, GLintptr drawcount, GLsizei maxdrawcount, GLsizei stride);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMultiDrawArraysIndirectCountARB (GLenum mode, GLintptr indirect, GLintptr drawcount, GLsizei maxdrawcount, GLsizei stride);
+GLAPI void APIENTRY glMultiDrawElementsIndirectCountARB (GLenum mode, GLenum type, GLintptr indirect, GLintptr drawcount, GLsizei maxdrawcount, GLsizei stride);
+#endif
+#endif /* GL_ARB_indirect_parameters */
+
+#ifndef GL_ARB_instanced_arrays
+#define GL_ARB_instanced_arrays 1
+#define GL_VERTEX_ATTRIB_ARRAY_DIVISOR_ARB 0x88FE
+typedef void (APIENTRYP PFNGLVERTEXATTRIBDIVISORARBPROC) (GLuint index, GLuint divisor);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttribDivisorARB (GLuint index, GLuint divisor);
+#endif
+#endif /* GL_ARB_instanced_arrays */
+
+#ifndef GL_ARB_internalformat_query
+#define GL_ARB_internalformat_query 1
+#endif /* GL_ARB_internalformat_query */
+
+#ifndef GL_ARB_internalformat_query2
+#define GL_ARB_internalformat_query2 1
+#define GL_SRGB_DECODE_ARB                0x8299
+#endif /* GL_ARB_internalformat_query2 */
+
+#ifndef GL_ARB_invalidate_subdata
+#define GL_ARB_invalidate_subdata 1
+#endif /* GL_ARB_invalidate_subdata */
+
+#ifndef GL_ARB_map_buffer_alignment
+#define GL_ARB_map_buffer_alignment 1
+#endif /* GL_ARB_map_buffer_alignment */
+
+#ifndef GL_ARB_map_buffer_range
+#define GL_ARB_map_buffer_range 1
+#endif /* GL_ARB_map_buffer_range */
+
+#ifndef GL_ARB_matrix_palette
+#define GL_ARB_matrix_palette 1
+#define GL_MATRIX_PALETTE_ARB             0x8840
+#define GL_MAX_MATRIX_PALETTE_STACK_DEPTH_ARB 0x8841
+#define GL_MAX_PALETTE_MATRICES_ARB       0x8842
+#define GL_CURRENT_PALETTE_MATRIX_ARB     0x8843
+#define GL_MATRIX_INDEX_ARRAY_ARB         0x8844
+#define GL_CURRENT_MATRIX_INDEX_ARB       0x8845
+#define GL_MATRIX_INDEX_ARRAY_SIZE_ARB    0x8846
+#define GL_MATRIX_INDEX_ARRAY_TYPE_ARB    0x8847
+#define GL_MATRIX_INDEX_ARRAY_STRIDE_ARB  0x8848
+#define GL_MATRIX_INDEX_ARRAY_POINTER_ARB 0x8849
+typedef void (APIENTRYP PFNGLCURRENTPALETTEMATRIXARBPROC) (GLint index);
+typedef void (APIENTRYP PFNGLMATRIXINDEXUBVARBPROC) (GLint size, const GLubyte *indices);
+typedef void (APIENTRYP PFNGLMATRIXINDEXUSVARBPROC) (GLint size, const GLushort *indices);
+typedef void (APIENTRYP PFNGLMATRIXINDEXUIVARBPROC) (GLint size, const GLuint *indices);
+typedef void (APIENTRYP PFNGLMATRIXINDEXPOINTERARBPROC) (GLint size, GLenum type, GLsizei stride, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCurrentPaletteMatrixARB (GLint index);
+GLAPI void APIENTRY glMatrixIndexubvARB (GLint size, const GLubyte *indices);
+GLAPI void APIENTRY glMatrixIndexusvARB (GLint size, const GLushort *indices);
+GLAPI void APIENTRY glMatrixIndexuivARB (GLint size, const GLuint *indices);
+GLAPI void APIENTRY glMatrixIndexPointerARB (GLint size, GLenum type, GLsizei stride, const void *pointer);
+#endif
+#endif /* GL_ARB_matrix_palette */
+
+#ifndef GL_ARB_multi_bind
+#define GL_ARB_multi_bind 1
+#endif /* GL_ARB_multi_bind */
+
+#ifndef GL_ARB_multi_draw_indirect
+#define GL_ARB_multi_draw_indirect 1
+#endif /* GL_ARB_multi_draw_indirect */
+
+#ifndef GL_ARB_multisample
+#define GL_ARB_multisample 1
+#define GL_MULTISAMPLE_ARB                0x809D
+#define GL_SAMPLE_ALPHA_TO_COVERAGE_ARB   0x809E
+#define GL_SAMPLE_ALPHA_TO_ONE_ARB        0x809F
+#define GL_SAMPLE_COVERAGE_ARB            0x80A0
+#define GL_SAMPLE_BUFFERS_ARB             0x80A8
+#define GL_SAMPLES_ARB                    0x80A9
+#define GL_SAMPLE_COVERAGE_VALUE_ARB      0x80AA
+#define GL_SAMPLE_COVERAGE_INVERT_ARB     0x80AB
+#define GL_MULTISAMPLE_BIT_ARB            0x20000000
+typedef void (APIENTRYP PFNGLSAMPLECOVERAGEARBPROC) (GLfloat value, GLboolean invert);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSampleCoverageARB (GLfloat value, GLboolean invert);
+#endif
+#endif /* GL_ARB_multisample */
+
+#ifndef GL_ARB_multitexture
+#define GL_ARB_multitexture 1
+#define GL_TEXTURE0_ARB                   0x84C0
+#define GL_TEXTURE1_ARB                   0x84C1
+#define GL_TEXTURE2_ARB                   0x84C2
+#define GL_TEXTURE3_ARB                   0x84C3
+#define GL_TEXTURE4_ARB                   0x84C4
+#define GL_TEXTURE5_ARB                   0x84C5
+#define GL_TEXTURE6_ARB                   0x84C6
+#define GL_TEXTURE7_ARB                   0x84C7
+#define GL_TEXTURE8_ARB                   0x84C8
+#define GL_TEXTURE9_ARB                   0x84C9
+#define GL_TEXTURE10_ARB                  0x84CA
+#define GL_TEXTURE11_ARB                  0x84CB
+#define GL_TEXTURE12_ARB                  0x84CC
+#define GL_TEXTURE13_ARB                  0x84CD
+#define GL_TEXTURE14_ARB                  0x84CE
+#define GL_TEXTURE15_ARB                  0x84CF
+#define GL_TEXTURE16_ARB                  0x84D0
+#define GL_TEXTURE17_ARB                  0x84D1
+#define GL_TEXTURE18_ARB                  0x84D2
+#define GL_TEXTURE19_ARB                  0x84D3
+#define GL_TEXTURE20_ARB                  0x84D4
+#define GL_TEXTURE21_ARB                  0x84D5
+#define GL_TEXTURE22_ARB                  0x84D6
+#define GL_TEXTURE23_ARB                  0x84D7
+#define GL_TEXTURE24_ARB                  0x84D8
+#define GL_TEXTURE25_ARB                  0x84D9
+#define GL_TEXTURE26_ARB                  0x84DA
+#define GL_TEXTURE27_ARB                  0x84DB
+#define GL_TEXTURE28_ARB                  0x84DC
+#define GL_TEXTURE29_ARB                  0x84DD
+#define GL_TEXTURE30_ARB                  0x84DE
+#define GL_TEXTURE31_ARB                  0x84DF
+#define GL_ACTIVE_TEXTURE_ARB             0x84E0
+#define GL_CLIENT_ACTIVE_TEXTURE_ARB      0x84E1
+#define GL_MAX_TEXTURE_UNITS_ARB          0x84E2
+typedef void (APIENTRYP PFNGLACTIVETEXTUREARBPROC) (GLenum texture);
+typedef void (APIENTRYP PFNGLCLIENTACTIVETEXTUREARBPROC) (GLenum texture);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1DARBPROC) (GLenum target, GLdouble s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1DVARBPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1FARBPROC) (GLenum target, GLfloat s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1FVARBPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1IARBPROC) (GLenum target, GLint s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1IVARBPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1SARBPROC) (GLenum target, GLshort s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1SVARBPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2DARBPROC) (GLenum target, GLdouble s, GLdouble t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2DVARBPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2FARBPROC) (GLenum target, GLfloat s, GLfloat t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2FVARBPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2IARBPROC) (GLenum target, GLint s, GLint t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2IVARBPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2SARBPROC) (GLenum target, GLshort s, GLshort t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2SVARBPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3DARBPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3DVARBPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3FARBPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3FVARBPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3IARBPROC) (GLenum target, GLint s, GLint t, GLint r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3IVARBPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3SARBPROC) (GLenum target, GLshort s, GLshort t, GLshort r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3SVARBPROC) (GLenum target, const GLshort *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4DARBPROC) (GLenum target, GLdouble s, GLdouble t, GLdouble r, GLdouble q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4DVARBPROC) (GLenum target, const GLdouble *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4FARBPROC) (GLenum target, GLfloat s, GLfloat t, GLfloat r, GLfloat q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4FVARBPROC) (GLenum target, const GLfloat *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4IARBPROC) (GLenum target, GLint s, GLint t, GLint r, GLint q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4IVARBPROC) (GLenum target, const GLint *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4SARBPROC) (GLenum target, GLshort s, GLshort t, GLshort r, GLshort q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4SVARBPROC) (GLenum target, const GLshort *v);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glActiveTextureARB (GLenum texture);
+GLAPI void APIENTRY glClientActiveTextureARB (GLenum texture);
+GLAPI void APIENTRY glMultiTexCoord1dARB (GLenum target, GLdouble s);
+GLAPI void APIENTRY glMultiTexCoord1dvARB (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord1fARB (GLenum target, GLfloat s);
+GLAPI void APIENTRY glMultiTexCoord1fvARB (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord1iARB (GLenum target, GLint s);
+GLAPI void APIENTRY glMultiTexCoord1ivARB (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord1sARB (GLenum target, GLshort s);
+GLAPI void APIENTRY glMultiTexCoord1svARB (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glMultiTexCoord2dARB (GLenum target, GLdouble s, GLdouble t);
+GLAPI void APIENTRY glMultiTexCoord2dvARB (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord2fARB (GLenum target, GLfloat s, GLfloat t);
+GLAPI void APIENTRY glMultiTexCoord2fvARB (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord2iARB (GLenum target, GLint s, GLint t);
+GLAPI void APIENTRY glMultiTexCoord2ivARB (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord2sARB (GLenum target, GLshort s, GLshort t);
+GLAPI void APIENTRY glMultiTexCoord2svARB (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glMultiTexCoord3dARB (GLenum target, GLdouble s, GLdouble t, GLdouble r);
+GLAPI void APIENTRY glMultiTexCoord3dvARB (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord3fARB (GLenum target, GLfloat s, GLfloat t, GLfloat r);
+GLAPI void APIENTRY glMultiTexCoord3fvARB (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord3iARB (GLenum target, GLint s, GLint t, GLint r);
+GLAPI void APIENTRY glMultiTexCoord3ivARB (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord3sARB (GLenum target, GLshort s, GLshort t, GLshort r);
+GLAPI void APIENTRY glMultiTexCoord3svARB (GLenum target, const GLshort *v);
+GLAPI void APIENTRY glMultiTexCoord4dARB (GLenum target, GLdouble s, GLdouble t, GLdouble r, GLdouble q);
+GLAPI void APIENTRY glMultiTexCoord4dvARB (GLenum target, const GLdouble *v);
+GLAPI void APIENTRY glMultiTexCoord4fARB (GLenum target, GLfloat s, GLfloat t, GLfloat r, GLfloat q);
+GLAPI void APIENTRY glMultiTexCoord4fvARB (GLenum target, const GLfloat *v);
+GLAPI void APIENTRY glMultiTexCoord4iARB (GLenum target, GLint s, GLint t, GLint r, GLint q);
+GLAPI void APIENTRY glMultiTexCoord4ivARB (GLenum target, const GLint *v);
+GLAPI void APIENTRY glMultiTexCoord4sARB (GLenum target, GLshort s, GLshort t, GLshort r, GLshort q);
+GLAPI void APIENTRY glMultiTexCoord4svARB (GLenum target, const GLshort *v);
+#endif
+#endif /* GL_ARB_multitexture */
+
+#ifndef GL_ARB_occlusion_query
+#define GL_ARB_occlusion_query 1
+#define GL_QUERY_COUNTER_BITS_ARB         0x8864
+#define GL_CURRENT_QUERY_ARB              0x8865
+#define GL_QUERY_RESULT_ARB               0x8866
+#define GL_QUERY_RESULT_AVAILABLE_ARB     0x8867
+#define GL_SAMPLES_PASSED_ARB             0x8914
+typedef void (APIENTRYP PFNGLGENQUERIESARBPROC) (GLsizei n, GLuint *ids);
+typedef void (APIENTRYP PFNGLDELETEQUERIESARBPROC) (GLsizei n, const GLuint *ids);
+typedef GLboolean (APIENTRYP PFNGLISQUERYARBPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLBEGINQUERYARBPROC) (GLenum target, GLuint id);
+typedef void (APIENTRYP PFNGLENDQUERYARBPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLGETQUERYIVARBPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTIVARBPROC) (GLuint id, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTUIVARBPROC) (GLuint id, GLenum pname, GLuint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGenQueriesARB (GLsizei n, GLuint *ids);
+GLAPI void APIENTRY glDeleteQueriesARB (GLsizei n, const GLuint *ids);
+GLAPI GLboolean APIENTRY glIsQueryARB (GLuint id);
+GLAPI void APIENTRY glBeginQueryARB (GLenum target, GLuint id);
+GLAPI void APIENTRY glEndQueryARB (GLenum target);
+GLAPI void APIENTRY glGetQueryivARB (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetQueryObjectivARB (GLuint id, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetQueryObjectuivARB (GLuint id, GLenum pname, GLuint *params);
+#endif
+#endif /* GL_ARB_occlusion_query */
+
+#ifndef GL_ARB_occlusion_query2
+#define GL_ARB_occlusion_query2 1
+#endif /* GL_ARB_occlusion_query2 */
+
+#ifndef GL_ARB_pixel_buffer_object
+#define GL_ARB_pixel_buffer_object 1
+#define GL_PIXEL_PACK_BUFFER_ARB          0x88EB
+#define GL_PIXEL_UNPACK_BUFFER_ARB        0x88EC
+#define GL_PIXEL_PACK_BUFFER_BINDING_ARB  0x88ED
+#define GL_PIXEL_UNPACK_BUFFER_BINDING_ARB 0x88EF
+#endif /* GL_ARB_pixel_buffer_object */
+
+#ifndef GL_ARB_point_parameters
+#define GL_ARB_point_parameters 1
+#define GL_POINT_SIZE_MIN_ARB             0x8126
+#define GL_POINT_SIZE_MAX_ARB             0x8127
+#define GL_POINT_FADE_THRESHOLD_SIZE_ARB  0x8128
+#define GL_POINT_DISTANCE_ATTENUATION_ARB 0x8129
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFARBPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFVARBPROC) (GLenum pname, const GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPointParameterfARB (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glPointParameterfvARB (GLenum pname, const GLfloat *params);
+#endif
+#endif /* GL_ARB_point_parameters */
+
+#ifndef GL_ARB_point_sprite
+#define GL_ARB_point_sprite 1
+#define GL_POINT_SPRITE_ARB               0x8861
+#define GL_COORD_REPLACE_ARB              0x8862
+#endif /* GL_ARB_point_sprite */
+
+#ifndef GL_ARB_program_interface_query
+#define GL_ARB_program_interface_query 1
+#endif /* GL_ARB_program_interface_query */
+
+#ifndef GL_ARB_provoking_vertex
+#define GL_ARB_provoking_vertex 1
+#endif /* GL_ARB_provoking_vertex */
+
+#ifndef GL_ARB_query_buffer_object
+#define GL_ARB_query_buffer_object 1
+#endif /* GL_ARB_query_buffer_object */
+
+#ifndef GL_ARB_robust_buffer_access_behavior
+#define GL_ARB_robust_buffer_access_behavior 1
+#endif /* GL_ARB_robust_buffer_access_behavior */
+
+#ifndef GL_ARB_robustness
+#define GL_ARB_robustness 1
+#define GL_CONTEXT_FLAG_ROBUST_ACCESS_BIT_ARB 0x00000004
+#define GL_LOSE_CONTEXT_ON_RESET_ARB      0x8252
+#define GL_GUILTY_CONTEXT_RESET_ARB       0x8253
+#define GL_INNOCENT_CONTEXT_RESET_ARB     0x8254
+#define GL_UNKNOWN_CONTEXT_RESET_ARB      0x8255
+#define GL_RESET_NOTIFICATION_STRATEGY_ARB 0x8256
+#define GL_NO_RESET_NOTIFICATION_ARB      0x8261
+typedef GLenum (APIENTRYP PFNGLGETGRAPHICSRESETSTATUSARBPROC) (void);
+typedef void (APIENTRYP PFNGLGETNTEXIMAGEARBPROC) (GLenum target, GLint level, GLenum format, GLenum type, GLsizei bufSize, void *img);
+typedef void (APIENTRYP PFNGLREADNPIXELSARBPROC) (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLsizei bufSize, void *data);
+typedef void (APIENTRYP PFNGLGETNCOMPRESSEDTEXIMAGEARBPROC) (GLenum target, GLint lod, GLsizei bufSize, void *img);
+typedef void (APIENTRYP PFNGLGETNUNIFORMFVARBPROC) (GLuint program, GLint location, GLsizei bufSize, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETNUNIFORMIVARBPROC) (GLuint program, GLint location, GLsizei bufSize, GLint *params);
+typedef void (APIENTRYP PFNGLGETNUNIFORMUIVARBPROC) (GLuint program, GLint location, GLsizei bufSize, GLuint *params);
+typedef void (APIENTRYP PFNGLGETNUNIFORMDVARBPROC) (GLuint program, GLint location, GLsizei bufSize, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETNMAPDVARBPROC) (GLenum target, GLenum query, GLsizei bufSize, GLdouble *v);
+typedef void (APIENTRYP PFNGLGETNMAPFVARBPROC) (GLenum target, GLenum query, GLsizei bufSize, GLfloat *v);
+typedef void (APIENTRYP PFNGLGETNMAPIVARBPROC) (GLenum target, GLenum query, GLsizei bufSize, GLint *v);
+typedef void (APIENTRYP PFNGLGETNPIXELMAPFVARBPROC) (GLenum map, GLsizei bufSize, GLfloat *values);
+typedef void (APIENTRYP PFNGLGETNPIXELMAPUIVARBPROC) (GLenum map, GLsizei bufSize, GLuint *values);
+typedef void (APIENTRYP PFNGLGETNPIXELMAPUSVARBPROC) (GLenum map, GLsizei bufSize, GLushort *values);
+typedef void (APIENTRYP PFNGLGETNPOLYGONSTIPPLEARBPROC) (GLsizei bufSize, GLubyte *pattern);
+typedef void (APIENTRYP PFNGLGETNCOLORTABLEARBPROC) (GLenum target, GLenum format, GLenum type, GLsizei bufSize, void *table);
+typedef void (APIENTRYP PFNGLGETNCONVOLUTIONFILTERARBPROC) (GLenum target, GLenum format, GLenum type, GLsizei bufSize, void *image);
+typedef void (APIENTRYP PFNGLGETNSEPARABLEFILTERARBPROC) (GLenum target, GLenum format, GLenum type, GLsizei rowBufSize, void *row, GLsizei columnBufSize, void *column, void *span);
+typedef void (APIENTRYP PFNGLGETNHISTOGRAMARBPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, GLsizei bufSize, void *values);
+typedef void (APIENTRYP PFNGLGETNMINMAXARBPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, GLsizei bufSize, void *values);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLenum APIENTRY glGetGraphicsResetStatusARB (void);
+GLAPI void APIENTRY glGetnTexImageARB (GLenum target, GLint level, GLenum format, GLenum type, GLsizei bufSize, void *img);
+GLAPI void APIENTRY glReadnPixelsARB (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLsizei bufSize, void *data);
+GLAPI void APIENTRY glGetnCompressedTexImageARB (GLenum target, GLint lod, GLsizei bufSize, void *img);
+GLAPI void APIENTRY glGetnUniformfvARB (GLuint program, GLint location, GLsizei bufSize, GLfloat *params);
+GLAPI void APIENTRY glGetnUniformivARB (GLuint program, GLint location, GLsizei bufSize, GLint *params);
+GLAPI void APIENTRY glGetnUniformuivARB (GLuint program, GLint location, GLsizei bufSize, GLuint *params);
+GLAPI void APIENTRY glGetnUniformdvARB (GLuint program, GLint location, GLsizei bufSize, GLdouble *params);
+GLAPI void APIENTRY glGetnMapdvARB (GLenum target, GLenum query, GLsizei bufSize, GLdouble *v);
+GLAPI void APIENTRY glGetnMapfvARB (GLenum target, GLenum query, GLsizei bufSize, GLfloat *v);
+GLAPI void APIENTRY glGetnMapivARB (GLenum target, GLenum query, GLsizei bufSize, GLint *v);
+GLAPI void APIENTRY glGetnPixelMapfvARB (GLenum map, GLsizei bufSize, GLfloat *values);
+GLAPI void APIENTRY glGetnPixelMapuivARB (GLenum map, GLsizei bufSize, GLuint *values);
+GLAPI void APIENTRY glGetnPixelMapusvARB (GLenum map, GLsizei bufSize, GLushort *values);
+GLAPI void APIENTRY glGetnPolygonStippleARB (GLsizei bufSize, GLubyte *pattern);
+GLAPI void APIENTRY glGetnColorTableARB (GLenum target, GLenum format, GLenum type, GLsizei bufSize, void *table);
+GLAPI void APIENTRY glGetnConvolutionFilterARB (GLenum target, GLenum format, GLenum type, GLsizei bufSize, void *image);
+GLAPI void APIENTRY glGetnSeparableFilterARB (GLenum target, GLenum format, GLenum type, GLsizei rowBufSize, void *row, GLsizei columnBufSize, void *column, void *span);
+GLAPI void APIENTRY glGetnHistogramARB (GLenum target, GLboolean reset, GLenum format, GLenum type, GLsizei bufSize, void *values);
+GLAPI void APIENTRY glGetnMinmaxARB (GLenum target, GLboolean reset, GLenum format, GLenum type, GLsizei bufSize, void *values);
+#endif
+#endif /* GL_ARB_robustness */
+
+#ifndef GL_ARB_robustness_isolation
+#define GL_ARB_robustness_isolation 1
+#endif /* GL_ARB_robustness_isolation */
+
+#ifndef GL_ARB_sample_shading
+#define GL_ARB_sample_shading 1
+#define GL_SAMPLE_SHADING_ARB             0x8C36
+#define GL_MIN_SAMPLE_SHADING_VALUE_ARB   0x8C37
+typedef void (APIENTRYP PFNGLMINSAMPLESHADINGARBPROC) (GLfloat value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMinSampleShadingARB (GLfloat value);
+#endif
+#endif /* GL_ARB_sample_shading */
+
+#ifndef GL_ARB_sampler_objects
+#define GL_ARB_sampler_objects 1
+#endif /* GL_ARB_sampler_objects */
+
+#ifndef GL_ARB_seamless_cube_map
+#define GL_ARB_seamless_cube_map 1
+#endif /* GL_ARB_seamless_cube_map */
+
+#ifndef GL_ARB_seamless_cubemap_per_texture
+#define GL_ARB_seamless_cubemap_per_texture 1
+#endif /* GL_ARB_seamless_cubemap_per_texture */
+
+#ifndef GL_ARB_separate_shader_objects
+#define GL_ARB_separate_shader_objects 1
+#endif /* GL_ARB_separate_shader_objects */
+
+#ifndef GL_ARB_shader_atomic_counters
+#define GL_ARB_shader_atomic_counters 1
+#endif /* GL_ARB_shader_atomic_counters */
+
+#ifndef GL_ARB_shader_bit_encoding
+#define GL_ARB_shader_bit_encoding 1
+#endif /* GL_ARB_shader_bit_encoding */
+
+#ifndef GL_ARB_shader_draw_parameters
+#define GL_ARB_shader_draw_parameters 1
+#endif /* GL_ARB_shader_draw_parameters */
+
+#ifndef GL_ARB_shader_group_vote
+#define GL_ARB_shader_group_vote 1
+#endif /* GL_ARB_shader_group_vote */
+
+#ifndef GL_ARB_shader_image_load_store
+#define GL_ARB_shader_image_load_store 1
+#endif /* GL_ARB_shader_image_load_store */
+
+#ifndef GL_ARB_shader_image_size
+#define GL_ARB_shader_image_size 1
+#endif /* GL_ARB_shader_image_size */
+
+#ifndef GL_ARB_shader_objects
+#define GL_ARB_shader_objects 1
+#ifdef __APPLE__
+typedef void *GLhandleARB;
+#else
+typedef unsigned int GLhandleARB;
+#endif
+typedef char GLcharARB;
+#define GL_PROGRAM_OBJECT_ARB             0x8B40
+#define GL_SHADER_OBJECT_ARB              0x8B48
+#define GL_OBJECT_TYPE_ARB                0x8B4E
+#define GL_OBJECT_SUBTYPE_ARB             0x8B4F
+#define GL_FLOAT_VEC2_ARB                 0x8B50
+#define GL_FLOAT_VEC3_ARB                 0x8B51
+#define GL_FLOAT_VEC4_ARB                 0x8B52
+#define GL_INT_VEC2_ARB                   0x8B53
+#define GL_INT_VEC3_ARB                   0x8B54
+#define GL_INT_VEC4_ARB                   0x8B55
+#define GL_BOOL_ARB                       0x8B56
+#define GL_BOOL_VEC2_ARB                  0x8B57
+#define GL_BOOL_VEC3_ARB                  0x8B58
+#define GL_BOOL_VEC4_ARB                  0x8B59
+#define GL_FLOAT_MAT2_ARB                 0x8B5A
+#define GL_FLOAT_MAT3_ARB                 0x8B5B
+#define GL_FLOAT_MAT4_ARB                 0x8B5C
+#define GL_SAMPLER_1D_ARB                 0x8B5D
+#define GL_SAMPLER_2D_ARB                 0x8B5E
+#define GL_SAMPLER_3D_ARB                 0x8B5F
+#define GL_SAMPLER_CUBE_ARB               0x8B60
+#define GL_SAMPLER_1D_SHADOW_ARB          0x8B61
+#define GL_SAMPLER_2D_SHADOW_ARB          0x8B62
+#define GL_SAMPLER_2D_RECT_ARB            0x8B63
+#define GL_SAMPLER_2D_RECT_SHADOW_ARB     0x8B64
+#define GL_OBJECT_DELETE_STATUS_ARB       0x8B80
+#define GL_OBJECT_COMPILE_STATUS_ARB      0x8B81
+#define GL_OBJECT_LINK_STATUS_ARB         0x8B82
+#define GL_OBJECT_VALIDATE_STATUS_ARB     0x8B83
+#define GL_OBJECT_INFO_LOG_LENGTH_ARB     0x8B84
+#define GL_OBJECT_ATTACHED_OBJECTS_ARB    0x8B85
+#define GL_OBJECT_ACTIVE_UNIFORMS_ARB     0x8B86
+#define GL_OBJECT_ACTIVE_UNIFORM_MAX_LENGTH_ARB 0x8B87
+#define GL_OBJECT_SHADER_SOURCE_LENGTH_ARB 0x8B88
+typedef void (APIENTRYP PFNGLDELETEOBJECTARBPROC) (GLhandleARB obj);
+typedef GLhandleARB (APIENTRYP PFNGLGETHANDLEARBPROC) (GLenum pname);
+typedef void (APIENTRYP PFNGLDETACHOBJECTARBPROC) (GLhandleARB containerObj, GLhandleARB attachedObj);
+typedef GLhandleARB (APIENTRYP PFNGLCREATESHADEROBJECTARBPROC) (GLenum shaderType);
+typedef void (APIENTRYP PFNGLSHADERSOURCEARBPROC) (GLhandleARB shaderObj, GLsizei count, const GLcharARB **string, const GLint *length);
+typedef void (APIENTRYP PFNGLCOMPILESHADERARBPROC) (GLhandleARB shaderObj);
+typedef GLhandleARB (APIENTRYP PFNGLCREATEPROGRAMOBJECTARBPROC) (void);
+typedef void (APIENTRYP PFNGLATTACHOBJECTARBPROC) (GLhandleARB containerObj, GLhandleARB obj);
+typedef void (APIENTRYP PFNGLLINKPROGRAMARBPROC) (GLhandleARB programObj);
+typedef void (APIENTRYP PFNGLUSEPROGRAMOBJECTARBPROC) (GLhandleARB programObj);
+typedef void (APIENTRYP PFNGLVALIDATEPROGRAMARBPROC) (GLhandleARB programObj);
+typedef void (APIENTRYP PFNGLUNIFORM1FARBPROC) (GLint location, GLfloat v0);
+typedef void (APIENTRYP PFNGLUNIFORM2FARBPROC) (GLint location, GLfloat v0, GLfloat v1);
+typedef void (APIENTRYP PFNGLUNIFORM3FARBPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+typedef void (APIENTRYP PFNGLUNIFORM4FARBPROC) (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+typedef void (APIENTRYP PFNGLUNIFORM1IARBPROC) (GLint location, GLint v0);
+typedef void (APIENTRYP PFNGLUNIFORM2IARBPROC) (GLint location, GLint v0, GLint v1);
+typedef void (APIENTRYP PFNGLUNIFORM3IARBPROC) (GLint location, GLint v0, GLint v1, GLint v2);
+typedef void (APIENTRYP PFNGLUNIFORM4IARBPROC) (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+typedef void (APIENTRYP PFNGLUNIFORM1FVARBPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM2FVARBPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM3FVARBPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM4FVARBPROC) (GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORM1IVARBPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORM2IVARBPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORM3IVARBPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORM4IVARBPROC) (GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX2FVARBPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX3FVARBPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLUNIFORMMATRIX4FVARBPROC) (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLGETOBJECTPARAMETERFVARBPROC) (GLhandleARB obj, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETOBJECTPARAMETERIVARBPROC) (GLhandleARB obj, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETINFOLOGARBPROC) (GLhandleARB obj, GLsizei maxLength, GLsizei *length, GLcharARB *infoLog);
+typedef void (APIENTRYP PFNGLGETATTACHEDOBJECTSARBPROC) (GLhandleARB containerObj, GLsizei maxCount, GLsizei *count, GLhandleARB *obj);
+typedef GLint (APIENTRYP PFNGLGETUNIFORMLOCATIONARBPROC) (GLhandleARB programObj, const GLcharARB *name);
+typedef void (APIENTRYP PFNGLGETACTIVEUNIFORMARBPROC) (GLhandleARB programObj, GLuint index, GLsizei maxLength, GLsizei *length, GLint *size, GLenum *type, GLcharARB *name);
+typedef void (APIENTRYP PFNGLGETUNIFORMFVARBPROC) (GLhandleARB programObj, GLint location, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETUNIFORMIVARBPROC) (GLhandleARB programObj, GLint location, GLint *params);
+typedef void (APIENTRYP PFNGLGETSHADERSOURCEARBPROC) (GLhandleARB obj, GLsizei maxLength, GLsizei *length, GLcharARB *source);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDeleteObjectARB (GLhandleARB obj);
+GLAPI GLhandleARB APIENTRY glGetHandleARB (GLenum pname);
+GLAPI void APIENTRY glDetachObjectARB (GLhandleARB containerObj, GLhandleARB attachedObj);
+GLAPI GLhandleARB APIENTRY glCreateShaderObjectARB (GLenum shaderType);
+GLAPI void APIENTRY glShaderSourceARB (GLhandleARB shaderObj, GLsizei count, const GLcharARB **string, const GLint *length);
+GLAPI void APIENTRY glCompileShaderARB (GLhandleARB shaderObj);
+GLAPI GLhandleARB APIENTRY glCreateProgramObjectARB (void);
+GLAPI void APIENTRY glAttachObjectARB (GLhandleARB containerObj, GLhandleARB obj);
+GLAPI void APIENTRY glLinkProgramARB (GLhandleARB programObj);
+GLAPI void APIENTRY glUseProgramObjectARB (GLhandleARB programObj);
+GLAPI void APIENTRY glValidateProgramARB (GLhandleARB programObj);
+GLAPI void APIENTRY glUniform1fARB (GLint location, GLfloat v0);
+GLAPI void APIENTRY glUniform2fARB (GLint location, GLfloat v0, GLfloat v1);
+GLAPI void APIENTRY glUniform3fARB (GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+GLAPI void APIENTRY glUniform4fARB (GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+GLAPI void APIENTRY glUniform1iARB (GLint location, GLint v0);
+GLAPI void APIENTRY glUniform2iARB (GLint location, GLint v0, GLint v1);
+GLAPI void APIENTRY glUniform3iARB (GLint location, GLint v0, GLint v1, GLint v2);
+GLAPI void APIENTRY glUniform4iARB (GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+GLAPI void APIENTRY glUniform1fvARB (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform2fvARB (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform3fvARB (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform4fvARB (GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glUniform1ivARB (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniform2ivARB (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniform3ivARB (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniform4ivARB (GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glUniformMatrix2fvARB (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix3fvARB (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glUniformMatrix4fvARB (GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glGetObjectParameterfvARB (GLhandleARB obj, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetObjectParameterivARB (GLhandleARB obj, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetInfoLogARB (GLhandleARB obj, GLsizei maxLength, GLsizei *length, GLcharARB *infoLog);
+GLAPI void APIENTRY glGetAttachedObjectsARB (GLhandleARB containerObj, GLsizei maxCount, GLsizei *count, GLhandleARB *obj);
+GLAPI GLint APIENTRY glGetUniformLocationARB (GLhandleARB programObj, const GLcharARB *name);
+GLAPI void APIENTRY glGetActiveUniformARB (GLhandleARB programObj, GLuint index, GLsizei maxLength, GLsizei *length, GLint *size, GLenum *type, GLcharARB *name);
+GLAPI void APIENTRY glGetUniformfvARB (GLhandleARB programObj, GLint location, GLfloat *params);
+GLAPI void APIENTRY glGetUniformivARB (GLhandleARB programObj, GLint location, GLint *params);
+GLAPI void APIENTRY glGetShaderSourceARB (GLhandleARB obj, GLsizei maxLength, GLsizei *length, GLcharARB *source);
+#endif
+#endif /* GL_ARB_shader_objects */
+
+#ifndef GL_ARB_shader_precision
+#define GL_ARB_shader_precision 1
+#endif /* GL_ARB_shader_precision */
+
+#ifndef GL_ARB_shader_stencil_export
+#define GL_ARB_shader_stencil_export 1
+#endif /* GL_ARB_shader_stencil_export */
+
+#ifndef GL_ARB_shader_storage_buffer_object
+#define GL_ARB_shader_storage_buffer_object 1
+#endif /* GL_ARB_shader_storage_buffer_object */
+
+#ifndef GL_ARB_shader_subroutine
+#define GL_ARB_shader_subroutine 1
+#endif /* GL_ARB_shader_subroutine */
+
+#ifndef GL_ARB_shader_texture_lod
+#define GL_ARB_shader_texture_lod 1
+#endif /* GL_ARB_shader_texture_lod */
+
+#ifndef GL_ARB_shading_language_100
+#define GL_ARB_shading_language_100 1
+#define GL_SHADING_LANGUAGE_VERSION_ARB   0x8B8C
+#endif /* GL_ARB_shading_language_100 */
+
+#ifndef GL_ARB_shading_language_420pack
+#define GL_ARB_shading_language_420pack 1
+#endif /* GL_ARB_shading_language_420pack */
+
+#ifndef GL_ARB_shading_language_include
+#define GL_ARB_shading_language_include 1
+#define GL_SHADER_INCLUDE_ARB             0x8DAE
+#define GL_NAMED_STRING_LENGTH_ARB        0x8DE9
+#define GL_NAMED_STRING_TYPE_ARB          0x8DEA
+typedef void (APIENTRYP PFNGLNAMEDSTRINGARBPROC) (GLenum type, GLint namelen, const GLchar *name, GLint stringlen, const GLchar *string);
+typedef void (APIENTRYP PFNGLDELETENAMEDSTRINGARBPROC) (GLint namelen, const GLchar *name);
+typedef void (APIENTRYP PFNGLCOMPILESHADERINCLUDEARBPROC) (GLuint shader, GLsizei count, const GLchar *const*path, const GLint *length);
+typedef GLboolean (APIENTRYP PFNGLISNAMEDSTRINGARBPROC) (GLint namelen, const GLchar *name);
+typedef void (APIENTRYP PFNGLGETNAMEDSTRINGARBPROC) (GLint namelen, const GLchar *name, GLsizei bufSize, GLint *stringlen, GLchar *string);
+typedef void (APIENTRYP PFNGLGETNAMEDSTRINGIVARBPROC) (GLint namelen, const GLchar *name, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glNamedStringARB (GLenum type, GLint namelen, const GLchar *name, GLint stringlen, const GLchar *string);
+GLAPI void APIENTRY glDeleteNamedStringARB (GLint namelen, const GLchar *name);
+GLAPI void APIENTRY glCompileShaderIncludeARB (GLuint shader, GLsizei count, const GLchar *const*path, const GLint *length);
+GLAPI GLboolean APIENTRY glIsNamedStringARB (GLint namelen, const GLchar *name);
+GLAPI void APIENTRY glGetNamedStringARB (GLint namelen, const GLchar *name, GLsizei bufSize, GLint *stringlen, GLchar *string);
+GLAPI void APIENTRY glGetNamedStringivARB (GLint namelen, const GLchar *name, GLenum pname, GLint *params);
+#endif
+#endif /* GL_ARB_shading_language_include */
+
+#ifndef GL_ARB_shading_language_packing
+#define GL_ARB_shading_language_packing 1
+#endif /* GL_ARB_shading_language_packing */
+
+#ifndef GL_ARB_shadow
+#define GL_ARB_shadow 1
+#define GL_TEXTURE_COMPARE_MODE_ARB       0x884C
+#define GL_TEXTURE_COMPARE_FUNC_ARB       0x884D
+#define GL_COMPARE_R_TO_TEXTURE_ARB       0x884E
+#endif /* GL_ARB_shadow */
+
+#ifndef GL_ARB_shadow_ambient
+#define GL_ARB_shadow_ambient 1
+#define GL_TEXTURE_COMPARE_FAIL_VALUE_ARB 0x80BF
+#endif /* GL_ARB_shadow_ambient */
+
+#ifndef GL_ARB_sparse_texture
+#define GL_ARB_sparse_texture 1
+#define GL_TEXTURE_SPARSE_ARB             0x91A6
+#define GL_VIRTUAL_PAGE_SIZE_INDEX_ARB    0x91A7
+#define GL_MIN_SPARSE_LEVEL_ARB           0x919B
+#define GL_NUM_VIRTUAL_PAGE_SIZES_ARB     0x91A8
+#define GL_VIRTUAL_PAGE_SIZE_X_ARB        0x9195
+#define GL_VIRTUAL_PAGE_SIZE_Y_ARB        0x9196
+#define GL_VIRTUAL_PAGE_SIZE_Z_ARB        0x9197
+#define GL_MAX_SPARSE_TEXTURE_SIZE_ARB    0x9198
+#define GL_MAX_SPARSE_3D_TEXTURE_SIZE_ARB 0x9199
+#define GL_MAX_SPARSE_ARRAY_TEXTURE_LAYERS_ARB 0x919A
+#define GL_SPARSE_TEXTURE_FULL_ARRAY_CUBE_MIPMAPS_ARB 0x91A9
+typedef void (APIENTRYP PFNGLTEXPAGECOMMITMENTARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean resident);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexPageCommitmentARB (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean resident);
+#endif
+#endif /* GL_ARB_sparse_texture */
+
+#ifndef GL_ARB_stencil_texturing
+#define GL_ARB_stencil_texturing 1
+#endif /* GL_ARB_stencil_texturing */
+
+#ifndef GL_ARB_sync
+#define GL_ARB_sync 1
+#endif /* GL_ARB_sync */
+
+#ifndef GL_ARB_tessellation_shader
+#define GL_ARB_tessellation_shader 1
+#endif /* GL_ARB_tessellation_shader */
+
+#ifndef GL_ARB_texture_border_clamp
+#define GL_ARB_texture_border_clamp 1
+#define GL_CLAMP_TO_BORDER_ARB            0x812D
+#endif /* GL_ARB_texture_border_clamp */
+
+#ifndef GL_ARB_texture_buffer_object
+#define GL_ARB_texture_buffer_object 1
+#define GL_TEXTURE_BUFFER_ARB             0x8C2A
+#define GL_MAX_TEXTURE_BUFFER_SIZE_ARB    0x8C2B
+#define GL_TEXTURE_BINDING_BUFFER_ARB     0x8C2C
+#define GL_TEXTURE_BUFFER_DATA_STORE_BINDING_ARB 0x8C2D
+#define GL_TEXTURE_BUFFER_FORMAT_ARB      0x8C2E
+typedef void (APIENTRYP PFNGLTEXBUFFERARBPROC) (GLenum target, GLenum internalformat, GLuint buffer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexBufferARB (GLenum target, GLenum internalformat, GLuint buffer);
+#endif
+#endif /* GL_ARB_texture_buffer_object */
+
+#ifndef GL_ARB_texture_buffer_object_rgb32
+#define GL_ARB_texture_buffer_object_rgb32 1
+#endif /* GL_ARB_texture_buffer_object_rgb32 */
+
+#ifndef GL_ARB_texture_buffer_range
+#define GL_ARB_texture_buffer_range 1
+#endif /* GL_ARB_texture_buffer_range */
+
+#ifndef GL_ARB_texture_compression
+#define GL_ARB_texture_compression 1
+#define GL_COMPRESSED_ALPHA_ARB           0x84E9
+#define GL_COMPRESSED_LUMINANCE_ARB       0x84EA
+#define GL_COMPRESSED_LUMINANCE_ALPHA_ARB 0x84EB
+#define GL_COMPRESSED_INTENSITY_ARB       0x84EC
+#define GL_COMPRESSED_RGB_ARB             0x84ED
+#define GL_COMPRESSED_RGBA_ARB            0x84EE
+#define GL_TEXTURE_COMPRESSION_HINT_ARB   0x84EF
+#define GL_TEXTURE_COMPRESSED_IMAGE_SIZE_ARB 0x86A0
+#define GL_TEXTURE_COMPRESSED_ARB         0x86A1
+#define GL_NUM_COMPRESSED_TEXTURE_FORMATS_ARB 0x86A2
+#define GL_COMPRESSED_TEXTURE_FORMATS_ARB 0x86A3
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE3DARBPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE2DARBPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXIMAGE1DARBPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE3DARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXSUBIMAGE1DARBPROC) (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *data);
+typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXIMAGEARBPROC) (GLenum target, GLint level, void *img);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCompressedTexImage3DARB (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexImage2DARB (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexImage1DARB (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexSubImage3DARB (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexSubImage2DARB (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glCompressedTexSubImage1DARB (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *data);
+GLAPI void APIENTRY glGetCompressedTexImageARB (GLenum target, GLint level, void *img);
+#endif
+#endif /* GL_ARB_texture_compression */
+
+#ifndef GL_ARB_texture_compression_bptc
+#define GL_ARB_texture_compression_bptc 1
+#define GL_COMPRESSED_RGBA_BPTC_UNORM_ARB 0x8E8C
+#define GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM_ARB 0x8E8D
+#define GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT_ARB 0x8E8E
+#define GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT_ARB 0x8E8F
+#endif /* GL_ARB_texture_compression_bptc */
+
+#ifndef GL_ARB_texture_compression_rgtc
+#define GL_ARB_texture_compression_rgtc 1
+#endif /* GL_ARB_texture_compression_rgtc */
+
+#ifndef GL_ARB_texture_cube_map
+#define GL_ARB_texture_cube_map 1
+#define GL_NORMAL_MAP_ARB                 0x8511
+#define GL_REFLECTION_MAP_ARB             0x8512
+#define GL_TEXTURE_CUBE_MAP_ARB           0x8513
+#define GL_TEXTURE_BINDING_CUBE_MAP_ARB   0x8514
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_X_ARB 0x8515
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_X_ARB 0x8516
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_Y_ARB 0x8517
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_Y_ARB 0x8518
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_Z_ARB 0x8519
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_Z_ARB 0x851A
+#define GL_PROXY_TEXTURE_CUBE_MAP_ARB     0x851B
+#define GL_MAX_CUBE_MAP_TEXTURE_SIZE_ARB  0x851C
+#endif /* GL_ARB_texture_cube_map */
+
+#ifndef GL_ARB_texture_cube_map_array
+#define GL_ARB_texture_cube_map_array 1
+#define GL_TEXTURE_CUBE_MAP_ARRAY_ARB     0x9009
+#define GL_TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB 0x900A
+#define GL_PROXY_TEXTURE_CUBE_MAP_ARRAY_ARB 0x900B
+#define GL_SAMPLER_CUBE_MAP_ARRAY_ARB     0x900C
+#define GL_SAMPLER_CUBE_MAP_ARRAY_SHADOW_ARB 0x900D
+#define GL_INT_SAMPLER_CUBE_MAP_ARRAY_ARB 0x900E
+#define GL_UNSIGNED_INT_SAMPLER_CUBE_MAP_ARRAY_ARB 0x900F
+#endif /* GL_ARB_texture_cube_map_array */
+
+#ifndef GL_ARB_texture_env_add
+#define GL_ARB_texture_env_add 1
+#endif /* GL_ARB_texture_env_add */
+
+#ifndef GL_ARB_texture_env_combine
+#define GL_ARB_texture_env_combine 1
+#define GL_COMBINE_ARB                    0x8570
+#define GL_COMBINE_RGB_ARB                0x8571
+#define GL_COMBINE_ALPHA_ARB              0x8572
+#define GL_SOURCE0_RGB_ARB                0x8580
+#define GL_SOURCE1_RGB_ARB                0x8581
+#define GL_SOURCE2_RGB_ARB                0x8582
+#define GL_SOURCE0_ALPHA_ARB              0x8588
+#define GL_SOURCE1_ALPHA_ARB              0x8589
+#define GL_SOURCE2_ALPHA_ARB              0x858A
+#define GL_OPERAND0_RGB_ARB               0x8590
+#define GL_OPERAND1_RGB_ARB               0x8591
+#define GL_OPERAND2_RGB_ARB               0x8592
+#define GL_OPERAND0_ALPHA_ARB             0x8598
+#define GL_OPERAND1_ALPHA_ARB             0x8599
+#define GL_OPERAND2_ALPHA_ARB             0x859A
+#define GL_RGB_SCALE_ARB                  0x8573
+#define GL_ADD_SIGNED_ARB                 0x8574
+#define GL_INTERPOLATE_ARB                0x8575
+#define GL_SUBTRACT_ARB                   0x84E7
+#define GL_CONSTANT_ARB                   0x8576
+#define GL_PRIMARY_COLOR_ARB              0x8577
+#define GL_PREVIOUS_ARB                   0x8578
+#endif /* GL_ARB_texture_env_combine */
+
+#ifndef GL_ARB_texture_env_crossbar
+#define GL_ARB_texture_env_crossbar 1
+#endif /* GL_ARB_texture_env_crossbar */
+
+#ifndef GL_ARB_texture_env_dot3
+#define GL_ARB_texture_env_dot3 1
+#define GL_DOT3_RGB_ARB                   0x86AE
+#define GL_DOT3_RGBA_ARB                  0x86AF
+#endif /* GL_ARB_texture_env_dot3 */
+
+#ifndef GL_ARB_texture_float
+#define GL_ARB_texture_float 1
+#define GL_TEXTURE_RED_TYPE_ARB           0x8C10
+#define GL_TEXTURE_GREEN_TYPE_ARB         0x8C11
+#define GL_TEXTURE_BLUE_TYPE_ARB          0x8C12
+#define GL_TEXTURE_ALPHA_TYPE_ARB         0x8C13
+#define GL_TEXTURE_LUMINANCE_TYPE_ARB     0x8C14
+#define GL_TEXTURE_INTENSITY_TYPE_ARB     0x8C15
+#define GL_TEXTURE_DEPTH_TYPE_ARB         0x8C16
+#define GL_UNSIGNED_NORMALIZED_ARB        0x8C17
+#define GL_RGBA32F_ARB                    0x8814
+#define GL_RGB32F_ARB                     0x8815
+#define GL_ALPHA32F_ARB                   0x8816
+#define GL_INTENSITY32F_ARB               0x8817
+#define GL_LUMINANCE32F_ARB               0x8818
+#define GL_LUMINANCE_ALPHA32F_ARB         0x8819
+#define GL_RGBA16F_ARB                    0x881A
+#define GL_RGB16F_ARB                     0x881B
+#define GL_ALPHA16F_ARB                   0x881C
+#define GL_INTENSITY16F_ARB               0x881D
+#define GL_LUMINANCE16F_ARB               0x881E
+#define GL_LUMINANCE_ALPHA16F_ARB         0x881F
+#endif /* GL_ARB_texture_float */
+
+#ifndef GL_ARB_texture_gather
+#define GL_ARB_texture_gather 1
+#define GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB 0x8E5E
+#define GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB 0x8E5F
+#define GL_MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB 0x8F9F
+#endif /* GL_ARB_texture_gather */
+
+#ifndef GL_ARB_texture_mirror_clamp_to_edge
+#define GL_ARB_texture_mirror_clamp_to_edge 1
+#endif /* GL_ARB_texture_mirror_clamp_to_edge */
+
+#ifndef GL_ARB_texture_mirrored_repeat
+#define GL_ARB_texture_mirrored_repeat 1
+#define GL_MIRRORED_REPEAT_ARB            0x8370
+#endif /* GL_ARB_texture_mirrored_repeat */
+
+#ifndef GL_ARB_texture_multisample
+#define GL_ARB_texture_multisample 1
+#endif /* GL_ARB_texture_multisample */
+
+#ifndef GL_ARB_texture_non_power_of_two
+#define GL_ARB_texture_non_power_of_two 1
+#endif /* GL_ARB_texture_non_power_of_two */
+
+#ifndef GL_ARB_texture_query_levels
+#define GL_ARB_texture_query_levels 1
+#endif /* GL_ARB_texture_query_levels */
+
+#ifndef GL_ARB_texture_query_lod
+#define GL_ARB_texture_query_lod 1
+#endif /* GL_ARB_texture_query_lod */
+
+#ifndef GL_ARB_texture_rectangle
+#define GL_ARB_texture_rectangle 1
+#define GL_TEXTURE_RECTANGLE_ARB          0x84F5
+#define GL_TEXTURE_BINDING_RECTANGLE_ARB  0x84F6
+#define GL_PROXY_TEXTURE_RECTANGLE_ARB    0x84F7
+#define GL_MAX_RECTANGLE_TEXTURE_SIZE_ARB 0x84F8
+#endif /* GL_ARB_texture_rectangle */
+
+#ifndef GL_ARB_texture_rg
+#define GL_ARB_texture_rg 1
+#endif /* GL_ARB_texture_rg */
+
+#ifndef GL_ARB_texture_rgb10_a2ui
+#define GL_ARB_texture_rgb10_a2ui 1
+#endif /* GL_ARB_texture_rgb10_a2ui */
+
+#ifndef GL_ARB_texture_stencil8
+#define GL_ARB_texture_stencil8 1
+#endif /* GL_ARB_texture_stencil8 */
+
+#ifndef GL_ARB_texture_storage
+#define GL_ARB_texture_storage 1
+#endif /* GL_ARB_texture_storage */
+
+#ifndef GL_ARB_texture_storage_multisample
+#define GL_ARB_texture_storage_multisample 1
+#endif /* GL_ARB_texture_storage_multisample */
+
+#ifndef GL_ARB_texture_swizzle
+#define GL_ARB_texture_swizzle 1
+#endif /* GL_ARB_texture_swizzle */
+
+#ifndef GL_ARB_texture_view
+#define GL_ARB_texture_view 1
+#endif /* GL_ARB_texture_view */
+
+#ifndef GL_ARB_timer_query
+#define GL_ARB_timer_query 1
+#endif /* GL_ARB_timer_query */
+
+#ifndef GL_ARB_transform_feedback2
+#define GL_ARB_transform_feedback2 1
+#define GL_TRANSFORM_FEEDBACK_PAUSED      0x8E23
+#define GL_TRANSFORM_FEEDBACK_ACTIVE      0x8E24
+#endif /* GL_ARB_transform_feedback2 */
+
+#ifndef GL_ARB_transform_feedback3
+#define GL_ARB_transform_feedback3 1
+#endif /* GL_ARB_transform_feedback3 */
+
+#ifndef GL_ARB_transform_feedback_instanced
+#define GL_ARB_transform_feedback_instanced 1
+#endif /* GL_ARB_transform_feedback_instanced */
+
+#ifndef GL_ARB_transpose_matrix
+#define GL_ARB_transpose_matrix 1
+#define GL_TRANSPOSE_MODELVIEW_MATRIX_ARB 0x84E3
+#define GL_TRANSPOSE_PROJECTION_MATRIX_ARB 0x84E4
+#define GL_TRANSPOSE_TEXTURE_MATRIX_ARB   0x84E5
+#define GL_TRANSPOSE_COLOR_MATRIX_ARB     0x84E6
+typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXFARBPROC) (const GLfloat *m);
+typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXDARBPROC) (const GLdouble *m);
+typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXFARBPROC) (const GLfloat *m);
+typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXDARBPROC) (const GLdouble *m);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glLoadTransposeMatrixfARB (const GLfloat *m);
+GLAPI void APIENTRY glLoadTransposeMatrixdARB (const GLdouble *m);
+GLAPI void APIENTRY glMultTransposeMatrixfARB (const GLfloat *m);
+GLAPI void APIENTRY glMultTransposeMatrixdARB (const GLdouble *m);
+#endif
+#endif /* GL_ARB_transpose_matrix */
+
+#ifndef GL_ARB_uniform_buffer_object
+#define GL_ARB_uniform_buffer_object 1
+#define GL_MAX_GEOMETRY_UNIFORM_BLOCKS    0x8A2C
+#define GL_MAX_COMBINED_GEOMETRY_UNIFORM_COMPONENTS 0x8A32
+#define GL_UNIFORM_BLOCK_REFERENCED_BY_GEOMETRY_SHADER 0x8A45
+#endif /* GL_ARB_uniform_buffer_object */
+
+#ifndef GL_ARB_vertex_array_bgra
+#define GL_ARB_vertex_array_bgra 1
+#endif /* GL_ARB_vertex_array_bgra */
+
+#ifndef GL_ARB_vertex_array_object
+#define GL_ARB_vertex_array_object 1
+#endif /* GL_ARB_vertex_array_object */
+
+#ifndef GL_ARB_vertex_attrib_64bit
+#define GL_ARB_vertex_attrib_64bit 1
+#endif /* GL_ARB_vertex_attrib_64bit */
+
+#ifndef GL_ARB_vertex_attrib_binding
+#define GL_ARB_vertex_attrib_binding 1
+#endif /* GL_ARB_vertex_attrib_binding */
+
+#ifndef GL_ARB_vertex_blend
+#define GL_ARB_vertex_blend 1
+#define GL_MAX_VERTEX_UNITS_ARB           0x86A4
+#define GL_ACTIVE_VERTEX_UNITS_ARB        0x86A5
+#define GL_WEIGHT_SUM_UNITY_ARB           0x86A6
+#define GL_VERTEX_BLEND_ARB               0x86A7
+#define GL_CURRENT_WEIGHT_ARB             0x86A8
+#define GL_WEIGHT_ARRAY_TYPE_ARB          0x86A9
+#define GL_WEIGHT_ARRAY_STRIDE_ARB        0x86AA
+#define GL_WEIGHT_ARRAY_SIZE_ARB          0x86AB
+#define GL_WEIGHT_ARRAY_POINTER_ARB       0x86AC
+#define GL_WEIGHT_ARRAY_ARB               0x86AD
+#define GL_MODELVIEW0_ARB                 0x1700
+#define GL_MODELVIEW1_ARB                 0x850A
+#define GL_MODELVIEW2_ARB                 0x8722
+#define GL_MODELVIEW3_ARB                 0x8723
+#define GL_MODELVIEW4_ARB                 0x8724
+#define GL_MODELVIEW5_ARB                 0x8725
+#define GL_MODELVIEW6_ARB                 0x8726
+#define GL_MODELVIEW7_ARB                 0x8727
+#define GL_MODELVIEW8_ARB                 0x8728
+#define GL_MODELVIEW9_ARB                 0x8729
+#define GL_MODELVIEW10_ARB                0x872A
+#define GL_MODELVIEW11_ARB                0x872B
+#define GL_MODELVIEW12_ARB                0x872C
+#define GL_MODELVIEW13_ARB                0x872D
+#define GL_MODELVIEW14_ARB                0x872E
+#define GL_MODELVIEW15_ARB                0x872F
+#define GL_MODELVIEW16_ARB                0x8730
+#define GL_MODELVIEW17_ARB                0x8731
+#define GL_MODELVIEW18_ARB                0x8732
+#define GL_MODELVIEW19_ARB                0x8733
+#define GL_MODELVIEW20_ARB                0x8734
+#define GL_MODELVIEW21_ARB                0x8735
+#define GL_MODELVIEW22_ARB                0x8736
+#define GL_MODELVIEW23_ARB                0x8737
+#define GL_MODELVIEW24_ARB                0x8738
+#define GL_MODELVIEW25_ARB                0x8739
+#define GL_MODELVIEW26_ARB                0x873A
+#define GL_MODELVIEW27_ARB                0x873B
+#define GL_MODELVIEW28_ARB                0x873C
+#define GL_MODELVIEW29_ARB                0x873D
+#define GL_MODELVIEW30_ARB                0x873E
+#define GL_MODELVIEW31_ARB                0x873F
+typedef void (APIENTRYP PFNGLWEIGHTBVARBPROC) (GLint size, const GLbyte *weights);
+typedef void (APIENTRYP PFNGLWEIGHTSVARBPROC) (GLint size, const GLshort *weights);
+typedef void (APIENTRYP PFNGLWEIGHTIVARBPROC) (GLint size, const GLint *weights);
+typedef void (APIENTRYP PFNGLWEIGHTFVARBPROC) (GLint size, const GLfloat *weights);
+typedef void (APIENTRYP PFNGLWEIGHTDVARBPROC) (GLint size, const GLdouble *weights);
+typedef void (APIENTRYP PFNGLWEIGHTUBVARBPROC) (GLint size, const GLubyte *weights);
+typedef void (APIENTRYP PFNGLWEIGHTUSVARBPROC) (GLint size, const GLushort *weights);
+typedef void (APIENTRYP PFNGLWEIGHTUIVARBPROC) (GLint size, const GLuint *weights);
+typedef void (APIENTRYP PFNGLWEIGHTPOINTERARBPROC) (GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLVERTEXBLENDARBPROC) (GLint count);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glWeightbvARB (GLint size, const GLbyte *weights);
+GLAPI void APIENTRY glWeightsvARB (GLint size, const GLshort *weights);
+GLAPI void APIENTRY glWeightivARB (GLint size, const GLint *weights);
+GLAPI void APIENTRY glWeightfvARB (GLint size, const GLfloat *weights);
+GLAPI void APIENTRY glWeightdvARB (GLint size, const GLdouble *weights);
+GLAPI void APIENTRY glWeightubvARB (GLint size, const GLubyte *weights);
+GLAPI void APIENTRY glWeightusvARB (GLint size, const GLushort *weights);
+GLAPI void APIENTRY glWeightuivARB (GLint size, const GLuint *weights);
+GLAPI void APIENTRY glWeightPointerARB (GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glVertexBlendARB (GLint count);
+#endif
+#endif /* GL_ARB_vertex_blend */
+
+#ifndef GL_ARB_vertex_buffer_object
+#define GL_ARB_vertex_buffer_object 1
+typedef ptrdiff_t GLsizeiptrARB;
+typedef ptrdiff_t GLintptrARB;
+#define GL_BUFFER_SIZE_ARB                0x8764
+#define GL_BUFFER_USAGE_ARB               0x8765
+#define GL_ARRAY_BUFFER_ARB               0x8892
+#define GL_ELEMENT_ARRAY_BUFFER_ARB       0x8893
+#define GL_ARRAY_BUFFER_BINDING_ARB       0x8894
+#define GL_ELEMENT_ARRAY_BUFFER_BINDING_ARB 0x8895
+#define GL_VERTEX_ARRAY_BUFFER_BINDING_ARB 0x8896
+#define GL_NORMAL_ARRAY_BUFFER_BINDING_ARB 0x8897
+#define GL_COLOR_ARRAY_BUFFER_BINDING_ARB 0x8898
+#define GL_INDEX_ARRAY_BUFFER_BINDING_ARB 0x8899
+#define GL_TEXTURE_COORD_ARRAY_BUFFER_BINDING_ARB 0x889A
+#define GL_EDGE_FLAG_ARRAY_BUFFER_BINDING_ARB 0x889B
+#define GL_SECONDARY_COLOR_ARRAY_BUFFER_BINDING_ARB 0x889C
+#define GL_FOG_COORDINATE_ARRAY_BUFFER_BINDING_ARB 0x889D
+#define GL_WEIGHT_ARRAY_BUFFER_BINDING_ARB 0x889E
+#define GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB 0x889F
+#define GL_READ_ONLY_ARB                  0x88B8
+#define GL_WRITE_ONLY_ARB                 0x88B9
+#define GL_READ_WRITE_ARB                 0x88BA
+#define GL_BUFFER_ACCESS_ARB              0x88BB
+#define GL_BUFFER_MAPPED_ARB              0x88BC
+#define GL_BUFFER_MAP_POINTER_ARB         0x88BD
+#define GL_STREAM_DRAW_ARB                0x88E0
+#define GL_STREAM_READ_ARB                0x88E1
+#define GL_STREAM_COPY_ARB                0x88E2
+#define GL_STATIC_DRAW_ARB                0x88E4
+#define GL_STATIC_READ_ARB                0x88E5
+#define GL_STATIC_COPY_ARB                0x88E6
+#define GL_DYNAMIC_DRAW_ARB               0x88E8
+#define GL_DYNAMIC_READ_ARB               0x88E9
+#define GL_DYNAMIC_COPY_ARB               0x88EA
+typedef void (APIENTRYP PFNGLBINDBUFFERARBPROC) (GLenum target, GLuint buffer);
+typedef void (APIENTRYP PFNGLDELETEBUFFERSARBPROC) (GLsizei n, const GLuint *buffers);
+typedef void (APIENTRYP PFNGLGENBUFFERSARBPROC) (GLsizei n, GLuint *buffers);
+typedef GLboolean (APIENTRYP PFNGLISBUFFERARBPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLBUFFERDATAARBPROC) (GLenum target, GLsizeiptrARB size, const void *data, GLenum usage);
+typedef void (APIENTRYP PFNGLBUFFERSUBDATAARBPROC) (GLenum target, GLintptrARB offset, GLsizeiptrARB size, const void *data);
+typedef void (APIENTRYP PFNGLGETBUFFERSUBDATAARBPROC) (GLenum target, GLintptrARB offset, GLsizeiptrARB size, void *data);
+typedef void *(APIENTRYP PFNGLMAPBUFFERARBPROC) (GLenum target, GLenum access);
+typedef GLboolean (APIENTRYP PFNGLUNMAPBUFFERARBPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERIVARBPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETBUFFERPOINTERVARBPROC) (GLenum target, GLenum pname, void **params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBindBufferARB (GLenum target, GLuint buffer);
+GLAPI void APIENTRY glDeleteBuffersARB (GLsizei n, const GLuint *buffers);
+GLAPI void APIENTRY glGenBuffersARB (GLsizei n, GLuint *buffers);
+GLAPI GLboolean APIENTRY glIsBufferARB (GLuint buffer);
+GLAPI void APIENTRY glBufferDataARB (GLenum target, GLsizeiptrARB size, const void *data, GLenum usage);
+GLAPI void APIENTRY glBufferSubDataARB (GLenum target, GLintptrARB offset, GLsizeiptrARB size, const void *data);
+GLAPI void APIENTRY glGetBufferSubDataARB (GLenum target, GLintptrARB offset, GLsizeiptrARB size, void *data);
+GLAPI void *APIENTRY glMapBufferARB (GLenum target, GLenum access);
+GLAPI GLboolean APIENTRY glUnmapBufferARB (GLenum target);
+GLAPI void APIENTRY glGetBufferParameterivARB (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetBufferPointervARB (GLenum target, GLenum pname, void **params);
+#endif
+#endif /* GL_ARB_vertex_buffer_object */
+
+#ifndef GL_ARB_vertex_program
+#define GL_ARB_vertex_program 1
+#define GL_COLOR_SUM_ARB                  0x8458
+#define GL_VERTEX_PROGRAM_ARB             0x8620
+#define GL_VERTEX_ATTRIB_ARRAY_ENABLED_ARB 0x8622
+#define GL_VERTEX_ATTRIB_ARRAY_SIZE_ARB   0x8623
+#define GL_VERTEX_ATTRIB_ARRAY_STRIDE_ARB 0x8624
+#define GL_VERTEX_ATTRIB_ARRAY_TYPE_ARB   0x8625
+#define GL_CURRENT_VERTEX_ATTRIB_ARB      0x8626
+#define GL_VERTEX_PROGRAM_POINT_SIZE_ARB  0x8642
+#define GL_VERTEX_PROGRAM_TWO_SIDE_ARB    0x8643
+#define GL_VERTEX_ATTRIB_ARRAY_POINTER_ARB 0x8645
+#define GL_MAX_VERTEX_ATTRIBS_ARB         0x8869
+#define GL_VERTEX_ATTRIB_ARRAY_NORMALIZED_ARB 0x886A
+#define GL_PROGRAM_ADDRESS_REGISTERS_ARB  0x88B0
+#define GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB 0x88B1
+#define GL_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB 0x88B2
+#define GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB 0x88B3
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1DARBPROC) (GLuint index, GLdouble x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1DVARBPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1FARBPROC) (GLuint index, GLfloat x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1FVARBPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1SARBPROC) (GLuint index, GLshort x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1SVARBPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2DARBPROC) (GLuint index, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2DVARBPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2FARBPROC) (GLuint index, GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2FVARBPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2SARBPROC) (GLuint index, GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2SVARBPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3DARBPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3DVARBPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3FARBPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3FVARBPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3SARBPROC) (GLuint index, GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3SVARBPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NBVARBPROC) (GLuint index, const GLbyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NIVARBPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NSVARBPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBARBPROC) (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUBVARBPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUIVARBPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4NUSVARBPROC) (GLuint index, const GLushort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4BVARBPROC) (GLuint index, const GLbyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4DARBPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4DVARBPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4FARBPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4FVARBPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4IVARBPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4SARBPROC) (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4SVARBPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBVARBPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4UIVARBPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4USVARBPROC) (GLuint index, const GLushort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBPOINTERARBPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLENABLEVERTEXATTRIBARRAYARBPROC) (GLuint index);
+typedef void (APIENTRYP PFNGLDISABLEVERTEXATTRIBARRAYARBPROC) (GLuint index);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBDVARBPROC) (GLuint index, GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBFVARBPROC) (GLuint index, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIVARBPROC) (GLuint index, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVARBPROC) (GLuint index, GLenum pname, void **pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttrib1dARB (GLuint index, GLdouble x);
+GLAPI void APIENTRY glVertexAttrib1dvARB (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib1fARB (GLuint index, GLfloat x);
+GLAPI void APIENTRY glVertexAttrib1fvARB (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib1sARB (GLuint index, GLshort x);
+GLAPI void APIENTRY glVertexAttrib1svARB (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib2dARB (GLuint index, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glVertexAttrib2dvARB (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib2fARB (GLuint index, GLfloat x, GLfloat y);
+GLAPI void APIENTRY glVertexAttrib2fvARB (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib2sARB (GLuint index, GLshort x, GLshort y);
+GLAPI void APIENTRY glVertexAttrib2svARB (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib3dARB (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glVertexAttrib3dvARB (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib3fARB (GLuint index, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glVertexAttrib3fvARB (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib3sARB (GLuint index, GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glVertexAttrib3svARB (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4NbvARB (GLuint index, const GLbyte *v);
+GLAPI void APIENTRY glVertexAttrib4NivARB (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttrib4NsvARB (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4NubARB (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w);
+GLAPI void APIENTRY glVertexAttrib4NubvARB (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttrib4NuivARB (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttrib4NusvARB (GLuint index, const GLushort *v);
+GLAPI void APIENTRY glVertexAttrib4bvARB (GLuint index, const GLbyte *v);
+GLAPI void APIENTRY glVertexAttrib4dARB (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glVertexAttrib4dvARB (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib4fARB (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glVertexAttrib4fvARB (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib4ivARB (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttrib4sARB (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w);
+GLAPI void APIENTRY glVertexAttrib4svARB (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4ubvARB (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttrib4uivARB (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttrib4usvARB (GLuint index, const GLushort *v);
+GLAPI void APIENTRY glVertexAttribPointerARB (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glEnableVertexAttribArrayARB (GLuint index);
+GLAPI void APIENTRY glDisableVertexAttribArrayARB (GLuint index);
+GLAPI void APIENTRY glGetVertexAttribdvARB (GLuint index, GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glGetVertexAttribfvARB (GLuint index, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetVertexAttribivARB (GLuint index, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribPointervARB (GLuint index, GLenum pname, void **pointer);
+#endif
+#endif /* GL_ARB_vertex_program */
+
+#ifndef GL_ARB_vertex_shader
+#define GL_ARB_vertex_shader 1
+#define GL_VERTEX_SHADER_ARB              0x8B31
+#define GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB 0x8B4A
+#define GL_MAX_VARYING_FLOATS_ARB         0x8B4B
+#define GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB 0x8B4C
+#define GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS_ARB 0x8B4D
+#define GL_OBJECT_ACTIVE_ATTRIBUTES_ARB   0x8B89
+#define GL_OBJECT_ACTIVE_ATTRIBUTE_MAX_LENGTH_ARB 0x8B8A
+typedef void (APIENTRYP PFNGLBINDATTRIBLOCATIONARBPROC) (GLhandleARB programObj, GLuint index, const GLcharARB *name);
+typedef void (APIENTRYP PFNGLGETACTIVEATTRIBARBPROC) (GLhandleARB programObj, GLuint index, GLsizei maxLength, GLsizei *length, GLint *size, GLenum *type, GLcharARB *name);
+typedef GLint (APIENTRYP PFNGLGETATTRIBLOCATIONARBPROC) (GLhandleARB programObj, const GLcharARB *name);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBindAttribLocationARB (GLhandleARB programObj, GLuint index, const GLcharARB *name);
+GLAPI void APIENTRY glGetActiveAttribARB (GLhandleARB programObj, GLuint index, GLsizei maxLength, GLsizei *length, GLint *size, GLenum *type, GLcharARB *name);
+GLAPI GLint APIENTRY glGetAttribLocationARB (GLhandleARB programObj, const GLcharARB *name);
+#endif
+#endif /* GL_ARB_vertex_shader */
+
+#ifndef GL_ARB_vertex_type_10f_11f_11f_rev
+#define GL_ARB_vertex_type_10f_11f_11f_rev 1
+#endif /* GL_ARB_vertex_type_10f_11f_11f_rev */
+
+#ifndef GL_ARB_vertex_type_2_10_10_10_rev
+#define GL_ARB_vertex_type_2_10_10_10_rev 1
+#endif /* GL_ARB_vertex_type_2_10_10_10_rev */
+
+#ifndef GL_ARB_viewport_array
+#define GL_ARB_viewport_array 1
+#endif /* GL_ARB_viewport_array */
+
+#ifndef GL_ARB_window_pos
+#define GL_ARB_window_pos 1
+typedef void (APIENTRYP PFNGLWINDOWPOS2DARBPROC) (GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2DVARBPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2FARBPROC) (GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2FVARBPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2IARBPROC) (GLint x, GLint y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2IVARBPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2SARBPROC) (GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2SVARBPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3DARBPROC) (GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3DVARBPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3FARBPROC) (GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3FVARBPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3IARBPROC) (GLint x, GLint y, GLint z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3IVARBPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3SARBPROC) (GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3SVARBPROC) (const GLshort *v);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glWindowPos2dARB (GLdouble x, GLdouble y);
+GLAPI void APIENTRY glWindowPos2dvARB (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos2fARB (GLfloat x, GLfloat y);
+GLAPI void APIENTRY glWindowPos2fvARB (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos2iARB (GLint x, GLint y);
+GLAPI void APIENTRY glWindowPos2ivARB (const GLint *v);
+GLAPI void APIENTRY glWindowPos2sARB (GLshort x, GLshort y);
+GLAPI void APIENTRY glWindowPos2svARB (const GLshort *v);
+GLAPI void APIENTRY glWindowPos3dARB (GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glWindowPos3dvARB (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos3fARB (GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glWindowPos3fvARB (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos3iARB (GLint x, GLint y, GLint z);
+GLAPI void APIENTRY glWindowPos3ivARB (const GLint *v);
+GLAPI void APIENTRY glWindowPos3sARB (GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glWindowPos3svARB (const GLshort *v);
+#endif
+#endif /* GL_ARB_window_pos */
+
+#ifndef GL_KHR_debug
+#define GL_KHR_debug 1
+#endif /* GL_KHR_debug */
+
+#ifndef GL_KHR_texture_compression_astc_hdr
+#define GL_KHR_texture_compression_astc_hdr 1
+#define GL_COMPRESSED_RGBA_ASTC_4x4_KHR   0x93B0
+#define GL_COMPRESSED_RGBA_ASTC_5x4_KHR   0x93B1
+#define GL_COMPRESSED_RGBA_ASTC_5x5_KHR   0x93B2
+#define GL_COMPRESSED_RGBA_ASTC_6x5_KHR   0x93B3
+#define GL_COMPRESSED_RGBA_ASTC_6x6_KHR   0x93B4
+#define GL_COMPRESSED_RGBA_ASTC_8x5_KHR   0x93B5
+#define GL_COMPRESSED_RGBA_ASTC_8x6_KHR   0x93B6
+#define GL_COMPRESSED_RGBA_ASTC_8x8_KHR   0x93B7
+#define GL_COMPRESSED_RGBA_ASTC_10x5_KHR  0x93B8
+#define GL_COMPRESSED_RGBA_ASTC_10x6_KHR  0x93B9
+#define GL_COMPRESSED_RGBA_ASTC_10x8_KHR  0x93BA
+#define GL_COMPRESSED_RGBA_ASTC_10x10_KHR 0x93BB
+#define GL_COMPRESSED_RGBA_ASTC_12x10_KHR 0x93BC
+#define GL_COMPRESSED_RGBA_ASTC_12x12_KHR 0x93BD
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR 0x93D0
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR 0x93D1
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR 0x93D2
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR 0x93D3
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR 0x93D4
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR 0x93D5
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR 0x93D6
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR 0x93D7
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR 0x93D8
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR 0x93D9
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR 0x93DA
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR 0x93DB
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR 0x93DC
+#define GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR 0x93DD
+#endif /* GL_KHR_texture_compression_astc_hdr */
+
+#ifndef GL_KHR_texture_compression_astc_ldr
+#define GL_KHR_texture_compression_astc_ldr 1
+#endif /* GL_KHR_texture_compression_astc_ldr */
+
+#ifndef GL_OES_byte_coordinates
+#define GL_OES_byte_coordinates 1
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1BOESPROC) (GLenum texture, GLbyte s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1BVOESPROC) (GLenum texture, const GLbyte *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2BOESPROC) (GLenum texture, GLbyte s, GLbyte t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2BVOESPROC) (GLenum texture, const GLbyte *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3BOESPROC) (GLenum texture, GLbyte s, GLbyte t, GLbyte r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3BVOESPROC) (GLenum texture, const GLbyte *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4BOESPROC) (GLenum texture, GLbyte s, GLbyte t, GLbyte r, GLbyte q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4BVOESPROC) (GLenum texture, const GLbyte *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD1BOESPROC) (GLbyte s);
+typedef void (APIENTRYP PFNGLTEXCOORD1BVOESPROC) (const GLbyte *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD2BOESPROC) (GLbyte s, GLbyte t);
+typedef void (APIENTRYP PFNGLTEXCOORD2BVOESPROC) (const GLbyte *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD3BOESPROC) (GLbyte s, GLbyte t, GLbyte r);
+typedef void (APIENTRYP PFNGLTEXCOORD3BVOESPROC) (const GLbyte *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD4BOESPROC) (GLbyte s, GLbyte t, GLbyte r, GLbyte q);
+typedef void (APIENTRYP PFNGLTEXCOORD4BVOESPROC) (const GLbyte *coords);
+typedef void (APIENTRYP PFNGLVERTEX2BOESPROC) (GLbyte x);
+typedef void (APIENTRYP PFNGLVERTEX2BVOESPROC) (const GLbyte *coords);
+typedef void (APIENTRYP PFNGLVERTEX3BOESPROC) (GLbyte x, GLbyte y);
+typedef void (APIENTRYP PFNGLVERTEX3BVOESPROC) (const GLbyte *coords);
+typedef void (APIENTRYP PFNGLVERTEX4BOESPROC) (GLbyte x, GLbyte y, GLbyte z);
+typedef void (APIENTRYP PFNGLVERTEX4BVOESPROC) (const GLbyte *coords);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMultiTexCoord1bOES (GLenum texture, GLbyte s);
+GLAPI void APIENTRY glMultiTexCoord1bvOES (GLenum texture, const GLbyte *coords);
+GLAPI void APIENTRY glMultiTexCoord2bOES (GLenum texture, GLbyte s, GLbyte t);
+GLAPI void APIENTRY glMultiTexCoord2bvOES (GLenum texture, const GLbyte *coords);
+GLAPI void APIENTRY glMultiTexCoord3bOES (GLenum texture, GLbyte s, GLbyte t, GLbyte r);
+GLAPI void APIENTRY glMultiTexCoord3bvOES (GLenum texture, const GLbyte *coords);
+GLAPI void APIENTRY glMultiTexCoord4bOES (GLenum texture, GLbyte s, GLbyte t, GLbyte r, GLbyte q);
+GLAPI void APIENTRY glMultiTexCoord4bvOES (GLenum texture, const GLbyte *coords);
+GLAPI void APIENTRY glTexCoord1bOES (GLbyte s);
+GLAPI void APIENTRY glTexCoord1bvOES (const GLbyte *coords);
+GLAPI void APIENTRY glTexCoord2bOES (GLbyte s, GLbyte t);
+GLAPI void APIENTRY glTexCoord2bvOES (const GLbyte *coords);
+GLAPI void APIENTRY glTexCoord3bOES (GLbyte s, GLbyte t, GLbyte r);
+GLAPI void APIENTRY glTexCoord3bvOES (const GLbyte *coords);
+GLAPI void APIENTRY glTexCoord4bOES (GLbyte s, GLbyte t, GLbyte r, GLbyte q);
+GLAPI void APIENTRY glTexCoord4bvOES (const GLbyte *coords);
+GLAPI void APIENTRY glVertex2bOES (GLbyte x);
+GLAPI void APIENTRY glVertex2bvOES (const GLbyte *coords);
+GLAPI void APIENTRY glVertex3bOES (GLbyte x, GLbyte y);
+GLAPI void APIENTRY glVertex3bvOES (const GLbyte *coords);
+GLAPI void APIENTRY glVertex4bOES (GLbyte x, GLbyte y, GLbyte z);
+GLAPI void APIENTRY glVertex4bvOES (const GLbyte *coords);
+#endif
+#endif /* GL_OES_byte_coordinates */
+
+#ifndef GL_OES_compressed_paletted_texture
+#define GL_OES_compressed_paletted_texture 1
+#define GL_PALETTE4_RGB8_OES              0x8B90
+#define GL_PALETTE4_RGBA8_OES             0x8B91
+#define GL_PALETTE4_R5_G6_B5_OES          0x8B92
+#define GL_PALETTE4_RGBA4_OES             0x8B93
+#define GL_PALETTE4_RGB5_A1_OES           0x8B94
+#define GL_PALETTE8_RGB8_OES              0x8B95
+#define GL_PALETTE8_RGBA8_OES             0x8B96
+#define GL_PALETTE8_R5_G6_B5_OES          0x8B97
+#define GL_PALETTE8_RGBA4_OES             0x8B98
+#define GL_PALETTE8_RGB5_A1_OES           0x8B99
+#endif /* GL_OES_compressed_paletted_texture */
+
+#ifndef GL_OES_fixed_point
+#define GL_OES_fixed_point 1
+typedef GLint GLfixed;
+#define GL_FIXED_OES                      0x140C
+typedef void (APIENTRYP PFNGLALPHAFUNCXOESPROC) (GLenum func, GLfixed ref);
+typedef void (APIENTRYP PFNGLCLEARCOLORXOESPROC) (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+typedef void (APIENTRYP PFNGLCLEARDEPTHXOESPROC) (GLfixed depth);
+typedef void (APIENTRYP PFNGLCLIPPLANEXOESPROC) (GLenum plane, const GLfixed *equation);
+typedef void (APIENTRYP PFNGLCOLOR4XOESPROC) (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+typedef void (APIENTRYP PFNGLDEPTHRANGEXOESPROC) (GLfixed n, GLfixed f);
+typedef void (APIENTRYP PFNGLFOGXOESPROC) (GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLFOGXVOESPROC) (GLenum pname, const GLfixed *param);
+typedef void (APIENTRYP PFNGLFRUSTUMXOESPROC) (GLfixed l, GLfixed r, GLfixed b, GLfixed t, GLfixed n, GLfixed f);
+typedef void (APIENTRYP PFNGLGETCLIPPLANEXOESPROC) (GLenum plane, GLfixed *equation);
+typedef void (APIENTRYP PFNGLGETFIXEDVOESPROC) (GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLGETTEXENVXVOESPROC) (GLenum target, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLGETTEXPARAMETERXVOESPROC) (GLenum target, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLLIGHTMODELXOESPROC) (GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLLIGHTMODELXVOESPROC) (GLenum pname, const GLfixed *param);
+typedef void (APIENTRYP PFNGLLIGHTXOESPROC) (GLenum light, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLLIGHTXVOESPROC) (GLenum light, GLenum pname, const GLfixed *params);
+typedef void (APIENTRYP PFNGLLINEWIDTHXOESPROC) (GLfixed width);
+typedef void (APIENTRYP PFNGLLOADMATRIXXOESPROC) (const GLfixed *m);
+typedef void (APIENTRYP PFNGLMATERIALXOESPROC) (GLenum face, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLMATERIALXVOESPROC) (GLenum face, GLenum pname, const GLfixed *param);
+typedef void (APIENTRYP PFNGLMULTMATRIXXOESPROC) (const GLfixed *m);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4XOESPROC) (GLenum texture, GLfixed s, GLfixed t, GLfixed r, GLfixed q);
+typedef void (APIENTRYP PFNGLNORMAL3XOESPROC) (GLfixed nx, GLfixed ny, GLfixed nz);
+typedef void (APIENTRYP PFNGLORTHOXOESPROC) (GLfixed l, GLfixed r, GLfixed b, GLfixed t, GLfixed n, GLfixed f);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERXVOESPROC) (GLenum pname, const GLfixed *params);
+typedef void (APIENTRYP PFNGLPOINTSIZEXOESPROC) (GLfixed size);
+typedef void (APIENTRYP PFNGLPOLYGONOFFSETXOESPROC) (GLfixed factor, GLfixed units);
+typedef void (APIENTRYP PFNGLROTATEXOESPROC) (GLfixed angle, GLfixed x, GLfixed y, GLfixed z);
+typedef void (APIENTRYP PFNGLSAMPLECOVERAGEOESPROC) (GLfixed value, GLboolean invert);
+typedef void (APIENTRYP PFNGLSCALEXOESPROC) (GLfixed x, GLfixed y, GLfixed z);
+typedef void (APIENTRYP PFNGLTEXENVXOESPROC) (GLenum target, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLTEXENVXVOESPROC) (GLenum target, GLenum pname, const GLfixed *params);
+typedef void (APIENTRYP PFNGLTEXPARAMETERXOESPROC) (GLenum target, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLTEXPARAMETERXVOESPROC) (GLenum target, GLenum pname, const GLfixed *params);
+typedef void (APIENTRYP PFNGLTRANSLATEXOESPROC) (GLfixed x, GLfixed y, GLfixed z);
+typedef void (APIENTRYP PFNGLACCUMXOESPROC) (GLenum op, GLfixed value);
+typedef void (APIENTRYP PFNGLBITMAPXOESPROC) (GLsizei width, GLsizei height, GLfixed xorig, GLfixed yorig, GLfixed xmove, GLfixed ymove, const GLubyte *bitmap);
+typedef void (APIENTRYP PFNGLBLENDCOLORXOESPROC) (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+typedef void (APIENTRYP PFNGLCLEARACCUMXOESPROC) (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+typedef void (APIENTRYP PFNGLCOLOR3XOESPROC) (GLfixed red, GLfixed green, GLfixed blue);
+typedef void (APIENTRYP PFNGLCOLOR3XVOESPROC) (const GLfixed *components);
+typedef void (APIENTRYP PFNGLCOLOR4XVOESPROC) (const GLfixed *components);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERXOESPROC) (GLenum target, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERXVOESPROC) (GLenum target, GLenum pname, const GLfixed *params);
+typedef void (APIENTRYP PFNGLEVALCOORD1XOESPROC) (GLfixed u);
+typedef void (APIENTRYP PFNGLEVALCOORD1XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLEVALCOORD2XOESPROC) (GLfixed u, GLfixed v);
+typedef void (APIENTRYP PFNGLEVALCOORD2XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLFEEDBACKBUFFERXOESPROC) (GLsizei n, GLenum type, const GLfixed *buffer);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERXVOESPROC) (GLenum target, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERXVOESPROC) (GLenum target, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLGETLIGHTXOESPROC) (GLenum light, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLGETMAPXVOESPROC) (GLenum target, GLenum query, GLfixed *v);
+typedef void (APIENTRYP PFNGLGETMATERIALXOESPROC) (GLenum face, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLGETPIXELMAPXVPROC) (GLenum map, GLint size, GLfixed *values);
+typedef void (APIENTRYP PFNGLGETTEXGENXVOESPROC) (GLenum coord, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLGETTEXLEVELPARAMETERXVOESPROC) (GLenum target, GLint level, GLenum pname, GLfixed *params);
+typedef void (APIENTRYP PFNGLINDEXXOESPROC) (GLfixed component);
+typedef void (APIENTRYP PFNGLINDEXXVOESPROC) (const GLfixed *component);
+typedef void (APIENTRYP PFNGLLOADTRANSPOSEMATRIXXOESPROC) (const GLfixed *m);
+typedef void (APIENTRYP PFNGLMAP1XOESPROC) (GLenum target, GLfixed u1, GLfixed u2, GLint stride, GLint order, GLfixed points);
+typedef void (APIENTRYP PFNGLMAP2XOESPROC) (GLenum target, GLfixed u1, GLfixed u2, GLint ustride, GLint uorder, GLfixed v1, GLfixed v2, GLint vstride, GLint vorder, GLfixed points);
+typedef void (APIENTRYP PFNGLMAPGRID1XOESPROC) (GLint n, GLfixed u1, GLfixed u2);
+typedef void (APIENTRYP PFNGLMAPGRID2XOESPROC) (GLint n, GLfixed u1, GLfixed u2, GLfixed v1, GLfixed v2);
+typedef void (APIENTRYP PFNGLMULTTRANSPOSEMATRIXXOESPROC) (const GLfixed *m);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1XOESPROC) (GLenum texture, GLfixed s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1XVOESPROC) (GLenum texture, const GLfixed *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2XOESPROC) (GLenum texture, GLfixed s, GLfixed t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2XVOESPROC) (GLenum texture, const GLfixed *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3XOESPROC) (GLenum texture, GLfixed s, GLfixed t, GLfixed r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3XVOESPROC) (GLenum texture, const GLfixed *coords);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4XVOESPROC) (GLenum texture, const GLfixed *coords);
+typedef void (APIENTRYP PFNGLNORMAL3XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLPASSTHROUGHXOESPROC) (GLfixed token);
+typedef void (APIENTRYP PFNGLPIXELMAPXPROC) (GLenum map, GLint size, const GLfixed *values);
+typedef void (APIENTRYP PFNGLPIXELSTOREXPROC) (GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLPIXELTRANSFERXOESPROC) (GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLPIXELZOOMXOESPROC) (GLfixed xfactor, GLfixed yfactor);
+typedef void (APIENTRYP PFNGLPRIORITIZETEXTURESXOESPROC) (GLsizei n, const GLuint *textures, const GLfixed *priorities);
+typedef void (APIENTRYP PFNGLRASTERPOS2XOESPROC) (GLfixed x, GLfixed y);
+typedef void (APIENTRYP PFNGLRASTERPOS2XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLRASTERPOS3XOESPROC) (GLfixed x, GLfixed y, GLfixed z);
+typedef void (APIENTRYP PFNGLRASTERPOS3XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLRASTERPOS4XOESPROC) (GLfixed x, GLfixed y, GLfixed z, GLfixed w);
+typedef void (APIENTRYP PFNGLRASTERPOS4XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLRECTXOESPROC) (GLfixed x1, GLfixed y1, GLfixed x2, GLfixed y2);
+typedef void (APIENTRYP PFNGLRECTXVOESPROC) (const GLfixed *v1, const GLfixed *v2);
+typedef void (APIENTRYP PFNGLTEXCOORD1XOESPROC) (GLfixed s);
+typedef void (APIENTRYP PFNGLTEXCOORD1XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD2XOESPROC) (GLfixed s, GLfixed t);
+typedef void (APIENTRYP PFNGLTEXCOORD2XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD3XOESPROC) (GLfixed s, GLfixed t, GLfixed r);
+typedef void (APIENTRYP PFNGLTEXCOORD3XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLTEXCOORD4XOESPROC) (GLfixed s, GLfixed t, GLfixed r, GLfixed q);
+typedef void (APIENTRYP PFNGLTEXCOORD4XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLTEXGENXOESPROC) (GLenum coord, GLenum pname, GLfixed param);
+typedef void (APIENTRYP PFNGLTEXGENXVOESPROC) (GLenum coord, GLenum pname, const GLfixed *params);
+typedef void (APIENTRYP PFNGLVERTEX2XOESPROC) (GLfixed x);
+typedef void (APIENTRYP PFNGLVERTEX2XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLVERTEX3XOESPROC) (GLfixed x, GLfixed y);
+typedef void (APIENTRYP PFNGLVERTEX3XVOESPROC) (const GLfixed *coords);
+typedef void (APIENTRYP PFNGLVERTEX4XOESPROC) (GLfixed x, GLfixed y, GLfixed z);
+typedef void (APIENTRYP PFNGLVERTEX4XVOESPROC) (const GLfixed *coords);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glAlphaFuncxOES (GLenum func, GLfixed ref);
+GLAPI void APIENTRY glClearColorxOES (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+GLAPI void APIENTRY glClearDepthxOES (GLfixed depth);
+GLAPI void APIENTRY glClipPlanexOES (GLenum plane, const GLfixed *equation);
+GLAPI void APIENTRY glColor4xOES (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+GLAPI void APIENTRY glDepthRangexOES (GLfixed n, GLfixed f);
+GLAPI void APIENTRY glFogxOES (GLenum pname, GLfixed param);
+GLAPI void APIENTRY glFogxvOES (GLenum pname, const GLfixed *param);
+GLAPI void APIENTRY glFrustumxOES (GLfixed l, GLfixed r, GLfixed b, GLfixed t, GLfixed n, GLfixed f);
+GLAPI void APIENTRY glGetClipPlanexOES (GLenum plane, GLfixed *equation);
+GLAPI void APIENTRY glGetFixedvOES (GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glGetTexEnvxvOES (GLenum target, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glGetTexParameterxvOES (GLenum target, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glLightModelxOES (GLenum pname, GLfixed param);
+GLAPI void APIENTRY glLightModelxvOES (GLenum pname, const GLfixed *param);
+GLAPI void APIENTRY glLightxOES (GLenum light, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glLightxvOES (GLenum light, GLenum pname, const GLfixed *params);
+GLAPI void APIENTRY glLineWidthxOES (GLfixed width);
+GLAPI void APIENTRY glLoadMatrixxOES (const GLfixed *m);
+GLAPI void APIENTRY glMaterialxOES (GLenum face, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glMaterialxvOES (GLenum face, GLenum pname, const GLfixed *param);
+GLAPI void APIENTRY glMultMatrixxOES (const GLfixed *m);
+GLAPI void APIENTRY glMultiTexCoord4xOES (GLenum texture, GLfixed s, GLfixed t, GLfixed r, GLfixed q);
+GLAPI void APIENTRY glNormal3xOES (GLfixed nx, GLfixed ny, GLfixed nz);
+GLAPI void APIENTRY glOrthoxOES (GLfixed l, GLfixed r, GLfixed b, GLfixed t, GLfixed n, GLfixed f);
+GLAPI void APIENTRY glPointParameterxvOES (GLenum pname, const GLfixed *params);
+GLAPI void APIENTRY glPointSizexOES (GLfixed size);
+GLAPI void APIENTRY glPolygonOffsetxOES (GLfixed factor, GLfixed units);
+GLAPI void APIENTRY glRotatexOES (GLfixed angle, GLfixed x, GLfixed y, GLfixed z);
+GLAPI void APIENTRY glSampleCoverageOES (GLfixed value, GLboolean invert);
+GLAPI void APIENTRY glScalexOES (GLfixed x, GLfixed y, GLfixed z);
+GLAPI void APIENTRY glTexEnvxOES (GLenum target, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glTexEnvxvOES (GLenum target, GLenum pname, const GLfixed *params);
+GLAPI void APIENTRY glTexParameterxOES (GLenum target, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glTexParameterxvOES (GLenum target, GLenum pname, const GLfixed *params);
+GLAPI void APIENTRY glTranslatexOES (GLfixed x, GLfixed y, GLfixed z);
+GLAPI void APIENTRY glAccumxOES (GLenum op, GLfixed value);
+GLAPI void APIENTRY glBitmapxOES (GLsizei width, GLsizei height, GLfixed xorig, GLfixed yorig, GLfixed xmove, GLfixed ymove, const GLubyte *bitmap);
+GLAPI void APIENTRY glBlendColorxOES (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+GLAPI void APIENTRY glClearAccumxOES (GLfixed red, GLfixed green, GLfixed blue, GLfixed alpha);
+GLAPI void APIENTRY glColor3xOES (GLfixed red, GLfixed green, GLfixed blue);
+GLAPI void APIENTRY glColor3xvOES (const GLfixed *components);
+GLAPI void APIENTRY glColor4xvOES (const GLfixed *components);
+GLAPI void APIENTRY glConvolutionParameterxOES (GLenum target, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glConvolutionParameterxvOES (GLenum target, GLenum pname, const GLfixed *params);
+GLAPI void APIENTRY glEvalCoord1xOES (GLfixed u);
+GLAPI void APIENTRY glEvalCoord1xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glEvalCoord2xOES (GLfixed u, GLfixed v);
+GLAPI void APIENTRY glEvalCoord2xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glFeedbackBufferxOES (GLsizei n, GLenum type, const GLfixed *buffer);
+GLAPI void APIENTRY glGetConvolutionParameterxvOES (GLenum target, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glGetHistogramParameterxvOES (GLenum target, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glGetLightxOES (GLenum light, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glGetMapxvOES (GLenum target, GLenum query, GLfixed *v);
+GLAPI void APIENTRY glGetMaterialxOES (GLenum face, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glGetPixelMapxv (GLenum map, GLint size, GLfixed *values);
+GLAPI void APIENTRY glGetTexGenxvOES (GLenum coord, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glGetTexLevelParameterxvOES (GLenum target, GLint level, GLenum pname, GLfixed *params);
+GLAPI void APIENTRY glIndexxOES (GLfixed component);
+GLAPI void APIENTRY glIndexxvOES (const GLfixed *component);
+GLAPI void APIENTRY glLoadTransposeMatrixxOES (const GLfixed *m);
+GLAPI void APIENTRY glMap1xOES (GLenum target, GLfixed u1, GLfixed u2, GLint stride, GLint order, GLfixed points);
+GLAPI void APIENTRY glMap2xOES (GLenum target, GLfixed u1, GLfixed u2, GLint ustride, GLint uorder, GLfixed v1, GLfixed v2, GLint vstride, GLint vorder, GLfixed points);
+GLAPI void APIENTRY glMapGrid1xOES (GLint n, GLfixed u1, GLfixed u2);
+GLAPI void APIENTRY glMapGrid2xOES (GLint n, GLfixed u1, GLfixed u2, GLfixed v1, GLfixed v2);
+GLAPI void APIENTRY glMultTransposeMatrixxOES (const GLfixed *m);
+GLAPI void APIENTRY glMultiTexCoord1xOES (GLenum texture, GLfixed s);
+GLAPI void APIENTRY glMultiTexCoord1xvOES (GLenum texture, const GLfixed *coords);
+GLAPI void APIENTRY glMultiTexCoord2xOES (GLenum texture, GLfixed s, GLfixed t);
+GLAPI void APIENTRY glMultiTexCoord2xvOES (GLenum texture, const GLfixed *coords);
+GLAPI void APIENTRY glMultiTexCoord3xOES (GLenum texture, GLfixed s, GLfixed t, GLfixed r);
+GLAPI void APIENTRY glMultiTexCoord3xvOES (GLenum texture, const GLfixed *coords);
+GLAPI void APIENTRY glMultiTexCoord4xvOES (GLenum texture, const GLfixed *coords);
+GLAPI void APIENTRY glNormal3xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glPassThroughxOES (GLfixed token);
+GLAPI void APIENTRY glPixelMapx (GLenum map, GLint size, const GLfixed *values);
+GLAPI void APIENTRY glPixelStorex (GLenum pname, GLfixed param);
+GLAPI void APIENTRY glPixelTransferxOES (GLenum pname, GLfixed param);
+GLAPI void APIENTRY glPixelZoomxOES (GLfixed xfactor, GLfixed yfactor);
+GLAPI void APIENTRY glPrioritizeTexturesxOES (GLsizei n, const GLuint *textures, const GLfixed *priorities);
+GLAPI void APIENTRY glRasterPos2xOES (GLfixed x, GLfixed y);
+GLAPI void APIENTRY glRasterPos2xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glRasterPos3xOES (GLfixed x, GLfixed y, GLfixed z);
+GLAPI void APIENTRY glRasterPos3xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glRasterPos4xOES (GLfixed x, GLfixed y, GLfixed z, GLfixed w);
+GLAPI void APIENTRY glRasterPos4xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glRectxOES (GLfixed x1, GLfixed y1, GLfixed x2, GLfixed y2);
+GLAPI void APIENTRY glRectxvOES (const GLfixed *v1, const GLfixed *v2);
+GLAPI void APIENTRY glTexCoord1xOES (GLfixed s);
+GLAPI void APIENTRY glTexCoord1xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glTexCoord2xOES (GLfixed s, GLfixed t);
+GLAPI void APIENTRY glTexCoord2xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glTexCoord3xOES (GLfixed s, GLfixed t, GLfixed r);
+GLAPI void APIENTRY glTexCoord3xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glTexCoord4xOES (GLfixed s, GLfixed t, GLfixed r, GLfixed q);
+GLAPI void APIENTRY glTexCoord4xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glTexGenxOES (GLenum coord, GLenum pname, GLfixed param);
+GLAPI void APIENTRY glTexGenxvOES (GLenum coord, GLenum pname, const GLfixed *params);
+GLAPI void APIENTRY glVertex2xOES (GLfixed x);
+GLAPI void APIENTRY glVertex2xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glVertex3xOES (GLfixed x, GLfixed y);
+GLAPI void APIENTRY glVertex3xvOES (const GLfixed *coords);
+GLAPI void APIENTRY glVertex4xOES (GLfixed x, GLfixed y, GLfixed z);
+GLAPI void APIENTRY glVertex4xvOES (const GLfixed *coords);
+#endif
+#endif /* GL_OES_fixed_point */
+
+#ifndef GL_OES_query_matrix
+#define GL_OES_query_matrix 1
+typedef GLbitfield (APIENTRYP PFNGLQUERYMATRIXXOESPROC) (GLfixed *mantissa, GLint *exponent);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLbitfield APIENTRY glQueryMatrixxOES (GLfixed *mantissa, GLint *exponent);
+#endif
+#endif /* GL_OES_query_matrix */
+
+#ifndef GL_OES_read_format
+#define GL_OES_read_format 1
+#define GL_IMPLEMENTATION_COLOR_READ_TYPE_OES 0x8B9A
+#define GL_IMPLEMENTATION_COLOR_READ_FORMAT_OES 0x8B9B
+#endif /* GL_OES_read_format */
+
+#ifndef GL_OES_single_precision
+#define GL_OES_single_precision 1
+typedef void (APIENTRYP PFNGLCLEARDEPTHFOESPROC) (GLclampf depth);
+typedef void (APIENTRYP PFNGLCLIPPLANEFOESPROC) (GLenum plane, const GLfloat *equation);
+typedef void (APIENTRYP PFNGLDEPTHRANGEFOESPROC) (GLclampf n, GLclampf f);
+typedef void (APIENTRYP PFNGLFRUSTUMFOESPROC) (GLfloat l, GLfloat r, GLfloat b, GLfloat t, GLfloat n, GLfloat f);
+typedef void (APIENTRYP PFNGLGETCLIPPLANEFOESPROC) (GLenum plane, GLfloat *equation);
+typedef void (APIENTRYP PFNGLORTHOFOESPROC) (GLfloat l, GLfloat r, GLfloat b, GLfloat t, GLfloat n, GLfloat f);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glClearDepthfOES (GLclampf depth);
+GLAPI void APIENTRY glClipPlanefOES (GLenum plane, const GLfloat *equation);
+GLAPI void APIENTRY glDepthRangefOES (GLclampf n, GLclampf f);
+GLAPI void APIENTRY glFrustumfOES (GLfloat l, GLfloat r, GLfloat b, GLfloat t, GLfloat n, GLfloat f);
+GLAPI void APIENTRY glGetClipPlanefOES (GLenum plane, GLfloat *equation);
+GLAPI void APIENTRY glOrthofOES (GLfloat l, GLfloat r, GLfloat b, GLfloat t, GLfloat n, GLfloat f);
+#endif
+#endif /* GL_OES_single_precision */
+
+#ifndef GL_3DFX_multisample
+#define GL_3DFX_multisample 1
+#define GL_MULTISAMPLE_3DFX               0x86B2
+#define GL_SAMPLE_BUFFERS_3DFX            0x86B3
+#define GL_SAMPLES_3DFX                   0x86B4
+#define GL_MULTISAMPLE_BIT_3DFX           0x20000000
+#endif /* GL_3DFX_multisample */
+
+#ifndef GL_3DFX_tbuffer
+#define GL_3DFX_tbuffer 1
+typedef void (APIENTRYP PFNGLTBUFFERMASK3DFXPROC) (GLuint mask);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTbufferMask3DFX (GLuint mask);
+#endif
+#endif /* GL_3DFX_tbuffer */
+
+#ifndef GL_3DFX_texture_compression_FXT1
+#define GL_3DFX_texture_compression_FXT1 1
+#define GL_COMPRESSED_RGB_FXT1_3DFX       0x86B0
+#define GL_COMPRESSED_RGBA_FXT1_3DFX      0x86B1
+#endif /* GL_3DFX_texture_compression_FXT1 */
+
+#ifndef GL_AMD_blend_minmax_factor
+#define GL_AMD_blend_minmax_factor 1
+#define GL_FACTOR_MIN_AMD                 0x901C
+#define GL_FACTOR_MAX_AMD                 0x901D
+#endif /* GL_AMD_blend_minmax_factor */
+
+#ifndef GL_AMD_conservative_depth
+#define GL_AMD_conservative_depth 1
+#endif /* GL_AMD_conservative_depth */
+
+#ifndef GL_AMD_debug_output
+#define GL_AMD_debug_output 1
+typedef void (APIENTRY  *GLDEBUGPROCAMD)(GLuint id,GLenum category,GLenum severity,GLsizei length,const GLchar *message,void *userParam);
+#define GL_MAX_DEBUG_MESSAGE_LENGTH_AMD   0x9143
+#define GL_MAX_DEBUG_LOGGED_MESSAGES_AMD  0x9144
+#define GL_DEBUG_LOGGED_MESSAGES_AMD      0x9145
+#define GL_DEBUG_SEVERITY_HIGH_AMD        0x9146
+#define GL_DEBUG_SEVERITY_MEDIUM_AMD      0x9147
+#define GL_DEBUG_SEVERITY_LOW_AMD         0x9148
+#define GL_DEBUG_CATEGORY_API_ERROR_AMD   0x9149
+#define GL_DEBUG_CATEGORY_WINDOW_SYSTEM_AMD 0x914A
+#define GL_DEBUG_CATEGORY_DEPRECATION_AMD 0x914B
+#define GL_DEBUG_CATEGORY_UNDEFINED_BEHAVIOR_AMD 0x914C
+#define GL_DEBUG_CATEGORY_PERFORMANCE_AMD 0x914D
+#define GL_DEBUG_CATEGORY_SHADER_COMPILER_AMD 0x914E
+#define GL_DEBUG_CATEGORY_APPLICATION_AMD 0x914F
+#define GL_DEBUG_CATEGORY_OTHER_AMD       0x9150
+typedef void (APIENTRYP PFNGLDEBUGMESSAGEENABLEAMDPROC) (GLenum category, GLenum severity, GLsizei count, const GLuint *ids, GLboolean enabled);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGEINSERTAMDPROC) (GLenum category, GLenum severity, GLuint id, GLsizei length, const GLchar *buf);
+typedef void (APIENTRYP PFNGLDEBUGMESSAGECALLBACKAMDPROC) (GLDEBUGPROCAMD callback, void *userParam);
+typedef GLuint (APIENTRYP PFNGLGETDEBUGMESSAGELOGAMDPROC) (GLuint count, GLsizei bufsize, GLenum *categories, GLuint *severities, GLuint *ids, GLsizei *lengths, GLchar *message);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDebugMessageEnableAMD (GLenum category, GLenum severity, GLsizei count, const GLuint *ids, GLboolean enabled);
+GLAPI void APIENTRY glDebugMessageInsertAMD (GLenum category, GLenum severity, GLuint id, GLsizei length, const GLchar *buf);
+GLAPI void APIENTRY glDebugMessageCallbackAMD (GLDEBUGPROCAMD callback, void *userParam);
+GLAPI GLuint APIENTRY glGetDebugMessageLogAMD (GLuint count, GLsizei bufsize, GLenum *categories, GLuint *severities, GLuint *ids, GLsizei *lengths, GLchar *message);
+#endif
+#endif /* GL_AMD_debug_output */
+
+#ifndef GL_AMD_depth_clamp_separate
+#define GL_AMD_depth_clamp_separate 1
+#define GL_DEPTH_CLAMP_NEAR_AMD           0x901E
+#define GL_DEPTH_CLAMP_FAR_AMD            0x901F
+#endif /* GL_AMD_depth_clamp_separate */
+
+#ifndef GL_AMD_draw_buffers_blend
+#define GL_AMD_draw_buffers_blend 1
+typedef void (APIENTRYP PFNGLBLENDFUNCINDEXEDAMDPROC) (GLuint buf, GLenum src, GLenum dst);
+typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEINDEXEDAMDPROC) (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha);
+typedef void (APIENTRYP PFNGLBLENDEQUATIONINDEXEDAMDPROC) (GLuint buf, GLenum mode);
+typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEINDEXEDAMDPROC) (GLuint buf, GLenum modeRGB, GLenum modeAlpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendFuncIndexedAMD (GLuint buf, GLenum src, GLenum dst);
+GLAPI void APIENTRY glBlendFuncSeparateIndexedAMD (GLuint buf, GLenum srcRGB, GLenum dstRGB, GLenum srcAlpha, GLenum dstAlpha);
+GLAPI void APIENTRY glBlendEquationIndexedAMD (GLuint buf, GLenum mode);
+GLAPI void APIENTRY glBlendEquationSeparateIndexedAMD (GLuint buf, GLenum modeRGB, GLenum modeAlpha);
+#endif
+#endif /* GL_AMD_draw_buffers_blend */
+
+#ifndef GL_AMD_interleaved_elements
+#define GL_AMD_interleaved_elements 1
+#define GL_VERTEX_ELEMENT_SWIZZLE_AMD     0x91A4
+#define GL_VERTEX_ID_SWIZZLE_AMD          0x91A5
+typedef void (APIENTRYP PFNGLVERTEXATTRIBPARAMETERIAMDPROC) (GLuint index, GLenum pname, GLint param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttribParameteriAMD (GLuint index, GLenum pname, GLint param);
+#endif
+#endif /* GL_AMD_interleaved_elements */
+
+#ifndef GL_AMD_multi_draw_indirect
+#define GL_AMD_multi_draw_indirect 1
+typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSINDIRECTAMDPROC) (GLenum mode, const void *indirect, GLsizei primcount, GLsizei stride);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSINDIRECTAMDPROC) (GLenum mode, GLenum type, const void *indirect, GLsizei primcount, GLsizei stride);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMultiDrawArraysIndirectAMD (GLenum mode, const void *indirect, GLsizei primcount, GLsizei stride);
+GLAPI void APIENTRY glMultiDrawElementsIndirectAMD (GLenum mode, GLenum type, const void *indirect, GLsizei primcount, GLsizei stride);
+#endif
+#endif /* GL_AMD_multi_draw_indirect */
+
+#ifndef GL_AMD_name_gen_delete
+#define GL_AMD_name_gen_delete 1
+#define GL_DATA_BUFFER_AMD                0x9151
+#define GL_PERFORMANCE_MONITOR_AMD        0x9152
+#define GL_QUERY_OBJECT_AMD               0x9153
+#define GL_VERTEX_ARRAY_OBJECT_AMD        0x9154
+#define GL_SAMPLER_OBJECT_AMD             0x9155
+typedef void (APIENTRYP PFNGLGENNAMESAMDPROC) (GLenum identifier, GLuint num, GLuint *names);
+typedef void (APIENTRYP PFNGLDELETENAMESAMDPROC) (GLenum identifier, GLuint num, const GLuint *names);
+typedef GLboolean (APIENTRYP PFNGLISNAMEAMDPROC) (GLenum identifier, GLuint name);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGenNamesAMD (GLenum identifier, GLuint num, GLuint *names);
+GLAPI void APIENTRY glDeleteNamesAMD (GLenum identifier, GLuint num, const GLuint *names);
+GLAPI GLboolean APIENTRY glIsNameAMD (GLenum identifier, GLuint name);
+#endif
+#endif /* GL_AMD_name_gen_delete */
+
+#ifndef GL_AMD_occlusion_query_event
+#define GL_AMD_occlusion_query_event 1
+#define GL_OCCLUSION_QUERY_EVENT_MASK_AMD 0x874F
+#define GL_QUERY_DEPTH_PASS_EVENT_BIT_AMD 0x00000001
+#define GL_QUERY_DEPTH_FAIL_EVENT_BIT_AMD 0x00000002
+#define GL_QUERY_STENCIL_FAIL_EVENT_BIT_AMD 0x00000004
+#define GL_QUERY_DEPTH_BOUNDS_FAIL_EVENT_BIT_AMD 0x00000008
+#define GL_QUERY_ALL_EVENT_BITS_AMD       0xFFFFFFFF
+typedef void (APIENTRYP PFNGLQUERYOBJECTPARAMETERUIAMDPROC) (GLenum target, GLuint id, GLenum pname, GLuint param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glQueryObjectParameteruiAMD (GLenum target, GLuint id, GLenum pname, GLuint param);
+#endif
+#endif /* GL_AMD_occlusion_query_event */
+
+#ifndef GL_AMD_performance_monitor
+#define GL_AMD_performance_monitor 1
+#define GL_COUNTER_TYPE_AMD               0x8BC0
+#define GL_COUNTER_RANGE_AMD              0x8BC1
+#define GL_UNSIGNED_INT64_AMD             0x8BC2
+#define GL_PERCENTAGE_AMD                 0x8BC3
+#define GL_PERFMON_RESULT_AVAILABLE_AMD   0x8BC4
+#define GL_PERFMON_RESULT_SIZE_AMD        0x8BC5
+#define GL_PERFMON_RESULT_AMD             0x8BC6
+typedef void (APIENTRYP PFNGLGETPERFMONITORGROUPSAMDPROC) (GLint *numGroups, GLsizei groupsSize, GLuint *groups);
+typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERSAMDPROC) (GLuint group, GLint *numCounters, GLint *maxActiveCounters, GLsizei counterSize, GLuint *counters);
+typedef void (APIENTRYP PFNGLGETPERFMONITORGROUPSTRINGAMDPROC) (GLuint group, GLsizei bufSize, GLsizei *length, GLchar *groupString);
+typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERSTRINGAMDPROC) (GLuint group, GLuint counter, GLsizei bufSize, GLsizei *length, GLchar *counterString);
+typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERINFOAMDPROC) (GLuint group, GLuint counter, GLenum pname, void *data);
+typedef void (APIENTRYP PFNGLGENPERFMONITORSAMDPROC) (GLsizei n, GLuint *monitors);
+typedef void (APIENTRYP PFNGLDELETEPERFMONITORSAMDPROC) (GLsizei n, GLuint *monitors);
+typedef void (APIENTRYP PFNGLSELECTPERFMONITORCOUNTERSAMDPROC) (GLuint monitor, GLboolean enable, GLuint group, GLint numCounters, GLuint *counterList);
+typedef void (APIENTRYP PFNGLBEGINPERFMONITORAMDPROC) (GLuint monitor);
+typedef void (APIENTRYP PFNGLENDPERFMONITORAMDPROC) (GLuint monitor);
+typedef void (APIENTRYP PFNGLGETPERFMONITORCOUNTERDATAAMDPROC) (GLuint monitor, GLenum pname, GLsizei dataSize, GLuint *data, GLint *bytesWritten);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetPerfMonitorGroupsAMD (GLint *numGroups, GLsizei groupsSize, GLuint *groups);
+GLAPI void APIENTRY glGetPerfMonitorCountersAMD (GLuint group, GLint *numCounters, GLint *maxActiveCounters, GLsizei counterSize, GLuint *counters);
+GLAPI void APIENTRY glGetPerfMonitorGroupStringAMD (GLuint group, GLsizei bufSize, GLsizei *length, GLchar *groupString);
+GLAPI void APIENTRY glGetPerfMonitorCounterStringAMD (GLuint group, GLuint counter, GLsizei bufSize, GLsizei *length, GLchar *counterString);
+GLAPI void APIENTRY glGetPerfMonitorCounterInfoAMD (GLuint group, GLuint counter, GLenum pname, void *data);
+GLAPI void APIENTRY glGenPerfMonitorsAMD (GLsizei n, GLuint *monitors);
+GLAPI void APIENTRY glDeletePerfMonitorsAMD (GLsizei n, GLuint *monitors);
+GLAPI void APIENTRY glSelectPerfMonitorCountersAMD (GLuint monitor, GLboolean enable, GLuint group, GLint numCounters, GLuint *counterList);
+GLAPI void APIENTRY glBeginPerfMonitorAMD (GLuint monitor);
+GLAPI void APIENTRY glEndPerfMonitorAMD (GLuint monitor);
+GLAPI void APIENTRY glGetPerfMonitorCounterDataAMD (GLuint monitor, GLenum pname, GLsizei dataSize, GLuint *data, GLint *bytesWritten);
+#endif
+#endif /* GL_AMD_performance_monitor */
+
+#ifndef GL_AMD_pinned_memory
+#define GL_AMD_pinned_memory 1
+#define GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD 0x9160
+#endif /* GL_AMD_pinned_memory */
+
+#ifndef GL_AMD_query_buffer_object
+#define GL_AMD_query_buffer_object 1
+#define GL_QUERY_BUFFER_AMD               0x9192
+#define GL_QUERY_BUFFER_BINDING_AMD       0x9193
+#define GL_QUERY_RESULT_NO_WAIT_AMD       0x9194
+#endif /* GL_AMD_query_buffer_object */
+
+#ifndef GL_AMD_sample_positions
+#define GL_AMD_sample_positions 1
+#define GL_SUBSAMPLE_DISTANCE_AMD         0x883F
+typedef void (APIENTRYP PFNGLSETMULTISAMPLEFVAMDPROC) (GLenum pname, GLuint index, const GLfloat *val);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSetMultisamplefvAMD (GLenum pname, GLuint index, const GLfloat *val);
+#endif
+#endif /* GL_AMD_sample_positions */
+
+#ifndef GL_AMD_seamless_cubemap_per_texture
+#define GL_AMD_seamless_cubemap_per_texture 1
+#endif /* GL_AMD_seamless_cubemap_per_texture */
+
+#ifndef GL_AMD_shader_atomic_counter_ops
+#define GL_AMD_shader_atomic_counter_ops 1
+#endif /* GL_AMD_shader_atomic_counter_ops */
+
+#ifndef GL_AMD_shader_stencil_export
+#define GL_AMD_shader_stencil_export 1
+#endif /* GL_AMD_shader_stencil_export */
+
+#ifndef GL_AMD_shader_trinary_minmax
+#define GL_AMD_shader_trinary_minmax 1
+#endif /* GL_AMD_shader_trinary_minmax */
+
+#ifndef GL_AMD_sparse_texture
+#define GL_AMD_sparse_texture 1
+#define GL_VIRTUAL_PAGE_SIZE_X_AMD        0x9195
+#define GL_VIRTUAL_PAGE_SIZE_Y_AMD        0x9196
+#define GL_VIRTUAL_PAGE_SIZE_Z_AMD        0x9197
+#define GL_MAX_SPARSE_TEXTURE_SIZE_AMD    0x9198
+#define GL_MAX_SPARSE_3D_TEXTURE_SIZE_AMD 0x9199
+#define GL_MAX_SPARSE_ARRAY_TEXTURE_LAYERS 0x919A
+#define GL_MIN_SPARSE_LEVEL_AMD           0x919B
+#define GL_MIN_LOD_WARNING_AMD            0x919C
+#define GL_TEXTURE_STORAGE_SPARSE_BIT_AMD 0x00000001
+typedef void (APIENTRYP PFNGLTEXSTORAGESPARSEAMDPROC) (GLenum target, GLenum internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLsizei layers, GLbitfield flags);
+typedef void (APIENTRYP PFNGLTEXTURESTORAGESPARSEAMDPROC) (GLuint texture, GLenum target, GLenum internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLsizei layers, GLbitfield flags);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexStorageSparseAMD (GLenum target, GLenum internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLsizei layers, GLbitfield flags);
+GLAPI void APIENTRY glTextureStorageSparseAMD (GLuint texture, GLenum target, GLenum internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLsizei layers, GLbitfield flags);
+#endif
+#endif /* GL_AMD_sparse_texture */
+
+#ifndef GL_AMD_stencil_operation_extended
+#define GL_AMD_stencil_operation_extended 1
+#define GL_SET_AMD                        0x874A
+#define GL_REPLACE_VALUE_AMD              0x874B
+#define GL_STENCIL_OP_VALUE_AMD           0x874C
+#define GL_STENCIL_BACK_OP_VALUE_AMD      0x874D
+typedef void (APIENTRYP PFNGLSTENCILOPVALUEAMDPROC) (GLenum face, GLuint value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glStencilOpValueAMD (GLenum face, GLuint value);
+#endif
+#endif /* GL_AMD_stencil_operation_extended */
+
+#ifndef GL_AMD_texture_texture4
+#define GL_AMD_texture_texture4 1
+#endif /* GL_AMD_texture_texture4 */
+
+#ifndef GL_AMD_transform_feedback3_lines_triangles
+#define GL_AMD_transform_feedback3_lines_triangles 1
+#endif /* GL_AMD_transform_feedback3_lines_triangles */
+
+#ifndef GL_AMD_vertex_shader_layer
+#define GL_AMD_vertex_shader_layer 1
+#endif /* GL_AMD_vertex_shader_layer */
+
+#ifndef GL_AMD_vertex_shader_tessellator
+#define GL_AMD_vertex_shader_tessellator 1
+#define GL_SAMPLER_BUFFER_AMD             0x9001
+#define GL_INT_SAMPLER_BUFFER_AMD         0x9002
+#define GL_UNSIGNED_INT_SAMPLER_BUFFER_AMD 0x9003
+#define GL_TESSELLATION_MODE_AMD          0x9004
+#define GL_TESSELLATION_FACTOR_AMD        0x9005
+#define GL_DISCRETE_AMD                   0x9006
+#define GL_CONTINUOUS_AMD                 0x9007
+typedef void (APIENTRYP PFNGLTESSELLATIONFACTORAMDPROC) (GLfloat factor);
+typedef void (APIENTRYP PFNGLTESSELLATIONMODEAMDPROC) (GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTessellationFactorAMD (GLfloat factor);
+GLAPI void APIENTRY glTessellationModeAMD (GLenum mode);
+#endif
+#endif /* GL_AMD_vertex_shader_tessellator */
+
+#ifndef GL_AMD_vertex_shader_viewport_index
+#define GL_AMD_vertex_shader_viewport_index 1
+#endif /* GL_AMD_vertex_shader_viewport_index */
+
+#ifndef GL_APPLE_aux_depth_stencil
+#define GL_APPLE_aux_depth_stencil 1
+#define GL_AUX_DEPTH_STENCIL_APPLE        0x8A14
+#endif /* GL_APPLE_aux_depth_stencil */
+
+#ifndef GL_APPLE_client_storage
+#define GL_APPLE_client_storage 1
+#define GL_UNPACK_CLIENT_STORAGE_APPLE    0x85B2
+#endif /* GL_APPLE_client_storage */
+
+#ifndef GL_APPLE_element_array
+#define GL_APPLE_element_array 1
+#define GL_ELEMENT_ARRAY_APPLE            0x8A0C
+#define GL_ELEMENT_ARRAY_TYPE_APPLE       0x8A0D
+#define GL_ELEMENT_ARRAY_POINTER_APPLE    0x8A0E
+typedef void (APIENTRYP PFNGLELEMENTPOINTERAPPLEPROC) (GLenum type, const void *pointer);
+typedef void (APIENTRYP PFNGLDRAWELEMENTARRAYAPPLEPROC) (GLenum mode, GLint first, GLsizei count);
+typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTARRAYAPPLEPROC) (GLenum mode, GLuint start, GLuint end, GLint first, GLsizei count);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTARRAYAPPLEPROC) (GLenum mode, const GLint *first, const GLsizei *count, GLsizei primcount);
+typedef void (APIENTRYP PFNGLMULTIDRAWRANGEELEMENTARRAYAPPLEPROC) (GLenum mode, GLuint start, GLuint end, const GLint *first, const GLsizei *count, GLsizei primcount);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glElementPointerAPPLE (GLenum type, const void *pointer);
+GLAPI void APIENTRY glDrawElementArrayAPPLE (GLenum mode, GLint first, GLsizei count);
+GLAPI void APIENTRY glDrawRangeElementArrayAPPLE (GLenum mode, GLuint start, GLuint end, GLint first, GLsizei count);
+GLAPI void APIENTRY glMultiDrawElementArrayAPPLE (GLenum mode, const GLint *first, const GLsizei *count, GLsizei primcount);
+GLAPI void APIENTRY glMultiDrawRangeElementArrayAPPLE (GLenum mode, GLuint start, GLuint end, const GLint *first, const GLsizei *count, GLsizei primcount);
+#endif
+#endif /* GL_APPLE_element_array */
+
+#ifndef GL_APPLE_fence
+#define GL_APPLE_fence 1
+#define GL_DRAW_PIXELS_APPLE              0x8A0A
+#define GL_FENCE_APPLE                    0x8A0B
+typedef void (APIENTRYP PFNGLGENFENCESAPPLEPROC) (GLsizei n, GLuint *fences);
+typedef void (APIENTRYP PFNGLDELETEFENCESAPPLEPROC) (GLsizei n, const GLuint *fences);
+typedef void (APIENTRYP PFNGLSETFENCEAPPLEPROC) (GLuint fence);
+typedef GLboolean (APIENTRYP PFNGLISFENCEAPPLEPROC) (GLuint fence);
+typedef GLboolean (APIENTRYP PFNGLTESTFENCEAPPLEPROC) (GLuint fence);
+typedef void (APIENTRYP PFNGLFINISHFENCEAPPLEPROC) (GLuint fence);
+typedef GLboolean (APIENTRYP PFNGLTESTOBJECTAPPLEPROC) (GLenum object, GLuint name);
+typedef void (APIENTRYP PFNGLFINISHOBJECTAPPLEPROC) (GLenum object, GLint name);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGenFencesAPPLE (GLsizei n, GLuint *fences);
+GLAPI void APIENTRY glDeleteFencesAPPLE (GLsizei n, const GLuint *fences);
+GLAPI void APIENTRY glSetFenceAPPLE (GLuint fence);
+GLAPI GLboolean APIENTRY glIsFenceAPPLE (GLuint fence);
+GLAPI GLboolean APIENTRY glTestFenceAPPLE (GLuint fence);
+GLAPI void APIENTRY glFinishFenceAPPLE (GLuint fence);
+GLAPI GLboolean APIENTRY glTestObjectAPPLE (GLenum object, GLuint name);
+GLAPI void APIENTRY glFinishObjectAPPLE (GLenum object, GLint name);
+#endif
+#endif /* GL_APPLE_fence */
+
+#ifndef GL_APPLE_float_pixels
+#define GL_APPLE_float_pixels 1
+#define GL_HALF_APPLE                     0x140B
+#define GL_RGBA_FLOAT32_APPLE             0x8814
+#define GL_RGB_FLOAT32_APPLE              0x8815
+#define GL_ALPHA_FLOAT32_APPLE            0x8816
+#define GL_INTENSITY_FLOAT32_APPLE        0x8817
+#define GL_LUMINANCE_FLOAT32_APPLE        0x8818
+#define GL_LUMINANCE_ALPHA_FLOAT32_APPLE  0x8819
+#define GL_RGBA_FLOAT16_APPLE             0x881A
+#define GL_RGB_FLOAT16_APPLE              0x881B
+#define GL_ALPHA_FLOAT16_APPLE            0x881C
+#define GL_INTENSITY_FLOAT16_APPLE        0x881D
+#define GL_LUMINANCE_FLOAT16_APPLE        0x881E
+#define GL_LUMINANCE_ALPHA_FLOAT16_APPLE  0x881F
+#define GL_COLOR_FLOAT_APPLE              0x8A0F
+#endif /* GL_APPLE_float_pixels */
+
+#ifndef GL_APPLE_flush_buffer_range
+#define GL_APPLE_flush_buffer_range 1
+#define GL_BUFFER_SERIALIZED_MODIFY_APPLE 0x8A12
+#define GL_BUFFER_FLUSHING_UNMAP_APPLE    0x8A13
+typedef void (APIENTRYP PFNGLBUFFERPARAMETERIAPPLEPROC) (GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLFLUSHMAPPEDBUFFERRANGEAPPLEPROC) (GLenum target, GLintptr offset, GLsizeiptr size);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBufferParameteriAPPLE (GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glFlushMappedBufferRangeAPPLE (GLenum target, GLintptr offset, GLsizeiptr size);
+#endif
+#endif /* GL_APPLE_flush_buffer_range */
+
+#ifndef GL_APPLE_object_purgeable
+#define GL_APPLE_object_purgeable 1
+#define GL_BUFFER_OBJECT_APPLE            0x85B3
+#define GL_RELEASED_APPLE                 0x8A19
+#define GL_VOLATILE_APPLE                 0x8A1A
+#define GL_RETAINED_APPLE                 0x8A1B
+#define GL_UNDEFINED_APPLE                0x8A1C
+#define GL_PURGEABLE_APPLE                0x8A1D
+typedef GLenum (APIENTRYP PFNGLOBJECTPURGEABLEAPPLEPROC) (GLenum objectType, GLuint name, GLenum option);
+typedef GLenum (APIENTRYP PFNGLOBJECTUNPURGEABLEAPPLEPROC) (GLenum objectType, GLuint name, GLenum option);
+typedef void (APIENTRYP PFNGLGETOBJECTPARAMETERIVAPPLEPROC) (GLenum objectType, GLuint name, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLenum APIENTRY glObjectPurgeableAPPLE (GLenum objectType, GLuint name, GLenum option);
+GLAPI GLenum APIENTRY glObjectUnpurgeableAPPLE (GLenum objectType, GLuint name, GLenum option);
+GLAPI void APIENTRY glGetObjectParameterivAPPLE (GLenum objectType, GLuint name, GLenum pname, GLint *params);
+#endif
+#endif /* GL_APPLE_object_purgeable */
+
+#ifndef GL_APPLE_rgb_422
+#define GL_APPLE_rgb_422 1
+#define GL_RGB_422_APPLE                  0x8A1F
+#define GL_UNSIGNED_SHORT_8_8_APPLE       0x85BA
+#define GL_UNSIGNED_SHORT_8_8_REV_APPLE   0x85BB
+#define GL_RGB_RAW_422_APPLE              0x8A51
+#endif /* GL_APPLE_rgb_422 */
+
+#ifndef GL_APPLE_row_bytes
+#define GL_APPLE_row_bytes 1
+#define GL_PACK_ROW_BYTES_APPLE           0x8A15
+#define GL_UNPACK_ROW_BYTES_APPLE         0x8A16
+#endif /* GL_APPLE_row_bytes */
+
+#ifndef GL_APPLE_specular_vector
+#define GL_APPLE_specular_vector 1
+#define GL_LIGHT_MODEL_SPECULAR_VECTOR_APPLE 0x85B0
+#endif /* GL_APPLE_specular_vector */
+
+#ifndef GL_APPLE_texture_range
+#define GL_APPLE_texture_range 1
+#define GL_TEXTURE_RANGE_LENGTH_APPLE     0x85B7
+#define GL_TEXTURE_RANGE_POINTER_APPLE    0x85B8
+#define GL_TEXTURE_STORAGE_HINT_APPLE     0x85BC
+#define GL_STORAGE_PRIVATE_APPLE          0x85BD
+#define GL_STORAGE_CACHED_APPLE           0x85BE
+#define GL_STORAGE_SHARED_APPLE           0x85BF
+typedef void (APIENTRYP PFNGLTEXTURERANGEAPPLEPROC) (GLenum target, GLsizei length, const void *pointer);
+typedef void (APIENTRYP PFNGLGETTEXPARAMETERPOINTERVAPPLEPROC) (GLenum target, GLenum pname, void **params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTextureRangeAPPLE (GLenum target, GLsizei length, const void *pointer);
+GLAPI void APIENTRY glGetTexParameterPointervAPPLE (GLenum target, GLenum pname, void **params);
+#endif
+#endif /* GL_APPLE_texture_range */
+
+#ifndef GL_APPLE_transform_hint
+#define GL_APPLE_transform_hint 1
+#define GL_TRANSFORM_HINT_APPLE           0x85B1
+#endif /* GL_APPLE_transform_hint */
+
+#ifndef GL_APPLE_vertex_array_object
+#define GL_APPLE_vertex_array_object 1
+#define GL_VERTEX_ARRAY_BINDING_APPLE     0x85B5
+typedef void (APIENTRYP PFNGLBINDVERTEXARRAYAPPLEPROC) (GLuint array);
+typedef void (APIENTRYP PFNGLDELETEVERTEXARRAYSAPPLEPROC) (GLsizei n, const GLuint *arrays);
+typedef void (APIENTRYP PFNGLGENVERTEXARRAYSAPPLEPROC) (GLsizei n, GLuint *arrays);
+typedef GLboolean (APIENTRYP PFNGLISVERTEXARRAYAPPLEPROC) (GLuint array);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBindVertexArrayAPPLE (GLuint array);
+GLAPI void APIENTRY glDeleteVertexArraysAPPLE (GLsizei n, const GLuint *arrays);
+GLAPI void APIENTRY glGenVertexArraysAPPLE (GLsizei n, GLuint *arrays);
+GLAPI GLboolean APIENTRY glIsVertexArrayAPPLE (GLuint array);
+#endif
+#endif /* GL_APPLE_vertex_array_object */
+
+#ifndef GL_APPLE_vertex_array_range
+#define GL_APPLE_vertex_array_range 1
+#define GL_VERTEX_ARRAY_RANGE_APPLE       0x851D
+#define GL_VERTEX_ARRAY_RANGE_LENGTH_APPLE 0x851E
+#define GL_VERTEX_ARRAY_STORAGE_HINT_APPLE 0x851F
+#define GL_VERTEX_ARRAY_RANGE_POINTER_APPLE 0x8521
+#define GL_STORAGE_CLIENT_APPLE           0x85B4
+typedef void (APIENTRYP PFNGLVERTEXARRAYRANGEAPPLEPROC) (GLsizei length, void *pointer);
+typedef void (APIENTRYP PFNGLFLUSHVERTEXARRAYRANGEAPPLEPROC) (GLsizei length, void *pointer);
+typedef void (APIENTRYP PFNGLVERTEXARRAYPARAMETERIAPPLEPROC) (GLenum pname, GLint param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexArrayRangeAPPLE (GLsizei length, void *pointer);
+GLAPI void APIENTRY glFlushVertexArrayRangeAPPLE (GLsizei length, void *pointer);
+GLAPI void APIENTRY glVertexArrayParameteriAPPLE (GLenum pname, GLint param);
+#endif
+#endif /* GL_APPLE_vertex_array_range */
+
+#ifndef GL_APPLE_vertex_program_evaluators
+#define GL_APPLE_vertex_program_evaluators 1
+#define GL_VERTEX_ATTRIB_MAP1_APPLE       0x8A00
+#define GL_VERTEX_ATTRIB_MAP2_APPLE       0x8A01
+#define GL_VERTEX_ATTRIB_MAP1_SIZE_APPLE  0x8A02
+#define GL_VERTEX_ATTRIB_MAP1_COEFF_APPLE 0x8A03
+#define GL_VERTEX_ATTRIB_MAP1_ORDER_APPLE 0x8A04
+#define GL_VERTEX_ATTRIB_MAP1_DOMAIN_APPLE 0x8A05
+#define GL_VERTEX_ATTRIB_MAP2_SIZE_APPLE  0x8A06
+#define GL_VERTEX_ATTRIB_MAP2_COEFF_APPLE 0x8A07
+#define GL_VERTEX_ATTRIB_MAP2_ORDER_APPLE 0x8A08
+#define GL_VERTEX_ATTRIB_MAP2_DOMAIN_APPLE 0x8A09
+typedef void (APIENTRYP PFNGLENABLEVERTEXATTRIBAPPLEPROC) (GLuint index, GLenum pname);
+typedef void (APIENTRYP PFNGLDISABLEVERTEXATTRIBAPPLEPROC) (GLuint index, GLenum pname);
+typedef GLboolean (APIENTRYP PFNGLISVERTEXATTRIBENABLEDAPPLEPROC) (GLuint index, GLenum pname);
+typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB1DAPPLEPROC) (GLuint index, GLuint size, GLdouble u1, GLdouble u2, GLint stride, GLint order, const GLdouble *points);
+typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB1FAPPLEPROC) (GLuint index, GLuint size, GLfloat u1, GLfloat u2, GLint stride, GLint order, const GLfloat *points);
+typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB2DAPPLEPROC) (GLuint index, GLuint size, GLdouble u1, GLdouble u2, GLint ustride, GLint uorder, GLdouble v1, GLdouble v2, GLint vstride, GLint vorder, const GLdouble *points);
+typedef void (APIENTRYP PFNGLMAPVERTEXATTRIB2FAPPLEPROC) (GLuint index, GLuint size, GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder, const GLfloat *points);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glEnableVertexAttribAPPLE (GLuint index, GLenum pname);
+GLAPI void APIENTRY glDisableVertexAttribAPPLE (GLuint index, GLenum pname);
+GLAPI GLboolean APIENTRY glIsVertexAttribEnabledAPPLE (GLuint index, GLenum pname);
+GLAPI void APIENTRY glMapVertexAttrib1dAPPLE (GLuint index, GLuint size, GLdouble u1, GLdouble u2, GLint stride, GLint order, const GLdouble *points);
+GLAPI void APIENTRY glMapVertexAttrib1fAPPLE (GLuint index, GLuint size, GLfloat u1, GLfloat u2, GLint stride, GLint order, const GLfloat *points);
+GLAPI void APIENTRY glMapVertexAttrib2dAPPLE (GLuint index, GLuint size, GLdouble u1, GLdouble u2, GLint ustride, GLint uorder, GLdouble v1, GLdouble v2, GLint vstride, GLint vorder, const GLdouble *points);
+GLAPI void APIENTRY glMapVertexAttrib2fAPPLE (GLuint index, GLuint size, GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder, const GLfloat *points);
+#endif
+#endif /* GL_APPLE_vertex_program_evaluators */
+
+#ifndef GL_APPLE_ycbcr_422
+#define GL_APPLE_ycbcr_422 1
+#define GL_YCBCR_422_APPLE                0x85B9
+#endif /* GL_APPLE_ycbcr_422 */
+
+#ifndef GL_ATI_draw_buffers
+#define GL_ATI_draw_buffers 1
+#define GL_MAX_DRAW_BUFFERS_ATI           0x8824
+#define GL_DRAW_BUFFER0_ATI               0x8825
+#define GL_DRAW_BUFFER1_ATI               0x8826
+#define GL_DRAW_BUFFER2_ATI               0x8827
+#define GL_DRAW_BUFFER3_ATI               0x8828
+#define GL_DRAW_BUFFER4_ATI               0x8829
+#define GL_DRAW_BUFFER5_ATI               0x882A
+#define GL_DRAW_BUFFER6_ATI               0x882B
+#define GL_DRAW_BUFFER7_ATI               0x882C
+#define GL_DRAW_BUFFER8_ATI               0x882D
+#define GL_DRAW_BUFFER9_ATI               0x882E
+#define GL_DRAW_BUFFER10_ATI              0x882F
+#define GL_DRAW_BUFFER11_ATI              0x8830
+#define GL_DRAW_BUFFER12_ATI              0x8831
+#define GL_DRAW_BUFFER13_ATI              0x8832
+#define GL_DRAW_BUFFER14_ATI              0x8833
+#define GL_DRAW_BUFFER15_ATI              0x8834
+typedef void (APIENTRYP PFNGLDRAWBUFFERSATIPROC) (GLsizei n, const GLenum *bufs);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawBuffersATI (GLsizei n, const GLenum *bufs);
+#endif
+#endif /* GL_ATI_draw_buffers */
+
+#ifndef GL_ATI_element_array
+#define GL_ATI_element_array 1
+#define GL_ELEMENT_ARRAY_ATI              0x8768
+#define GL_ELEMENT_ARRAY_TYPE_ATI         0x8769
+#define GL_ELEMENT_ARRAY_POINTER_ATI      0x876A
+typedef void (APIENTRYP PFNGLELEMENTPOINTERATIPROC) (GLenum type, const void *pointer);
+typedef void (APIENTRYP PFNGLDRAWELEMENTARRAYATIPROC) (GLenum mode, GLsizei count);
+typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTARRAYATIPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glElementPointerATI (GLenum type, const void *pointer);
+GLAPI void APIENTRY glDrawElementArrayATI (GLenum mode, GLsizei count);
+GLAPI void APIENTRY glDrawRangeElementArrayATI (GLenum mode, GLuint start, GLuint end, GLsizei count);
+#endif
+#endif /* GL_ATI_element_array */
+
+#ifndef GL_ATI_envmap_bumpmap
+#define GL_ATI_envmap_bumpmap 1
+#define GL_BUMP_ROT_MATRIX_ATI            0x8775
+#define GL_BUMP_ROT_MATRIX_SIZE_ATI       0x8776
+#define GL_BUMP_NUM_TEX_UNITS_ATI         0x8777
+#define GL_BUMP_TEX_UNITS_ATI             0x8778
+#define GL_DUDV_ATI                       0x8779
+#define GL_DU8DV8_ATI                     0x877A
+#define GL_BUMP_ENVMAP_ATI                0x877B
+#define GL_BUMP_TARGET_ATI                0x877C
+typedef void (APIENTRYP PFNGLTEXBUMPPARAMETERIVATIPROC) (GLenum pname, const GLint *param);
+typedef void (APIENTRYP PFNGLTEXBUMPPARAMETERFVATIPROC) (GLenum pname, const GLfloat *param);
+typedef void (APIENTRYP PFNGLGETTEXBUMPPARAMETERIVATIPROC) (GLenum pname, GLint *param);
+typedef void (APIENTRYP PFNGLGETTEXBUMPPARAMETERFVATIPROC) (GLenum pname, GLfloat *param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexBumpParameterivATI (GLenum pname, const GLint *param);
+GLAPI void APIENTRY glTexBumpParameterfvATI (GLenum pname, const GLfloat *param);
+GLAPI void APIENTRY glGetTexBumpParameterivATI (GLenum pname, GLint *param);
+GLAPI void APIENTRY glGetTexBumpParameterfvATI (GLenum pname, GLfloat *param);
+#endif
+#endif /* GL_ATI_envmap_bumpmap */
+
+#ifndef GL_ATI_fragment_shader
+#define GL_ATI_fragment_shader 1
+#define GL_FRAGMENT_SHADER_ATI            0x8920
+#define GL_REG_0_ATI                      0x8921
+#define GL_REG_1_ATI                      0x8922
+#define GL_REG_2_ATI                      0x8923
+#define GL_REG_3_ATI                      0x8924
+#define GL_REG_4_ATI                      0x8925
+#define GL_REG_5_ATI                      0x8926
+#define GL_REG_6_ATI                      0x8927
+#define GL_REG_7_ATI                      0x8928
+#define GL_REG_8_ATI                      0x8929
+#define GL_REG_9_ATI                      0x892A
+#define GL_REG_10_ATI                     0x892B
+#define GL_REG_11_ATI                     0x892C
+#define GL_REG_12_ATI                     0x892D
+#define GL_REG_13_ATI                     0x892E
+#define GL_REG_14_ATI                     0x892F
+#define GL_REG_15_ATI                     0x8930
+#define GL_REG_16_ATI                     0x8931
+#define GL_REG_17_ATI                     0x8932
+#define GL_REG_18_ATI                     0x8933
+#define GL_REG_19_ATI                     0x8934
+#define GL_REG_20_ATI                     0x8935
+#define GL_REG_21_ATI                     0x8936
+#define GL_REG_22_ATI                     0x8937
+#define GL_REG_23_ATI                     0x8938
+#define GL_REG_24_ATI                     0x8939
+#define GL_REG_25_ATI                     0x893A
+#define GL_REG_26_ATI                     0x893B
+#define GL_REG_27_ATI                     0x893C
+#define GL_REG_28_ATI                     0x893D
+#define GL_REG_29_ATI                     0x893E
+#define GL_REG_30_ATI                     0x893F
+#define GL_REG_31_ATI                     0x8940
+#define GL_CON_0_ATI                      0x8941
+#define GL_CON_1_ATI                      0x8942
+#define GL_CON_2_ATI                      0x8943
+#define GL_CON_3_ATI                      0x8944
+#define GL_CON_4_ATI                      0x8945
+#define GL_CON_5_ATI                      0x8946
+#define GL_CON_6_ATI                      0x8947
+#define GL_CON_7_ATI                      0x8948
+#define GL_CON_8_ATI                      0x8949
+#define GL_CON_9_ATI                      0x894A
+#define GL_CON_10_ATI                     0x894B
+#define GL_CON_11_ATI                     0x894C
+#define GL_CON_12_ATI                     0x894D
+#define GL_CON_13_ATI                     0x894E
+#define GL_CON_14_ATI                     0x894F
+#define GL_CON_15_ATI                     0x8950
+#define GL_CON_16_ATI                     0x8951
+#define GL_CON_17_ATI                     0x8952
+#define GL_CON_18_ATI                     0x8953
+#define GL_CON_19_ATI                     0x8954
+#define GL_CON_20_ATI                     0x8955
+#define GL_CON_21_ATI                     0x8956
+#define GL_CON_22_ATI                     0x8957
+#define GL_CON_23_ATI                     0x8958
+#define GL_CON_24_ATI                     0x8959
+#define GL_CON_25_ATI                     0x895A
+#define GL_CON_26_ATI                     0x895B
+#define GL_CON_27_ATI                     0x895C
+#define GL_CON_28_ATI                     0x895D
+#define GL_CON_29_ATI                     0x895E
+#define GL_CON_30_ATI                     0x895F
+#define GL_CON_31_ATI                     0x8960
+#define GL_MOV_ATI                        0x8961
+#define GL_ADD_ATI                        0x8963
+#define GL_MUL_ATI                        0x8964
+#define GL_SUB_ATI                        0x8965
+#define GL_DOT3_ATI                       0x8966
+#define GL_DOT4_ATI                       0x8967
+#define GL_MAD_ATI                        0x8968
+#define GL_LERP_ATI                       0x8969
+#define GL_CND_ATI                        0x896A
+#define GL_CND0_ATI                       0x896B
+#define GL_DOT2_ADD_ATI                   0x896C
+#define GL_SECONDARY_INTERPOLATOR_ATI     0x896D
+#define GL_NUM_FRAGMENT_REGISTERS_ATI     0x896E
+#define GL_NUM_FRAGMENT_CONSTANTS_ATI     0x896F
+#define GL_NUM_PASSES_ATI                 0x8970
+#define GL_NUM_INSTRUCTIONS_PER_PASS_ATI  0x8971
+#define GL_NUM_INSTRUCTIONS_TOTAL_ATI     0x8972
+#define GL_NUM_INPUT_INTERPOLATOR_COMPONENTS_ATI 0x8973
+#define GL_NUM_LOOPBACK_COMPONENTS_ATI    0x8974
+#define GL_COLOR_ALPHA_PAIRING_ATI        0x8975
+#define GL_SWIZZLE_STR_ATI                0x8976
+#define GL_SWIZZLE_STQ_ATI                0x8977
+#define GL_SWIZZLE_STR_DR_ATI             0x8978
+#define GL_SWIZZLE_STQ_DQ_ATI             0x8979
+#define GL_SWIZZLE_STRQ_ATI               0x897A
+#define GL_SWIZZLE_STRQ_DQ_ATI            0x897B
+#define GL_RED_BIT_ATI                    0x00000001
+#define GL_GREEN_BIT_ATI                  0x00000002
+#define GL_BLUE_BIT_ATI                   0x00000004
+#define GL_2X_BIT_ATI                     0x00000001
+#define GL_4X_BIT_ATI                     0x00000002
+#define GL_8X_BIT_ATI                     0x00000004
+#define GL_HALF_BIT_ATI                   0x00000008
+#define GL_QUARTER_BIT_ATI                0x00000010
+#define GL_EIGHTH_BIT_ATI                 0x00000020
+#define GL_SATURATE_BIT_ATI               0x00000040
+#define GL_COMP_BIT_ATI                   0x00000002
+#define GL_NEGATE_BIT_ATI                 0x00000004
+#define GL_BIAS_BIT_ATI                   0x00000008
+typedef GLuint (APIENTRYP PFNGLGENFRAGMENTSHADERSATIPROC) (GLuint range);
+typedef void (APIENTRYP PFNGLBINDFRAGMENTSHADERATIPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLDELETEFRAGMENTSHADERATIPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLBEGINFRAGMENTSHADERATIPROC) (void);
+typedef void (APIENTRYP PFNGLENDFRAGMENTSHADERATIPROC) (void);
+typedef void (APIENTRYP PFNGLPASSTEXCOORDATIPROC) (GLuint dst, GLuint coord, GLenum swizzle);
+typedef void (APIENTRYP PFNGLSAMPLEMAPATIPROC) (GLuint dst, GLuint interp, GLenum swizzle);
+typedef void (APIENTRYP PFNGLCOLORFRAGMENTOP1ATIPROC) (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod);
+typedef void (APIENTRYP PFNGLCOLORFRAGMENTOP2ATIPROC) (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod);
+typedef void (APIENTRYP PFNGLCOLORFRAGMENTOP3ATIPROC) (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod, GLuint arg3, GLuint arg3Rep, GLuint arg3Mod);
+typedef void (APIENTRYP PFNGLALPHAFRAGMENTOP1ATIPROC) (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod);
+typedef void (APIENTRYP PFNGLALPHAFRAGMENTOP2ATIPROC) (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod);
+typedef void (APIENTRYP PFNGLALPHAFRAGMENTOP3ATIPROC) (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod, GLuint arg3, GLuint arg3Rep, GLuint arg3Mod);
+typedef void (APIENTRYP PFNGLSETFRAGMENTSHADERCONSTANTATIPROC) (GLuint dst, const GLfloat *value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLuint APIENTRY glGenFragmentShadersATI (GLuint range);
+GLAPI void APIENTRY glBindFragmentShaderATI (GLuint id);
+GLAPI void APIENTRY glDeleteFragmentShaderATI (GLuint id);
+GLAPI void APIENTRY glBeginFragmentShaderATI (void);
+GLAPI void APIENTRY glEndFragmentShaderATI (void);
+GLAPI void APIENTRY glPassTexCoordATI (GLuint dst, GLuint coord, GLenum swizzle);
+GLAPI void APIENTRY glSampleMapATI (GLuint dst, GLuint interp, GLenum swizzle);
+GLAPI void APIENTRY glColorFragmentOp1ATI (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod);
+GLAPI void APIENTRY glColorFragmentOp2ATI (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod);
+GLAPI void APIENTRY glColorFragmentOp3ATI (GLenum op, GLuint dst, GLuint dstMask, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod, GLuint arg3, GLuint arg3Rep, GLuint arg3Mod);
+GLAPI void APIENTRY glAlphaFragmentOp1ATI (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod);
+GLAPI void APIENTRY glAlphaFragmentOp2ATI (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod);
+GLAPI void APIENTRY glAlphaFragmentOp3ATI (GLenum op, GLuint dst, GLuint dstMod, GLuint arg1, GLuint arg1Rep, GLuint arg1Mod, GLuint arg2, GLuint arg2Rep, GLuint arg2Mod, GLuint arg3, GLuint arg3Rep, GLuint arg3Mod);
+GLAPI void APIENTRY glSetFragmentShaderConstantATI (GLuint dst, const GLfloat *value);
+#endif
+#endif /* GL_ATI_fragment_shader */
+
+#ifndef GL_ATI_map_object_buffer
+#define GL_ATI_map_object_buffer 1
+typedef void *(APIENTRYP PFNGLMAPOBJECTBUFFERATIPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLUNMAPOBJECTBUFFERATIPROC) (GLuint buffer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void *APIENTRY glMapObjectBufferATI (GLuint buffer);
+GLAPI void APIENTRY glUnmapObjectBufferATI (GLuint buffer);
+#endif
+#endif /* GL_ATI_map_object_buffer */
+
+#ifndef GL_ATI_meminfo
+#define GL_ATI_meminfo 1
+#define GL_VBO_FREE_MEMORY_ATI            0x87FB
+#define GL_TEXTURE_FREE_MEMORY_ATI        0x87FC
+#define GL_RENDERBUFFER_FREE_MEMORY_ATI   0x87FD
+#endif /* GL_ATI_meminfo */
+
+#ifndef GL_ATI_pixel_format_float
+#define GL_ATI_pixel_format_float 1
+#define GL_RGBA_FLOAT_MODE_ATI            0x8820
+#define GL_COLOR_CLEAR_UNCLAMPED_VALUE_ATI 0x8835
+#endif /* GL_ATI_pixel_format_float */
+
+#ifndef GL_ATI_pn_triangles
+#define GL_ATI_pn_triangles 1
+#define GL_PN_TRIANGLES_ATI               0x87F0
+#define GL_MAX_PN_TRIANGLES_TESSELATION_LEVEL_ATI 0x87F1
+#define GL_PN_TRIANGLES_POINT_MODE_ATI    0x87F2
+#define GL_PN_TRIANGLES_NORMAL_MODE_ATI   0x87F3
+#define GL_PN_TRIANGLES_TESSELATION_LEVEL_ATI 0x87F4
+#define GL_PN_TRIANGLES_POINT_MODE_LINEAR_ATI 0x87F5
+#define GL_PN_TRIANGLES_POINT_MODE_CUBIC_ATI 0x87F6
+#define GL_PN_TRIANGLES_NORMAL_MODE_LINEAR_ATI 0x87F7
+#define GL_PN_TRIANGLES_NORMAL_MODE_QUADRATIC_ATI 0x87F8
+typedef void (APIENTRYP PFNGLPNTRIANGLESIATIPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLPNTRIANGLESFATIPROC) (GLenum pname, GLfloat param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPNTrianglesiATI (GLenum pname, GLint param);
+GLAPI void APIENTRY glPNTrianglesfATI (GLenum pname, GLfloat param);
+#endif
+#endif /* GL_ATI_pn_triangles */
+
+#ifndef GL_ATI_separate_stencil
+#define GL_ATI_separate_stencil 1
+#define GL_STENCIL_BACK_FUNC_ATI          0x8800
+#define GL_STENCIL_BACK_FAIL_ATI          0x8801
+#define GL_STENCIL_BACK_PASS_DEPTH_FAIL_ATI 0x8802
+#define GL_STENCIL_BACK_PASS_DEPTH_PASS_ATI 0x8803
+typedef void (APIENTRYP PFNGLSTENCILOPSEPARATEATIPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
+typedef void (APIENTRYP PFNGLSTENCILFUNCSEPARATEATIPROC) (GLenum frontfunc, GLenum backfunc, GLint ref, GLuint mask);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glStencilOpSeparateATI (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
+GLAPI void APIENTRY glStencilFuncSeparateATI (GLenum frontfunc, GLenum backfunc, GLint ref, GLuint mask);
+#endif
+#endif /* GL_ATI_separate_stencil */
+
+#ifndef GL_ATI_text_fragment_shader
+#define GL_ATI_text_fragment_shader 1
+#define GL_TEXT_FRAGMENT_SHADER_ATI       0x8200
+#endif /* GL_ATI_text_fragment_shader */
+
+#ifndef GL_ATI_texture_env_combine3
+#define GL_ATI_texture_env_combine3 1
+#define GL_MODULATE_ADD_ATI               0x8744
+#define GL_MODULATE_SIGNED_ADD_ATI        0x8745
+#define GL_MODULATE_SUBTRACT_ATI          0x8746
+#endif /* GL_ATI_texture_env_combine3 */
+
+#ifndef GL_ATI_texture_float
+#define GL_ATI_texture_float 1
+#define GL_RGBA_FLOAT32_ATI               0x8814
+#define GL_RGB_FLOAT32_ATI                0x8815
+#define GL_ALPHA_FLOAT32_ATI              0x8816
+#define GL_INTENSITY_FLOAT32_ATI          0x8817
+#define GL_LUMINANCE_FLOAT32_ATI          0x8818
+#define GL_LUMINANCE_ALPHA_FLOAT32_ATI    0x8819
+#define GL_RGBA_FLOAT16_ATI               0x881A
+#define GL_RGB_FLOAT16_ATI                0x881B
+#define GL_ALPHA_FLOAT16_ATI              0x881C
+#define GL_INTENSITY_FLOAT16_ATI          0x881D
+#define GL_LUMINANCE_FLOAT16_ATI          0x881E
+#define GL_LUMINANCE_ALPHA_FLOAT16_ATI    0x881F
+#endif /* GL_ATI_texture_float */
+
+#ifndef GL_ATI_texture_mirror_once
+#define GL_ATI_texture_mirror_once 1
+#define GL_MIRROR_CLAMP_ATI               0x8742
+#define GL_MIRROR_CLAMP_TO_EDGE_ATI       0x8743
+#endif /* GL_ATI_texture_mirror_once */
+
+#ifndef GL_ATI_vertex_array_object
+#define GL_ATI_vertex_array_object 1
+#define GL_STATIC_ATI                     0x8760
+#define GL_DYNAMIC_ATI                    0x8761
+#define GL_PRESERVE_ATI                   0x8762
+#define GL_DISCARD_ATI                    0x8763
+#define GL_OBJECT_BUFFER_SIZE_ATI         0x8764
+#define GL_OBJECT_BUFFER_USAGE_ATI        0x8765
+#define GL_ARRAY_OBJECT_BUFFER_ATI        0x8766
+#define GL_ARRAY_OBJECT_OFFSET_ATI        0x8767
+typedef GLuint (APIENTRYP PFNGLNEWOBJECTBUFFERATIPROC) (GLsizei size, const void *pointer, GLenum usage);
+typedef GLboolean (APIENTRYP PFNGLISOBJECTBUFFERATIPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLUPDATEOBJECTBUFFERATIPROC) (GLuint buffer, GLuint offset, GLsizei size, const void *pointer, GLenum preserve);
+typedef void (APIENTRYP PFNGLGETOBJECTBUFFERFVATIPROC) (GLuint buffer, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETOBJECTBUFFERIVATIPROC) (GLuint buffer, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLFREEOBJECTBUFFERATIPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLARRAYOBJECTATIPROC) (GLenum array, GLint size, GLenum type, GLsizei stride, GLuint buffer, GLuint offset);
+typedef void (APIENTRYP PFNGLGETARRAYOBJECTFVATIPROC) (GLenum array, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETARRAYOBJECTIVATIPROC) (GLenum array, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLVARIANTARRAYOBJECTATIPROC) (GLuint id, GLenum type, GLsizei stride, GLuint buffer, GLuint offset);
+typedef void (APIENTRYP PFNGLGETVARIANTARRAYOBJECTFVATIPROC) (GLuint id, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETVARIANTARRAYOBJECTIVATIPROC) (GLuint id, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLuint APIENTRY glNewObjectBufferATI (GLsizei size, const void *pointer, GLenum usage);
+GLAPI GLboolean APIENTRY glIsObjectBufferATI (GLuint buffer);
+GLAPI void APIENTRY glUpdateObjectBufferATI (GLuint buffer, GLuint offset, GLsizei size, const void *pointer, GLenum preserve);
+GLAPI void APIENTRY glGetObjectBufferfvATI (GLuint buffer, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetObjectBufferivATI (GLuint buffer, GLenum pname, GLint *params);
+GLAPI void APIENTRY glFreeObjectBufferATI (GLuint buffer);
+GLAPI void APIENTRY glArrayObjectATI (GLenum array, GLint size, GLenum type, GLsizei stride, GLuint buffer, GLuint offset);
+GLAPI void APIENTRY glGetArrayObjectfvATI (GLenum array, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetArrayObjectivATI (GLenum array, GLenum pname, GLint *params);
+GLAPI void APIENTRY glVariantArrayObjectATI (GLuint id, GLenum type, GLsizei stride, GLuint buffer, GLuint offset);
+GLAPI void APIENTRY glGetVariantArrayObjectfvATI (GLuint id, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetVariantArrayObjectivATI (GLuint id, GLenum pname, GLint *params);
+#endif
+#endif /* GL_ATI_vertex_array_object */
+
+#ifndef GL_ATI_vertex_attrib_array_object
+#define GL_ATI_vertex_attrib_array_object 1
+typedef void (APIENTRYP PFNGLVERTEXATTRIBARRAYOBJECTATIPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, GLuint buffer, GLuint offset);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBARRAYOBJECTFVATIPROC) (GLuint index, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBARRAYOBJECTIVATIPROC) (GLuint index, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttribArrayObjectATI (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, GLuint buffer, GLuint offset);
+GLAPI void APIENTRY glGetVertexAttribArrayObjectfvATI (GLuint index, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetVertexAttribArrayObjectivATI (GLuint index, GLenum pname, GLint *params);
+#endif
+#endif /* GL_ATI_vertex_attrib_array_object */
+
+#ifndef GL_ATI_vertex_streams
+#define GL_ATI_vertex_streams 1
+#define GL_MAX_VERTEX_STREAMS_ATI         0x876B
+#define GL_VERTEX_STREAM0_ATI             0x876C
+#define GL_VERTEX_STREAM1_ATI             0x876D
+#define GL_VERTEX_STREAM2_ATI             0x876E
+#define GL_VERTEX_STREAM3_ATI             0x876F
+#define GL_VERTEX_STREAM4_ATI             0x8770
+#define GL_VERTEX_STREAM5_ATI             0x8771
+#define GL_VERTEX_STREAM6_ATI             0x8772
+#define GL_VERTEX_STREAM7_ATI             0x8773
+#define GL_VERTEX_SOURCE_ATI              0x8774
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1SATIPROC) (GLenum stream, GLshort x);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1SVATIPROC) (GLenum stream, const GLshort *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1IATIPROC) (GLenum stream, GLint x);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1IVATIPROC) (GLenum stream, const GLint *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1FATIPROC) (GLenum stream, GLfloat x);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1FVATIPROC) (GLenum stream, const GLfloat *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1DATIPROC) (GLenum stream, GLdouble x);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM1DVATIPROC) (GLenum stream, const GLdouble *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2SATIPROC) (GLenum stream, GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2SVATIPROC) (GLenum stream, const GLshort *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2IATIPROC) (GLenum stream, GLint x, GLint y);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2IVATIPROC) (GLenum stream, const GLint *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2FATIPROC) (GLenum stream, GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2FVATIPROC) (GLenum stream, const GLfloat *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2DATIPROC) (GLenum stream, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM2DVATIPROC) (GLenum stream, const GLdouble *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3SATIPROC) (GLenum stream, GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3SVATIPROC) (GLenum stream, const GLshort *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3IATIPROC) (GLenum stream, GLint x, GLint y, GLint z);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3IVATIPROC) (GLenum stream, const GLint *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3FATIPROC) (GLenum stream, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3FVATIPROC) (GLenum stream, const GLfloat *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3DATIPROC) (GLenum stream, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM3DVATIPROC) (GLenum stream, const GLdouble *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4SATIPROC) (GLenum stream, GLshort x, GLshort y, GLshort z, GLshort w);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4SVATIPROC) (GLenum stream, const GLshort *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4IATIPROC) (GLenum stream, GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4IVATIPROC) (GLenum stream, const GLint *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4FATIPROC) (GLenum stream, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4FVATIPROC) (GLenum stream, const GLfloat *coords);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4DATIPROC) (GLenum stream, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLVERTEXSTREAM4DVATIPROC) (GLenum stream, const GLdouble *coords);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3BATIPROC) (GLenum stream, GLbyte nx, GLbyte ny, GLbyte nz);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3BVATIPROC) (GLenum stream, const GLbyte *coords);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3SATIPROC) (GLenum stream, GLshort nx, GLshort ny, GLshort nz);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3SVATIPROC) (GLenum stream, const GLshort *coords);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3IATIPROC) (GLenum stream, GLint nx, GLint ny, GLint nz);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3IVATIPROC) (GLenum stream, const GLint *coords);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3FATIPROC) (GLenum stream, GLfloat nx, GLfloat ny, GLfloat nz);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3FVATIPROC) (GLenum stream, const GLfloat *coords);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3DATIPROC) (GLenum stream, GLdouble nx, GLdouble ny, GLdouble nz);
+typedef void (APIENTRYP PFNGLNORMALSTREAM3DVATIPROC) (GLenum stream, const GLdouble *coords);
+typedef void (APIENTRYP PFNGLCLIENTACTIVEVERTEXSTREAMATIPROC) (GLenum stream);
+typedef void (APIENTRYP PFNGLVERTEXBLENDENVIATIPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLVERTEXBLENDENVFATIPROC) (GLenum pname, GLfloat param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexStream1sATI (GLenum stream, GLshort x);
+GLAPI void APIENTRY glVertexStream1svATI (GLenum stream, const GLshort *coords);
+GLAPI void APIENTRY glVertexStream1iATI (GLenum stream, GLint x);
+GLAPI void APIENTRY glVertexStream1ivATI (GLenum stream, const GLint *coords);
+GLAPI void APIENTRY glVertexStream1fATI (GLenum stream, GLfloat x);
+GLAPI void APIENTRY glVertexStream1fvATI (GLenum stream, const GLfloat *coords);
+GLAPI void APIENTRY glVertexStream1dATI (GLenum stream, GLdouble x);
+GLAPI void APIENTRY glVertexStream1dvATI (GLenum stream, const GLdouble *coords);
+GLAPI void APIENTRY glVertexStream2sATI (GLenum stream, GLshort x, GLshort y);
+GLAPI void APIENTRY glVertexStream2svATI (GLenum stream, const GLshort *coords);
+GLAPI void APIENTRY glVertexStream2iATI (GLenum stream, GLint x, GLint y);
+GLAPI void APIENTRY glVertexStream2ivATI (GLenum stream, const GLint *coords);
+GLAPI void APIENTRY glVertexStream2fATI (GLenum stream, GLfloat x, GLfloat y);
+GLAPI void APIENTRY glVertexStream2fvATI (GLenum stream, const GLfloat *coords);
+GLAPI void APIENTRY glVertexStream2dATI (GLenum stream, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glVertexStream2dvATI (GLenum stream, const GLdouble *coords);
+GLAPI void APIENTRY glVertexStream3sATI (GLenum stream, GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glVertexStream3svATI (GLenum stream, const GLshort *coords);
+GLAPI void APIENTRY glVertexStream3iATI (GLenum stream, GLint x, GLint y, GLint z);
+GLAPI void APIENTRY glVertexStream3ivATI (GLenum stream, const GLint *coords);
+GLAPI void APIENTRY glVertexStream3fATI (GLenum stream, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glVertexStream3fvATI (GLenum stream, const GLfloat *coords);
+GLAPI void APIENTRY glVertexStream3dATI (GLenum stream, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glVertexStream3dvATI (GLenum stream, const GLdouble *coords);
+GLAPI void APIENTRY glVertexStream4sATI (GLenum stream, GLshort x, GLshort y, GLshort z, GLshort w);
+GLAPI void APIENTRY glVertexStream4svATI (GLenum stream, const GLshort *coords);
+GLAPI void APIENTRY glVertexStream4iATI (GLenum stream, GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glVertexStream4ivATI (GLenum stream, const GLint *coords);
+GLAPI void APIENTRY glVertexStream4fATI (GLenum stream, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glVertexStream4fvATI (GLenum stream, const GLfloat *coords);
+GLAPI void APIENTRY glVertexStream4dATI (GLenum stream, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glVertexStream4dvATI (GLenum stream, const GLdouble *coords);
+GLAPI void APIENTRY glNormalStream3bATI (GLenum stream, GLbyte nx, GLbyte ny, GLbyte nz);
+GLAPI void APIENTRY glNormalStream3bvATI (GLenum stream, const GLbyte *coords);
+GLAPI void APIENTRY glNormalStream3sATI (GLenum stream, GLshort nx, GLshort ny, GLshort nz);
+GLAPI void APIENTRY glNormalStream3svATI (GLenum stream, const GLshort *coords);
+GLAPI void APIENTRY glNormalStream3iATI (GLenum stream, GLint nx, GLint ny, GLint nz);
+GLAPI void APIENTRY glNormalStream3ivATI (GLenum stream, const GLint *coords);
+GLAPI void APIENTRY glNormalStream3fATI (GLenum stream, GLfloat nx, GLfloat ny, GLfloat nz);
+GLAPI void APIENTRY glNormalStream3fvATI (GLenum stream, const GLfloat *coords);
+GLAPI void APIENTRY glNormalStream3dATI (GLenum stream, GLdouble nx, GLdouble ny, GLdouble nz);
+GLAPI void APIENTRY glNormalStream3dvATI (GLenum stream, const GLdouble *coords);
+GLAPI void APIENTRY glClientActiveVertexStreamATI (GLenum stream);
+GLAPI void APIENTRY glVertexBlendEnviATI (GLenum pname, GLint param);
+GLAPI void APIENTRY glVertexBlendEnvfATI (GLenum pname, GLfloat param);
+#endif
+#endif /* GL_ATI_vertex_streams */
+
+#ifndef GL_EXT_422_pixels
+#define GL_EXT_422_pixels 1
+#define GL_422_EXT                        0x80CC
+#define GL_422_REV_EXT                    0x80CD
+#define GL_422_AVERAGE_EXT                0x80CE
+#define GL_422_REV_AVERAGE_EXT            0x80CF
+#endif /* GL_EXT_422_pixels */
+
+#ifndef GL_EXT_abgr
+#define GL_EXT_abgr 1
+#define GL_ABGR_EXT                       0x8000
+#endif /* GL_EXT_abgr */
+
+#ifndef GL_EXT_bgra
+#define GL_EXT_bgra 1
+#define GL_BGR_EXT                        0x80E0
+#define GL_BGRA_EXT                       0x80E1
+#endif /* GL_EXT_bgra */
+
+#ifndef GL_EXT_bindable_uniform
+#define GL_EXT_bindable_uniform 1
+#define GL_MAX_VERTEX_BINDABLE_UNIFORMS_EXT 0x8DE2
+#define GL_MAX_FRAGMENT_BINDABLE_UNIFORMS_EXT 0x8DE3
+#define GL_MAX_GEOMETRY_BINDABLE_UNIFORMS_EXT 0x8DE4
+#define GL_MAX_BINDABLE_UNIFORM_SIZE_EXT  0x8DED
+#define GL_UNIFORM_BUFFER_EXT             0x8DEE
+#define GL_UNIFORM_BUFFER_BINDING_EXT     0x8DEF
+typedef void (APIENTRYP PFNGLUNIFORMBUFFEREXTPROC) (GLuint program, GLint location, GLuint buffer);
+typedef GLint (APIENTRYP PFNGLGETUNIFORMBUFFERSIZEEXTPROC) (GLuint program, GLint location);
+typedef GLintptr (APIENTRYP PFNGLGETUNIFORMOFFSETEXTPROC) (GLuint program, GLint location);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glUniformBufferEXT (GLuint program, GLint location, GLuint buffer);
+GLAPI GLint APIENTRY glGetUniformBufferSizeEXT (GLuint program, GLint location);
+GLAPI GLintptr APIENTRY glGetUniformOffsetEXT (GLuint program, GLint location);
+#endif
+#endif /* GL_EXT_bindable_uniform */
+
+#ifndef GL_EXT_blend_color
+#define GL_EXT_blend_color 1
+#define GL_CONSTANT_COLOR_EXT             0x8001
+#define GL_ONE_MINUS_CONSTANT_COLOR_EXT   0x8002
+#define GL_CONSTANT_ALPHA_EXT             0x8003
+#define GL_ONE_MINUS_CONSTANT_ALPHA_EXT   0x8004
+#define GL_BLEND_COLOR_EXT                0x8005
+typedef void (APIENTRYP PFNGLBLENDCOLOREXTPROC) (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendColorEXT (GLfloat red, GLfloat green, GLfloat blue, GLfloat alpha);
+#endif
+#endif /* GL_EXT_blend_color */
+
+#ifndef GL_EXT_blend_equation_separate
+#define GL_EXT_blend_equation_separate 1
+#define GL_BLEND_EQUATION_RGB_EXT         0x8009
+#define GL_BLEND_EQUATION_ALPHA_EXT       0x883D
+typedef void (APIENTRYP PFNGLBLENDEQUATIONSEPARATEEXTPROC) (GLenum modeRGB, GLenum modeAlpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendEquationSeparateEXT (GLenum modeRGB, GLenum modeAlpha);
+#endif
+#endif /* GL_EXT_blend_equation_separate */
+
+#ifndef GL_EXT_blend_func_separate
+#define GL_EXT_blend_func_separate 1
+#define GL_BLEND_DST_RGB_EXT              0x80C8
+#define GL_BLEND_SRC_RGB_EXT              0x80C9
+#define GL_BLEND_DST_ALPHA_EXT            0x80CA
+#define GL_BLEND_SRC_ALPHA_EXT            0x80CB
+typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEEXTPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendFuncSeparateEXT (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
+#endif
+#endif /* GL_EXT_blend_func_separate */
+
+#ifndef GL_EXT_blend_logic_op
+#define GL_EXT_blend_logic_op 1
+#endif /* GL_EXT_blend_logic_op */
+
+#ifndef GL_EXT_blend_minmax
+#define GL_EXT_blend_minmax 1
+#define GL_MIN_EXT                        0x8007
+#define GL_MAX_EXT                        0x8008
+#define GL_FUNC_ADD_EXT                   0x8006
+#define GL_BLEND_EQUATION_EXT             0x8009
+typedef void (APIENTRYP PFNGLBLENDEQUATIONEXTPROC) (GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendEquationEXT (GLenum mode);
+#endif
+#endif /* GL_EXT_blend_minmax */
+
+#ifndef GL_EXT_blend_subtract
+#define GL_EXT_blend_subtract 1
+#define GL_FUNC_SUBTRACT_EXT              0x800A
+#define GL_FUNC_REVERSE_SUBTRACT_EXT      0x800B
+#endif /* GL_EXT_blend_subtract */
+
+#ifndef GL_EXT_clip_volume_hint
+#define GL_EXT_clip_volume_hint 1
+#define GL_CLIP_VOLUME_CLIPPING_HINT_EXT  0x80F0
+#endif /* GL_EXT_clip_volume_hint */
+
+#ifndef GL_EXT_cmyka
+#define GL_EXT_cmyka 1
+#define GL_CMYK_EXT                       0x800C
+#define GL_CMYKA_EXT                      0x800D
+#define GL_PACK_CMYK_HINT_EXT             0x800E
+#define GL_UNPACK_CMYK_HINT_EXT           0x800F
+#endif /* GL_EXT_cmyka */
+
+#ifndef GL_EXT_color_subtable
+#define GL_EXT_color_subtable 1
+typedef void (APIENTRYP PFNGLCOLORSUBTABLEEXTPROC) (GLenum target, GLsizei start, GLsizei count, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLCOPYCOLORSUBTABLEEXTPROC) (GLenum target, GLsizei start, GLint x, GLint y, GLsizei width);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorSubTableEXT (GLenum target, GLsizei start, GLsizei count, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glCopyColorSubTableEXT (GLenum target, GLsizei start, GLint x, GLint y, GLsizei width);
+#endif
+#endif /* GL_EXT_color_subtable */
+
+#ifndef GL_EXT_compiled_vertex_array
+#define GL_EXT_compiled_vertex_array 1
+#define GL_ARRAY_ELEMENT_LOCK_FIRST_EXT   0x81A8
+#define GL_ARRAY_ELEMENT_LOCK_COUNT_EXT   0x81A9
+typedef void (APIENTRYP PFNGLLOCKARRAYSEXTPROC) (GLint first, GLsizei count);
+typedef void (APIENTRYP PFNGLUNLOCKARRAYSEXTPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glLockArraysEXT (GLint first, GLsizei count);
+GLAPI void APIENTRY glUnlockArraysEXT (void);
+#endif
+#endif /* GL_EXT_compiled_vertex_array */
+
+#ifndef GL_EXT_convolution
+#define GL_EXT_convolution 1
+#define GL_CONVOLUTION_1D_EXT             0x8010
+#define GL_CONVOLUTION_2D_EXT             0x8011
+#define GL_SEPARABLE_2D_EXT               0x8012
+#define GL_CONVOLUTION_BORDER_MODE_EXT    0x8013
+#define GL_CONVOLUTION_FILTER_SCALE_EXT   0x8014
+#define GL_CONVOLUTION_FILTER_BIAS_EXT    0x8015
+#define GL_REDUCE_EXT                     0x8016
+#define GL_CONVOLUTION_FORMAT_EXT         0x8017
+#define GL_CONVOLUTION_WIDTH_EXT          0x8018
+#define GL_CONVOLUTION_HEIGHT_EXT         0x8019
+#define GL_MAX_CONVOLUTION_WIDTH_EXT      0x801A
+#define GL_MAX_CONVOLUTION_HEIGHT_EXT     0x801B
+#define GL_POST_CONVOLUTION_RED_SCALE_EXT 0x801C
+#define GL_POST_CONVOLUTION_GREEN_SCALE_EXT 0x801D
+#define GL_POST_CONVOLUTION_BLUE_SCALE_EXT 0x801E
+#define GL_POST_CONVOLUTION_ALPHA_SCALE_EXT 0x801F
+#define GL_POST_CONVOLUTION_RED_BIAS_EXT  0x8020
+#define GL_POST_CONVOLUTION_GREEN_BIAS_EXT 0x8021
+#define GL_POST_CONVOLUTION_BLUE_BIAS_EXT 0x8022
+#define GL_POST_CONVOLUTION_ALPHA_BIAS_EXT 0x8023
+typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER1DEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *image);
+typedef void (APIENTRYP PFNGLCONVOLUTIONFILTER2DEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *image);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFEXTPROC) (GLenum target, GLenum pname, GLfloat params);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERFVEXTPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIEXTPROC) (GLenum target, GLenum pname, GLint params);
+typedef void (APIENTRYP PFNGLCONVOLUTIONPARAMETERIVEXTPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER1DEXTPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLCOPYCONVOLUTIONFILTER2DEXTPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONFILTEREXTPROC) (GLenum target, GLenum format, GLenum type, void *image);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCONVOLUTIONPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETSEPARABLEFILTEREXTPROC) (GLenum target, GLenum format, GLenum type, void *row, void *column, void *span);
+typedef void (APIENTRYP PFNGLSEPARABLEFILTER2DEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *row, const void *column);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glConvolutionFilter1DEXT (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *image);
+GLAPI void APIENTRY glConvolutionFilter2DEXT (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *image);
+GLAPI void APIENTRY glConvolutionParameterfEXT (GLenum target, GLenum pname, GLfloat params);
+GLAPI void APIENTRY glConvolutionParameterfvEXT (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glConvolutionParameteriEXT (GLenum target, GLenum pname, GLint params);
+GLAPI void APIENTRY glConvolutionParameterivEXT (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glCopyConvolutionFilter1DEXT (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glCopyConvolutionFilter2DEXT (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetConvolutionFilterEXT (GLenum target, GLenum format, GLenum type, void *image);
+GLAPI void APIENTRY glGetConvolutionParameterfvEXT (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetConvolutionParameterivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetSeparableFilterEXT (GLenum target, GLenum format, GLenum type, void *row, void *column, void *span);
+GLAPI void APIENTRY glSeparableFilter2DEXT (GLenum target, GLenum internalformat, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *row, const void *column);
+#endif
+#endif /* GL_EXT_convolution */
+
+#ifndef GL_EXT_coordinate_frame
+#define GL_EXT_coordinate_frame 1
+#define GL_TANGENT_ARRAY_EXT              0x8439
+#define GL_BINORMAL_ARRAY_EXT             0x843A
+#define GL_CURRENT_TANGENT_EXT            0x843B
+#define GL_CURRENT_BINORMAL_EXT           0x843C
+#define GL_TANGENT_ARRAY_TYPE_EXT         0x843E
+#define GL_TANGENT_ARRAY_STRIDE_EXT       0x843F
+#define GL_BINORMAL_ARRAY_TYPE_EXT        0x8440
+#define GL_BINORMAL_ARRAY_STRIDE_EXT      0x8441
+#define GL_TANGENT_ARRAY_POINTER_EXT      0x8442
+#define GL_BINORMAL_ARRAY_POINTER_EXT     0x8443
+#define GL_MAP1_TANGENT_EXT               0x8444
+#define GL_MAP2_TANGENT_EXT               0x8445
+#define GL_MAP1_BINORMAL_EXT              0x8446
+#define GL_MAP2_BINORMAL_EXT              0x8447
+typedef void (APIENTRYP PFNGLTANGENT3BEXTPROC) (GLbyte tx, GLbyte ty, GLbyte tz);
+typedef void (APIENTRYP PFNGLTANGENT3BVEXTPROC) (const GLbyte *v);
+typedef void (APIENTRYP PFNGLTANGENT3DEXTPROC) (GLdouble tx, GLdouble ty, GLdouble tz);
+typedef void (APIENTRYP PFNGLTANGENT3DVEXTPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLTANGENT3FEXTPROC) (GLfloat tx, GLfloat ty, GLfloat tz);
+typedef void (APIENTRYP PFNGLTANGENT3FVEXTPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLTANGENT3IEXTPROC) (GLint tx, GLint ty, GLint tz);
+typedef void (APIENTRYP PFNGLTANGENT3IVEXTPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLTANGENT3SEXTPROC) (GLshort tx, GLshort ty, GLshort tz);
+typedef void (APIENTRYP PFNGLTANGENT3SVEXTPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLBINORMAL3BEXTPROC) (GLbyte bx, GLbyte by, GLbyte bz);
+typedef void (APIENTRYP PFNGLBINORMAL3BVEXTPROC) (const GLbyte *v);
+typedef void (APIENTRYP PFNGLBINORMAL3DEXTPROC) (GLdouble bx, GLdouble by, GLdouble bz);
+typedef void (APIENTRYP PFNGLBINORMAL3DVEXTPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLBINORMAL3FEXTPROC) (GLfloat bx, GLfloat by, GLfloat bz);
+typedef void (APIENTRYP PFNGLBINORMAL3FVEXTPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLBINORMAL3IEXTPROC) (GLint bx, GLint by, GLint bz);
+typedef void (APIENTRYP PFNGLBINORMAL3IVEXTPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLBINORMAL3SEXTPROC) (GLshort bx, GLshort by, GLshort bz);
+typedef void (APIENTRYP PFNGLBINORMAL3SVEXTPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLTANGENTPOINTEREXTPROC) (GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLBINORMALPOINTEREXTPROC) (GLenum type, GLsizei stride, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTangent3bEXT (GLbyte tx, GLbyte ty, GLbyte tz);
+GLAPI void APIENTRY glTangent3bvEXT (const GLbyte *v);
+GLAPI void APIENTRY glTangent3dEXT (GLdouble tx, GLdouble ty, GLdouble tz);
+GLAPI void APIENTRY glTangent3dvEXT (const GLdouble *v);
+GLAPI void APIENTRY glTangent3fEXT (GLfloat tx, GLfloat ty, GLfloat tz);
+GLAPI void APIENTRY glTangent3fvEXT (const GLfloat *v);
+GLAPI void APIENTRY glTangent3iEXT (GLint tx, GLint ty, GLint tz);
+GLAPI void APIENTRY glTangent3ivEXT (const GLint *v);
+GLAPI void APIENTRY glTangent3sEXT (GLshort tx, GLshort ty, GLshort tz);
+GLAPI void APIENTRY glTangent3svEXT (const GLshort *v);
+GLAPI void APIENTRY glBinormal3bEXT (GLbyte bx, GLbyte by, GLbyte bz);
+GLAPI void APIENTRY glBinormal3bvEXT (const GLbyte *v);
+GLAPI void APIENTRY glBinormal3dEXT (GLdouble bx, GLdouble by, GLdouble bz);
+GLAPI void APIENTRY glBinormal3dvEXT (const GLdouble *v);
+GLAPI void APIENTRY glBinormal3fEXT (GLfloat bx, GLfloat by, GLfloat bz);
+GLAPI void APIENTRY glBinormal3fvEXT (const GLfloat *v);
+GLAPI void APIENTRY glBinormal3iEXT (GLint bx, GLint by, GLint bz);
+GLAPI void APIENTRY glBinormal3ivEXT (const GLint *v);
+GLAPI void APIENTRY glBinormal3sEXT (GLshort bx, GLshort by, GLshort bz);
+GLAPI void APIENTRY glBinormal3svEXT (const GLshort *v);
+GLAPI void APIENTRY glTangentPointerEXT (GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glBinormalPointerEXT (GLenum type, GLsizei stride, const void *pointer);
+#endif
+#endif /* GL_EXT_coordinate_frame */
+
+#ifndef GL_EXT_copy_texture
+#define GL_EXT_copy_texture 1
+typedef void (APIENTRYP PFNGLCOPYTEXIMAGE1DEXTPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border);
+typedef void (APIENTRYP PFNGLCOPYTEXIMAGE2DEXTPROC) (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
+typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE1DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE2DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLCOPYTEXSUBIMAGE3DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCopyTexImage1DEXT (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border);
+GLAPI void APIENTRY glCopyTexImage2DEXT (GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
+GLAPI void APIENTRY glCopyTexSubImage1DEXT (GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glCopyTexSubImage2DEXT (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glCopyTexSubImage3DEXT (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+#endif
+#endif /* GL_EXT_copy_texture */
+
+#ifndef GL_EXT_cull_vertex
+#define GL_EXT_cull_vertex 1
+#define GL_CULL_VERTEX_EXT                0x81AA
+#define GL_CULL_VERTEX_EYE_POSITION_EXT   0x81AB
+#define GL_CULL_VERTEX_OBJECT_POSITION_EXT 0x81AC
+typedef void (APIENTRYP PFNGLCULLPARAMETERDVEXTPROC) (GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLCULLPARAMETERFVEXTPROC) (GLenum pname, GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCullParameterdvEXT (GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glCullParameterfvEXT (GLenum pname, GLfloat *params);
+#endif
+#endif /* GL_EXT_cull_vertex */
+
+#ifndef GL_EXT_debug_label
+#define GL_EXT_debug_label 1
+#define GL_PROGRAM_PIPELINE_OBJECT_EXT    0x8A4F
+#define GL_PROGRAM_OBJECT_EXT             0x8B40
+#define GL_SHADER_OBJECT_EXT              0x8B48
+#define GL_BUFFER_OBJECT_EXT              0x9151
+#define GL_QUERY_OBJECT_EXT               0x9153
+#define GL_VERTEX_ARRAY_OBJECT_EXT        0x9154
+typedef void (APIENTRYP PFNGLLABELOBJECTEXTPROC) (GLenum type, GLuint object, GLsizei length, const GLchar *label);
+typedef void (APIENTRYP PFNGLGETOBJECTLABELEXTPROC) (GLenum type, GLuint object, GLsizei bufSize, GLsizei *length, GLchar *label);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glLabelObjectEXT (GLenum type, GLuint object, GLsizei length, const GLchar *label);
+GLAPI void APIENTRY glGetObjectLabelEXT (GLenum type, GLuint object, GLsizei bufSize, GLsizei *length, GLchar *label);
+#endif
+#endif /* GL_EXT_debug_label */
+
+#ifndef GL_EXT_debug_marker
+#define GL_EXT_debug_marker 1
+typedef void (APIENTRYP PFNGLINSERTEVENTMARKEREXTPROC) (GLsizei length, const GLchar *marker);
+typedef void (APIENTRYP PFNGLPUSHGROUPMARKEREXTPROC) (GLsizei length, const GLchar *marker);
+typedef void (APIENTRYP PFNGLPOPGROUPMARKEREXTPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glInsertEventMarkerEXT (GLsizei length, const GLchar *marker);
+GLAPI void APIENTRY glPushGroupMarkerEXT (GLsizei length, const GLchar *marker);
+GLAPI void APIENTRY glPopGroupMarkerEXT (void);
+#endif
+#endif /* GL_EXT_debug_marker */
+
+#ifndef GL_EXT_depth_bounds_test
+#define GL_EXT_depth_bounds_test 1
+#define GL_DEPTH_BOUNDS_TEST_EXT          0x8890
+#define GL_DEPTH_BOUNDS_EXT               0x8891
+typedef void (APIENTRYP PFNGLDEPTHBOUNDSEXTPROC) (GLclampd zmin, GLclampd zmax);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDepthBoundsEXT (GLclampd zmin, GLclampd zmax);
+#endif
+#endif /* GL_EXT_depth_bounds_test */
+
+#ifndef GL_EXT_direct_state_access
+#define GL_EXT_direct_state_access 1
+#define GL_PROGRAM_MATRIX_EXT             0x8E2D
+#define GL_TRANSPOSE_PROGRAM_MATRIX_EXT   0x8E2E
+#define GL_PROGRAM_MATRIX_STACK_DEPTH_EXT 0x8E2F
+typedef void (APIENTRYP PFNGLMATRIXLOADFEXTPROC) (GLenum mode, const GLfloat *m);
+typedef void (APIENTRYP PFNGLMATRIXLOADDEXTPROC) (GLenum mode, const GLdouble *m);
+typedef void (APIENTRYP PFNGLMATRIXMULTFEXTPROC) (GLenum mode, const GLfloat *m);
+typedef void (APIENTRYP PFNGLMATRIXMULTDEXTPROC) (GLenum mode, const GLdouble *m);
+typedef void (APIENTRYP PFNGLMATRIXLOADIDENTITYEXTPROC) (GLenum mode);
+typedef void (APIENTRYP PFNGLMATRIXROTATEFEXTPROC) (GLenum mode, GLfloat angle, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLMATRIXROTATEDEXTPROC) (GLenum mode, GLdouble angle, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLMATRIXSCALEFEXTPROC) (GLenum mode, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLMATRIXSCALEDEXTPROC) (GLenum mode, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLMATRIXTRANSLATEFEXTPROC) (GLenum mode, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLMATRIXTRANSLATEDEXTPROC) (GLenum mode, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLMATRIXFRUSTUMEXTPROC) (GLenum mode, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble zNear, GLdouble zFar);
+typedef void (APIENTRYP PFNGLMATRIXORTHOEXTPROC) (GLenum mode, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble zNear, GLdouble zFar);
+typedef void (APIENTRYP PFNGLMATRIXPOPEXTPROC) (GLenum mode);
+typedef void (APIENTRYP PFNGLMATRIXPUSHEXTPROC) (GLenum mode);
+typedef void (APIENTRYP PFNGLCLIENTATTRIBDEFAULTEXTPROC) (GLbitfield mask);
+typedef void (APIENTRYP PFNGLPUSHCLIENTATTRIBDEFAULTEXTPROC) (GLbitfield mask);
+typedef void (APIENTRYP PFNGLTEXTUREPARAMETERFEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLTEXTUREPARAMETERFVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint internalformat, GLsizei width, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXTURESUBIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXTURESUBIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLCOPYTEXTUREIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border);
+typedef void (APIENTRYP PFNGLCOPYTEXTUREIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
+typedef void (APIENTRYP PFNGLCOPYTEXTURESUBIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLCOPYTEXTURESUBIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETTEXTUREIMAGEEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum format, GLenum type, void *pixels);
+typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERFVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETTEXTURELEVELPARAMETERFVEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETTEXTURELEVELPARAMETERIVEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXTURESUBIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLCOPYTEXTURESUBIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLBINDMULTITEXTUREEXTPROC) (GLenum texunit, GLenum target, GLuint texture);
+typedef void (APIENTRYP PFNGLMULTITEXCOORDPOINTEREXTPROC) (GLenum texunit, GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLMULTITEXENVFEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLMULTITEXENVFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLMULTITEXENVIEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLMULTITEXENVIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLMULTITEXGENDEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLdouble param);
+typedef void (APIENTRYP PFNGLMULTITEXGENDVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, const GLdouble *params);
+typedef void (APIENTRYP PFNGLMULTITEXGENFEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLMULTITEXGENFVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLMULTITEXGENIEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLMULTITEXGENIVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXENVFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXENVIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXGENDVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXGENFVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXGENIVEXTPROC) (GLenum texunit, GLenum coord, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLMULTITEXPARAMETERFEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLMULTITEXPARAMETERFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLMULTITEXIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint internalformat, GLsizei width, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLMULTITEXIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLMULTITEXSUBIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLMULTITEXSUBIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLCOPYMULTITEXIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border);
+typedef void (APIENTRYP PFNGLCOPYMULTITEXIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
+typedef void (APIENTRYP PFNGLCOPYMULTITEXSUBIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLCOPYMULTITEXSUBIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETMULTITEXIMAGEEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum format, GLenum type, void *pixels);
+typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERFVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXLEVELPARAMETERFVEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXLEVELPARAMETERIVEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLMULTITEXIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLMULTITEXSUBIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLCOPYMULTITEXSUBIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLENABLECLIENTSTATEINDEXEDEXTPROC) (GLenum array, GLuint index);
+typedef void (APIENTRYP PFNGLDISABLECLIENTSTATEINDEXEDEXTPROC) (GLenum array, GLuint index);
+typedef void (APIENTRYP PFNGLGETFLOATINDEXEDVEXTPROC) (GLenum target, GLuint index, GLfloat *data);
+typedef void (APIENTRYP PFNGLGETDOUBLEINDEXEDVEXTPROC) (GLenum target, GLuint index, GLdouble *data);
+typedef void (APIENTRYP PFNGLGETPOINTERINDEXEDVEXTPROC) (GLenum target, GLuint index, void **data);
+typedef void (APIENTRYP PFNGLENABLEINDEXEDEXTPROC) (GLenum target, GLuint index);
+typedef void (APIENTRYP PFNGLDISABLEINDEXEDEXTPROC) (GLenum target, GLuint index);
+typedef GLboolean (APIENTRYP PFNGLISENABLEDINDEXEDEXTPROC) (GLenum target, GLuint index);
+typedef void (APIENTRYP PFNGLGETINTEGERINDEXEDVEXTPROC) (GLenum target, GLuint index, GLint *data);
+typedef void (APIENTRYP PFNGLGETBOOLEANINDEXEDVEXTPROC) (GLenum target, GLuint index, GLboolean *data);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTUREIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTUREIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTUREIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTURESUBIMAGE3DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTURESUBIMAGE2DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDTEXTURESUBIMAGE1DEXTPROC) (GLuint texture, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLGETCOMPRESSEDTEXTUREIMAGEEXTPROC) (GLuint texture, GLenum target, GLint lod, void *img);
+typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXSUBIMAGE3DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXSUBIMAGE2DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLCOMPRESSEDMULTITEXSUBIMAGE1DEXTPROC) (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *bits);
+typedef void (APIENTRYP PFNGLGETCOMPRESSEDMULTITEXIMAGEEXTPROC) (GLenum texunit, GLenum target, GLint lod, void *img);
+typedef void (APIENTRYP PFNGLMATRIXLOADTRANSPOSEFEXTPROC) (GLenum mode, const GLfloat *m);
+typedef void (APIENTRYP PFNGLMATRIXLOADTRANSPOSEDEXTPROC) (GLenum mode, const GLdouble *m);
+typedef void (APIENTRYP PFNGLMATRIXMULTTRANSPOSEFEXTPROC) (GLenum mode, const GLfloat *m);
+typedef void (APIENTRYP PFNGLMATRIXMULTTRANSPOSEDEXTPROC) (GLenum mode, const GLdouble *m);
+typedef void (APIENTRYP PFNGLNAMEDBUFFERDATAEXTPROC) (GLuint buffer, GLsizeiptr size, const void *data, GLenum usage);
+typedef void (APIENTRYP PFNGLNAMEDBUFFERSUBDATAEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, const void *data);
+typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFEREXTPROC) (GLuint buffer, GLenum access);
+typedef GLboolean (APIENTRYP PFNGLUNMAPNAMEDBUFFEREXTPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERIVEXTPROC) (GLuint buffer, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPOINTERVEXTPROC) (GLuint buffer, GLenum pname, void **params);
+typedef void (APIENTRYP PFNGLGETNAMEDBUFFERSUBDATAEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, void *data);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1FEXTPROC) (GLuint program, GLint location, GLfloat v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2FEXTPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3FEXTPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4FEXTPROC) (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1IEXTPROC) (GLuint program, GLint location, GLint v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2IEXTPROC) (GLuint program, GLint location, GLint v0, GLint v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3IEXTPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4IEXTPROC) (GLuint program, GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4FVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4IVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X3FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X2FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X4FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X2FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X4FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X3FVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+typedef void (APIENTRYP PFNGLTEXTUREBUFFEREXTPROC) (GLuint texture, GLenum target, GLenum internalformat, GLuint buffer);
+typedef void (APIENTRYP PFNGLMULTITEXBUFFEREXTPROC) (GLenum texunit, GLenum target, GLenum internalformat, GLuint buffer);
+typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLTEXTUREPARAMETERIUIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERIIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETTEXTUREPARAMETERIUIVEXTPROC) (GLuint texture, GLenum target, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLMULTITEXPARAMETERIUIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERIIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMULTITEXPARAMETERIUIVEXTPROC) (GLenum texunit, GLenum target, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UIEXTPROC) (GLuint program, GLint location, GLuint v0);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UIEXTPROC) (GLuint program, GLint location, GLuint v0, GLuint v1);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UIEXTPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UIEXTPROC) (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UIVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERS4FVEXTPROC) (GLuint program, GLenum target, GLuint index, GLsizei count, const GLfloat *params);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4IEXTPROC) (GLuint program, GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4IVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLint *params);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERSI4IVEXTPROC) (GLuint program, GLenum target, GLuint index, GLsizei count, const GLint *params);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4UIEXTPROC) (GLuint program, GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERI4UIVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLuint *params);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETERSI4UIVEXTPROC) (GLuint program, GLenum target, GLuint index, GLsizei count, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERIIVEXTPROC) (GLuint program, GLenum target, GLuint index, GLint *params);
+typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERIUIVEXTPROC) (GLuint program, GLenum target, GLuint index, GLuint *params);
+typedef void (APIENTRYP PFNGLENABLECLIENTSTATEIEXTPROC) (GLenum array, GLuint index);
+typedef void (APIENTRYP PFNGLDISABLECLIENTSTATEIEXTPROC) (GLenum array, GLuint index);
+typedef void (APIENTRYP PFNGLGETFLOATI_VEXTPROC) (GLenum pname, GLuint index, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETDOUBLEI_VEXTPROC) (GLenum pname, GLuint index, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETPOINTERI_VEXTPROC) (GLenum pname, GLuint index, void **params);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMSTRINGEXTPROC) (GLuint program, GLenum target, GLenum format, GLsizei len, const void *string);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4DEXTPROC) (GLuint program, GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4DVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLdouble *params);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4FEXTPROC) (GLuint program, GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLNAMEDPROGRAMLOCALPARAMETER4FVEXTPROC) (GLuint program, GLenum target, GLuint index, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERDVEXTPROC) (GLuint program, GLenum target, GLuint index, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMLOCALPARAMETERFVEXTPROC) (GLuint program, GLenum target, GLuint index, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMIVEXTPROC) (GLuint program, GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETNAMEDPROGRAMSTRINGEXTPROC) (GLuint program, GLenum target, GLenum pname, void *string);
+typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEEXTPROC) (GLuint renderbuffer, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETNAMEDRENDERBUFFERPARAMETERIVEXTPROC) (GLuint renderbuffer, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC) (GLuint renderbuffer, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLNAMEDRENDERBUFFERSTORAGEMULTISAMPLECOVERAGEEXTPROC) (GLuint renderbuffer, GLsizei coverageSamples, GLsizei colorSamples, GLenum internalformat, GLsizei width, GLsizei height);
+typedef GLenum (APIENTRYP PFNGLCHECKNAMEDFRAMEBUFFERSTATUSEXTPROC) (GLuint framebuffer, GLenum target);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURE1DEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURE2DEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURE3DEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERRENDERBUFFEREXTPROC) (GLuint framebuffer, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLGETNAMEDFRAMEBUFFERATTACHMENTPARAMETERIVEXTPROC) (GLuint framebuffer, GLenum attachment, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGENERATETEXTUREMIPMAPEXTPROC) (GLuint texture, GLenum target);
+typedef void (APIENTRYP PFNGLGENERATEMULTITEXMIPMAPEXTPROC) (GLenum texunit, GLenum target);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERDRAWBUFFEREXTPROC) (GLuint framebuffer, GLenum mode);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERDRAWBUFFERSEXTPROC) (GLuint framebuffer, GLsizei n, const GLenum *bufs);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERREADBUFFEREXTPROC) (GLuint framebuffer, GLenum mode);
+typedef void (APIENTRYP PFNGLGETFRAMEBUFFERPARAMETERIVEXTPROC) (GLuint framebuffer, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLNAMEDCOPYBUFFERSUBDATAEXTPROC) (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTUREEXTPROC) (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTURELAYEREXTPROC) (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level, GLint layer);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERTEXTUREFACEEXTPROC) (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level, GLenum face);
+typedef void (APIENTRYP PFNGLTEXTURERENDERBUFFEREXTPROC) (GLuint texture, GLenum target, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLMULTITEXRENDERBUFFEREXTPROC) (GLenum texunit, GLenum target, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYCOLOROFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYEDGEFLAGOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYINDEXOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYNORMALOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYTEXCOORDOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYMULTITEXCOORDOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLenum texunit, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYFOGCOORDOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYSECONDARYCOLOROFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBIOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLuint index, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLENABLEVERTEXARRAYEXTPROC) (GLuint vaobj, GLenum array);
+typedef void (APIENTRYP PFNGLDISABLEVERTEXARRAYEXTPROC) (GLuint vaobj, GLenum array);
+typedef void (APIENTRYP PFNGLENABLEVERTEXARRAYATTRIBEXTPROC) (GLuint vaobj, GLuint index);
+typedef void (APIENTRYP PFNGLDISABLEVERTEXARRAYATTRIBEXTPROC) (GLuint vaobj, GLuint index);
+typedef void (APIENTRYP PFNGLGETVERTEXARRAYINTEGERVEXTPROC) (GLuint vaobj, GLenum pname, GLint *param);
+typedef void (APIENTRYP PFNGLGETVERTEXARRAYPOINTERVEXTPROC) (GLuint vaobj, GLenum pname, void **param);
+typedef void (APIENTRYP PFNGLGETVERTEXARRAYINTEGERI_VEXTPROC) (GLuint vaobj, GLuint index, GLenum pname, GLint *param);
+typedef void (APIENTRYP PFNGLGETVERTEXARRAYPOINTERI_VEXTPROC) (GLuint vaobj, GLuint index, GLenum pname, void **param);
+typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERRANGEEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length, GLbitfield access);
+typedef void (APIENTRYP PFNGLFLUSHMAPPEDNAMEDBUFFERRANGEEXTPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length);
+typedef void (APIENTRYP PFNGLNAMEDBUFFERSTORAGEEXTPROC) (GLuint buffer, GLsizeiptr size, const void *data, GLbitfield flags);
+typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERDATAEXTPROC) (GLuint buffer, GLenum internalformat, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERSUBDATAEXTPROC) (GLuint buffer, GLenum internalformat, GLsizeiptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data);
+typedef void (APIENTRYP PFNGLNAMEDFRAMEBUFFERPARAMETERIEXTPROC) (GLuint framebuffer, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLGETNAMEDFRAMEBUFFERPARAMETERIVEXTPROC) (GLuint framebuffer, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1DEXTPROC) (GLuint program, GLint location, GLdouble x);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2DEXTPROC) (GLuint program, GLint location, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3DEXTPROC) (GLuint program, GLint location, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4DEXTPROC) (GLuint program, GLint location, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1DVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2DVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3DVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4DVEXTPROC) (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X3DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX2X4DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X2DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX3X4DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X2DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMMATRIX4X3DVEXTPROC) (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+typedef void (APIENTRYP PFNGLTEXTUREBUFFERRANGEEXTPROC) (GLuint texture, GLenum target, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLTEXTURESTORAGE1DEXTPROC) (GLuint texture, GLenum target, GLsizei levels, GLenum internalformat, GLsizei width);
+typedef void (APIENTRYP PFNGLTEXTURESTORAGE2DEXTPROC) (GLuint texture, GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLTEXTURESTORAGE3DEXTPROC) (GLuint texture, GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
+typedef void (APIENTRYP PFNGLTEXTURESTORAGE2DMULTISAMPLEEXTPROC) (GLuint texture, GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
+typedef void (APIENTRYP PFNGLTEXTURESTORAGE3DMULTISAMPLEEXTPROC) (GLuint texture, GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations);
+typedef void (APIENTRYP PFNGLVERTEXARRAYBINDVERTEXBUFFEREXTPROC) (GLuint vaobj, GLuint bindingindex, GLuint buffer, GLintptr offset, GLsizei stride);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBFORMATEXTPROC) (GLuint vaobj, GLuint attribindex, GLint size, GLenum type, GLboolean normalized, GLuint relativeoffset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBIFORMATEXTPROC) (GLuint vaobj, GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBLFORMATEXTPROC) (GLuint vaobj, GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBBINDINGEXTPROC) (GLuint vaobj, GLuint attribindex, GLuint bindingindex);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXBINDINGDIVISOREXTPROC) (GLuint vaobj, GLuint bindingindex, GLuint divisor);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBLOFFSETEXTPROC) (GLuint vaobj, GLuint buffer, GLuint index, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+typedef void (APIENTRYP PFNGLTEXTUREPAGECOMMITMENTEXTPROC) (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean resident);
+typedef void (APIENTRYP PFNGLVERTEXARRAYVERTEXATTRIBDIVISOREXTPROC) (GLuint vaobj, GLuint index, GLuint divisor);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMatrixLoadfEXT (GLenum mode, const GLfloat *m);
+GLAPI void APIENTRY glMatrixLoaddEXT (GLenum mode, const GLdouble *m);
+GLAPI void APIENTRY glMatrixMultfEXT (GLenum mode, const GLfloat *m);
+GLAPI void APIENTRY glMatrixMultdEXT (GLenum mode, const GLdouble *m);
+GLAPI void APIENTRY glMatrixLoadIdentityEXT (GLenum mode);
+GLAPI void APIENTRY glMatrixRotatefEXT (GLenum mode, GLfloat angle, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glMatrixRotatedEXT (GLenum mode, GLdouble angle, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glMatrixScalefEXT (GLenum mode, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glMatrixScaledEXT (GLenum mode, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glMatrixTranslatefEXT (GLenum mode, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glMatrixTranslatedEXT (GLenum mode, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glMatrixFrustumEXT (GLenum mode, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble zNear, GLdouble zFar);
+GLAPI void APIENTRY glMatrixOrthoEXT (GLenum mode, GLdouble left, GLdouble right, GLdouble bottom, GLdouble top, GLdouble zNear, GLdouble zFar);
+GLAPI void APIENTRY glMatrixPopEXT (GLenum mode);
+GLAPI void APIENTRY glMatrixPushEXT (GLenum mode);
+GLAPI void APIENTRY glClientAttribDefaultEXT (GLbitfield mask);
+GLAPI void APIENTRY glPushClientAttribDefaultEXT (GLbitfield mask);
+GLAPI void APIENTRY glTextureParameterfEXT (GLuint texture, GLenum target, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glTextureParameterfvEXT (GLuint texture, GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glTextureParameteriEXT (GLuint texture, GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glTextureParameterivEXT (GLuint texture, GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glTextureImage1DEXT (GLuint texture, GLenum target, GLint level, GLint internalformat, GLsizei width, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTextureImage2DEXT (GLuint texture, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTextureSubImage1DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTextureSubImage2DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glCopyTextureImage1DEXT (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border);
+GLAPI void APIENTRY glCopyTextureImage2DEXT (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
+GLAPI void APIENTRY glCopyTextureSubImage1DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glCopyTextureSubImage2DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetTextureImageEXT (GLuint texture, GLenum target, GLint level, GLenum format, GLenum type, void *pixels);
+GLAPI void APIENTRY glGetTextureParameterfvEXT (GLuint texture, GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetTextureParameterivEXT (GLuint texture, GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetTextureLevelParameterfvEXT (GLuint texture, GLenum target, GLint level, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetTextureLevelParameterivEXT (GLuint texture, GLenum target, GLint level, GLenum pname, GLint *params);
+GLAPI void APIENTRY glTextureImage3DEXT (GLuint texture, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTextureSubImage3DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glCopyTextureSubImage3DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glBindMultiTextureEXT (GLenum texunit, GLenum target, GLuint texture);
+GLAPI void APIENTRY glMultiTexCoordPointerEXT (GLenum texunit, GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glMultiTexEnvfEXT (GLenum texunit, GLenum target, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glMultiTexEnvfvEXT (GLenum texunit, GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glMultiTexEnviEXT (GLenum texunit, GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glMultiTexEnvivEXT (GLenum texunit, GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glMultiTexGendEXT (GLenum texunit, GLenum coord, GLenum pname, GLdouble param);
+GLAPI void APIENTRY glMultiTexGendvEXT (GLenum texunit, GLenum coord, GLenum pname, const GLdouble *params);
+GLAPI void APIENTRY glMultiTexGenfEXT (GLenum texunit, GLenum coord, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glMultiTexGenfvEXT (GLenum texunit, GLenum coord, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glMultiTexGeniEXT (GLenum texunit, GLenum coord, GLenum pname, GLint param);
+GLAPI void APIENTRY glMultiTexGenivEXT (GLenum texunit, GLenum coord, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glGetMultiTexEnvfvEXT (GLenum texunit, GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMultiTexEnvivEXT (GLenum texunit, GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMultiTexGendvEXT (GLenum texunit, GLenum coord, GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glGetMultiTexGenfvEXT (GLenum texunit, GLenum coord, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMultiTexGenivEXT (GLenum texunit, GLenum coord, GLenum pname, GLint *params);
+GLAPI void APIENTRY glMultiTexParameteriEXT (GLenum texunit, GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glMultiTexParameterivEXT (GLenum texunit, GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glMultiTexParameterfEXT (GLenum texunit, GLenum target, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glMultiTexParameterfvEXT (GLenum texunit, GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glMultiTexImage1DEXT (GLenum texunit, GLenum target, GLint level, GLint internalformat, GLsizei width, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glMultiTexImage2DEXT (GLenum texunit, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glMultiTexSubImage1DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glMultiTexSubImage2DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glCopyMultiTexImage1DEXT (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLint border);
+GLAPI void APIENTRY glCopyMultiTexImage2DEXT (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLint x, GLint y, GLsizei width, GLsizei height, GLint border);
+GLAPI void APIENTRY glCopyMultiTexSubImage1DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glCopyMultiTexSubImage2DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetMultiTexImageEXT (GLenum texunit, GLenum target, GLint level, GLenum format, GLenum type, void *pixels);
+GLAPI void APIENTRY glGetMultiTexParameterfvEXT (GLenum texunit, GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMultiTexParameterivEXT (GLenum texunit, GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMultiTexLevelParameterfvEXT (GLenum texunit, GLenum target, GLint level, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMultiTexLevelParameterivEXT (GLenum texunit, GLenum target, GLint level, GLenum pname, GLint *params);
+GLAPI void APIENTRY glMultiTexImage3DEXT (GLenum texunit, GLenum target, GLint level, GLint internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glMultiTexSubImage3DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glCopyMultiTexSubImage3DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint x, GLint y, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glEnableClientStateIndexedEXT (GLenum array, GLuint index);
+GLAPI void APIENTRY glDisableClientStateIndexedEXT (GLenum array, GLuint index);
+GLAPI void APIENTRY glGetFloatIndexedvEXT (GLenum target, GLuint index, GLfloat *data);
+GLAPI void APIENTRY glGetDoubleIndexedvEXT (GLenum target, GLuint index, GLdouble *data);
+GLAPI void APIENTRY glGetPointerIndexedvEXT (GLenum target, GLuint index, void **data);
+GLAPI void APIENTRY glEnableIndexedEXT (GLenum target, GLuint index);
+GLAPI void APIENTRY glDisableIndexedEXT (GLenum target, GLuint index);
+GLAPI GLboolean APIENTRY glIsEnabledIndexedEXT (GLenum target, GLuint index);
+GLAPI void APIENTRY glGetIntegerIndexedvEXT (GLenum target, GLuint index, GLint *data);
+GLAPI void APIENTRY glGetBooleanIndexedvEXT (GLenum target, GLuint index, GLboolean *data);
+GLAPI void APIENTRY glCompressedTextureImage3DEXT (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedTextureImage2DEXT (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedTextureImage1DEXT (GLuint texture, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedTextureSubImage3DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedTextureSubImage2DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedTextureSubImage1DEXT (GLuint texture, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glGetCompressedTextureImageEXT (GLuint texture, GLenum target, GLint lod, void *img);
+GLAPI void APIENTRY glCompressedMultiTexImage3DEXT (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedMultiTexImage2DEXT (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLint border, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedMultiTexImage1DEXT (GLenum texunit, GLenum target, GLint level, GLenum internalformat, GLsizei width, GLint border, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedMultiTexSubImage3DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedMultiTexSubImage2DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glCompressedMultiTexSubImage1DEXT (GLenum texunit, GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLsizei imageSize, const void *bits);
+GLAPI void APIENTRY glGetCompressedMultiTexImageEXT (GLenum texunit, GLenum target, GLint lod, void *img);
+GLAPI void APIENTRY glMatrixLoadTransposefEXT (GLenum mode, const GLfloat *m);
+GLAPI void APIENTRY glMatrixLoadTransposedEXT (GLenum mode, const GLdouble *m);
+GLAPI void APIENTRY glMatrixMultTransposefEXT (GLenum mode, const GLfloat *m);
+GLAPI void APIENTRY glMatrixMultTransposedEXT (GLenum mode, const GLdouble *m);
+GLAPI void APIENTRY glNamedBufferDataEXT (GLuint buffer, GLsizeiptr size, const void *data, GLenum usage);
+GLAPI void APIENTRY glNamedBufferSubDataEXT (GLuint buffer, GLintptr offset, GLsizeiptr size, const void *data);
+GLAPI void *APIENTRY glMapNamedBufferEXT (GLuint buffer, GLenum access);
+GLAPI GLboolean APIENTRY glUnmapNamedBufferEXT (GLuint buffer);
+GLAPI void APIENTRY glGetNamedBufferParameterivEXT (GLuint buffer, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetNamedBufferPointervEXT (GLuint buffer, GLenum pname, void **params);
+GLAPI void APIENTRY glGetNamedBufferSubDataEXT (GLuint buffer, GLintptr offset, GLsizeiptr size, void *data);
+GLAPI void APIENTRY glProgramUniform1fEXT (GLuint program, GLint location, GLfloat v0);
+GLAPI void APIENTRY glProgramUniform2fEXT (GLuint program, GLint location, GLfloat v0, GLfloat v1);
+GLAPI void APIENTRY glProgramUniform3fEXT (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2);
+GLAPI void APIENTRY glProgramUniform4fEXT (GLuint program, GLint location, GLfloat v0, GLfloat v1, GLfloat v2, GLfloat v3);
+GLAPI void APIENTRY glProgramUniform1iEXT (GLuint program, GLint location, GLint v0);
+GLAPI void APIENTRY glProgramUniform2iEXT (GLuint program, GLint location, GLint v0, GLint v1);
+GLAPI void APIENTRY glProgramUniform3iEXT (GLuint program, GLint location, GLint v0, GLint v1, GLint v2);
+GLAPI void APIENTRY glProgramUniform4iEXT (GLuint program, GLint location, GLint v0, GLint v1, GLint v2, GLint v3);
+GLAPI void APIENTRY glProgramUniform1fvEXT (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform2fvEXT (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform3fvEXT (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform4fvEXT (GLuint program, GLint location, GLsizei count, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniform1ivEXT (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform2ivEXT (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform3ivEXT (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniform4ivEXT (GLuint program, GLint location, GLsizei count, const GLint *value);
+GLAPI void APIENTRY glProgramUniformMatrix2fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix3fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix4fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x3fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x2fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x4fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x2fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x4fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x3fvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
+GLAPI void APIENTRY glTextureBufferEXT (GLuint texture, GLenum target, GLenum internalformat, GLuint buffer);
+GLAPI void APIENTRY glMultiTexBufferEXT (GLenum texunit, GLenum target, GLenum internalformat, GLuint buffer);
+GLAPI void APIENTRY glTextureParameterIivEXT (GLuint texture, GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glTextureParameterIuivEXT (GLuint texture, GLenum target, GLenum pname, const GLuint *params);
+GLAPI void APIENTRY glGetTextureParameterIivEXT (GLuint texture, GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetTextureParameterIuivEXT (GLuint texture, GLenum target, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glMultiTexParameterIivEXT (GLenum texunit, GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glMultiTexParameterIuivEXT (GLenum texunit, GLenum target, GLenum pname, const GLuint *params);
+GLAPI void APIENTRY glGetMultiTexParameterIivEXT (GLenum texunit, GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMultiTexParameterIuivEXT (GLenum texunit, GLenum target, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glProgramUniform1uiEXT (GLuint program, GLint location, GLuint v0);
+GLAPI void APIENTRY glProgramUniform2uiEXT (GLuint program, GLint location, GLuint v0, GLuint v1);
+GLAPI void APIENTRY glProgramUniform3uiEXT (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2);
+GLAPI void APIENTRY glProgramUniform4uiEXT (GLuint program, GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+GLAPI void APIENTRY glProgramUniform1uivEXT (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniform2uivEXT (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniform3uivEXT (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glProgramUniform4uivEXT (GLuint program, GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glNamedProgramLocalParameters4fvEXT (GLuint program, GLenum target, GLuint index, GLsizei count, const GLfloat *params);
+GLAPI void APIENTRY glNamedProgramLocalParameterI4iEXT (GLuint program, GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glNamedProgramLocalParameterI4ivEXT (GLuint program, GLenum target, GLuint index, const GLint *params);
+GLAPI void APIENTRY glNamedProgramLocalParametersI4ivEXT (GLuint program, GLenum target, GLuint index, GLsizei count, const GLint *params);
+GLAPI void APIENTRY glNamedProgramLocalParameterI4uiEXT (GLuint program, GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+GLAPI void APIENTRY glNamedProgramLocalParameterI4uivEXT (GLuint program, GLenum target, GLuint index, const GLuint *params);
+GLAPI void APIENTRY glNamedProgramLocalParametersI4uivEXT (GLuint program, GLenum target, GLuint index, GLsizei count, const GLuint *params);
+GLAPI void APIENTRY glGetNamedProgramLocalParameterIivEXT (GLuint program, GLenum target, GLuint index, GLint *params);
+GLAPI void APIENTRY glGetNamedProgramLocalParameterIuivEXT (GLuint program, GLenum target, GLuint index, GLuint *params);
+GLAPI void APIENTRY glEnableClientStateiEXT (GLenum array, GLuint index);
+GLAPI void APIENTRY glDisableClientStateiEXT (GLenum array, GLuint index);
+GLAPI void APIENTRY glGetFloati_vEXT (GLenum pname, GLuint index, GLfloat *params);
+GLAPI void APIENTRY glGetDoublei_vEXT (GLenum pname, GLuint index, GLdouble *params);
+GLAPI void APIENTRY glGetPointeri_vEXT (GLenum pname, GLuint index, void **params);
+GLAPI void APIENTRY glNamedProgramStringEXT (GLuint program, GLenum target, GLenum format, GLsizei len, const void *string);
+GLAPI void APIENTRY glNamedProgramLocalParameter4dEXT (GLuint program, GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glNamedProgramLocalParameter4dvEXT (GLuint program, GLenum target, GLuint index, const GLdouble *params);
+GLAPI void APIENTRY glNamedProgramLocalParameter4fEXT (GLuint program, GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glNamedProgramLocalParameter4fvEXT (GLuint program, GLenum target, GLuint index, const GLfloat *params);
+GLAPI void APIENTRY glGetNamedProgramLocalParameterdvEXT (GLuint program, GLenum target, GLuint index, GLdouble *params);
+GLAPI void APIENTRY glGetNamedProgramLocalParameterfvEXT (GLuint program, GLenum target, GLuint index, GLfloat *params);
+GLAPI void APIENTRY glGetNamedProgramivEXT (GLuint program, GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetNamedProgramStringEXT (GLuint program, GLenum target, GLenum pname, void *string);
+GLAPI void APIENTRY glNamedRenderbufferStorageEXT (GLuint renderbuffer, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetNamedRenderbufferParameterivEXT (GLuint renderbuffer, GLenum pname, GLint *params);
+GLAPI void APIENTRY glNamedRenderbufferStorageMultisampleEXT (GLuint renderbuffer, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glNamedRenderbufferStorageMultisampleCoverageEXT (GLuint renderbuffer, GLsizei coverageSamples, GLsizei colorSamples, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI GLenum APIENTRY glCheckNamedFramebufferStatusEXT (GLuint framebuffer, GLenum target);
+GLAPI void APIENTRY glNamedFramebufferTexture1DEXT (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+GLAPI void APIENTRY glNamedFramebufferTexture2DEXT (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+GLAPI void APIENTRY glNamedFramebufferTexture3DEXT (GLuint framebuffer, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset);
+GLAPI void APIENTRY glNamedFramebufferRenderbufferEXT (GLuint framebuffer, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
+GLAPI void APIENTRY glGetNamedFramebufferAttachmentParameterivEXT (GLuint framebuffer, GLenum attachment, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGenerateTextureMipmapEXT (GLuint texture, GLenum target);
+GLAPI void APIENTRY glGenerateMultiTexMipmapEXT (GLenum texunit, GLenum target);
+GLAPI void APIENTRY glFramebufferDrawBufferEXT (GLuint framebuffer, GLenum mode);
+GLAPI void APIENTRY glFramebufferDrawBuffersEXT (GLuint framebuffer, GLsizei n, const GLenum *bufs);
+GLAPI void APIENTRY glFramebufferReadBufferEXT (GLuint framebuffer, GLenum mode);
+GLAPI void APIENTRY glGetFramebufferParameterivEXT (GLuint framebuffer, GLenum pname, GLint *params);
+GLAPI void APIENTRY glNamedCopyBufferSubDataEXT (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size);
+GLAPI void APIENTRY glNamedFramebufferTextureEXT (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level);
+GLAPI void APIENTRY glNamedFramebufferTextureLayerEXT (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level, GLint layer);
+GLAPI void APIENTRY glNamedFramebufferTextureFaceEXT (GLuint framebuffer, GLenum attachment, GLuint texture, GLint level, GLenum face);
+GLAPI void APIENTRY glTextureRenderbufferEXT (GLuint texture, GLenum target, GLuint renderbuffer);
+GLAPI void APIENTRY glMultiTexRenderbufferEXT (GLenum texunit, GLenum target, GLuint renderbuffer);
+GLAPI void APIENTRY glVertexArrayVertexOffsetEXT (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayColorOffsetEXT (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayEdgeFlagOffsetEXT (GLuint vaobj, GLuint buffer, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayIndexOffsetEXT (GLuint vaobj, GLuint buffer, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayNormalOffsetEXT (GLuint vaobj, GLuint buffer, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayTexCoordOffsetEXT (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayMultiTexCoordOffsetEXT (GLuint vaobj, GLuint buffer, GLenum texunit, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayFogCoordOffsetEXT (GLuint vaobj, GLuint buffer, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArraySecondaryColorOffsetEXT (GLuint vaobj, GLuint buffer, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayVertexAttribOffsetEXT (GLuint vaobj, GLuint buffer, GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glVertexArrayVertexAttribIOffsetEXT (GLuint vaobj, GLuint buffer, GLuint index, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glEnableVertexArrayEXT (GLuint vaobj, GLenum array);
+GLAPI void APIENTRY glDisableVertexArrayEXT (GLuint vaobj, GLenum array);
+GLAPI void APIENTRY glEnableVertexArrayAttribEXT (GLuint vaobj, GLuint index);
+GLAPI void APIENTRY glDisableVertexArrayAttribEXT (GLuint vaobj, GLuint index);
+GLAPI void APIENTRY glGetVertexArrayIntegervEXT (GLuint vaobj, GLenum pname, GLint *param);
+GLAPI void APIENTRY glGetVertexArrayPointervEXT (GLuint vaobj, GLenum pname, void **param);
+GLAPI void APIENTRY glGetVertexArrayIntegeri_vEXT (GLuint vaobj, GLuint index, GLenum pname, GLint *param);
+GLAPI void APIENTRY glGetVertexArrayPointeri_vEXT (GLuint vaobj, GLuint index, GLenum pname, void **param);
+GLAPI void *APIENTRY glMapNamedBufferRangeEXT (GLuint buffer, GLintptr offset, GLsizeiptr length, GLbitfield access);
+GLAPI void APIENTRY glFlushMappedNamedBufferRangeEXT (GLuint buffer, GLintptr offset, GLsizeiptr length);
+GLAPI void APIENTRY glNamedBufferStorageEXT (GLuint buffer, GLsizeiptr size, const void *data, GLbitfield flags);
+GLAPI void APIENTRY glClearNamedBufferDataEXT (GLuint buffer, GLenum internalformat, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glClearNamedBufferSubDataEXT (GLuint buffer, GLenum internalformat, GLsizeiptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data);
+GLAPI void APIENTRY glNamedFramebufferParameteriEXT (GLuint framebuffer, GLenum pname, GLint param);
+GLAPI void APIENTRY glGetNamedFramebufferParameterivEXT (GLuint framebuffer, GLenum pname, GLint *params);
+GLAPI void APIENTRY glProgramUniform1dEXT (GLuint program, GLint location, GLdouble x);
+GLAPI void APIENTRY glProgramUniform2dEXT (GLuint program, GLint location, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glProgramUniform3dEXT (GLuint program, GLint location, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glProgramUniform4dEXT (GLuint program, GLint location, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glProgramUniform1dvEXT (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform2dvEXT (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform3dvEXT (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniform4dvEXT (GLuint program, GLint location, GLsizei count, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix2dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix3dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix4dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x3dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix2x4dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x2dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix3x4dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x2dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glProgramUniformMatrix4x3dvEXT (GLuint program, GLint location, GLsizei count, GLboolean transpose, const GLdouble *value);
+GLAPI void APIENTRY glTextureBufferRangeEXT (GLuint texture, GLenum target, GLenum internalformat, GLuint buffer, GLintptr offset, GLsizeiptr size);
+GLAPI void APIENTRY glTextureStorage1DEXT (GLuint texture, GLenum target, GLsizei levels, GLenum internalformat, GLsizei width);
+GLAPI void APIENTRY glTextureStorage2DEXT (GLuint texture, GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glTextureStorage3DEXT (GLuint texture, GLenum target, GLsizei levels, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth);
+GLAPI void APIENTRY glTextureStorage2DMultisampleEXT (GLuint texture, GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLboolean fixedsamplelocations);
+GLAPI void APIENTRY glTextureStorage3DMultisampleEXT (GLuint texture, GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedsamplelocations);
+GLAPI void APIENTRY glVertexArrayBindVertexBufferEXT (GLuint vaobj, GLuint bindingindex, GLuint buffer, GLintptr offset, GLsizei stride);
+GLAPI void APIENTRY glVertexArrayVertexAttribFormatEXT (GLuint vaobj, GLuint attribindex, GLint size, GLenum type, GLboolean normalized, GLuint relativeoffset);
+GLAPI void APIENTRY glVertexArrayVertexAttribIFormatEXT (GLuint vaobj, GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+GLAPI void APIENTRY glVertexArrayVertexAttribLFormatEXT (GLuint vaobj, GLuint attribindex, GLint size, GLenum type, GLuint relativeoffset);
+GLAPI void APIENTRY glVertexArrayVertexAttribBindingEXT (GLuint vaobj, GLuint attribindex, GLuint bindingindex);
+GLAPI void APIENTRY glVertexArrayVertexBindingDivisorEXT (GLuint vaobj, GLuint bindingindex, GLuint divisor);
+GLAPI void APIENTRY glVertexArrayVertexAttribLOffsetEXT (GLuint vaobj, GLuint buffer, GLuint index, GLint size, GLenum type, GLsizei stride, GLintptr offset);
+GLAPI void APIENTRY glTexturePageCommitmentEXT (GLuint texture, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLboolean resident);
+GLAPI void APIENTRY glVertexArrayVertexAttribDivisorEXT (GLuint vaobj, GLuint index, GLuint divisor);
+#endif
+#endif /* GL_EXT_direct_state_access */
+
+#ifndef GL_EXT_draw_buffers2
+#define GL_EXT_draw_buffers2 1
+typedef void (APIENTRYP PFNGLCOLORMASKINDEXEDEXTPROC) (GLuint index, GLboolean r, GLboolean g, GLboolean b, GLboolean a);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorMaskIndexedEXT (GLuint index, GLboolean r, GLboolean g, GLboolean b, GLboolean a);
+#endif
+#endif /* GL_EXT_draw_buffers2 */
+
+#ifndef GL_EXT_draw_instanced
+#define GL_EXT_draw_instanced 1
+typedef void (APIENTRYP PFNGLDRAWARRAYSINSTANCEDEXTPROC) (GLenum mode, GLint start, GLsizei count, GLsizei primcount);
+typedef void (APIENTRYP PFNGLDRAWELEMENTSINSTANCEDEXTPROC) (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei primcount);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawArraysInstancedEXT (GLenum mode, GLint start, GLsizei count, GLsizei primcount);
+GLAPI void APIENTRY glDrawElementsInstancedEXT (GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei primcount);
+#endif
+#endif /* GL_EXT_draw_instanced */
+
+#ifndef GL_EXT_draw_range_elements
+#define GL_EXT_draw_range_elements 1
+#define GL_MAX_ELEMENTS_VERTICES_EXT      0x80E8
+#define GL_MAX_ELEMENTS_INDICES_EXT       0x80E9
+typedef void (APIENTRYP PFNGLDRAWRANGEELEMENTSEXTPROC) (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawRangeElementsEXT (GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, const void *indices);
+#endif
+#endif /* GL_EXT_draw_range_elements */
+
+#ifndef GL_EXT_fog_coord
+#define GL_EXT_fog_coord 1
+#define GL_FOG_COORDINATE_SOURCE_EXT      0x8450
+#define GL_FOG_COORDINATE_EXT             0x8451
+#define GL_FRAGMENT_DEPTH_EXT             0x8452
+#define GL_CURRENT_FOG_COORDINATE_EXT     0x8453
+#define GL_FOG_COORDINATE_ARRAY_TYPE_EXT  0x8454
+#define GL_FOG_COORDINATE_ARRAY_STRIDE_EXT 0x8455
+#define GL_FOG_COORDINATE_ARRAY_POINTER_EXT 0x8456
+#define GL_FOG_COORDINATE_ARRAY_EXT       0x8457
+typedef void (APIENTRYP PFNGLFOGCOORDFEXTPROC) (GLfloat coord);
+typedef void (APIENTRYP PFNGLFOGCOORDFVEXTPROC) (const GLfloat *coord);
+typedef void (APIENTRYP PFNGLFOGCOORDDEXTPROC) (GLdouble coord);
+typedef void (APIENTRYP PFNGLFOGCOORDDVEXTPROC) (const GLdouble *coord);
+typedef void (APIENTRYP PFNGLFOGCOORDPOINTEREXTPROC) (GLenum type, GLsizei stride, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFogCoordfEXT (GLfloat coord);
+GLAPI void APIENTRY glFogCoordfvEXT (const GLfloat *coord);
+GLAPI void APIENTRY glFogCoorddEXT (GLdouble coord);
+GLAPI void APIENTRY glFogCoorddvEXT (const GLdouble *coord);
+GLAPI void APIENTRY glFogCoordPointerEXT (GLenum type, GLsizei stride, const void *pointer);
+#endif
+#endif /* GL_EXT_fog_coord */
+
+#ifndef GL_EXT_framebuffer_blit
+#define GL_EXT_framebuffer_blit 1
+#define GL_READ_FRAMEBUFFER_EXT           0x8CA8
+#define GL_DRAW_FRAMEBUFFER_EXT           0x8CA9
+#define GL_DRAW_FRAMEBUFFER_BINDING_EXT   0x8CA6
+#define GL_READ_FRAMEBUFFER_BINDING_EXT   0x8CAA
+typedef void (APIENTRYP PFNGLBLITFRAMEBUFFEREXTPROC) (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlitFramebufferEXT (GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, GLbitfield mask, GLenum filter);
+#endif
+#endif /* GL_EXT_framebuffer_blit */
+
+#ifndef GL_EXT_framebuffer_multisample
+#define GL_EXT_framebuffer_multisample 1
+#define GL_RENDERBUFFER_SAMPLES_EXT       0x8CAB
+#define GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE_EXT 0x8D56
+#define GL_MAX_SAMPLES_EXT                0x8D57
+typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLEEXTPROC) (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glRenderbufferStorageMultisampleEXT (GLenum target, GLsizei samples, GLenum internalformat, GLsizei width, GLsizei height);
+#endif
+#endif /* GL_EXT_framebuffer_multisample */
+
+#ifndef GL_EXT_framebuffer_multisample_blit_scaled
+#define GL_EXT_framebuffer_multisample_blit_scaled 1
+#define GL_SCALED_RESOLVE_FASTEST_EXT     0x90BA
+#define GL_SCALED_RESOLVE_NICEST_EXT      0x90BB
+#endif /* GL_EXT_framebuffer_multisample_blit_scaled */
+
+#ifndef GL_EXT_framebuffer_object
+#define GL_EXT_framebuffer_object 1
+#define GL_INVALID_FRAMEBUFFER_OPERATION_EXT 0x0506
+#define GL_MAX_RENDERBUFFER_SIZE_EXT      0x84E8
+#define GL_FRAMEBUFFER_BINDING_EXT        0x8CA6
+#define GL_RENDERBUFFER_BINDING_EXT       0x8CA7
+#define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE_EXT 0x8CD0
+#define GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME_EXT 0x8CD1
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LEVEL_EXT 0x8CD2
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_CUBE_MAP_FACE_EXT 0x8CD3
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_3D_ZOFFSET_EXT 0x8CD4
+#define GL_FRAMEBUFFER_COMPLETE_EXT       0x8CD5
+#define GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT_EXT 0x8CD6
+#define GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT_EXT 0x8CD7
+#define GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS_EXT 0x8CD9
+#define GL_FRAMEBUFFER_INCOMPLETE_FORMATS_EXT 0x8CDA
+#define GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER_EXT 0x8CDB
+#define GL_FRAMEBUFFER_INCOMPLETE_READ_BUFFER_EXT 0x8CDC
+#define GL_FRAMEBUFFER_UNSUPPORTED_EXT    0x8CDD
+#define GL_MAX_COLOR_ATTACHMENTS_EXT      0x8CDF
+#define GL_COLOR_ATTACHMENT0_EXT          0x8CE0
+#define GL_COLOR_ATTACHMENT1_EXT          0x8CE1
+#define GL_COLOR_ATTACHMENT2_EXT          0x8CE2
+#define GL_COLOR_ATTACHMENT3_EXT          0x8CE3
+#define GL_COLOR_ATTACHMENT4_EXT          0x8CE4
+#define GL_COLOR_ATTACHMENT5_EXT          0x8CE5
+#define GL_COLOR_ATTACHMENT6_EXT          0x8CE6
+#define GL_COLOR_ATTACHMENT7_EXT          0x8CE7
+#define GL_COLOR_ATTACHMENT8_EXT          0x8CE8
+#define GL_COLOR_ATTACHMENT9_EXT          0x8CE9
+#define GL_COLOR_ATTACHMENT10_EXT         0x8CEA
+#define GL_COLOR_ATTACHMENT11_EXT         0x8CEB
+#define GL_COLOR_ATTACHMENT12_EXT         0x8CEC
+#define GL_COLOR_ATTACHMENT13_EXT         0x8CED
+#define GL_COLOR_ATTACHMENT14_EXT         0x8CEE
+#define GL_COLOR_ATTACHMENT15_EXT         0x8CEF
+#define GL_DEPTH_ATTACHMENT_EXT           0x8D00
+#define GL_STENCIL_ATTACHMENT_EXT         0x8D20
+#define GL_FRAMEBUFFER_EXT                0x8D40
+#define GL_RENDERBUFFER_EXT               0x8D41
+#define GL_RENDERBUFFER_WIDTH_EXT         0x8D42
+#define GL_RENDERBUFFER_HEIGHT_EXT        0x8D43
+#define GL_RENDERBUFFER_INTERNAL_FORMAT_EXT 0x8D44
+#define GL_STENCIL_INDEX1_EXT             0x8D46
+#define GL_STENCIL_INDEX4_EXT             0x8D47
+#define GL_STENCIL_INDEX8_EXT             0x8D48
+#define GL_STENCIL_INDEX16_EXT            0x8D49
+#define GL_RENDERBUFFER_RED_SIZE_EXT      0x8D50
+#define GL_RENDERBUFFER_GREEN_SIZE_EXT    0x8D51
+#define GL_RENDERBUFFER_BLUE_SIZE_EXT     0x8D52
+#define GL_RENDERBUFFER_ALPHA_SIZE_EXT    0x8D53
+#define GL_RENDERBUFFER_DEPTH_SIZE_EXT    0x8D54
+#define GL_RENDERBUFFER_STENCIL_SIZE_EXT  0x8D55
+typedef GLboolean (APIENTRYP PFNGLISRENDERBUFFEREXTPROC) (GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLBINDRENDERBUFFEREXTPROC) (GLenum target, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLDELETERENDERBUFFERSEXTPROC) (GLsizei n, const GLuint *renderbuffers);
+typedef void (APIENTRYP PFNGLGENRENDERBUFFERSEXTPROC) (GLsizei n, GLuint *renderbuffers);
+typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEEXTPROC) (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
+typedef void (APIENTRYP PFNGLGETRENDERBUFFERPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef GLboolean (APIENTRYP PFNGLISFRAMEBUFFEREXTPROC) (GLuint framebuffer);
+typedef void (APIENTRYP PFNGLBINDFRAMEBUFFEREXTPROC) (GLenum target, GLuint framebuffer);
+typedef void (APIENTRYP PFNGLDELETEFRAMEBUFFERSEXTPROC) (GLsizei n, const GLuint *framebuffers);
+typedef void (APIENTRYP PFNGLGENFRAMEBUFFERSEXTPROC) (GLsizei n, GLuint *framebuffers);
+typedef GLenum (APIENTRYP PFNGLCHECKFRAMEBUFFERSTATUSEXTPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE1DEXTPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE2DEXTPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURE3DEXTPROC) (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERRENDERBUFFEREXTPROC) (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
+typedef void (APIENTRYP PFNGLGETFRAMEBUFFERATTACHMENTPARAMETERIVEXTPROC) (GLenum target, GLenum attachment, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGENERATEMIPMAPEXTPROC) (GLenum target);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLboolean APIENTRY glIsRenderbufferEXT (GLuint renderbuffer);
+GLAPI void APIENTRY glBindRenderbufferEXT (GLenum target, GLuint renderbuffer);
+GLAPI void APIENTRY glDeleteRenderbuffersEXT (GLsizei n, const GLuint *renderbuffers);
+GLAPI void APIENTRY glGenRenderbuffersEXT (GLsizei n, GLuint *renderbuffers);
+GLAPI void APIENTRY glRenderbufferStorageEXT (GLenum target, GLenum internalformat, GLsizei width, GLsizei height);
+GLAPI void APIENTRY glGetRenderbufferParameterivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI GLboolean APIENTRY glIsFramebufferEXT (GLuint framebuffer);
+GLAPI void APIENTRY glBindFramebufferEXT (GLenum target, GLuint framebuffer);
+GLAPI void APIENTRY glDeleteFramebuffersEXT (GLsizei n, const GLuint *framebuffers);
+GLAPI void APIENTRY glGenFramebuffersEXT (GLsizei n, GLuint *framebuffers);
+GLAPI GLenum APIENTRY glCheckFramebufferStatusEXT (GLenum target);
+GLAPI void APIENTRY glFramebufferTexture1DEXT (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+GLAPI void APIENTRY glFramebufferTexture2DEXT (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level);
+GLAPI void APIENTRY glFramebufferTexture3DEXT (GLenum target, GLenum attachment, GLenum textarget, GLuint texture, GLint level, GLint zoffset);
+GLAPI void APIENTRY glFramebufferRenderbufferEXT (GLenum target, GLenum attachment, GLenum renderbuffertarget, GLuint renderbuffer);
+GLAPI void APIENTRY glGetFramebufferAttachmentParameterivEXT (GLenum target, GLenum attachment, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGenerateMipmapEXT (GLenum target);
+#endif
+#endif /* GL_EXT_framebuffer_object */
+
+#ifndef GL_EXT_framebuffer_sRGB
+#define GL_EXT_framebuffer_sRGB 1
+#define GL_FRAMEBUFFER_SRGB_EXT           0x8DB9
+#define GL_FRAMEBUFFER_SRGB_CAPABLE_EXT   0x8DBA
+#endif /* GL_EXT_framebuffer_sRGB */
+
+#ifndef GL_EXT_geometry_shader4
+#define GL_EXT_geometry_shader4 1
+#define GL_GEOMETRY_SHADER_EXT            0x8DD9
+#define GL_GEOMETRY_VERTICES_OUT_EXT      0x8DDA
+#define GL_GEOMETRY_INPUT_TYPE_EXT        0x8DDB
+#define GL_GEOMETRY_OUTPUT_TYPE_EXT       0x8DDC
+#define GL_MAX_GEOMETRY_TEXTURE_IMAGE_UNITS_EXT 0x8C29
+#define GL_MAX_GEOMETRY_VARYING_COMPONENTS_EXT 0x8DDD
+#define GL_MAX_VERTEX_VARYING_COMPONENTS_EXT 0x8DDE
+#define GL_MAX_VARYING_COMPONENTS_EXT     0x8B4B
+#define GL_MAX_GEOMETRY_UNIFORM_COMPONENTS_EXT 0x8DDF
+#define GL_MAX_GEOMETRY_OUTPUT_VERTICES_EXT 0x8DE0
+#define GL_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS_EXT 0x8DE1
+#define GL_LINES_ADJACENCY_EXT            0x000A
+#define GL_LINE_STRIP_ADJACENCY_EXT       0x000B
+#define GL_TRIANGLES_ADJACENCY_EXT        0x000C
+#define GL_TRIANGLE_STRIP_ADJACENCY_EXT   0x000D
+#define GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS_EXT 0x8DA8
+#define GL_FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_EXT 0x8DA9
+#define GL_FRAMEBUFFER_ATTACHMENT_LAYERED_EXT 0x8DA7
+#define GL_FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER_EXT 0x8CD4
+#define GL_PROGRAM_POINT_SIZE_EXT         0x8642
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETERIEXTPROC) (GLuint program, GLenum pname, GLint value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramParameteriEXT (GLuint program, GLenum pname, GLint value);
+#endif
+#endif /* GL_EXT_geometry_shader4 */
+
+#ifndef GL_EXT_gpu_program_parameters
+#define GL_EXT_gpu_program_parameters 1
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERS4FVEXTPROC) (GLenum target, GLuint index, GLsizei count, const GLfloat *params);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERS4FVEXTPROC) (GLenum target, GLuint index, GLsizei count, const GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramEnvParameters4fvEXT (GLenum target, GLuint index, GLsizei count, const GLfloat *params);
+GLAPI void APIENTRY glProgramLocalParameters4fvEXT (GLenum target, GLuint index, GLsizei count, const GLfloat *params);
+#endif
+#endif /* GL_EXT_gpu_program_parameters */
+
+#ifndef GL_EXT_gpu_shader4
+#define GL_EXT_gpu_shader4 1
+#define GL_VERTEX_ATTRIB_ARRAY_INTEGER_EXT 0x88FD
+#define GL_SAMPLER_1D_ARRAY_EXT           0x8DC0
+#define GL_SAMPLER_2D_ARRAY_EXT           0x8DC1
+#define GL_SAMPLER_BUFFER_EXT             0x8DC2
+#define GL_SAMPLER_1D_ARRAY_SHADOW_EXT    0x8DC3
+#define GL_SAMPLER_2D_ARRAY_SHADOW_EXT    0x8DC4
+#define GL_SAMPLER_CUBE_SHADOW_EXT        0x8DC5
+#define GL_UNSIGNED_INT_VEC2_EXT          0x8DC6
+#define GL_UNSIGNED_INT_VEC3_EXT          0x8DC7
+#define GL_UNSIGNED_INT_VEC4_EXT          0x8DC8
+#define GL_INT_SAMPLER_1D_EXT             0x8DC9
+#define GL_INT_SAMPLER_2D_EXT             0x8DCA
+#define GL_INT_SAMPLER_3D_EXT             0x8DCB
+#define GL_INT_SAMPLER_CUBE_EXT           0x8DCC
+#define GL_INT_SAMPLER_2D_RECT_EXT        0x8DCD
+#define GL_INT_SAMPLER_1D_ARRAY_EXT       0x8DCE
+#define GL_INT_SAMPLER_2D_ARRAY_EXT       0x8DCF
+#define GL_INT_SAMPLER_BUFFER_EXT         0x8DD0
+#define GL_UNSIGNED_INT_SAMPLER_1D_EXT    0x8DD1
+#define GL_UNSIGNED_INT_SAMPLER_2D_EXT    0x8DD2
+#define GL_UNSIGNED_INT_SAMPLER_3D_EXT    0x8DD3
+#define GL_UNSIGNED_INT_SAMPLER_CUBE_EXT  0x8DD4
+#define GL_UNSIGNED_INT_SAMPLER_2D_RECT_EXT 0x8DD5
+#define GL_UNSIGNED_INT_SAMPLER_1D_ARRAY_EXT 0x8DD6
+#define GL_UNSIGNED_INT_SAMPLER_2D_ARRAY_EXT 0x8DD7
+#define GL_UNSIGNED_INT_SAMPLER_BUFFER_EXT 0x8DD8
+#define GL_MIN_PROGRAM_TEXEL_OFFSET_EXT   0x8904
+#define GL_MAX_PROGRAM_TEXEL_OFFSET_EXT   0x8905
+typedef void (APIENTRYP PFNGLGETUNIFORMUIVEXTPROC) (GLuint program, GLint location, GLuint *params);
+typedef void (APIENTRYP PFNGLBINDFRAGDATALOCATIONEXTPROC) (GLuint program, GLuint color, const GLchar *name);
+typedef GLint (APIENTRYP PFNGLGETFRAGDATALOCATIONEXTPROC) (GLuint program, const GLchar *name);
+typedef void (APIENTRYP PFNGLUNIFORM1UIEXTPROC) (GLint location, GLuint v0);
+typedef void (APIENTRYP PFNGLUNIFORM2UIEXTPROC) (GLint location, GLuint v0, GLuint v1);
+typedef void (APIENTRYP PFNGLUNIFORM3UIEXTPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2);
+typedef void (APIENTRYP PFNGLUNIFORM4UIEXTPROC) (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+typedef void (APIENTRYP PFNGLUNIFORM1UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLUNIFORM2UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLUNIFORM3UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value);
+typedef void (APIENTRYP PFNGLUNIFORM4UIVEXTPROC) (GLint location, GLsizei count, const GLuint *value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetUniformuivEXT (GLuint program, GLint location, GLuint *params);
+GLAPI void APIENTRY glBindFragDataLocationEXT (GLuint program, GLuint color, const GLchar *name);
+GLAPI GLint APIENTRY glGetFragDataLocationEXT (GLuint program, const GLchar *name);
+GLAPI void APIENTRY glUniform1uiEXT (GLint location, GLuint v0);
+GLAPI void APIENTRY glUniform2uiEXT (GLint location, GLuint v0, GLuint v1);
+GLAPI void APIENTRY glUniform3uiEXT (GLint location, GLuint v0, GLuint v1, GLuint v2);
+GLAPI void APIENTRY glUniform4uiEXT (GLint location, GLuint v0, GLuint v1, GLuint v2, GLuint v3);
+GLAPI void APIENTRY glUniform1uivEXT (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glUniform2uivEXT (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glUniform3uivEXT (GLint location, GLsizei count, const GLuint *value);
+GLAPI void APIENTRY glUniform4uivEXT (GLint location, GLsizei count, const GLuint *value);
+#endif
+#endif /* GL_EXT_gpu_shader4 */
+
+#ifndef GL_EXT_histogram
+#define GL_EXT_histogram 1
+#define GL_HISTOGRAM_EXT                  0x8024
+#define GL_PROXY_HISTOGRAM_EXT            0x8025
+#define GL_HISTOGRAM_WIDTH_EXT            0x8026
+#define GL_HISTOGRAM_FORMAT_EXT           0x8027
+#define GL_HISTOGRAM_RED_SIZE_EXT         0x8028
+#define GL_HISTOGRAM_GREEN_SIZE_EXT       0x8029
+#define GL_HISTOGRAM_BLUE_SIZE_EXT        0x802A
+#define GL_HISTOGRAM_ALPHA_SIZE_EXT       0x802B
+#define GL_HISTOGRAM_LUMINANCE_SIZE_EXT   0x802C
+#define GL_HISTOGRAM_SINK_EXT             0x802D
+#define GL_MINMAX_EXT                     0x802E
+#define GL_MINMAX_FORMAT_EXT              0x802F
+#define GL_MINMAX_SINK_EXT                0x8030
+#define GL_TABLE_TOO_LARGE_EXT            0x8031
+typedef void (APIENTRYP PFNGLGETHISTOGRAMEXTPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETHISTOGRAMPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMINMAXEXTPROC) (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMINMAXPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLHISTOGRAMEXTPROC) (GLenum target, GLsizei width, GLenum internalformat, GLboolean sink);
+typedef void (APIENTRYP PFNGLMINMAXEXTPROC) (GLenum target, GLenum internalformat, GLboolean sink);
+typedef void (APIENTRYP PFNGLRESETHISTOGRAMEXTPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLRESETMINMAXEXTPROC) (GLenum target);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetHistogramEXT (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+GLAPI void APIENTRY glGetHistogramParameterfvEXT (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetHistogramParameterivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMinmaxEXT (GLenum target, GLboolean reset, GLenum format, GLenum type, void *values);
+GLAPI void APIENTRY glGetMinmaxParameterfvEXT (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMinmaxParameterivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glHistogramEXT (GLenum target, GLsizei width, GLenum internalformat, GLboolean sink);
+GLAPI void APIENTRY glMinmaxEXT (GLenum target, GLenum internalformat, GLboolean sink);
+GLAPI void APIENTRY glResetHistogramEXT (GLenum target);
+GLAPI void APIENTRY glResetMinmaxEXT (GLenum target);
+#endif
+#endif /* GL_EXT_histogram */
+
+#ifndef GL_EXT_index_array_formats
+#define GL_EXT_index_array_formats 1
+#define GL_IUI_V2F_EXT                    0x81AD
+#define GL_IUI_V3F_EXT                    0x81AE
+#define GL_IUI_N3F_V2F_EXT                0x81AF
+#define GL_IUI_N3F_V3F_EXT                0x81B0
+#define GL_T2F_IUI_V2F_EXT                0x81B1
+#define GL_T2F_IUI_V3F_EXT                0x81B2
+#define GL_T2F_IUI_N3F_V2F_EXT            0x81B3
+#define GL_T2F_IUI_N3F_V3F_EXT            0x81B4
+#endif /* GL_EXT_index_array_formats */
+
+#ifndef GL_EXT_index_func
+#define GL_EXT_index_func 1
+#define GL_INDEX_TEST_EXT                 0x81B5
+#define GL_INDEX_TEST_FUNC_EXT            0x81B6
+#define GL_INDEX_TEST_REF_EXT             0x81B7
+typedef void (APIENTRYP PFNGLINDEXFUNCEXTPROC) (GLenum func, GLclampf ref);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glIndexFuncEXT (GLenum func, GLclampf ref);
+#endif
+#endif /* GL_EXT_index_func */
+
+#ifndef GL_EXT_index_material
+#define GL_EXT_index_material 1
+#define GL_INDEX_MATERIAL_EXT             0x81B8
+#define GL_INDEX_MATERIAL_PARAMETER_EXT   0x81B9
+#define GL_INDEX_MATERIAL_FACE_EXT        0x81BA
+typedef void (APIENTRYP PFNGLINDEXMATERIALEXTPROC) (GLenum face, GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glIndexMaterialEXT (GLenum face, GLenum mode);
+#endif
+#endif /* GL_EXT_index_material */
+
+#ifndef GL_EXT_index_texture
+#define GL_EXT_index_texture 1
+#endif /* GL_EXT_index_texture */
+
+#ifndef GL_EXT_light_texture
+#define GL_EXT_light_texture 1
+#define GL_FRAGMENT_MATERIAL_EXT          0x8349
+#define GL_FRAGMENT_NORMAL_EXT            0x834A
+#define GL_FRAGMENT_COLOR_EXT             0x834C
+#define GL_ATTENUATION_EXT                0x834D
+#define GL_SHADOW_ATTENUATION_EXT         0x834E
+#define GL_TEXTURE_APPLICATION_MODE_EXT   0x834F
+#define GL_TEXTURE_LIGHT_EXT              0x8350
+#define GL_TEXTURE_MATERIAL_FACE_EXT      0x8351
+#define GL_TEXTURE_MATERIAL_PARAMETER_EXT 0x8352
+typedef void (APIENTRYP PFNGLAPPLYTEXTUREEXTPROC) (GLenum mode);
+typedef void (APIENTRYP PFNGLTEXTURELIGHTEXTPROC) (GLenum pname);
+typedef void (APIENTRYP PFNGLTEXTUREMATERIALEXTPROC) (GLenum face, GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glApplyTextureEXT (GLenum mode);
+GLAPI void APIENTRY glTextureLightEXT (GLenum pname);
+GLAPI void APIENTRY glTextureMaterialEXT (GLenum face, GLenum mode);
+#endif
+#endif /* GL_EXT_light_texture */
+
+#ifndef GL_EXT_misc_attribute
+#define GL_EXT_misc_attribute 1
+#endif /* GL_EXT_misc_attribute */
+
+#ifndef GL_EXT_multi_draw_arrays
+#define GL_EXT_multi_draw_arrays 1
+typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSEXTPROC) (GLenum mode, const GLint *first, const GLsizei *count, GLsizei primcount);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSEXTPROC) (GLenum mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei primcount);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMultiDrawArraysEXT (GLenum mode, const GLint *first, const GLsizei *count, GLsizei primcount);
+GLAPI void APIENTRY glMultiDrawElementsEXT (GLenum mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei primcount);
+#endif
+#endif /* GL_EXT_multi_draw_arrays */
+
+#ifndef GL_EXT_multisample
+#define GL_EXT_multisample 1
+#define GL_MULTISAMPLE_EXT                0x809D
+#define GL_SAMPLE_ALPHA_TO_MASK_EXT       0x809E
+#define GL_SAMPLE_ALPHA_TO_ONE_EXT        0x809F
+#define GL_SAMPLE_MASK_EXT                0x80A0
+#define GL_1PASS_EXT                      0x80A1
+#define GL_2PASS_0_EXT                    0x80A2
+#define GL_2PASS_1_EXT                    0x80A3
+#define GL_4PASS_0_EXT                    0x80A4
+#define GL_4PASS_1_EXT                    0x80A5
+#define GL_4PASS_2_EXT                    0x80A6
+#define GL_4PASS_3_EXT                    0x80A7
+#define GL_SAMPLE_BUFFERS_EXT             0x80A8
+#define GL_SAMPLES_EXT                    0x80A9
+#define GL_SAMPLE_MASK_VALUE_EXT          0x80AA
+#define GL_SAMPLE_MASK_INVERT_EXT         0x80AB
+#define GL_SAMPLE_PATTERN_EXT             0x80AC
+#define GL_MULTISAMPLE_BIT_EXT            0x20000000
+typedef void (APIENTRYP PFNGLSAMPLEMASKEXTPROC) (GLclampf value, GLboolean invert);
+typedef void (APIENTRYP PFNGLSAMPLEPATTERNEXTPROC) (GLenum pattern);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSampleMaskEXT (GLclampf value, GLboolean invert);
+GLAPI void APIENTRY glSamplePatternEXT (GLenum pattern);
+#endif
+#endif /* GL_EXT_multisample */
+
+#ifndef GL_EXT_packed_depth_stencil
+#define GL_EXT_packed_depth_stencil 1
+#define GL_DEPTH_STENCIL_EXT              0x84F9
+#define GL_UNSIGNED_INT_24_8_EXT          0x84FA
+#define GL_DEPTH24_STENCIL8_EXT           0x88F0
+#define GL_TEXTURE_STENCIL_SIZE_EXT       0x88F1
+#endif /* GL_EXT_packed_depth_stencil */
+
+#ifndef GL_EXT_packed_float
+#define GL_EXT_packed_float 1
+#define GL_R11F_G11F_B10F_EXT             0x8C3A
+#define GL_UNSIGNED_INT_10F_11F_11F_REV_EXT 0x8C3B
+#define GL_RGBA_SIGNED_COMPONENTS_EXT     0x8C3C
+#endif /* GL_EXT_packed_float */
+
+#ifndef GL_EXT_packed_pixels
+#define GL_EXT_packed_pixels 1
+#define GL_UNSIGNED_BYTE_3_3_2_EXT        0x8032
+#define GL_UNSIGNED_SHORT_4_4_4_4_EXT     0x8033
+#define GL_UNSIGNED_SHORT_5_5_5_1_EXT     0x8034
+#define GL_UNSIGNED_INT_8_8_8_8_EXT       0x8035
+#define GL_UNSIGNED_INT_10_10_10_2_EXT    0x8036
+#endif /* GL_EXT_packed_pixels */
+
+#ifndef GL_EXT_paletted_texture
+#define GL_EXT_paletted_texture 1
+#define GL_COLOR_INDEX1_EXT               0x80E2
+#define GL_COLOR_INDEX2_EXT               0x80E3
+#define GL_COLOR_INDEX4_EXT               0x80E4
+#define GL_COLOR_INDEX8_EXT               0x80E5
+#define GL_COLOR_INDEX12_EXT              0x80E6
+#define GL_COLOR_INDEX16_EXT              0x80E7
+#define GL_TEXTURE_INDEX_SIZE_EXT         0x80ED
+typedef void (APIENTRYP PFNGLCOLORTABLEEXTPROC) (GLenum target, GLenum internalFormat, GLsizei width, GLenum format, GLenum type, const void *table);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEEXTPROC) (GLenum target, GLenum format, GLenum type, void *data);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorTableEXT (GLenum target, GLenum internalFormat, GLsizei width, GLenum format, GLenum type, const void *table);
+GLAPI void APIENTRY glGetColorTableEXT (GLenum target, GLenum format, GLenum type, void *data);
+GLAPI void APIENTRY glGetColorTableParameterivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetColorTableParameterfvEXT (GLenum target, GLenum pname, GLfloat *params);
+#endif
+#endif /* GL_EXT_paletted_texture */
+
+#ifndef GL_EXT_pixel_buffer_object
+#define GL_EXT_pixel_buffer_object 1
+#define GL_PIXEL_PACK_BUFFER_EXT          0x88EB
+#define GL_PIXEL_UNPACK_BUFFER_EXT        0x88EC
+#define GL_PIXEL_PACK_BUFFER_BINDING_EXT  0x88ED
+#define GL_PIXEL_UNPACK_BUFFER_BINDING_EXT 0x88EF
+#endif /* GL_EXT_pixel_buffer_object */
+
+#ifndef GL_EXT_pixel_transform
+#define GL_EXT_pixel_transform 1
+#define GL_PIXEL_TRANSFORM_2D_EXT         0x8330
+#define GL_PIXEL_MAG_FILTER_EXT           0x8331
+#define GL_PIXEL_MIN_FILTER_EXT           0x8332
+#define GL_PIXEL_CUBIC_WEIGHT_EXT         0x8333
+#define GL_CUBIC_EXT                      0x8334
+#define GL_AVERAGE_EXT                    0x8335
+#define GL_PIXEL_TRANSFORM_2D_STACK_DEPTH_EXT 0x8336
+#define GL_MAX_PIXEL_TRANSFORM_2D_STACK_DEPTH_EXT 0x8337
+#define GL_PIXEL_TRANSFORM_2D_MATRIX_EXT  0x8338
+typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERIEXTPROC) (GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERFEXTPROC) (GLenum target, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERIVEXTPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLPIXELTRANSFORMPARAMETERFVEXTPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPIXELTRANSFORMPARAMETERIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETPIXELTRANSFORMPARAMETERFVEXTPROC) (GLenum target, GLenum pname, GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPixelTransformParameteriEXT (GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glPixelTransformParameterfEXT (GLenum target, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glPixelTransformParameterivEXT (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glPixelTransformParameterfvEXT (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glGetPixelTransformParameterivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetPixelTransformParameterfvEXT (GLenum target, GLenum pname, GLfloat *params);
+#endif
+#endif /* GL_EXT_pixel_transform */
+
+#ifndef GL_EXT_pixel_transform_color_table
+#define GL_EXT_pixel_transform_color_table 1
+#endif /* GL_EXT_pixel_transform_color_table */
+
+#ifndef GL_EXT_point_parameters
+#define GL_EXT_point_parameters 1
+#define GL_POINT_SIZE_MIN_EXT             0x8126
+#define GL_POINT_SIZE_MAX_EXT             0x8127
+#define GL_POINT_FADE_THRESHOLD_SIZE_EXT  0x8128
+#define GL_DISTANCE_ATTENUATION_EXT       0x8129
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFEXTPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFVEXTPROC) (GLenum pname, const GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPointParameterfEXT (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glPointParameterfvEXT (GLenum pname, const GLfloat *params);
+#endif
+#endif /* GL_EXT_point_parameters */
+
+#ifndef GL_EXT_polygon_offset
+#define GL_EXT_polygon_offset 1
+#define GL_POLYGON_OFFSET_EXT             0x8037
+#define GL_POLYGON_OFFSET_FACTOR_EXT      0x8038
+#define GL_POLYGON_OFFSET_BIAS_EXT        0x8039
+typedef void (APIENTRYP PFNGLPOLYGONOFFSETEXTPROC) (GLfloat factor, GLfloat bias);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPolygonOffsetEXT (GLfloat factor, GLfloat bias);
+#endif
+#endif /* GL_EXT_polygon_offset */
+
+#ifndef GL_EXT_provoking_vertex
+#define GL_EXT_provoking_vertex 1
+#define GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION_EXT 0x8E4C
+#define GL_FIRST_VERTEX_CONVENTION_EXT    0x8E4D
+#define GL_LAST_VERTEX_CONVENTION_EXT     0x8E4E
+#define GL_PROVOKING_VERTEX_EXT           0x8E4F
+typedef void (APIENTRYP PFNGLPROVOKINGVERTEXEXTPROC) (GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProvokingVertexEXT (GLenum mode);
+#endif
+#endif /* GL_EXT_provoking_vertex */
+
+#ifndef GL_EXT_rescale_normal
+#define GL_EXT_rescale_normal 1
+#define GL_RESCALE_NORMAL_EXT             0x803A
+#endif /* GL_EXT_rescale_normal */
+
+#ifndef GL_EXT_secondary_color
+#define GL_EXT_secondary_color 1
+#define GL_COLOR_SUM_EXT                  0x8458
+#define GL_CURRENT_SECONDARY_COLOR_EXT    0x8459
+#define GL_SECONDARY_COLOR_ARRAY_SIZE_EXT 0x845A
+#define GL_SECONDARY_COLOR_ARRAY_TYPE_EXT 0x845B
+#define GL_SECONDARY_COLOR_ARRAY_STRIDE_EXT 0x845C
+#define GL_SECONDARY_COLOR_ARRAY_POINTER_EXT 0x845D
+#define GL_SECONDARY_COLOR_ARRAY_EXT      0x845E
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BEXTPROC) (GLbyte red, GLbyte green, GLbyte blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3BVEXTPROC) (const GLbyte *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DEXTPROC) (GLdouble red, GLdouble green, GLdouble blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3DVEXTPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FEXTPROC) (GLfloat red, GLfloat green, GLfloat blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3FVEXTPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IEXTPROC) (GLint red, GLint green, GLint blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3IVEXTPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SEXTPROC) (GLshort red, GLshort green, GLshort blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3SVEXTPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBEXTPROC) (GLubyte red, GLubyte green, GLubyte blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UBVEXTPROC) (const GLubyte *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIEXTPROC) (GLuint red, GLuint green, GLuint blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3UIVEXTPROC) (const GLuint *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USEXTPROC) (GLushort red, GLushort green, GLushort blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3USVEXTPROC) (const GLushort *v);
+typedef void (APIENTRYP PFNGLSECONDARYCOLORPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSecondaryColor3bEXT (GLbyte red, GLbyte green, GLbyte blue);
+GLAPI void APIENTRY glSecondaryColor3bvEXT (const GLbyte *v);
+GLAPI void APIENTRY glSecondaryColor3dEXT (GLdouble red, GLdouble green, GLdouble blue);
+GLAPI void APIENTRY glSecondaryColor3dvEXT (const GLdouble *v);
+GLAPI void APIENTRY glSecondaryColor3fEXT (GLfloat red, GLfloat green, GLfloat blue);
+GLAPI void APIENTRY glSecondaryColor3fvEXT (const GLfloat *v);
+GLAPI void APIENTRY glSecondaryColor3iEXT (GLint red, GLint green, GLint blue);
+GLAPI void APIENTRY glSecondaryColor3ivEXT (const GLint *v);
+GLAPI void APIENTRY glSecondaryColor3sEXT (GLshort red, GLshort green, GLshort blue);
+GLAPI void APIENTRY glSecondaryColor3svEXT (const GLshort *v);
+GLAPI void APIENTRY glSecondaryColor3ubEXT (GLubyte red, GLubyte green, GLubyte blue);
+GLAPI void APIENTRY glSecondaryColor3ubvEXT (const GLubyte *v);
+GLAPI void APIENTRY glSecondaryColor3uiEXT (GLuint red, GLuint green, GLuint blue);
+GLAPI void APIENTRY glSecondaryColor3uivEXT (const GLuint *v);
+GLAPI void APIENTRY glSecondaryColor3usEXT (GLushort red, GLushort green, GLushort blue);
+GLAPI void APIENTRY glSecondaryColor3usvEXT (const GLushort *v);
+GLAPI void APIENTRY glSecondaryColorPointerEXT (GLint size, GLenum type, GLsizei stride, const void *pointer);
+#endif
+#endif /* GL_EXT_secondary_color */
+
+#ifndef GL_EXT_separate_shader_objects
+#define GL_EXT_separate_shader_objects 1
+#define GL_ACTIVE_PROGRAM_EXT             0x8B8D
+typedef void (APIENTRYP PFNGLUSESHADERPROGRAMEXTPROC) (GLenum type, GLuint program);
+typedef void (APIENTRYP PFNGLACTIVEPROGRAMEXTPROC) (GLuint program);
+typedef GLuint (APIENTRYP PFNGLCREATESHADERPROGRAMEXTPROC) (GLenum type, const GLchar *string);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glUseShaderProgramEXT (GLenum type, GLuint program);
+GLAPI void APIENTRY glActiveProgramEXT (GLuint program);
+GLAPI GLuint APIENTRY glCreateShaderProgramEXT (GLenum type, const GLchar *string);
+#endif
+#endif /* GL_EXT_separate_shader_objects */
+
+#ifndef GL_EXT_separate_specular_color
+#define GL_EXT_separate_specular_color 1
+#define GL_LIGHT_MODEL_COLOR_CONTROL_EXT  0x81F8
+#define GL_SINGLE_COLOR_EXT               0x81F9
+#define GL_SEPARATE_SPECULAR_COLOR_EXT    0x81FA
+#endif /* GL_EXT_separate_specular_color */
+
+#ifndef GL_EXT_shader_image_load_store
+#define GL_EXT_shader_image_load_store 1
+#define GL_MAX_IMAGE_UNITS_EXT            0x8F38
+#define GL_MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS_EXT 0x8F39
+#define GL_IMAGE_BINDING_NAME_EXT         0x8F3A
+#define GL_IMAGE_BINDING_LEVEL_EXT        0x8F3B
+#define GL_IMAGE_BINDING_LAYERED_EXT      0x8F3C
+#define GL_IMAGE_BINDING_LAYER_EXT        0x8F3D
+#define GL_IMAGE_BINDING_ACCESS_EXT       0x8F3E
+#define GL_IMAGE_1D_EXT                   0x904C
+#define GL_IMAGE_2D_EXT                   0x904D
+#define GL_IMAGE_3D_EXT                   0x904E
+#define GL_IMAGE_2D_RECT_EXT              0x904F
+#define GL_IMAGE_CUBE_EXT                 0x9050
+#define GL_IMAGE_BUFFER_EXT               0x9051
+#define GL_IMAGE_1D_ARRAY_EXT             0x9052
+#define GL_IMAGE_2D_ARRAY_EXT             0x9053
+#define GL_IMAGE_CUBE_MAP_ARRAY_EXT       0x9054
+#define GL_IMAGE_2D_MULTISAMPLE_EXT       0x9055
+#define GL_IMAGE_2D_MULTISAMPLE_ARRAY_EXT 0x9056
+#define GL_INT_IMAGE_1D_EXT               0x9057
+#define GL_INT_IMAGE_2D_EXT               0x9058
+#define GL_INT_IMAGE_3D_EXT               0x9059
+#define GL_INT_IMAGE_2D_RECT_EXT          0x905A
+#define GL_INT_IMAGE_CUBE_EXT             0x905B
+#define GL_INT_IMAGE_BUFFER_EXT           0x905C
+#define GL_INT_IMAGE_1D_ARRAY_EXT         0x905D
+#define GL_INT_IMAGE_2D_ARRAY_EXT         0x905E
+#define GL_INT_IMAGE_CUBE_MAP_ARRAY_EXT   0x905F
+#define GL_INT_IMAGE_2D_MULTISAMPLE_EXT   0x9060
+#define GL_INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT 0x9061
+#define GL_UNSIGNED_INT_IMAGE_1D_EXT      0x9062
+#define GL_UNSIGNED_INT_IMAGE_2D_EXT      0x9063
+#define GL_UNSIGNED_INT_IMAGE_3D_EXT      0x9064
+#define GL_UNSIGNED_INT_IMAGE_2D_RECT_EXT 0x9065
+#define GL_UNSIGNED_INT_IMAGE_CUBE_EXT    0x9066
+#define GL_UNSIGNED_INT_IMAGE_BUFFER_EXT  0x9067
+#define GL_UNSIGNED_INT_IMAGE_1D_ARRAY_EXT 0x9068
+#define GL_UNSIGNED_INT_IMAGE_2D_ARRAY_EXT 0x9069
+#define GL_UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY_EXT 0x906A
+#define GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_EXT 0x906B
+#define GL_UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT 0x906C
+#define GL_MAX_IMAGE_SAMPLES_EXT          0x906D
+#define GL_IMAGE_BINDING_FORMAT_EXT       0x906E
+#define GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT 0x00000001
+#define GL_ELEMENT_ARRAY_BARRIER_BIT_EXT  0x00000002
+#define GL_UNIFORM_BARRIER_BIT_EXT        0x00000004
+#define GL_TEXTURE_FETCH_BARRIER_BIT_EXT  0x00000008
+#define GL_SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT 0x00000020
+#define GL_COMMAND_BARRIER_BIT_EXT        0x00000040
+#define GL_PIXEL_BUFFER_BARRIER_BIT_EXT   0x00000080
+#define GL_TEXTURE_UPDATE_BARRIER_BIT_EXT 0x00000100
+#define GL_BUFFER_UPDATE_BARRIER_BIT_EXT  0x00000200
+#define GL_FRAMEBUFFER_BARRIER_BIT_EXT    0x00000400
+#define GL_TRANSFORM_FEEDBACK_BARRIER_BIT_EXT 0x00000800
+#define GL_ATOMIC_COUNTER_BARRIER_BIT_EXT 0x00001000
+#define GL_ALL_BARRIER_BITS_EXT           0xFFFFFFFF
+typedef void (APIENTRYP PFNGLBINDIMAGETEXTUREEXTPROC) (GLuint index, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLint format);
+typedef void (APIENTRYP PFNGLMEMORYBARRIEREXTPROC) (GLbitfield barriers);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBindImageTextureEXT (GLuint index, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLint format);
+GLAPI void APIENTRY glMemoryBarrierEXT (GLbitfield barriers);
+#endif
+#endif /* GL_EXT_shader_image_load_store */
+
+#ifndef GL_EXT_shader_integer_mix
+#define GL_EXT_shader_integer_mix 1
+#endif /* GL_EXT_shader_integer_mix */
+
+#ifndef GL_EXT_shadow_funcs
+#define GL_EXT_shadow_funcs 1
+#endif /* GL_EXT_shadow_funcs */
+
+#ifndef GL_EXT_shared_texture_palette
+#define GL_EXT_shared_texture_palette 1
+#define GL_SHARED_TEXTURE_PALETTE_EXT     0x81FB
+#endif /* GL_EXT_shared_texture_palette */
+
+#ifndef GL_EXT_stencil_clear_tag
+#define GL_EXT_stencil_clear_tag 1
+#define GL_STENCIL_TAG_BITS_EXT           0x88F2
+#define GL_STENCIL_CLEAR_TAG_VALUE_EXT    0x88F3
+typedef void (APIENTRYP PFNGLSTENCILCLEARTAGEXTPROC) (GLsizei stencilTagBits, GLuint stencilClearTag);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glStencilClearTagEXT (GLsizei stencilTagBits, GLuint stencilClearTag);
+#endif
+#endif /* GL_EXT_stencil_clear_tag */
+
+#ifndef GL_EXT_stencil_two_side
+#define GL_EXT_stencil_two_side 1
+#define GL_STENCIL_TEST_TWO_SIDE_EXT      0x8910
+#define GL_ACTIVE_STENCIL_FACE_EXT        0x8911
+typedef void (APIENTRYP PFNGLACTIVESTENCILFACEEXTPROC) (GLenum face);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glActiveStencilFaceEXT (GLenum face);
+#endif
+#endif /* GL_EXT_stencil_two_side */
+
+#ifndef GL_EXT_stencil_wrap
+#define GL_EXT_stencil_wrap 1
+#define GL_INCR_WRAP_EXT                  0x8507
+#define GL_DECR_WRAP_EXT                  0x8508
+#endif /* GL_EXT_stencil_wrap */
+
+#ifndef GL_EXT_subtexture
+#define GL_EXT_subtexture 1
+typedef void (APIENTRYP PFNGLTEXSUBIMAGE1DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXSUBIMAGE2DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexSubImage1DEXT (GLenum target, GLint level, GLint xoffset, GLsizei width, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTexSubImage2DEXT (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLsizei width, GLsizei height, GLenum format, GLenum type, const void *pixels);
+#endif
+#endif /* GL_EXT_subtexture */
+
+#ifndef GL_EXT_texture
+#define GL_EXT_texture 1
+#define GL_ALPHA4_EXT                     0x803B
+#define GL_ALPHA8_EXT                     0x803C
+#define GL_ALPHA12_EXT                    0x803D
+#define GL_ALPHA16_EXT                    0x803E
+#define GL_LUMINANCE4_EXT                 0x803F
+#define GL_LUMINANCE8_EXT                 0x8040
+#define GL_LUMINANCE12_EXT                0x8041
+#define GL_LUMINANCE16_EXT                0x8042
+#define GL_LUMINANCE4_ALPHA4_EXT          0x8043
+#define GL_LUMINANCE6_ALPHA2_EXT          0x8044
+#define GL_LUMINANCE8_ALPHA8_EXT          0x8045
+#define GL_LUMINANCE12_ALPHA4_EXT         0x8046
+#define GL_LUMINANCE12_ALPHA12_EXT        0x8047
+#define GL_LUMINANCE16_ALPHA16_EXT        0x8048
+#define GL_INTENSITY_EXT                  0x8049
+#define GL_INTENSITY4_EXT                 0x804A
+#define GL_INTENSITY8_EXT                 0x804B
+#define GL_INTENSITY12_EXT                0x804C
+#define GL_INTENSITY16_EXT                0x804D
+#define GL_RGB2_EXT                       0x804E
+#define GL_RGB4_EXT                       0x804F
+#define GL_RGB5_EXT                       0x8050
+#define GL_RGB8_EXT                       0x8051
+#define GL_RGB10_EXT                      0x8052
+#define GL_RGB12_EXT                      0x8053
+#define GL_RGB16_EXT                      0x8054
+#define GL_RGBA2_EXT                      0x8055
+#define GL_RGBA4_EXT                      0x8056
+#define GL_RGB5_A1_EXT                    0x8057
+#define GL_RGBA8_EXT                      0x8058
+#define GL_RGB10_A2_EXT                   0x8059
+#define GL_RGBA12_EXT                     0x805A
+#define GL_RGBA16_EXT                     0x805B
+#define GL_TEXTURE_RED_SIZE_EXT           0x805C
+#define GL_TEXTURE_GREEN_SIZE_EXT         0x805D
+#define GL_TEXTURE_BLUE_SIZE_EXT          0x805E
+#define GL_TEXTURE_ALPHA_SIZE_EXT         0x805F
+#define GL_TEXTURE_LUMINANCE_SIZE_EXT     0x8060
+#define GL_TEXTURE_INTENSITY_SIZE_EXT     0x8061
+#define GL_REPLACE_EXT                    0x8062
+#define GL_PROXY_TEXTURE_1D_EXT           0x8063
+#define GL_PROXY_TEXTURE_2D_EXT           0x8064
+#define GL_TEXTURE_TOO_LARGE_EXT          0x8065
+#endif /* GL_EXT_texture */
+
+#ifndef GL_EXT_texture3D
+#define GL_EXT_texture3D 1
+#define GL_PACK_SKIP_IMAGES_EXT           0x806B
+#define GL_PACK_IMAGE_HEIGHT_EXT          0x806C
+#define GL_UNPACK_SKIP_IMAGES_EXT         0x806D
+#define GL_UNPACK_IMAGE_HEIGHT_EXT        0x806E
+#define GL_TEXTURE_3D_EXT                 0x806F
+#define GL_PROXY_TEXTURE_3D_EXT           0x8070
+#define GL_TEXTURE_DEPTH_EXT              0x8071
+#define GL_TEXTURE_WRAP_R_EXT             0x8072
+#define GL_MAX_3D_TEXTURE_SIZE_EXT        0x8073
+typedef void (APIENTRYP PFNGLTEXIMAGE3DEXTPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXSUBIMAGE3DEXTPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexImage3DEXT (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTexSubImage3DEXT (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLsizei width, GLsizei height, GLsizei depth, GLenum format, GLenum type, const void *pixels);
+#endif
+#endif /* GL_EXT_texture3D */
+
+#ifndef GL_EXT_texture_array
+#define GL_EXT_texture_array 1
+#define GL_TEXTURE_1D_ARRAY_EXT           0x8C18
+#define GL_PROXY_TEXTURE_1D_ARRAY_EXT     0x8C19
+#define GL_TEXTURE_2D_ARRAY_EXT           0x8C1A
+#define GL_PROXY_TEXTURE_2D_ARRAY_EXT     0x8C1B
+#define GL_TEXTURE_BINDING_1D_ARRAY_EXT   0x8C1C
+#define GL_TEXTURE_BINDING_2D_ARRAY_EXT   0x8C1D
+#define GL_MAX_ARRAY_TEXTURE_LAYERS_EXT   0x88FF
+#define GL_COMPARE_REF_DEPTH_TO_TEXTURE_EXT 0x884E
+#endif /* GL_EXT_texture_array */
+
+#ifndef GL_EXT_texture_buffer_object
+#define GL_EXT_texture_buffer_object 1
+#define GL_TEXTURE_BUFFER_EXT             0x8C2A
+#define GL_MAX_TEXTURE_BUFFER_SIZE_EXT    0x8C2B
+#define GL_TEXTURE_BINDING_BUFFER_EXT     0x8C2C
+#define GL_TEXTURE_BUFFER_DATA_STORE_BINDING_EXT 0x8C2D
+#define GL_TEXTURE_BUFFER_FORMAT_EXT      0x8C2E
+typedef void (APIENTRYP PFNGLTEXBUFFEREXTPROC) (GLenum target, GLenum internalformat, GLuint buffer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexBufferEXT (GLenum target, GLenum internalformat, GLuint buffer);
+#endif
+#endif /* GL_EXT_texture_buffer_object */
+
+#ifndef GL_EXT_texture_compression_latc
+#define GL_EXT_texture_compression_latc 1
+#define GL_COMPRESSED_LUMINANCE_LATC1_EXT 0x8C70
+#define GL_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT 0x8C71
+#define GL_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT 0x8C72
+#define GL_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT 0x8C73
+#endif /* GL_EXT_texture_compression_latc */
+
+#ifndef GL_EXT_texture_compression_rgtc
+#define GL_EXT_texture_compression_rgtc 1
+#define GL_COMPRESSED_RED_RGTC1_EXT       0x8DBB
+#define GL_COMPRESSED_SIGNED_RED_RGTC1_EXT 0x8DBC
+#define GL_COMPRESSED_RED_GREEN_RGTC2_EXT 0x8DBD
+#define GL_COMPRESSED_SIGNED_RED_GREEN_RGTC2_EXT 0x8DBE
+#endif /* GL_EXT_texture_compression_rgtc */
+
+#ifndef GL_EXT_texture_compression_s3tc
+#define GL_EXT_texture_compression_s3tc 1
+#define GL_COMPRESSED_RGB_S3TC_DXT1_EXT   0x83F0
+#define GL_COMPRESSED_RGBA_S3TC_DXT1_EXT  0x83F1
+#define GL_COMPRESSED_RGBA_S3TC_DXT3_EXT  0x83F2
+#define GL_COMPRESSED_RGBA_S3TC_DXT5_EXT  0x83F3
+#endif /* GL_EXT_texture_compression_s3tc */
+
+#ifndef GL_EXT_texture_cube_map
+#define GL_EXT_texture_cube_map 1
+#define GL_NORMAL_MAP_EXT                 0x8511
+#define GL_REFLECTION_MAP_EXT             0x8512
+#define GL_TEXTURE_CUBE_MAP_EXT           0x8513
+#define GL_TEXTURE_BINDING_CUBE_MAP_EXT   0x8514
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_X_EXT 0x8515
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_X_EXT 0x8516
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_Y_EXT 0x8517
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_Y_EXT 0x8518
+#define GL_TEXTURE_CUBE_MAP_POSITIVE_Z_EXT 0x8519
+#define GL_TEXTURE_CUBE_MAP_NEGATIVE_Z_EXT 0x851A
+#define GL_PROXY_TEXTURE_CUBE_MAP_EXT     0x851B
+#define GL_MAX_CUBE_MAP_TEXTURE_SIZE_EXT  0x851C
+#endif /* GL_EXT_texture_cube_map */
+
+#ifndef GL_EXT_texture_env_add
+#define GL_EXT_texture_env_add 1
+#endif /* GL_EXT_texture_env_add */
+
+#ifndef GL_EXT_texture_env_combine
+#define GL_EXT_texture_env_combine 1
+#define GL_COMBINE_EXT                    0x8570
+#define GL_COMBINE_RGB_EXT                0x8571
+#define GL_COMBINE_ALPHA_EXT              0x8572
+#define GL_RGB_SCALE_EXT                  0x8573
+#define GL_ADD_SIGNED_EXT                 0x8574
+#define GL_INTERPOLATE_EXT                0x8575
+#define GL_CONSTANT_EXT                   0x8576
+#define GL_PRIMARY_COLOR_EXT              0x8577
+#define GL_PREVIOUS_EXT                   0x8578
+#define GL_SOURCE0_RGB_EXT                0x8580
+#define GL_SOURCE1_RGB_EXT                0x8581
+#define GL_SOURCE2_RGB_EXT                0x8582
+#define GL_SOURCE0_ALPHA_EXT              0x8588
+#define GL_SOURCE1_ALPHA_EXT              0x8589
+#define GL_SOURCE2_ALPHA_EXT              0x858A
+#define GL_OPERAND0_RGB_EXT               0x8590
+#define GL_OPERAND1_RGB_EXT               0x8591
+#define GL_OPERAND2_RGB_EXT               0x8592
+#define GL_OPERAND0_ALPHA_EXT             0x8598
+#define GL_OPERAND1_ALPHA_EXT             0x8599
+#define GL_OPERAND2_ALPHA_EXT             0x859A
+#endif /* GL_EXT_texture_env_combine */
+
+#ifndef GL_EXT_texture_env_dot3
+#define GL_EXT_texture_env_dot3 1
+#define GL_DOT3_RGB_EXT                   0x8740
+#define GL_DOT3_RGBA_EXT                  0x8741
+#endif /* GL_EXT_texture_env_dot3 */
+
+#ifndef GL_EXT_texture_filter_anisotropic
+#define GL_EXT_texture_filter_anisotropic 1
+#define GL_TEXTURE_MAX_ANISOTROPY_EXT     0x84FE
+#define GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT 0x84FF
+#endif /* GL_EXT_texture_filter_anisotropic */
+
+#ifndef GL_EXT_texture_integer
+#define GL_EXT_texture_integer 1
+#define GL_RGBA32UI_EXT                   0x8D70
+#define GL_RGB32UI_EXT                    0x8D71
+#define GL_ALPHA32UI_EXT                  0x8D72
+#define GL_INTENSITY32UI_EXT              0x8D73
+#define GL_LUMINANCE32UI_EXT              0x8D74
+#define GL_LUMINANCE_ALPHA32UI_EXT        0x8D75
+#define GL_RGBA16UI_EXT                   0x8D76
+#define GL_RGB16UI_EXT                    0x8D77
+#define GL_ALPHA16UI_EXT                  0x8D78
+#define GL_INTENSITY16UI_EXT              0x8D79
+#define GL_LUMINANCE16UI_EXT              0x8D7A
+#define GL_LUMINANCE_ALPHA16UI_EXT        0x8D7B
+#define GL_RGBA8UI_EXT                    0x8D7C
+#define GL_RGB8UI_EXT                     0x8D7D
+#define GL_ALPHA8UI_EXT                   0x8D7E
+#define GL_INTENSITY8UI_EXT               0x8D7F
+#define GL_LUMINANCE8UI_EXT               0x8D80
+#define GL_LUMINANCE_ALPHA8UI_EXT         0x8D81
+#define GL_RGBA32I_EXT                    0x8D82
+#define GL_RGB32I_EXT                     0x8D83
+#define GL_ALPHA32I_EXT                   0x8D84
+#define GL_INTENSITY32I_EXT               0x8D85
+#define GL_LUMINANCE32I_EXT               0x8D86
+#define GL_LUMINANCE_ALPHA32I_EXT         0x8D87
+#define GL_RGBA16I_EXT                    0x8D88
+#define GL_RGB16I_EXT                     0x8D89
+#define GL_ALPHA16I_EXT                   0x8D8A
+#define GL_INTENSITY16I_EXT               0x8D8B
+#define GL_LUMINANCE16I_EXT               0x8D8C
+#define GL_LUMINANCE_ALPHA16I_EXT         0x8D8D
+#define GL_RGBA8I_EXT                     0x8D8E
+#define GL_RGB8I_EXT                      0x8D8F
+#define GL_ALPHA8I_EXT                    0x8D90
+#define GL_INTENSITY8I_EXT                0x8D91
+#define GL_LUMINANCE8I_EXT                0x8D92
+#define GL_LUMINANCE_ALPHA8I_EXT          0x8D93
+#define GL_RED_INTEGER_EXT                0x8D94
+#define GL_GREEN_INTEGER_EXT              0x8D95
+#define GL_BLUE_INTEGER_EXT               0x8D96
+#define GL_ALPHA_INTEGER_EXT              0x8D97
+#define GL_RGB_INTEGER_EXT                0x8D98
+#define GL_RGBA_INTEGER_EXT               0x8D99
+#define GL_BGR_INTEGER_EXT                0x8D9A
+#define GL_BGRA_INTEGER_EXT               0x8D9B
+#define GL_LUMINANCE_INTEGER_EXT          0x8D9C
+#define GL_LUMINANCE_ALPHA_INTEGER_EXT    0x8D9D
+#define GL_RGBA_INTEGER_MODE_EXT          0x8D9E
+typedef void (APIENTRYP PFNGLTEXPARAMETERIIVEXTPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLTEXPARAMETERIUIVEXTPROC) (GLenum target, GLenum pname, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETTEXPARAMETERIIVEXTPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETTEXPARAMETERIUIVEXTPROC) (GLenum target, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLCLEARCOLORIIEXTPROC) (GLint red, GLint green, GLint blue, GLint alpha);
+typedef void (APIENTRYP PFNGLCLEARCOLORIUIEXTPROC) (GLuint red, GLuint green, GLuint blue, GLuint alpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexParameterIivEXT (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glTexParameterIuivEXT (GLenum target, GLenum pname, const GLuint *params);
+GLAPI void APIENTRY glGetTexParameterIivEXT (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetTexParameterIuivEXT (GLenum target, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glClearColorIiEXT (GLint red, GLint green, GLint blue, GLint alpha);
+GLAPI void APIENTRY glClearColorIuiEXT (GLuint red, GLuint green, GLuint blue, GLuint alpha);
+#endif
+#endif /* GL_EXT_texture_integer */
+
+#ifndef GL_EXT_texture_lod_bias
+#define GL_EXT_texture_lod_bias 1
+#define GL_MAX_TEXTURE_LOD_BIAS_EXT       0x84FD
+#define GL_TEXTURE_FILTER_CONTROL_EXT     0x8500
+#define GL_TEXTURE_LOD_BIAS_EXT           0x8501
+#endif /* GL_EXT_texture_lod_bias */
+
+#ifndef GL_EXT_texture_mirror_clamp
+#define GL_EXT_texture_mirror_clamp 1
+#define GL_MIRROR_CLAMP_EXT               0x8742
+#define GL_MIRROR_CLAMP_TO_EDGE_EXT       0x8743
+#define GL_MIRROR_CLAMP_TO_BORDER_EXT     0x8912
+#endif /* GL_EXT_texture_mirror_clamp */
+
+#ifndef GL_EXT_texture_object
+#define GL_EXT_texture_object 1
+#define GL_TEXTURE_PRIORITY_EXT           0x8066
+#define GL_TEXTURE_RESIDENT_EXT           0x8067
+#define GL_TEXTURE_1D_BINDING_EXT         0x8068
+#define GL_TEXTURE_2D_BINDING_EXT         0x8069
+#define GL_TEXTURE_3D_BINDING_EXT         0x806A
+typedef GLboolean (APIENTRYP PFNGLARETEXTURESRESIDENTEXTPROC) (GLsizei n, const GLuint *textures, GLboolean *residences);
+typedef void (APIENTRYP PFNGLBINDTEXTUREEXTPROC) (GLenum target, GLuint texture);
+typedef void (APIENTRYP PFNGLDELETETEXTURESEXTPROC) (GLsizei n, const GLuint *textures);
+typedef void (APIENTRYP PFNGLGENTEXTURESEXTPROC) (GLsizei n, GLuint *textures);
+typedef GLboolean (APIENTRYP PFNGLISTEXTUREEXTPROC) (GLuint texture);
+typedef void (APIENTRYP PFNGLPRIORITIZETEXTURESEXTPROC) (GLsizei n, const GLuint *textures, const GLclampf *priorities);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLboolean APIENTRY glAreTexturesResidentEXT (GLsizei n, const GLuint *textures, GLboolean *residences);
+GLAPI void APIENTRY glBindTextureEXT (GLenum target, GLuint texture);
+GLAPI void APIENTRY glDeleteTexturesEXT (GLsizei n, const GLuint *textures);
+GLAPI void APIENTRY glGenTexturesEXT (GLsizei n, GLuint *textures);
+GLAPI GLboolean APIENTRY glIsTextureEXT (GLuint texture);
+GLAPI void APIENTRY glPrioritizeTexturesEXT (GLsizei n, const GLuint *textures, const GLclampf *priorities);
+#endif
+#endif /* GL_EXT_texture_object */
+
+#ifndef GL_EXT_texture_perturb_normal
+#define GL_EXT_texture_perturb_normal 1
+#define GL_PERTURB_EXT                    0x85AE
+#define GL_TEXTURE_NORMAL_EXT             0x85AF
+typedef void (APIENTRYP PFNGLTEXTURENORMALEXTPROC) (GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTextureNormalEXT (GLenum mode);
+#endif
+#endif /* GL_EXT_texture_perturb_normal */
+
+#ifndef GL_EXT_texture_sRGB
+#define GL_EXT_texture_sRGB 1
+#define GL_SRGB_EXT                       0x8C40
+#define GL_SRGB8_EXT                      0x8C41
+#define GL_SRGB_ALPHA_EXT                 0x8C42
+#define GL_SRGB8_ALPHA8_EXT               0x8C43
+#define GL_SLUMINANCE_ALPHA_EXT           0x8C44
+#define GL_SLUMINANCE8_ALPHA8_EXT         0x8C45
+#define GL_SLUMINANCE_EXT                 0x8C46
+#define GL_SLUMINANCE8_EXT                0x8C47
+#define GL_COMPRESSED_SRGB_EXT            0x8C48
+#define GL_COMPRESSED_SRGB_ALPHA_EXT      0x8C49
+#define GL_COMPRESSED_SLUMINANCE_EXT      0x8C4A
+#define GL_COMPRESSED_SLUMINANCE_ALPHA_EXT 0x8C4B
+#define GL_COMPRESSED_SRGB_S3TC_DXT1_EXT  0x8C4C
+#define GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT 0x8C4D
+#define GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT 0x8C4E
+#define GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT 0x8C4F
+#endif /* GL_EXT_texture_sRGB */
+
+#ifndef GL_EXT_texture_sRGB_decode
+#define GL_EXT_texture_sRGB_decode 1
+#define GL_TEXTURE_SRGB_DECODE_EXT        0x8A48
+#define GL_DECODE_EXT                     0x8A49
+#define GL_SKIP_DECODE_EXT                0x8A4A
+#endif /* GL_EXT_texture_sRGB_decode */
+
+#ifndef GL_EXT_texture_shared_exponent
+#define GL_EXT_texture_shared_exponent 1
+#define GL_RGB9_E5_EXT                    0x8C3D
+#define GL_UNSIGNED_INT_5_9_9_9_REV_EXT   0x8C3E
+#define GL_TEXTURE_SHARED_SIZE_EXT        0x8C3F
+#endif /* GL_EXT_texture_shared_exponent */
+
+#ifndef GL_EXT_texture_snorm
+#define GL_EXT_texture_snorm 1
+#define GL_ALPHA_SNORM                    0x9010
+#define GL_LUMINANCE_SNORM                0x9011
+#define GL_LUMINANCE_ALPHA_SNORM          0x9012
+#define GL_INTENSITY_SNORM                0x9013
+#define GL_ALPHA8_SNORM                   0x9014
+#define GL_LUMINANCE8_SNORM               0x9015
+#define GL_LUMINANCE8_ALPHA8_SNORM        0x9016
+#define GL_INTENSITY8_SNORM               0x9017
+#define GL_ALPHA16_SNORM                  0x9018
+#define GL_LUMINANCE16_SNORM              0x9019
+#define GL_LUMINANCE16_ALPHA16_SNORM      0x901A
+#define GL_INTENSITY16_SNORM              0x901B
+#define GL_RED_SNORM                      0x8F90
+#define GL_RG_SNORM                       0x8F91
+#define GL_RGB_SNORM                      0x8F92
+#define GL_RGBA_SNORM                     0x8F93
+#endif /* GL_EXT_texture_snorm */
+
+#ifndef GL_EXT_texture_swizzle
+#define GL_EXT_texture_swizzle 1
+#define GL_TEXTURE_SWIZZLE_R_EXT          0x8E42
+#define GL_TEXTURE_SWIZZLE_G_EXT          0x8E43
+#define GL_TEXTURE_SWIZZLE_B_EXT          0x8E44
+#define GL_TEXTURE_SWIZZLE_A_EXT          0x8E45
+#define GL_TEXTURE_SWIZZLE_RGBA_EXT       0x8E46
+#endif /* GL_EXT_texture_swizzle */
+
+#ifndef GL_EXT_timer_query
+#define GL_EXT_timer_query 1
+#define GL_TIME_ELAPSED_EXT               0x88BF
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTI64VEXTPROC) (GLuint id, GLenum pname, GLint64 *params);
+typedef void (APIENTRYP PFNGLGETQUERYOBJECTUI64VEXTPROC) (GLuint id, GLenum pname, GLuint64 *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetQueryObjecti64vEXT (GLuint id, GLenum pname, GLint64 *params);
+GLAPI void APIENTRY glGetQueryObjectui64vEXT (GLuint id, GLenum pname, GLuint64 *params);
+#endif
+#endif /* GL_EXT_timer_query */
+
+#ifndef GL_EXT_transform_feedback
+#define GL_EXT_transform_feedback 1
+#define GL_TRANSFORM_FEEDBACK_BUFFER_EXT  0x8C8E
+#define GL_TRANSFORM_FEEDBACK_BUFFER_START_EXT 0x8C84
+#define GL_TRANSFORM_FEEDBACK_BUFFER_SIZE_EXT 0x8C85
+#define GL_TRANSFORM_FEEDBACK_BUFFER_BINDING_EXT 0x8C8F
+#define GL_INTERLEAVED_ATTRIBS_EXT        0x8C8C
+#define GL_SEPARATE_ATTRIBS_EXT           0x8C8D
+#define GL_PRIMITIVES_GENERATED_EXT       0x8C87
+#define GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN_EXT 0x8C88
+#define GL_RASTERIZER_DISCARD_EXT         0x8C89
+#define GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS_EXT 0x8C8A
+#define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS_EXT 0x8C8B
+#define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS_EXT 0x8C80
+#define GL_TRANSFORM_FEEDBACK_VARYINGS_EXT 0x8C83
+#define GL_TRANSFORM_FEEDBACK_BUFFER_MODE_EXT 0x8C7F
+#define GL_TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH_EXT 0x8C76
+typedef void (APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKEXTPROC) (GLenum primitiveMode);
+typedef void (APIENTRYP PFNGLENDTRANSFORMFEEDBACKEXTPROC) (void);
+typedef void (APIENTRYP PFNGLBINDBUFFERRANGEEXTPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLBINDBUFFEROFFSETEXTPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset);
+typedef void (APIENTRYP PFNGLBINDBUFFERBASEEXTPROC) (GLenum target, GLuint index, GLuint buffer);
+typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSEXTPROC) (GLuint program, GLsizei count, const GLchar *const*varyings, GLenum bufferMode);
+typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGEXTPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginTransformFeedbackEXT (GLenum primitiveMode);
+GLAPI void APIENTRY glEndTransformFeedbackEXT (void);
+GLAPI void APIENTRY glBindBufferRangeEXT (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
+GLAPI void APIENTRY glBindBufferOffsetEXT (GLenum target, GLuint index, GLuint buffer, GLintptr offset);
+GLAPI void APIENTRY glBindBufferBaseEXT (GLenum target, GLuint index, GLuint buffer);
+GLAPI void APIENTRY glTransformFeedbackVaryingsEXT (GLuint program, GLsizei count, const GLchar *const*varyings, GLenum bufferMode);
+GLAPI void APIENTRY glGetTransformFeedbackVaryingEXT (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
+#endif
+#endif /* GL_EXT_transform_feedback */
+
+#ifndef GL_EXT_vertex_array
+#define GL_EXT_vertex_array 1
+#define GL_VERTEX_ARRAY_EXT               0x8074
+#define GL_NORMAL_ARRAY_EXT               0x8075
+#define GL_COLOR_ARRAY_EXT                0x8076
+#define GL_INDEX_ARRAY_EXT                0x8077
+#define GL_TEXTURE_COORD_ARRAY_EXT        0x8078
+#define GL_EDGE_FLAG_ARRAY_EXT            0x8079
+#define GL_VERTEX_ARRAY_SIZE_EXT          0x807A
+#define GL_VERTEX_ARRAY_TYPE_EXT          0x807B
+#define GL_VERTEX_ARRAY_STRIDE_EXT        0x807C
+#define GL_VERTEX_ARRAY_COUNT_EXT         0x807D
+#define GL_NORMAL_ARRAY_TYPE_EXT          0x807E
+#define GL_NORMAL_ARRAY_STRIDE_EXT        0x807F
+#define GL_NORMAL_ARRAY_COUNT_EXT         0x8080
+#define GL_COLOR_ARRAY_SIZE_EXT           0x8081
+#define GL_COLOR_ARRAY_TYPE_EXT           0x8082
+#define GL_COLOR_ARRAY_STRIDE_EXT         0x8083
+#define GL_COLOR_ARRAY_COUNT_EXT          0x8084
+#define GL_INDEX_ARRAY_TYPE_EXT           0x8085
+#define GL_INDEX_ARRAY_STRIDE_EXT         0x8086
+#define GL_INDEX_ARRAY_COUNT_EXT          0x8087
+#define GL_TEXTURE_COORD_ARRAY_SIZE_EXT   0x8088
+#define GL_TEXTURE_COORD_ARRAY_TYPE_EXT   0x8089
+#define GL_TEXTURE_COORD_ARRAY_STRIDE_EXT 0x808A
+#define GL_TEXTURE_COORD_ARRAY_COUNT_EXT  0x808B
+#define GL_EDGE_FLAG_ARRAY_STRIDE_EXT     0x808C
+#define GL_EDGE_FLAG_ARRAY_COUNT_EXT      0x808D
+#define GL_VERTEX_ARRAY_POINTER_EXT       0x808E
+#define GL_NORMAL_ARRAY_POINTER_EXT       0x808F
+#define GL_COLOR_ARRAY_POINTER_EXT        0x8090
+#define GL_INDEX_ARRAY_POINTER_EXT        0x8091
+#define GL_TEXTURE_COORD_ARRAY_POINTER_EXT 0x8092
+#define GL_EDGE_FLAG_ARRAY_POINTER_EXT    0x8093
+typedef void (APIENTRYP PFNGLARRAYELEMENTEXTPROC) (GLint i);
+typedef void (APIENTRYP PFNGLCOLORPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+typedef void (APIENTRYP PFNGLDRAWARRAYSEXTPROC) (GLenum mode, GLint first, GLsizei count);
+typedef void (APIENTRYP PFNGLEDGEFLAGPOINTEREXTPROC) (GLsizei stride, GLsizei count, const GLboolean *pointer);
+typedef void (APIENTRYP PFNGLGETPOINTERVEXTPROC) (GLenum pname, void **params);
+typedef void (APIENTRYP PFNGLINDEXPOINTEREXTPROC) (GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+typedef void (APIENTRYP PFNGLNORMALPOINTEREXTPROC) (GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+typedef void (APIENTRYP PFNGLTEXCOORDPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+typedef void (APIENTRYP PFNGLVERTEXPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glArrayElementEXT (GLint i);
+GLAPI void APIENTRY glColorPointerEXT (GLint size, GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+GLAPI void APIENTRY glDrawArraysEXT (GLenum mode, GLint first, GLsizei count);
+GLAPI void APIENTRY glEdgeFlagPointerEXT (GLsizei stride, GLsizei count, const GLboolean *pointer);
+GLAPI void APIENTRY glGetPointervEXT (GLenum pname, void **params);
+GLAPI void APIENTRY glIndexPointerEXT (GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+GLAPI void APIENTRY glNormalPointerEXT (GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+GLAPI void APIENTRY glTexCoordPointerEXT (GLint size, GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+GLAPI void APIENTRY glVertexPointerEXT (GLint size, GLenum type, GLsizei stride, GLsizei count, const void *pointer);
+#endif
+#endif /* GL_EXT_vertex_array */
+
+#ifndef GL_EXT_vertex_array_bgra
+#define GL_EXT_vertex_array_bgra 1
+#endif /* GL_EXT_vertex_array_bgra */
+
+#ifndef GL_EXT_vertex_attrib_64bit
+#define GL_EXT_vertex_attrib_64bit 1
+#define GL_DOUBLE_VEC2_EXT                0x8FFC
+#define GL_DOUBLE_VEC3_EXT                0x8FFD
+#define GL_DOUBLE_VEC4_EXT                0x8FFE
+#define GL_DOUBLE_MAT2_EXT                0x8F46
+#define GL_DOUBLE_MAT3_EXT                0x8F47
+#define GL_DOUBLE_MAT4_EXT                0x8F48
+#define GL_DOUBLE_MAT2x3_EXT              0x8F49
+#define GL_DOUBLE_MAT2x4_EXT              0x8F4A
+#define GL_DOUBLE_MAT3x2_EXT              0x8F4B
+#define GL_DOUBLE_MAT3x4_EXT              0x8F4C
+#define GL_DOUBLE_MAT4x2_EXT              0x8F4D
+#define GL_DOUBLE_MAT4x3_EXT              0x8F4E
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1DEXTPROC) (GLuint index, GLdouble x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2DEXTPROC) (GLuint index, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3DEXTPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4DEXTPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1DVEXTPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2DVEXTPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3DVEXTPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4DVEXTPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBLPOINTEREXTPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBLDVEXTPROC) (GLuint index, GLenum pname, GLdouble *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttribL1dEXT (GLuint index, GLdouble x);
+GLAPI void APIENTRY glVertexAttribL2dEXT (GLuint index, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glVertexAttribL3dEXT (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glVertexAttribL4dEXT (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glVertexAttribL1dvEXT (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribL2dvEXT (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribL3dvEXT (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribL4dvEXT (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribLPointerEXT (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glGetVertexAttribLdvEXT (GLuint index, GLenum pname, GLdouble *params);
+#endif
+#endif /* GL_EXT_vertex_attrib_64bit */
+
+#ifndef GL_EXT_vertex_shader
+#define GL_EXT_vertex_shader 1
+#define GL_VERTEX_SHADER_EXT              0x8780
+#define GL_VERTEX_SHADER_BINDING_EXT      0x8781
+#define GL_OP_INDEX_EXT                   0x8782
+#define GL_OP_NEGATE_EXT                  0x8783
+#define GL_OP_DOT3_EXT                    0x8784
+#define GL_OP_DOT4_EXT                    0x8785
+#define GL_OP_MUL_EXT                     0x8786
+#define GL_OP_ADD_EXT                     0x8787
+#define GL_OP_MADD_EXT                    0x8788
+#define GL_OP_FRAC_EXT                    0x8789
+#define GL_OP_MAX_EXT                     0x878A
+#define GL_OP_MIN_EXT                     0x878B
+#define GL_OP_SET_GE_EXT                  0x878C
+#define GL_OP_SET_LT_EXT                  0x878D
+#define GL_OP_CLAMP_EXT                   0x878E
+#define GL_OP_FLOOR_EXT                   0x878F
+#define GL_OP_ROUND_EXT                   0x8790
+#define GL_OP_EXP_BASE_2_EXT              0x8791
+#define GL_OP_LOG_BASE_2_EXT              0x8792
+#define GL_OP_POWER_EXT                   0x8793
+#define GL_OP_RECIP_EXT                   0x8794
+#define GL_OP_RECIP_SQRT_EXT              0x8795
+#define GL_OP_SUB_EXT                     0x8796
+#define GL_OP_CROSS_PRODUCT_EXT           0x8797
+#define GL_OP_MULTIPLY_MATRIX_EXT         0x8798
+#define GL_OP_MOV_EXT                     0x8799
+#define GL_OUTPUT_VERTEX_EXT              0x879A
+#define GL_OUTPUT_COLOR0_EXT              0x879B
+#define GL_OUTPUT_COLOR1_EXT              0x879C
+#define GL_OUTPUT_TEXTURE_COORD0_EXT      0x879D
+#define GL_OUTPUT_TEXTURE_COORD1_EXT      0x879E
+#define GL_OUTPUT_TEXTURE_COORD2_EXT      0x879F
+#define GL_OUTPUT_TEXTURE_COORD3_EXT      0x87A0
+#define GL_OUTPUT_TEXTURE_COORD4_EXT      0x87A1
+#define GL_OUTPUT_TEXTURE_COORD5_EXT      0x87A2
+#define GL_OUTPUT_TEXTURE_COORD6_EXT      0x87A3
+#define GL_OUTPUT_TEXTURE_COORD7_EXT      0x87A4
+#define GL_OUTPUT_TEXTURE_COORD8_EXT      0x87A5
+#define GL_OUTPUT_TEXTURE_COORD9_EXT      0x87A6
+#define GL_OUTPUT_TEXTURE_COORD10_EXT     0x87A7
+#define GL_OUTPUT_TEXTURE_COORD11_EXT     0x87A8
+#define GL_OUTPUT_TEXTURE_COORD12_EXT     0x87A9
+#define GL_OUTPUT_TEXTURE_COORD13_EXT     0x87AA
+#define GL_OUTPUT_TEXTURE_COORD14_EXT     0x87AB
+#define GL_OUTPUT_TEXTURE_COORD15_EXT     0x87AC
+#define GL_OUTPUT_TEXTURE_COORD16_EXT     0x87AD
+#define GL_OUTPUT_TEXTURE_COORD17_EXT     0x87AE
+#define GL_OUTPUT_TEXTURE_COORD18_EXT     0x87AF
+#define GL_OUTPUT_TEXTURE_COORD19_EXT     0x87B0
+#define GL_OUTPUT_TEXTURE_COORD20_EXT     0x87B1
+#define GL_OUTPUT_TEXTURE_COORD21_EXT     0x87B2
+#define GL_OUTPUT_TEXTURE_COORD22_EXT     0x87B3
+#define GL_OUTPUT_TEXTURE_COORD23_EXT     0x87B4
+#define GL_OUTPUT_TEXTURE_COORD24_EXT     0x87B5
+#define GL_OUTPUT_TEXTURE_COORD25_EXT     0x87B6
+#define GL_OUTPUT_TEXTURE_COORD26_EXT     0x87B7
+#define GL_OUTPUT_TEXTURE_COORD27_EXT     0x87B8
+#define GL_OUTPUT_TEXTURE_COORD28_EXT     0x87B9
+#define GL_OUTPUT_TEXTURE_COORD29_EXT     0x87BA
+#define GL_OUTPUT_TEXTURE_COORD30_EXT     0x87BB
+#define GL_OUTPUT_TEXTURE_COORD31_EXT     0x87BC
+#define GL_OUTPUT_FOG_EXT                 0x87BD
+#define GL_SCALAR_EXT                     0x87BE
+#define GL_VECTOR_EXT                     0x87BF
+#define GL_MATRIX_EXT                     0x87C0
+#define GL_VARIANT_EXT                    0x87C1
+#define GL_INVARIANT_EXT                  0x87C2
+#define GL_LOCAL_CONSTANT_EXT             0x87C3
+#define GL_LOCAL_EXT                      0x87C4
+#define GL_MAX_VERTEX_SHADER_INSTRUCTIONS_EXT 0x87C5
+#define GL_MAX_VERTEX_SHADER_VARIANTS_EXT 0x87C6
+#define GL_MAX_VERTEX_SHADER_INVARIANTS_EXT 0x87C7
+#define GL_MAX_VERTEX_SHADER_LOCAL_CONSTANTS_EXT 0x87C8
+#define GL_MAX_VERTEX_SHADER_LOCALS_EXT   0x87C9
+#define GL_MAX_OPTIMIZED_VERTEX_SHADER_INSTRUCTIONS_EXT 0x87CA
+#define GL_MAX_OPTIMIZED_VERTEX_SHADER_VARIANTS_EXT 0x87CB
+#define GL_MAX_OPTIMIZED_VERTEX_SHADER_LOCAL_CONSTANTS_EXT 0x87CC
+#define GL_MAX_OPTIMIZED_VERTEX_SHADER_INVARIANTS_EXT 0x87CD
+#define GL_MAX_OPTIMIZED_VERTEX_SHADER_LOCALS_EXT 0x87CE
+#define GL_VERTEX_SHADER_INSTRUCTIONS_EXT 0x87CF
+#define GL_VERTEX_SHADER_VARIANTS_EXT     0x87D0
+#define GL_VERTEX_SHADER_INVARIANTS_EXT   0x87D1
+#define GL_VERTEX_SHADER_LOCAL_CONSTANTS_EXT 0x87D2
+#define GL_VERTEX_SHADER_LOCALS_EXT       0x87D3
+#define GL_VERTEX_SHADER_OPTIMIZED_EXT    0x87D4
+#define GL_X_EXT                          0x87D5
+#define GL_Y_EXT                          0x87D6
+#define GL_Z_EXT                          0x87D7
+#define GL_W_EXT                          0x87D8
+#define GL_NEGATIVE_X_EXT                 0x87D9
+#define GL_NEGATIVE_Y_EXT                 0x87DA
+#define GL_NEGATIVE_Z_EXT                 0x87DB
+#define GL_NEGATIVE_W_EXT                 0x87DC
+#define GL_ZERO_EXT                       0x87DD
+#define GL_ONE_EXT                        0x87DE
+#define GL_NEGATIVE_ONE_EXT               0x87DF
+#define GL_NORMALIZED_RANGE_EXT           0x87E0
+#define GL_FULL_RANGE_EXT                 0x87E1
+#define GL_CURRENT_VERTEX_EXT             0x87E2
+#define GL_MVP_MATRIX_EXT                 0x87E3
+#define GL_VARIANT_VALUE_EXT              0x87E4
+#define GL_VARIANT_DATATYPE_EXT           0x87E5
+#define GL_VARIANT_ARRAY_STRIDE_EXT       0x87E6
+#define GL_VARIANT_ARRAY_TYPE_EXT         0x87E7
+#define GL_VARIANT_ARRAY_EXT              0x87E8
+#define GL_VARIANT_ARRAY_POINTER_EXT      0x87E9
+#define GL_INVARIANT_VALUE_EXT            0x87EA
+#define GL_INVARIANT_DATATYPE_EXT         0x87EB
+#define GL_LOCAL_CONSTANT_VALUE_EXT       0x87EC
+#define GL_LOCAL_CONSTANT_DATATYPE_EXT    0x87ED
+typedef void (APIENTRYP PFNGLBEGINVERTEXSHADEREXTPROC) (void);
+typedef void (APIENTRYP PFNGLENDVERTEXSHADEREXTPROC) (void);
+typedef void (APIENTRYP PFNGLBINDVERTEXSHADEREXTPROC) (GLuint id);
+typedef GLuint (APIENTRYP PFNGLGENVERTEXSHADERSEXTPROC) (GLuint range);
+typedef void (APIENTRYP PFNGLDELETEVERTEXSHADEREXTPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLSHADEROP1EXTPROC) (GLenum op, GLuint res, GLuint arg1);
+typedef void (APIENTRYP PFNGLSHADEROP2EXTPROC) (GLenum op, GLuint res, GLuint arg1, GLuint arg2);
+typedef void (APIENTRYP PFNGLSHADEROP3EXTPROC) (GLenum op, GLuint res, GLuint arg1, GLuint arg2, GLuint arg3);
+typedef void (APIENTRYP PFNGLSWIZZLEEXTPROC) (GLuint res, GLuint in, GLenum outX, GLenum outY, GLenum outZ, GLenum outW);
+typedef void (APIENTRYP PFNGLWRITEMASKEXTPROC) (GLuint res, GLuint in, GLenum outX, GLenum outY, GLenum outZ, GLenum outW);
+typedef void (APIENTRYP PFNGLINSERTCOMPONENTEXTPROC) (GLuint res, GLuint src, GLuint num);
+typedef void (APIENTRYP PFNGLEXTRACTCOMPONENTEXTPROC) (GLuint res, GLuint src, GLuint num);
+typedef GLuint (APIENTRYP PFNGLGENSYMBOLSEXTPROC) (GLenum datatype, GLenum storagetype, GLenum range, GLuint components);
+typedef void (APIENTRYP PFNGLSETINVARIANTEXTPROC) (GLuint id, GLenum type, const void *addr);
+typedef void (APIENTRYP PFNGLSETLOCALCONSTANTEXTPROC) (GLuint id, GLenum type, const void *addr);
+typedef void (APIENTRYP PFNGLVARIANTBVEXTPROC) (GLuint id, const GLbyte *addr);
+typedef void (APIENTRYP PFNGLVARIANTSVEXTPROC) (GLuint id, const GLshort *addr);
+typedef void (APIENTRYP PFNGLVARIANTIVEXTPROC) (GLuint id, const GLint *addr);
+typedef void (APIENTRYP PFNGLVARIANTFVEXTPROC) (GLuint id, const GLfloat *addr);
+typedef void (APIENTRYP PFNGLVARIANTDVEXTPROC) (GLuint id, const GLdouble *addr);
+typedef void (APIENTRYP PFNGLVARIANTUBVEXTPROC) (GLuint id, const GLubyte *addr);
+typedef void (APIENTRYP PFNGLVARIANTUSVEXTPROC) (GLuint id, const GLushort *addr);
+typedef void (APIENTRYP PFNGLVARIANTUIVEXTPROC) (GLuint id, const GLuint *addr);
+typedef void (APIENTRYP PFNGLVARIANTPOINTEREXTPROC) (GLuint id, GLenum type, GLuint stride, const void *addr);
+typedef void (APIENTRYP PFNGLENABLEVARIANTCLIENTSTATEEXTPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLDISABLEVARIANTCLIENTSTATEEXTPROC) (GLuint id);
+typedef GLuint (APIENTRYP PFNGLBINDLIGHTPARAMETEREXTPROC) (GLenum light, GLenum value);
+typedef GLuint (APIENTRYP PFNGLBINDMATERIALPARAMETEREXTPROC) (GLenum face, GLenum value);
+typedef GLuint (APIENTRYP PFNGLBINDTEXGENPARAMETEREXTPROC) (GLenum unit, GLenum coord, GLenum value);
+typedef GLuint (APIENTRYP PFNGLBINDTEXTUREUNITPARAMETEREXTPROC) (GLenum unit, GLenum value);
+typedef GLuint (APIENTRYP PFNGLBINDPARAMETEREXTPROC) (GLenum value);
+typedef GLboolean (APIENTRYP PFNGLISVARIANTENABLEDEXTPROC) (GLuint id, GLenum cap);
+typedef void (APIENTRYP PFNGLGETVARIANTBOOLEANVEXTPROC) (GLuint id, GLenum value, GLboolean *data);
+typedef void (APIENTRYP PFNGLGETVARIANTINTEGERVEXTPROC) (GLuint id, GLenum value, GLint *data);
+typedef void (APIENTRYP PFNGLGETVARIANTFLOATVEXTPROC) (GLuint id, GLenum value, GLfloat *data);
+typedef void (APIENTRYP PFNGLGETVARIANTPOINTERVEXTPROC) (GLuint id, GLenum value, void **data);
+typedef void (APIENTRYP PFNGLGETINVARIANTBOOLEANVEXTPROC) (GLuint id, GLenum value, GLboolean *data);
+typedef void (APIENTRYP PFNGLGETINVARIANTINTEGERVEXTPROC) (GLuint id, GLenum value, GLint *data);
+typedef void (APIENTRYP PFNGLGETINVARIANTFLOATVEXTPROC) (GLuint id, GLenum value, GLfloat *data);
+typedef void (APIENTRYP PFNGLGETLOCALCONSTANTBOOLEANVEXTPROC) (GLuint id, GLenum value, GLboolean *data);
+typedef void (APIENTRYP PFNGLGETLOCALCONSTANTINTEGERVEXTPROC) (GLuint id, GLenum value, GLint *data);
+typedef void (APIENTRYP PFNGLGETLOCALCONSTANTFLOATVEXTPROC) (GLuint id, GLenum value, GLfloat *data);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginVertexShaderEXT (void);
+GLAPI void APIENTRY glEndVertexShaderEXT (void);
+GLAPI void APIENTRY glBindVertexShaderEXT (GLuint id);
+GLAPI GLuint APIENTRY glGenVertexShadersEXT (GLuint range);
+GLAPI void APIENTRY glDeleteVertexShaderEXT (GLuint id);
+GLAPI void APIENTRY glShaderOp1EXT (GLenum op, GLuint res, GLuint arg1);
+GLAPI void APIENTRY glShaderOp2EXT (GLenum op, GLuint res, GLuint arg1, GLuint arg2);
+GLAPI void APIENTRY glShaderOp3EXT (GLenum op, GLuint res, GLuint arg1, GLuint arg2, GLuint arg3);
+GLAPI void APIENTRY glSwizzleEXT (GLuint res, GLuint in, GLenum outX, GLenum outY, GLenum outZ, GLenum outW);
+GLAPI void APIENTRY glWriteMaskEXT (GLuint res, GLuint in, GLenum outX, GLenum outY, GLenum outZ, GLenum outW);
+GLAPI void APIENTRY glInsertComponentEXT (GLuint res, GLuint src, GLuint num);
+GLAPI void APIENTRY glExtractComponentEXT (GLuint res, GLuint src, GLuint num);
+GLAPI GLuint APIENTRY glGenSymbolsEXT (GLenum datatype, GLenum storagetype, GLenum range, GLuint components);
+GLAPI void APIENTRY glSetInvariantEXT (GLuint id, GLenum type, const void *addr);
+GLAPI void APIENTRY glSetLocalConstantEXT (GLuint id, GLenum type, const void *addr);
+GLAPI void APIENTRY glVariantbvEXT (GLuint id, const GLbyte *addr);
+GLAPI void APIENTRY glVariantsvEXT (GLuint id, const GLshort *addr);
+GLAPI void APIENTRY glVariantivEXT (GLuint id, const GLint *addr);
+GLAPI void APIENTRY glVariantfvEXT (GLuint id, const GLfloat *addr);
+GLAPI void APIENTRY glVariantdvEXT (GLuint id, const GLdouble *addr);
+GLAPI void APIENTRY glVariantubvEXT (GLuint id, const GLubyte *addr);
+GLAPI void APIENTRY glVariantusvEXT (GLuint id, const GLushort *addr);
+GLAPI void APIENTRY glVariantuivEXT (GLuint id, const GLuint *addr);
+GLAPI void APIENTRY glVariantPointerEXT (GLuint id, GLenum type, GLuint stride, const void *addr);
+GLAPI void APIENTRY glEnableVariantClientStateEXT (GLuint id);
+GLAPI void APIENTRY glDisableVariantClientStateEXT (GLuint id);
+GLAPI GLuint APIENTRY glBindLightParameterEXT (GLenum light, GLenum value);
+GLAPI GLuint APIENTRY glBindMaterialParameterEXT (GLenum face, GLenum value);
+GLAPI GLuint APIENTRY glBindTexGenParameterEXT (GLenum unit, GLenum coord, GLenum value);
+GLAPI GLuint APIENTRY glBindTextureUnitParameterEXT (GLenum unit, GLenum value);
+GLAPI GLuint APIENTRY glBindParameterEXT (GLenum value);
+GLAPI GLboolean APIENTRY glIsVariantEnabledEXT (GLuint id, GLenum cap);
+GLAPI void APIENTRY glGetVariantBooleanvEXT (GLuint id, GLenum value, GLboolean *data);
+GLAPI void APIENTRY glGetVariantIntegervEXT (GLuint id, GLenum value, GLint *data);
+GLAPI void APIENTRY glGetVariantFloatvEXT (GLuint id, GLenum value, GLfloat *data);
+GLAPI void APIENTRY glGetVariantPointervEXT (GLuint id, GLenum value, void **data);
+GLAPI void APIENTRY glGetInvariantBooleanvEXT (GLuint id, GLenum value, GLboolean *data);
+GLAPI void APIENTRY glGetInvariantIntegervEXT (GLuint id, GLenum value, GLint *data);
+GLAPI void APIENTRY glGetInvariantFloatvEXT (GLuint id, GLenum value, GLfloat *data);
+GLAPI void APIENTRY glGetLocalConstantBooleanvEXT (GLuint id, GLenum value, GLboolean *data);
+GLAPI void APIENTRY glGetLocalConstantIntegervEXT (GLuint id, GLenum value, GLint *data);
+GLAPI void APIENTRY glGetLocalConstantFloatvEXT (GLuint id, GLenum value, GLfloat *data);
+#endif
+#endif /* GL_EXT_vertex_shader */
+
+#ifndef GL_EXT_vertex_weighting
+#define GL_EXT_vertex_weighting 1
+#define GL_MODELVIEW0_STACK_DEPTH_EXT     0x0BA3
+#define GL_MODELVIEW1_STACK_DEPTH_EXT     0x8502
+#define GL_MODELVIEW0_MATRIX_EXT          0x0BA6
+#define GL_MODELVIEW1_MATRIX_EXT          0x8506
+#define GL_VERTEX_WEIGHTING_EXT           0x8509
+#define GL_MODELVIEW0_EXT                 0x1700
+#define GL_MODELVIEW1_EXT                 0x850A
+#define GL_CURRENT_VERTEX_WEIGHT_EXT      0x850B
+#define GL_VERTEX_WEIGHT_ARRAY_EXT        0x850C
+#define GL_VERTEX_WEIGHT_ARRAY_SIZE_EXT   0x850D
+#define GL_VERTEX_WEIGHT_ARRAY_TYPE_EXT   0x850E
+#define GL_VERTEX_WEIGHT_ARRAY_STRIDE_EXT 0x850F
+#define GL_VERTEX_WEIGHT_ARRAY_POINTER_EXT 0x8510
+typedef void (APIENTRYP PFNGLVERTEXWEIGHTFEXTPROC) (GLfloat weight);
+typedef void (APIENTRYP PFNGLVERTEXWEIGHTFVEXTPROC) (const GLfloat *weight);
+typedef void (APIENTRYP PFNGLVERTEXWEIGHTPOINTEREXTPROC) (GLint size, GLenum type, GLsizei stride, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexWeightfEXT (GLfloat weight);
+GLAPI void APIENTRY glVertexWeightfvEXT (const GLfloat *weight);
+GLAPI void APIENTRY glVertexWeightPointerEXT (GLint size, GLenum type, GLsizei stride, const void *pointer);
+#endif
+#endif /* GL_EXT_vertex_weighting */
+
+#ifndef GL_EXT_x11_sync_object
+#define GL_EXT_x11_sync_object 1
+#define GL_SYNC_X11_FENCE_EXT             0x90E1
+typedef GLsync (APIENTRYP PFNGLIMPORTSYNCEXTPROC) (GLenum external_sync_type, GLintptr external_sync, GLbitfield flags);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLsync APIENTRY glImportSyncEXT (GLenum external_sync_type, GLintptr external_sync, GLbitfield flags);
+#endif
+#endif /* GL_EXT_x11_sync_object */
+
+#ifndef GL_GREMEDY_frame_terminator
+#define GL_GREMEDY_frame_terminator 1
+typedef void (APIENTRYP PFNGLFRAMETERMINATORGREMEDYPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFrameTerminatorGREMEDY (void);
+#endif
+#endif /* GL_GREMEDY_frame_terminator */
+
+#ifndef GL_GREMEDY_string_marker
+#define GL_GREMEDY_string_marker 1
+typedef void (APIENTRYP PFNGLSTRINGMARKERGREMEDYPROC) (GLsizei len, const void *string);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glStringMarkerGREMEDY (GLsizei len, const void *string);
+#endif
+#endif /* GL_GREMEDY_string_marker */
+
+#ifndef GL_HP_convolution_border_modes
+#define GL_HP_convolution_border_modes 1
+#define GL_IGNORE_BORDER_HP               0x8150
+#define GL_CONSTANT_BORDER_HP             0x8151
+#define GL_REPLICATE_BORDER_HP            0x8153
+#define GL_CONVOLUTION_BORDER_COLOR_HP    0x8154
+#endif /* GL_HP_convolution_border_modes */
+
+#ifndef GL_HP_image_transform
+#define GL_HP_image_transform 1
+#define GL_IMAGE_SCALE_X_HP               0x8155
+#define GL_IMAGE_SCALE_Y_HP               0x8156
+#define GL_IMAGE_TRANSLATE_X_HP           0x8157
+#define GL_IMAGE_TRANSLATE_Y_HP           0x8158
+#define GL_IMAGE_ROTATE_ANGLE_HP          0x8159
+#define GL_IMAGE_ROTATE_ORIGIN_X_HP       0x815A
+#define GL_IMAGE_ROTATE_ORIGIN_Y_HP       0x815B
+#define GL_IMAGE_MAG_FILTER_HP            0x815C
+#define GL_IMAGE_MIN_FILTER_HP            0x815D
+#define GL_IMAGE_CUBIC_WEIGHT_HP          0x815E
+#define GL_CUBIC_HP                       0x815F
+#define GL_AVERAGE_HP                     0x8160
+#define GL_IMAGE_TRANSFORM_2D_HP          0x8161
+#define GL_POST_IMAGE_TRANSFORM_COLOR_TABLE_HP 0x8162
+#define GL_PROXY_POST_IMAGE_TRANSFORM_COLOR_TABLE_HP 0x8163
+typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERIHPPROC) (GLenum target, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERFHPPROC) (GLenum target, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERIVHPPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLIMAGETRANSFORMPARAMETERFVHPPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETIMAGETRANSFORMPARAMETERIVHPPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETIMAGETRANSFORMPARAMETERFVHPPROC) (GLenum target, GLenum pname, GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glImageTransformParameteriHP (GLenum target, GLenum pname, GLint param);
+GLAPI void APIENTRY glImageTransformParameterfHP (GLenum target, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glImageTransformParameterivHP (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glImageTransformParameterfvHP (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glGetImageTransformParameterivHP (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetImageTransformParameterfvHP (GLenum target, GLenum pname, GLfloat *params);
+#endif
+#endif /* GL_HP_image_transform */
+
+#ifndef GL_HP_occlusion_test
+#define GL_HP_occlusion_test 1
+#define GL_OCCLUSION_TEST_HP              0x8165
+#define GL_OCCLUSION_TEST_RESULT_HP       0x8166
+#endif /* GL_HP_occlusion_test */
+
+#ifndef GL_HP_texture_lighting
+#define GL_HP_texture_lighting 1
+#define GL_TEXTURE_LIGHTING_MODE_HP       0x8167
+#define GL_TEXTURE_POST_SPECULAR_HP       0x8168
+#define GL_TEXTURE_PRE_SPECULAR_HP        0x8169
+#endif /* GL_HP_texture_lighting */
+
+#ifndef GL_IBM_cull_vertex
+#define GL_IBM_cull_vertex 1
+#define GL_CULL_VERTEX_IBM                103050
+#endif /* GL_IBM_cull_vertex */
+
+#ifndef GL_IBM_multimode_draw_arrays
+#define GL_IBM_multimode_draw_arrays 1
+typedef void (APIENTRYP PFNGLMULTIMODEDRAWARRAYSIBMPROC) (const GLenum *mode, const GLint *first, const GLsizei *count, GLsizei primcount, GLint modestride);
+typedef void (APIENTRYP PFNGLMULTIMODEDRAWELEMENTSIBMPROC) (const GLenum *mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei primcount, GLint modestride);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMultiModeDrawArraysIBM (const GLenum *mode, const GLint *first, const GLsizei *count, GLsizei primcount, GLint modestride);
+GLAPI void APIENTRY glMultiModeDrawElementsIBM (const GLenum *mode, const GLsizei *count, GLenum type, const void *const*indices, GLsizei primcount, GLint modestride);
+#endif
+#endif /* GL_IBM_multimode_draw_arrays */
+
+#ifndef GL_IBM_rasterpos_clip
+#define GL_IBM_rasterpos_clip 1
+#define GL_RASTER_POSITION_UNCLIPPED_IBM  0x19262
+#endif /* GL_IBM_rasterpos_clip */
+
+#ifndef GL_IBM_static_data
+#define GL_IBM_static_data 1
+#define GL_ALL_STATIC_DATA_IBM            103060
+#define GL_STATIC_VERTEX_ARRAY_IBM        103061
+typedef void (APIENTRYP PFNGLFLUSHSTATICDATAIBMPROC) (GLenum target);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFlushStaticDataIBM (GLenum target);
+#endif
+#endif /* GL_IBM_static_data */
+
+#ifndef GL_IBM_texture_mirrored_repeat
+#define GL_IBM_texture_mirrored_repeat 1
+#define GL_MIRRORED_REPEAT_IBM            0x8370
+#endif /* GL_IBM_texture_mirrored_repeat */
+
+#ifndef GL_IBM_vertex_array_lists
+#define GL_IBM_vertex_array_lists 1
+#define GL_VERTEX_ARRAY_LIST_IBM          103070
+#define GL_NORMAL_ARRAY_LIST_IBM          103071
+#define GL_COLOR_ARRAY_LIST_IBM           103072
+#define GL_INDEX_ARRAY_LIST_IBM           103073
+#define GL_TEXTURE_COORD_ARRAY_LIST_IBM   103074
+#define GL_EDGE_FLAG_ARRAY_LIST_IBM       103075
+#define GL_FOG_COORDINATE_ARRAY_LIST_IBM  103076
+#define GL_SECONDARY_COLOR_ARRAY_LIST_IBM 103077
+#define GL_VERTEX_ARRAY_LIST_STRIDE_IBM   103080
+#define GL_NORMAL_ARRAY_LIST_STRIDE_IBM   103081
+#define GL_COLOR_ARRAY_LIST_STRIDE_IBM    103082
+#define GL_INDEX_ARRAY_LIST_STRIDE_IBM    103083
+#define GL_TEXTURE_COORD_ARRAY_LIST_STRIDE_IBM 103084
+#define GL_EDGE_FLAG_ARRAY_LIST_STRIDE_IBM 103085
+#define GL_FOG_COORDINATE_ARRAY_LIST_STRIDE_IBM 103086
+#define GL_SECONDARY_COLOR_ARRAY_LIST_STRIDE_IBM 103087
+typedef void (APIENTRYP PFNGLCOLORPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLSECONDARYCOLORPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLEDGEFLAGPOINTERLISTIBMPROC) (GLint stride, const GLboolean **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLFOGCOORDPOINTERLISTIBMPROC) (GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLINDEXPOINTERLISTIBMPROC) (GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLNORMALPOINTERLISTIBMPROC) (GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLTEXCOORDPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+typedef void (APIENTRYP PFNGLVERTEXPOINTERLISTIBMPROC) (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorPointerListIBM (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+GLAPI void APIENTRY glSecondaryColorPointerListIBM (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+GLAPI void APIENTRY glEdgeFlagPointerListIBM (GLint stride, const GLboolean **pointer, GLint ptrstride);
+GLAPI void APIENTRY glFogCoordPointerListIBM (GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+GLAPI void APIENTRY glIndexPointerListIBM (GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+GLAPI void APIENTRY glNormalPointerListIBM (GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+GLAPI void APIENTRY glTexCoordPointerListIBM (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+GLAPI void APIENTRY glVertexPointerListIBM (GLint size, GLenum type, GLint stride, const void **pointer, GLint ptrstride);
+#endif
+#endif /* GL_IBM_vertex_array_lists */
+
+#ifndef GL_INGR_blend_func_separate
+#define GL_INGR_blend_func_separate 1
+typedef void (APIENTRYP PFNGLBLENDFUNCSEPARATEINGRPROC) (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendFuncSeparateINGR (GLenum sfactorRGB, GLenum dfactorRGB, GLenum sfactorAlpha, GLenum dfactorAlpha);
+#endif
+#endif /* GL_INGR_blend_func_separate */
+
+#ifndef GL_INGR_color_clamp
+#define GL_INGR_color_clamp 1
+#define GL_RED_MIN_CLAMP_INGR             0x8560
+#define GL_GREEN_MIN_CLAMP_INGR           0x8561
+#define GL_BLUE_MIN_CLAMP_INGR            0x8562
+#define GL_ALPHA_MIN_CLAMP_INGR           0x8563
+#define GL_RED_MAX_CLAMP_INGR             0x8564
+#define GL_GREEN_MAX_CLAMP_INGR           0x8565
+#define GL_BLUE_MAX_CLAMP_INGR            0x8566
+#define GL_ALPHA_MAX_CLAMP_INGR           0x8567
+#endif /* GL_INGR_color_clamp */
+
+#ifndef GL_INGR_interlace_read
+#define GL_INGR_interlace_read 1
+#define GL_INTERLACE_READ_INGR            0x8568
+#endif /* GL_INGR_interlace_read */
+
+#ifndef GL_INTEL_fragment_shader_ordering
+#define GL_INTEL_fragment_shader_ordering 1
+#endif /* GL_INTEL_fragment_shader_ordering */
+
+#ifndef GL_INTEL_map_texture
+#define GL_INTEL_map_texture 1
+#define GL_TEXTURE_MEMORY_LAYOUT_INTEL    0x83FF
+#define GL_LAYOUT_DEFAULT_INTEL           0
+#define GL_LAYOUT_LINEAR_INTEL            1
+#define GL_LAYOUT_LINEAR_CPU_CACHED_INTEL 2
+typedef void (APIENTRYP PFNGLSYNCTEXTUREINTELPROC) (GLuint texture);
+typedef void (APIENTRYP PFNGLUNMAPTEXTURE2DINTELPROC) (GLuint texture, GLint level);
+typedef void *(APIENTRYP PFNGLMAPTEXTURE2DINTELPROC) (GLuint texture, GLint level, GLbitfield access, GLint *stride, GLenum *layout);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSyncTextureINTEL (GLuint texture);
+GLAPI void APIENTRY glUnmapTexture2DINTEL (GLuint texture, GLint level);
+GLAPI void *APIENTRY glMapTexture2DINTEL (GLuint texture, GLint level, GLbitfield access, GLint *stride, GLenum *layout);
+#endif
+#endif /* GL_INTEL_map_texture */
+
+#ifndef GL_INTEL_parallel_arrays
+#define GL_INTEL_parallel_arrays 1
+#define GL_PARALLEL_ARRAYS_INTEL          0x83F4
+#define GL_VERTEX_ARRAY_PARALLEL_POINTERS_INTEL 0x83F5
+#define GL_NORMAL_ARRAY_PARALLEL_POINTERS_INTEL 0x83F6
+#define GL_COLOR_ARRAY_PARALLEL_POINTERS_INTEL 0x83F7
+#define GL_TEXTURE_COORD_ARRAY_PARALLEL_POINTERS_INTEL 0x83F8
+typedef void (APIENTRYP PFNGLVERTEXPOINTERVINTELPROC) (GLint size, GLenum type, const void **pointer);
+typedef void (APIENTRYP PFNGLNORMALPOINTERVINTELPROC) (GLenum type, const void **pointer);
+typedef void (APIENTRYP PFNGLCOLORPOINTERVINTELPROC) (GLint size, GLenum type, const void **pointer);
+typedef void (APIENTRYP PFNGLTEXCOORDPOINTERVINTELPROC) (GLint size, GLenum type, const void **pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexPointervINTEL (GLint size, GLenum type, const void **pointer);
+GLAPI void APIENTRY glNormalPointervINTEL (GLenum type, const void **pointer);
+GLAPI void APIENTRY glColorPointervINTEL (GLint size, GLenum type, const void **pointer);
+GLAPI void APIENTRY glTexCoordPointervINTEL (GLint size, GLenum type, const void **pointer);
+#endif
+#endif /* GL_INTEL_parallel_arrays */
+
+#ifndef GL_INTEL_performance_query
+#define GL_INTEL_performance_query 1
+#define GL_PERFQUERY_SINGLE_CONTEXT_INTEL 0x00000000
+#define GL_PERFQUERY_GLOBAL_CONTEXT_INTEL 0x00000001
+#define GL_PERFQUERY_WAIT_INTEL           0x83FB
+#define GL_PERFQUERY_FLUSH_INTEL          0x83FA
+#define GL_PERFQUERY_DONOT_FLUSH_INTEL    0x83F9
+#define GL_PERFQUERY_COUNTER_EVENT_INTEL  0x94F0
+#define GL_PERFQUERY_COUNTER_DURATION_NORM_INTEL 0x94F1
+#define GL_PERFQUERY_COUNTER_DURATION_RAW_INTEL 0x94F2
+#define GL_PERFQUERY_COUNTER_THROUGHPUT_INTEL 0x94F3
+#define GL_PERFQUERY_COUNTER_RAW_INTEL    0x94F4
+#define GL_PERFQUERY_COUNTER_TIMESTAMP_INTEL 0x94F5
+#define GL_PERFQUERY_COUNTER_DATA_UINT32_INTEL 0x94F8
+#define GL_PERFQUERY_COUNTER_DATA_UINT64_INTEL 0x94F9
+#define GL_PERFQUERY_COUNTER_DATA_FLOAT_INTEL 0x94FA
+#define GL_PERFQUERY_COUNTER_DATA_DOUBLE_INTEL 0x94FB
+#define GL_PERFQUERY_COUNTER_DATA_BOOL32_INTEL 0x94FC
+#define GL_PERFQUERY_QUERY_NAME_LENGTH_MAX_INTEL 0x94FD
+#define GL_PERFQUERY_COUNTER_NAME_LENGTH_MAX_INTEL 0x94FE
+#define GL_PERFQUERY_COUNTER_DESC_LENGTH_MAX_INTEL 0x94FF
+#define GL_PERFQUERY_GPA_EXTENDED_COUNTERS_INTEL 0x9500
+typedef void (APIENTRYP PFNGLBEGINPERFQUERYINTELPROC) (GLuint queryHandle);
+typedef void (APIENTRYP PFNGLCREATEPERFQUERYINTELPROC) (GLuint queryId, GLuint *queryHandle);
+typedef void (APIENTRYP PFNGLDELETEPERFQUERYINTELPROC) (GLuint queryHandle);
+typedef void (APIENTRYP PFNGLENDPERFQUERYINTELPROC) (GLuint queryHandle);
+typedef void (APIENTRYP PFNGLGETFIRSTPERFQUERYIDINTELPROC) (GLuint *queryId);
+typedef void (APIENTRYP PFNGLGETNEXTPERFQUERYIDINTELPROC) (GLuint queryId, GLuint *nextQueryId);
+typedef void (APIENTRYP PFNGLGETPERFCOUNTERINFOINTELPROC) (GLuint queryId, GLuint counterId, GLuint counterNameLength, GLchar *counterName, GLuint counterDescLength, GLchar *counterDesc, GLuint *counterOffset, GLuint *counterDataSize, GLuint *counterTypeEnum, GLuint *counterDataTypeEnum, GLuint64 *rawCounterMaxValue);
+typedef void (APIENTRYP PFNGLGETPERFQUERYDATAINTELPROC) (GLuint queryHandle, GLuint flags, GLsizei dataSize, GLvoid *data, GLuint *bytesWritten);
+typedef void (APIENTRYP PFNGLGETPERFQUERYIDBYNAMEINTELPROC) (GLchar *queryName, GLuint *queryId);
+typedef void (APIENTRYP PFNGLGETPERFQUERYINFOINTELPROC) (GLuint queryId, GLuint queryNameLength, GLchar *queryName, GLuint *dataSize, GLuint *noCounters, GLuint *noInstances, GLuint *capsMask);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginPerfQueryINTEL (GLuint queryHandle);
+GLAPI void APIENTRY glCreatePerfQueryINTEL (GLuint queryId, GLuint *queryHandle);
+GLAPI void APIENTRY glDeletePerfQueryINTEL (GLuint queryHandle);
+GLAPI void APIENTRY glEndPerfQueryINTEL (GLuint queryHandle);
+GLAPI void APIENTRY glGetFirstPerfQueryIdINTEL (GLuint *queryId);
+GLAPI void APIENTRY glGetNextPerfQueryIdINTEL (GLuint queryId, GLuint *nextQueryId);
+GLAPI void APIENTRY glGetPerfCounterInfoINTEL (GLuint queryId, GLuint counterId, GLuint counterNameLength, GLchar *counterName, GLuint counterDescLength, GLchar *counterDesc, GLuint *counterOffset, GLuint *counterDataSize, GLuint *counterTypeEnum, GLuint *counterDataTypeEnum, GLuint64 *rawCounterMaxValue);
+GLAPI void APIENTRY glGetPerfQueryDataINTEL (GLuint queryHandle, GLuint flags, GLsizei dataSize, GLvoid *data, GLuint *bytesWritten);
+GLAPI void APIENTRY glGetPerfQueryIdByNameINTEL (GLchar *queryName, GLuint *queryId);
+GLAPI void APIENTRY glGetPerfQueryInfoINTEL (GLuint queryId, GLuint queryNameLength, GLchar *queryName, GLuint *dataSize, GLuint *noCounters, GLuint *noInstances, GLuint *capsMask);
+#endif
+#endif /* GL_INTEL_performance_query */
+
+#ifndef GL_MESAX_texture_stack
+#define GL_MESAX_texture_stack 1
+#define GL_TEXTURE_1D_STACK_MESAX         0x8759
+#define GL_TEXTURE_2D_STACK_MESAX         0x875A
+#define GL_PROXY_TEXTURE_1D_STACK_MESAX   0x875B
+#define GL_PROXY_TEXTURE_2D_STACK_MESAX   0x875C
+#define GL_TEXTURE_1D_STACK_BINDING_MESAX 0x875D
+#define GL_TEXTURE_2D_STACK_BINDING_MESAX 0x875E
+#endif /* GL_MESAX_texture_stack */
+
+#ifndef GL_MESA_pack_invert
+#define GL_MESA_pack_invert 1
+#define GL_PACK_INVERT_MESA               0x8758
+#endif /* GL_MESA_pack_invert */
+
+#ifndef GL_MESA_resize_buffers
+#define GL_MESA_resize_buffers 1
+typedef void (APIENTRYP PFNGLRESIZEBUFFERSMESAPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glResizeBuffersMESA (void);
+#endif
+#endif /* GL_MESA_resize_buffers */
+
+#ifndef GL_MESA_window_pos
+#define GL_MESA_window_pos 1
+typedef void (APIENTRYP PFNGLWINDOWPOS2DMESAPROC) (GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2DVMESAPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2FMESAPROC) (GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2FVMESAPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2IMESAPROC) (GLint x, GLint y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2IVMESAPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS2SMESAPROC) (GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLWINDOWPOS2SVMESAPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3DMESAPROC) (GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3DVMESAPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3FMESAPROC) (GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3FVMESAPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3IMESAPROC) (GLint x, GLint y, GLint z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3IVMESAPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS3SMESAPROC) (GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLWINDOWPOS3SVMESAPROC) (const GLshort *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS4DMESAPROC) (GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLWINDOWPOS4DVMESAPROC) (const GLdouble *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS4FMESAPROC) (GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLWINDOWPOS4FVMESAPROC) (const GLfloat *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS4IMESAPROC) (GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLWINDOWPOS4IVMESAPROC) (const GLint *v);
+typedef void (APIENTRYP PFNGLWINDOWPOS4SMESAPROC) (GLshort x, GLshort y, GLshort z, GLshort w);
+typedef void (APIENTRYP PFNGLWINDOWPOS4SVMESAPROC) (const GLshort *v);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glWindowPos2dMESA (GLdouble x, GLdouble y);
+GLAPI void APIENTRY glWindowPos2dvMESA (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos2fMESA (GLfloat x, GLfloat y);
+GLAPI void APIENTRY glWindowPos2fvMESA (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos2iMESA (GLint x, GLint y);
+GLAPI void APIENTRY glWindowPos2ivMESA (const GLint *v);
+GLAPI void APIENTRY glWindowPos2sMESA (GLshort x, GLshort y);
+GLAPI void APIENTRY glWindowPos2svMESA (const GLshort *v);
+GLAPI void APIENTRY glWindowPos3dMESA (GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glWindowPos3dvMESA (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos3fMESA (GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glWindowPos3fvMESA (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos3iMESA (GLint x, GLint y, GLint z);
+GLAPI void APIENTRY glWindowPos3ivMESA (const GLint *v);
+GLAPI void APIENTRY glWindowPos3sMESA (GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glWindowPos3svMESA (const GLshort *v);
+GLAPI void APIENTRY glWindowPos4dMESA (GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glWindowPos4dvMESA (const GLdouble *v);
+GLAPI void APIENTRY glWindowPos4fMESA (GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glWindowPos4fvMESA (const GLfloat *v);
+GLAPI void APIENTRY glWindowPos4iMESA (GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glWindowPos4ivMESA (const GLint *v);
+GLAPI void APIENTRY glWindowPos4sMESA (GLshort x, GLshort y, GLshort z, GLshort w);
+GLAPI void APIENTRY glWindowPos4svMESA (const GLshort *v);
+#endif
+#endif /* GL_MESA_window_pos */
+
+#ifndef GL_MESA_ycbcr_texture
+#define GL_MESA_ycbcr_texture 1
+#define GL_UNSIGNED_SHORT_8_8_MESA        0x85BA
+#define GL_UNSIGNED_SHORT_8_8_REV_MESA    0x85BB
+#define GL_YCBCR_MESA                     0x8757
+#endif /* GL_MESA_ycbcr_texture */
+
+#ifndef GL_NVX_conditional_render
+#define GL_NVX_conditional_render 1
+typedef void (APIENTRYP PFNGLBEGINCONDITIONALRENDERNVXPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLENDCONDITIONALRENDERNVXPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginConditionalRenderNVX (GLuint id);
+GLAPI void APIENTRY glEndConditionalRenderNVX (void);
+#endif
+#endif /* GL_NVX_conditional_render */
+
+#ifndef GL_NV_bindless_multi_draw_indirect
+#define GL_NV_bindless_multi_draw_indirect 1
+typedef void (APIENTRYP PFNGLMULTIDRAWARRAYSINDIRECTBINDLESSNVPROC) (GLenum mode, const void *indirect, GLsizei drawCount, GLsizei stride, GLint vertexBufferCount);
+typedef void (APIENTRYP PFNGLMULTIDRAWELEMENTSINDIRECTBINDLESSNVPROC) (GLenum mode, GLenum type, const void *indirect, GLsizei drawCount, GLsizei stride, GLint vertexBufferCount);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMultiDrawArraysIndirectBindlessNV (GLenum mode, const void *indirect, GLsizei drawCount, GLsizei stride, GLint vertexBufferCount);
+GLAPI void APIENTRY glMultiDrawElementsIndirectBindlessNV (GLenum mode, GLenum type, const void *indirect, GLsizei drawCount, GLsizei stride, GLint vertexBufferCount);
+#endif
+#endif /* GL_NV_bindless_multi_draw_indirect */
+
+#ifndef GL_NV_bindless_texture
+#define GL_NV_bindless_texture 1
+typedef GLuint64 (APIENTRYP PFNGLGETTEXTUREHANDLENVPROC) (GLuint texture);
+typedef GLuint64 (APIENTRYP PFNGLGETTEXTURESAMPLERHANDLENVPROC) (GLuint texture, GLuint sampler);
+typedef void (APIENTRYP PFNGLMAKETEXTUREHANDLERESIDENTNVPROC) (GLuint64 handle);
+typedef void (APIENTRYP PFNGLMAKETEXTUREHANDLENONRESIDENTNVPROC) (GLuint64 handle);
+typedef GLuint64 (APIENTRYP PFNGLGETIMAGEHANDLENVPROC) (GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum format);
+typedef void (APIENTRYP PFNGLMAKEIMAGEHANDLERESIDENTNVPROC) (GLuint64 handle, GLenum access);
+typedef void (APIENTRYP PFNGLMAKEIMAGEHANDLENONRESIDENTNVPROC) (GLuint64 handle);
+typedef void (APIENTRYP PFNGLUNIFORMHANDLEUI64NVPROC) (GLint location, GLuint64 value);
+typedef void (APIENTRYP PFNGLUNIFORMHANDLEUI64VNVPROC) (GLint location, GLsizei count, const GLuint64 *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMHANDLEUI64NVPROC) (GLuint program, GLint location, GLuint64 value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMHANDLEUI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64 *values);
+typedef GLboolean (APIENTRYP PFNGLISTEXTUREHANDLERESIDENTNVPROC) (GLuint64 handle);
+typedef GLboolean (APIENTRYP PFNGLISIMAGEHANDLERESIDENTNVPROC) (GLuint64 handle);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLuint64 APIENTRY glGetTextureHandleNV (GLuint texture);
+GLAPI GLuint64 APIENTRY glGetTextureSamplerHandleNV (GLuint texture, GLuint sampler);
+GLAPI void APIENTRY glMakeTextureHandleResidentNV (GLuint64 handle);
+GLAPI void APIENTRY glMakeTextureHandleNonResidentNV (GLuint64 handle);
+GLAPI GLuint64 APIENTRY glGetImageHandleNV (GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum format);
+GLAPI void APIENTRY glMakeImageHandleResidentNV (GLuint64 handle, GLenum access);
+GLAPI void APIENTRY glMakeImageHandleNonResidentNV (GLuint64 handle);
+GLAPI void APIENTRY glUniformHandleui64NV (GLint location, GLuint64 value);
+GLAPI void APIENTRY glUniformHandleui64vNV (GLint location, GLsizei count, const GLuint64 *value);
+GLAPI void APIENTRY glProgramUniformHandleui64NV (GLuint program, GLint location, GLuint64 value);
+GLAPI void APIENTRY glProgramUniformHandleui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64 *values);
+GLAPI GLboolean APIENTRY glIsTextureHandleResidentNV (GLuint64 handle);
+GLAPI GLboolean APIENTRY glIsImageHandleResidentNV (GLuint64 handle);
+#endif
+#endif /* GL_NV_bindless_texture */
+
+#ifndef GL_NV_blend_equation_advanced
+#define GL_NV_blend_equation_advanced 1
+#define GL_BLEND_OVERLAP_NV               0x9281
+#define GL_BLEND_PREMULTIPLIED_SRC_NV     0x9280
+#define GL_BLUE_NV                        0x1905
+#define GL_COLORBURN_NV                   0x929A
+#define GL_COLORDODGE_NV                  0x9299
+#define GL_CONJOINT_NV                    0x9284
+#define GL_CONTRAST_NV                    0x92A1
+#define GL_DARKEN_NV                      0x9297
+#define GL_DIFFERENCE_NV                  0x929E
+#define GL_DISJOINT_NV                    0x9283
+#define GL_DST_ATOP_NV                    0x928F
+#define GL_DST_IN_NV                      0x928B
+#define GL_DST_NV                         0x9287
+#define GL_DST_OUT_NV                     0x928D
+#define GL_DST_OVER_NV                    0x9289
+#define GL_EXCLUSION_NV                   0x92A0
+#define GL_GREEN_NV                       0x1904
+#define GL_HARDLIGHT_NV                   0x929B
+#define GL_HARDMIX_NV                     0x92A9
+#define GL_HSL_COLOR_NV                   0x92AF
+#define GL_HSL_HUE_NV                     0x92AD
+#define GL_HSL_LUMINOSITY_NV              0x92B0
+#define GL_HSL_SATURATION_NV              0x92AE
+#define GL_INVERT_OVG_NV                  0x92B4
+#define GL_INVERT_RGB_NV                  0x92A3
+#define GL_LIGHTEN_NV                     0x9298
+#define GL_LINEARBURN_NV                  0x92A5
+#define GL_LINEARDODGE_NV                 0x92A4
+#define GL_LINEARLIGHT_NV                 0x92A7
+#define GL_MINUS_CLAMPED_NV               0x92B3
+#define GL_MINUS_NV                       0x929F
+#define GL_MULTIPLY_NV                    0x9294
+#define GL_OVERLAY_NV                     0x9296
+#define GL_PINLIGHT_NV                    0x92A8
+#define GL_PLUS_CLAMPED_ALPHA_NV          0x92B2
+#define GL_PLUS_CLAMPED_NV                0x92B1
+#define GL_PLUS_DARKER_NV                 0x9292
+#define GL_PLUS_NV                        0x9291
+#define GL_RED_NV                         0x1903
+#define GL_SCREEN_NV                      0x9295
+#define GL_SOFTLIGHT_NV                   0x929C
+#define GL_SRC_ATOP_NV                    0x928E
+#define GL_SRC_IN_NV                      0x928A
+#define GL_SRC_NV                         0x9286
+#define GL_SRC_OUT_NV                     0x928C
+#define GL_SRC_OVER_NV                    0x9288
+#define GL_UNCORRELATED_NV                0x9282
+#define GL_VIVIDLIGHT_NV                  0x92A6
+#define GL_XOR_NV                         0x1506
+typedef void (APIENTRYP PFNGLBLENDPARAMETERINVPROC) (GLenum pname, GLint value);
+typedef void (APIENTRYP PFNGLBLENDBARRIERNVPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBlendParameteriNV (GLenum pname, GLint value);
+GLAPI void APIENTRY glBlendBarrierNV (void);
+#endif
+#endif /* GL_NV_blend_equation_advanced */
+
+#ifndef GL_NV_blend_equation_advanced_coherent
+#define GL_NV_blend_equation_advanced_coherent 1
+#define GL_BLEND_ADVANCED_COHERENT_NV     0x9285
+#endif /* GL_NV_blend_equation_advanced_coherent */
+
+#ifndef GL_NV_blend_square
+#define GL_NV_blend_square 1
+#endif /* GL_NV_blend_square */
+
+#ifndef GL_NV_compute_program5
+#define GL_NV_compute_program5 1
+#define GL_COMPUTE_PROGRAM_NV             0x90FB
+#define GL_COMPUTE_PROGRAM_PARAMETER_BUFFER_NV 0x90FC
+#endif /* GL_NV_compute_program5 */
+
+#ifndef GL_NV_conditional_render
+#define GL_NV_conditional_render 1
+#define GL_QUERY_WAIT_NV                  0x8E13
+#define GL_QUERY_NO_WAIT_NV               0x8E14
+#define GL_QUERY_BY_REGION_WAIT_NV        0x8E15
+#define GL_QUERY_BY_REGION_NO_WAIT_NV     0x8E16
+typedef void (APIENTRYP PFNGLBEGINCONDITIONALRENDERNVPROC) (GLuint id, GLenum mode);
+typedef void (APIENTRYP PFNGLENDCONDITIONALRENDERNVPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginConditionalRenderNV (GLuint id, GLenum mode);
+GLAPI void APIENTRY glEndConditionalRenderNV (void);
+#endif
+#endif /* GL_NV_conditional_render */
+
+#ifndef GL_NV_copy_depth_to_color
+#define GL_NV_copy_depth_to_color 1
+#define GL_DEPTH_STENCIL_TO_RGBA_NV       0x886E
+#define GL_DEPTH_STENCIL_TO_BGRA_NV       0x886F
+#endif /* GL_NV_copy_depth_to_color */
+
+#ifndef GL_NV_copy_image
+#define GL_NV_copy_image 1
+typedef void (APIENTRYP PFNGLCOPYIMAGESUBDATANVPROC) (GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei width, GLsizei height, GLsizei depth);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCopyImageSubDataNV (GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei width, GLsizei height, GLsizei depth);
+#endif
+#endif /* GL_NV_copy_image */
+
+#ifndef GL_NV_deep_texture3D
+#define GL_NV_deep_texture3D 1
+#define GL_MAX_DEEP_3D_TEXTURE_WIDTH_HEIGHT_NV 0x90D0
+#define GL_MAX_DEEP_3D_TEXTURE_DEPTH_NV   0x90D1
+#endif /* GL_NV_deep_texture3D */
+
+#ifndef GL_NV_depth_buffer_float
+#define GL_NV_depth_buffer_float 1
+#define GL_DEPTH_COMPONENT32F_NV          0x8DAB
+#define GL_DEPTH32F_STENCIL8_NV           0x8DAC
+#define GL_FLOAT_32_UNSIGNED_INT_24_8_REV_NV 0x8DAD
+#define GL_DEPTH_BUFFER_FLOAT_MODE_NV     0x8DAF
+typedef void (APIENTRYP PFNGLDEPTHRANGEDNVPROC) (GLdouble zNear, GLdouble zFar);
+typedef void (APIENTRYP PFNGLCLEARDEPTHDNVPROC) (GLdouble depth);
+typedef void (APIENTRYP PFNGLDEPTHBOUNDSDNVPROC) (GLdouble zmin, GLdouble zmax);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDepthRangedNV (GLdouble zNear, GLdouble zFar);
+GLAPI void APIENTRY glClearDepthdNV (GLdouble depth);
+GLAPI void APIENTRY glDepthBoundsdNV (GLdouble zmin, GLdouble zmax);
+#endif
+#endif /* GL_NV_depth_buffer_float */
+
+#ifndef GL_NV_depth_clamp
+#define GL_NV_depth_clamp 1
+#define GL_DEPTH_CLAMP_NV                 0x864F
+#endif /* GL_NV_depth_clamp */
+
+#ifndef GL_NV_draw_texture
+#define GL_NV_draw_texture 1
+typedef void (APIENTRYP PFNGLDRAWTEXTURENVPROC) (GLuint texture, GLuint sampler, GLfloat x0, GLfloat y0, GLfloat x1, GLfloat y1, GLfloat z, GLfloat s0, GLfloat t0, GLfloat s1, GLfloat t1);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawTextureNV (GLuint texture, GLuint sampler, GLfloat x0, GLfloat y0, GLfloat x1, GLfloat y1, GLfloat z, GLfloat s0, GLfloat t0, GLfloat s1, GLfloat t1);
+#endif
+#endif /* GL_NV_draw_texture */
+
+#ifndef GL_NV_evaluators
+#define GL_NV_evaluators 1
+#define GL_EVAL_2D_NV                     0x86C0
+#define GL_EVAL_TRIANGULAR_2D_NV          0x86C1
+#define GL_MAP_TESSELLATION_NV            0x86C2
+#define GL_MAP_ATTRIB_U_ORDER_NV          0x86C3
+#define GL_MAP_ATTRIB_V_ORDER_NV          0x86C4
+#define GL_EVAL_FRACTIONAL_TESSELLATION_NV 0x86C5
+#define GL_EVAL_VERTEX_ATTRIB0_NV         0x86C6
+#define GL_EVAL_VERTEX_ATTRIB1_NV         0x86C7
+#define GL_EVAL_VERTEX_ATTRIB2_NV         0x86C8
+#define GL_EVAL_VERTEX_ATTRIB3_NV         0x86C9
+#define GL_EVAL_VERTEX_ATTRIB4_NV         0x86CA
+#define GL_EVAL_VERTEX_ATTRIB5_NV         0x86CB
+#define GL_EVAL_VERTEX_ATTRIB6_NV         0x86CC
+#define GL_EVAL_VERTEX_ATTRIB7_NV         0x86CD
+#define GL_EVAL_VERTEX_ATTRIB8_NV         0x86CE
+#define GL_EVAL_VERTEX_ATTRIB9_NV         0x86CF
+#define GL_EVAL_VERTEX_ATTRIB10_NV        0x86D0
+#define GL_EVAL_VERTEX_ATTRIB11_NV        0x86D1
+#define GL_EVAL_VERTEX_ATTRIB12_NV        0x86D2
+#define GL_EVAL_VERTEX_ATTRIB13_NV        0x86D3
+#define GL_EVAL_VERTEX_ATTRIB14_NV        0x86D4
+#define GL_EVAL_VERTEX_ATTRIB15_NV        0x86D5
+#define GL_MAX_MAP_TESSELLATION_NV        0x86D6
+#define GL_MAX_RATIONAL_EVAL_ORDER_NV     0x86D7
+typedef void (APIENTRYP PFNGLMAPCONTROLPOINTSNVPROC) (GLenum target, GLuint index, GLenum type, GLsizei ustride, GLsizei vstride, GLint uorder, GLint vorder, GLboolean packed, const void *points);
+typedef void (APIENTRYP PFNGLMAPPARAMETERIVNVPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLMAPPARAMETERFVNVPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMAPCONTROLPOINTSNVPROC) (GLenum target, GLuint index, GLenum type, GLsizei ustride, GLsizei vstride, GLboolean packed, void *points);
+typedef void (APIENTRYP PFNGLGETMAPPARAMETERIVNVPROC) (GLenum target, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMAPPARAMETERFVNVPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETMAPATTRIBPARAMETERIVNVPROC) (GLenum target, GLuint index, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETMAPATTRIBPARAMETERFVNVPROC) (GLenum target, GLuint index, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLEVALMAPSNVPROC) (GLenum target, GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMapControlPointsNV (GLenum target, GLuint index, GLenum type, GLsizei ustride, GLsizei vstride, GLint uorder, GLint vorder, GLboolean packed, const void *points);
+GLAPI void APIENTRY glMapParameterivNV (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glMapParameterfvNV (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glGetMapControlPointsNV (GLenum target, GLuint index, GLenum type, GLsizei ustride, GLsizei vstride, GLboolean packed, void *points);
+GLAPI void APIENTRY glGetMapParameterivNV (GLenum target, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMapParameterfvNV (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetMapAttribParameterivNV (GLenum target, GLuint index, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetMapAttribParameterfvNV (GLenum target, GLuint index, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glEvalMapsNV (GLenum target, GLenum mode);
+#endif
+#endif /* GL_NV_evaluators */
+
+#ifndef GL_NV_explicit_multisample
+#define GL_NV_explicit_multisample 1
+#define GL_SAMPLE_POSITION_NV             0x8E50
+#define GL_SAMPLE_MASK_NV                 0x8E51
+#define GL_SAMPLE_MASK_VALUE_NV           0x8E52
+#define GL_TEXTURE_BINDING_RENDERBUFFER_NV 0x8E53
+#define GL_TEXTURE_RENDERBUFFER_DATA_STORE_BINDING_NV 0x8E54
+#define GL_TEXTURE_RENDERBUFFER_NV        0x8E55
+#define GL_SAMPLER_RENDERBUFFER_NV        0x8E56
+#define GL_INT_SAMPLER_RENDERBUFFER_NV    0x8E57
+#define GL_UNSIGNED_INT_SAMPLER_RENDERBUFFER_NV 0x8E58
+#define GL_MAX_SAMPLE_MASK_WORDS_NV       0x8E59
+typedef void (APIENTRYP PFNGLGETMULTISAMPLEFVNVPROC) (GLenum pname, GLuint index, GLfloat *val);
+typedef void (APIENTRYP PFNGLSAMPLEMASKINDEXEDNVPROC) (GLuint index, GLbitfield mask);
+typedef void (APIENTRYP PFNGLTEXRENDERBUFFERNVPROC) (GLenum target, GLuint renderbuffer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetMultisamplefvNV (GLenum pname, GLuint index, GLfloat *val);
+GLAPI void APIENTRY glSampleMaskIndexedNV (GLuint index, GLbitfield mask);
+GLAPI void APIENTRY glTexRenderbufferNV (GLenum target, GLuint renderbuffer);
+#endif
+#endif /* GL_NV_explicit_multisample */
+
+#ifndef GL_NV_fence
+#define GL_NV_fence 1
+#define GL_ALL_COMPLETED_NV               0x84F2
+#define GL_FENCE_STATUS_NV                0x84F3
+#define GL_FENCE_CONDITION_NV             0x84F4
+typedef void (APIENTRYP PFNGLDELETEFENCESNVPROC) (GLsizei n, const GLuint *fences);
+typedef void (APIENTRYP PFNGLGENFENCESNVPROC) (GLsizei n, GLuint *fences);
+typedef GLboolean (APIENTRYP PFNGLISFENCENVPROC) (GLuint fence);
+typedef GLboolean (APIENTRYP PFNGLTESTFENCENVPROC) (GLuint fence);
+typedef void (APIENTRYP PFNGLGETFENCEIVNVPROC) (GLuint fence, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLFINISHFENCENVPROC) (GLuint fence);
+typedef void (APIENTRYP PFNGLSETFENCENVPROC) (GLuint fence, GLenum condition);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDeleteFencesNV (GLsizei n, const GLuint *fences);
+GLAPI void APIENTRY glGenFencesNV (GLsizei n, GLuint *fences);
+GLAPI GLboolean APIENTRY glIsFenceNV (GLuint fence);
+GLAPI GLboolean APIENTRY glTestFenceNV (GLuint fence);
+GLAPI void APIENTRY glGetFenceivNV (GLuint fence, GLenum pname, GLint *params);
+GLAPI void APIENTRY glFinishFenceNV (GLuint fence);
+GLAPI void APIENTRY glSetFenceNV (GLuint fence, GLenum condition);
+#endif
+#endif /* GL_NV_fence */
+
+#ifndef GL_NV_float_buffer
+#define GL_NV_float_buffer 1
+#define GL_FLOAT_R_NV                     0x8880
+#define GL_FLOAT_RG_NV                    0x8881
+#define GL_FLOAT_RGB_NV                   0x8882
+#define GL_FLOAT_RGBA_NV                  0x8883
+#define GL_FLOAT_R16_NV                   0x8884
+#define GL_FLOAT_R32_NV                   0x8885
+#define GL_FLOAT_RG16_NV                  0x8886
+#define GL_FLOAT_RG32_NV                  0x8887
+#define GL_FLOAT_RGB16_NV                 0x8888
+#define GL_FLOAT_RGB32_NV                 0x8889
+#define GL_FLOAT_RGBA16_NV                0x888A
+#define GL_FLOAT_RGBA32_NV                0x888B
+#define GL_TEXTURE_FLOAT_COMPONENTS_NV    0x888C
+#define GL_FLOAT_CLEAR_COLOR_VALUE_NV     0x888D
+#define GL_FLOAT_RGBA_MODE_NV             0x888E
+#endif /* GL_NV_float_buffer */
+
+#ifndef GL_NV_fog_distance
+#define GL_NV_fog_distance 1
+#define GL_FOG_DISTANCE_MODE_NV           0x855A
+#define GL_EYE_RADIAL_NV                  0x855B
+#define GL_EYE_PLANE_ABSOLUTE_NV          0x855C
+#endif /* GL_NV_fog_distance */
+
+#ifndef GL_NV_fragment_program
+#define GL_NV_fragment_program 1
+#define GL_MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868
+#define GL_FRAGMENT_PROGRAM_NV            0x8870
+#define GL_MAX_TEXTURE_COORDS_NV          0x8871
+#define GL_MAX_TEXTURE_IMAGE_UNITS_NV     0x8872
+#define GL_FRAGMENT_PROGRAM_BINDING_NV    0x8873
+#define GL_PROGRAM_ERROR_STRING_NV        0x8874
+typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4FNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4FVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, const GLfloat *v);
+typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4DNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLPROGRAMNAMEDPARAMETER4DVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, const GLdouble *v);
+typedef void (APIENTRYP PFNGLGETPROGRAMNAMEDPARAMETERFVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMNAMEDPARAMETERDVNVPROC) (GLuint id, GLsizei len, const GLubyte *name, GLdouble *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramNamedParameter4fNV (GLuint id, GLsizei len, const GLubyte *name, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glProgramNamedParameter4fvNV (GLuint id, GLsizei len, const GLubyte *name, const GLfloat *v);
+GLAPI void APIENTRY glProgramNamedParameter4dNV (GLuint id, GLsizei len, const GLubyte *name, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glProgramNamedParameter4dvNV (GLuint id, GLsizei len, const GLubyte *name, const GLdouble *v);
+GLAPI void APIENTRY glGetProgramNamedParameterfvNV (GLuint id, GLsizei len, const GLubyte *name, GLfloat *params);
+GLAPI void APIENTRY glGetProgramNamedParameterdvNV (GLuint id, GLsizei len, const GLubyte *name, GLdouble *params);
+#endif
+#endif /* GL_NV_fragment_program */
+
+#ifndef GL_NV_fragment_program2
+#define GL_NV_fragment_program2 1
+#define GL_MAX_PROGRAM_EXEC_INSTRUCTIONS_NV 0x88F4
+#define GL_MAX_PROGRAM_CALL_DEPTH_NV      0x88F5
+#define GL_MAX_PROGRAM_IF_DEPTH_NV        0x88F6
+#define GL_MAX_PROGRAM_LOOP_DEPTH_NV      0x88F7
+#define GL_MAX_PROGRAM_LOOP_COUNT_NV      0x88F8
+#endif /* GL_NV_fragment_program2 */
+
+#ifndef GL_NV_fragment_program4
+#define GL_NV_fragment_program4 1
+#endif /* GL_NV_fragment_program4 */
+
+#ifndef GL_NV_fragment_program_option
+#define GL_NV_fragment_program_option 1
+#endif /* GL_NV_fragment_program_option */
+
+#ifndef GL_NV_framebuffer_multisample_coverage
+#define GL_NV_framebuffer_multisample_coverage 1
+#define GL_RENDERBUFFER_COVERAGE_SAMPLES_NV 0x8CAB
+#define GL_RENDERBUFFER_COLOR_SAMPLES_NV  0x8E10
+#define GL_MAX_MULTISAMPLE_COVERAGE_MODES_NV 0x8E11
+#define GL_MULTISAMPLE_COVERAGE_MODES_NV  0x8E12
+typedef void (APIENTRYP PFNGLRENDERBUFFERSTORAGEMULTISAMPLECOVERAGENVPROC) (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLenum internalformat, GLsizei width, GLsizei height);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glRenderbufferStorageMultisampleCoverageNV (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLenum internalformat, GLsizei width, GLsizei height);
+#endif
+#endif /* GL_NV_framebuffer_multisample_coverage */
+
+#ifndef GL_NV_geometry_program4
+#define GL_NV_geometry_program4 1
+#define GL_GEOMETRY_PROGRAM_NV            0x8C26
+#define GL_MAX_PROGRAM_OUTPUT_VERTICES_NV 0x8C27
+#define GL_MAX_PROGRAM_TOTAL_OUTPUT_COMPONENTS_NV 0x8C28
+typedef void (APIENTRYP PFNGLPROGRAMVERTEXLIMITNVPROC) (GLenum target, GLint limit);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREEXTPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTURELAYEREXTPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
+typedef void (APIENTRYP PFNGLFRAMEBUFFERTEXTUREFACEEXTPROC) (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramVertexLimitNV (GLenum target, GLint limit);
+GLAPI void APIENTRY glFramebufferTextureEXT (GLenum target, GLenum attachment, GLuint texture, GLint level);
+GLAPI void APIENTRY glFramebufferTextureLayerEXT (GLenum target, GLenum attachment, GLuint texture, GLint level, GLint layer);
+GLAPI void APIENTRY glFramebufferTextureFaceEXT (GLenum target, GLenum attachment, GLuint texture, GLint level, GLenum face);
+#endif
+#endif /* GL_NV_geometry_program4 */
+
+#ifndef GL_NV_geometry_shader4
+#define GL_NV_geometry_shader4 1
+#endif /* GL_NV_geometry_shader4 */
+
+#ifndef GL_NV_gpu_program4
+#define GL_NV_gpu_program4 1
+#define GL_MIN_PROGRAM_TEXEL_OFFSET_NV    0x8904
+#define GL_MAX_PROGRAM_TEXEL_OFFSET_NV    0x8905
+#define GL_PROGRAM_ATTRIB_COMPONENTS_NV   0x8906
+#define GL_PROGRAM_RESULT_COMPONENTS_NV   0x8907
+#define GL_MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908
+#define GL_MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909
+#define GL_MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5
+#define GL_MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4INVPROC) (GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4IVNVPROC) (GLenum target, GLuint index, const GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERSI4IVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4UINVPROC) (GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERI4UIVNVPROC) (GLenum target, GLuint index, const GLuint *params);
+typedef void (APIENTRYP PFNGLPROGRAMLOCALPARAMETERSI4UIVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLuint *params);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4INVPROC) (GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4IVNVPROC) (GLenum target, GLuint index, const GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERSI4IVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4UINVPROC) (GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERI4UIVNVPROC) (GLenum target, GLuint index, const GLuint *params);
+typedef void (APIENTRYP PFNGLPROGRAMENVPARAMETERSI4UIVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERIIVNVPROC) (GLenum target, GLuint index, GLint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMLOCALPARAMETERIUIVNVPROC) (GLenum target, GLuint index, GLuint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERIIVNVPROC) (GLenum target, GLuint index, GLint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMENVPARAMETERIUIVNVPROC) (GLenum target, GLuint index, GLuint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramLocalParameterI4iNV (GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glProgramLocalParameterI4ivNV (GLenum target, GLuint index, const GLint *params);
+GLAPI void APIENTRY glProgramLocalParametersI4ivNV (GLenum target, GLuint index, GLsizei count, const GLint *params);
+GLAPI void APIENTRY glProgramLocalParameterI4uiNV (GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+GLAPI void APIENTRY glProgramLocalParameterI4uivNV (GLenum target, GLuint index, const GLuint *params);
+GLAPI void APIENTRY glProgramLocalParametersI4uivNV (GLenum target, GLuint index, GLsizei count, const GLuint *params);
+GLAPI void APIENTRY glProgramEnvParameterI4iNV (GLenum target, GLuint index, GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glProgramEnvParameterI4ivNV (GLenum target, GLuint index, const GLint *params);
+GLAPI void APIENTRY glProgramEnvParametersI4ivNV (GLenum target, GLuint index, GLsizei count, const GLint *params);
+GLAPI void APIENTRY glProgramEnvParameterI4uiNV (GLenum target, GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+GLAPI void APIENTRY glProgramEnvParameterI4uivNV (GLenum target, GLuint index, const GLuint *params);
+GLAPI void APIENTRY glProgramEnvParametersI4uivNV (GLenum target, GLuint index, GLsizei count, const GLuint *params);
+GLAPI void APIENTRY glGetProgramLocalParameterIivNV (GLenum target, GLuint index, GLint *params);
+GLAPI void APIENTRY glGetProgramLocalParameterIuivNV (GLenum target, GLuint index, GLuint *params);
+GLAPI void APIENTRY glGetProgramEnvParameterIivNV (GLenum target, GLuint index, GLint *params);
+GLAPI void APIENTRY glGetProgramEnvParameterIuivNV (GLenum target, GLuint index, GLuint *params);
+#endif
+#endif /* GL_NV_gpu_program4 */
+
+#ifndef GL_NV_gpu_program5
+#define GL_NV_gpu_program5 1
+#define GL_MAX_GEOMETRY_PROGRAM_INVOCATIONS_NV 0x8E5A
+#define GL_MIN_FRAGMENT_INTERPOLATION_OFFSET_NV 0x8E5B
+#define GL_MAX_FRAGMENT_INTERPOLATION_OFFSET_NV 0x8E5C
+#define GL_FRAGMENT_PROGRAM_INTERPOLATION_OFFSET_BITS_NV 0x8E5D
+#define GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET_NV 0x8E5E
+#define GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET_NV 0x8E5F
+#define GL_MAX_PROGRAM_SUBROUTINE_PARAMETERS_NV 0x8F44
+#define GL_MAX_PROGRAM_SUBROUTINE_NUM_NV  0x8F45
+typedef void (APIENTRYP PFNGLPROGRAMSUBROUTINEPARAMETERSUIVNVPROC) (GLenum target, GLsizei count, const GLuint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMSUBROUTINEPARAMETERUIVNVPROC) (GLenum target, GLuint index, GLuint *param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramSubroutineParametersuivNV (GLenum target, GLsizei count, const GLuint *params);
+GLAPI void APIENTRY glGetProgramSubroutineParameteruivNV (GLenum target, GLuint index, GLuint *param);
+#endif
+#endif /* GL_NV_gpu_program5 */
+
+#ifndef GL_NV_gpu_program5_mem_extended
+#define GL_NV_gpu_program5_mem_extended 1
+#endif /* GL_NV_gpu_program5_mem_extended */
+
+#ifndef GL_NV_gpu_shader5
+#define GL_NV_gpu_shader5 1
+typedef int64_t GLint64EXT;
+#define GL_INT64_NV                       0x140E
+#define GL_UNSIGNED_INT64_NV              0x140F
+#define GL_INT8_NV                        0x8FE0
+#define GL_INT8_VEC2_NV                   0x8FE1
+#define GL_INT8_VEC3_NV                   0x8FE2
+#define GL_INT8_VEC4_NV                   0x8FE3
+#define GL_INT16_NV                       0x8FE4
+#define GL_INT16_VEC2_NV                  0x8FE5
+#define GL_INT16_VEC3_NV                  0x8FE6
+#define GL_INT16_VEC4_NV                  0x8FE7
+#define GL_INT64_VEC2_NV                  0x8FE9
+#define GL_INT64_VEC3_NV                  0x8FEA
+#define GL_INT64_VEC4_NV                  0x8FEB
+#define GL_UNSIGNED_INT8_NV               0x8FEC
+#define GL_UNSIGNED_INT8_VEC2_NV          0x8FED
+#define GL_UNSIGNED_INT8_VEC3_NV          0x8FEE
+#define GL_UNSIGNED_INT8_VEC4_NV          0x8FEF
+#define GL_UNSIGNED_INT16_NV              0x8FF0
+#define GL_UNSIGNED_INT16_VEC2_NV         0x8FF1
+#define GL_UNSIGNED_INT16_VEC3_NV         0x8FF2
+#define GL_UNSIGNED_INT16_VEC4_NV         0x8FF3
+#define GL_UNSIGNED_INT64_VEC2_NV         0x8FF5
+#define GL_UNSIGNED_INT64_VEC3_NV         0x8FF6
+#define GL_UNSIGNED_INT64_VEC4_NV         0x8FF7
+#define GL_FLOAT16_NV                     0x8FF8
+#define GL_FLOAT16_VEC2_NV                0x8FF9
+#define GL_FLOAT16_VEC3_NV                0x8FFA
+#define GL_FLOAT16_VEC4_NV                0x8FFB
+typedef void (APIENTRYP PFNGLUNIFORM1I64NVPROC) (GLint location, GLint64EXT x);
+typedef void (APIENTRYP PFNGLUNIFORM2I64NVPROC) (GLint location, GLint64EXT x, GLint64EXT y);
+typedef void (APIENTRYP PFNGLUNIFORM3I64NVPROC) (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
+typedef void (APIENTRYP PFNGLUNIFORM4I64NVPROC) (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
+typedef void (APIENTRYP PFNGLUNIFORM1I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM2I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM3I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM4I64VNVPROC) (GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM1UI64NVPROC) (GLint location, GLuint64EXT x);
+typedef void (APIENTRYP PFNGLUNIFORM2UI64NVPROC) (GLint location, GLuint64EXT x, GLuint64EXT y);
+typedef void (APIENTRYP PFNGLUNIFORM3UI64NVPROC) (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
+typedef void (APIENTRYP PFNGLUNIFORM4UI64NVPROC) (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
+typedef void (APIENTRYP PFNGLUNIFORM1UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM2UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM3UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLUNIFORM4UI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLGETUNIFORMI64VNVPROC) (GLuint program, GLint location, GLint64EXT *params);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1I64NVPROC) (GLuint program, GLint location, GLint64EXT x);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2I64NVPROC) (GLuint program, GLint location, GLint64EXT x, GLint64EXT y);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3I64NVPROC) (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4I64NVPROC) (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4I64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UI64NVPROC) (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM1UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM2UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM3UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORM4UI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glUniform1i64NV (GLint location, GLint64EXT x);
+GLAPI void APIENTRY glUniform2i64NV (GLint location, GLint64EXT x, GLint64EXT y);
+GLAPI void APIENTRY glUniform3i64NV (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
+GLAPI void APIENTRY glUniform4i64NV (GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
+GLAPI void APIENTRY glUniform1i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glUniform2i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glUniform3i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glUniform4i64vNV (GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glUniform1ui64NV (GLint location, GLuint64EXT x);
+GLAPI void APIENTRY glUniform2ui64NV (GLint location, GLuint64EXT x, GLuint64EXT y);
+GLAPI void APIENTRY glUniform3ui64NV (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
+GLAPI void APIENTRY glUniform4ui64NV (GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
+GLAPI void APIENTRY glUniform1ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glUniform2ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glUniform3ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glUniform4ui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glGetUniformi64vNV (GLuint program, GLint location, GLint64EXT *params);
+GLAPI void APIENTRY glProgramUniform1i64NV (GLuint program, GLint location, GLint64EXT x);
+GLAPI void APIENTRY glProgramUniform2i64NV (GLuint program, GLint location, GLint64EXT x, GLint64EXT y);
+GLAPI void APIENTRY glProgramUniform3i64NV (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z);
+GLAPI void APIENTRY glProgramUniform4i64NV (GLuint program, GLint location, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
+GLAPI void APIENTRY glProgramUniform1i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glProgramUniform2i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glProgramUniform3i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glProgramUniform4i64vNV (GLuint program, GLint location, GLsizei count, const GLint64EXT *value);
+GLAPI void APIENTRY glProgramUniform1ui64NV (GLuint program, GLint location, GLuint64EXT x);
+GLAPI void APIENTRY glProgramUniform2ui64NV (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y);
+GLAPI void APIENTRY glProgramUniform3ui64NV (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
+GLAPI void APIENTRY glProgramUniform4ui64NV (GLuint program, GLint location, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
+GLAPI void APIENTRY glProgramUniform1ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glProgramUniform2ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glProgramUniform3ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glProgramUniform4ui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+#endif
+#endif /* GL_NV_gpu_shader5 */
+
+#ifndef GL_NV_half_float
+#define GL_NV_half_float 1
+typedef unsigned short GLhalfNV;
+#define GL_HALF_FLOAT_NV                  0x140B
+typedef void (APIENTRYP PFNGLVERTEX2HNVPROC) (GLhalfNV x, GLhalfNV y);
+typedef void (APIENTRYP PFNGLVERTEX2HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEX3HNVPROC) (GLhalfNV x, GLhalfNV y, GLhalfNV z);
+typedef void (APIENTRYP PFNGLVERTEX3HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEX4HNVPROC) (GLhalfNV x, GLhalfNV y, GLhalfNV z, GLhalfNV w);
+typedef void (APIENTRYP PFNGLVERTEX4HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLNORMAL3HNVPROC) (GLhalfNV nx, GLhalfNV ny, GLhalfNV nz);
+typedef void (APIENTRYP PFNGLNORMAL3HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLCOLOR3HNVPROC) (GLhalfNV red, GLhalfNV green, GLhalfNV blue);
+typedef void (APIENTRYP PFNGLCOLOR3HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLCOLOR4HNVPROC) (GLhalfNV red, GLhalfNV green, GLhalfNV blue, GLhalfNV alpha);
+typedef void (APIENTRYP PFNGLCOLOR4HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLTEXCOORD1HNVPROC) (GLhalfNV s);
+typedef void (APIENTRYP PFNGLTEXCOORD1HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLTEXCOORD2HNVPROC) (GLhalfNV s, GLhalfNV t);
+typedef void (APIENTRYP PFNGLTEXCOORD2HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLTEXCOORD3HNVPROC) (GLhalfNV s, GLhalfNV t, GLhalfNV r);
+typedef void (APIENTRYP PFNGLTEXCOORD3HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLTEXCOORD4HNVPROC) (GLhalfNV s, GLhalfNV t, GLhalfNV r, GLhalfNV q);
+typedef void (APIENTRYP PFNGLTEXCOORD4HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1HNVPROC) (GLenum target, GLhalfNV s);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD1HVNVPROC) (GLenum target, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2HNVPROC) (GLenum target, GLhalfNV s, GLhalfNV t);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD2HVNVPROC) (GLenum target, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3HNVPROC) (GLenum target, GLhalfNV s, GLhalfNV t, GLhalfNV r);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD3HVNVPROC) (GLenum target, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4HNVPROC) (GLenum target, GLhalfNV s, GLhalfNV t, GLhalfNV r, GLhalfNV q);
+typedef void (APIENTRYP PFNGLMULTITEXCOORD4HVNVPROC) (GLenum target, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLFOGCOORDHNVPROC) (GLhalfNV fog);
+typedef void (APIENTRYP PFNGLFOGCOORDHVNVPROC) (const GLhalfNV *fog);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3HNVPROC) (GLhalfNV red, GLhalfNV green, GLhalfNV blue);
+typedef void (APIENTRYP PFNGLSECONDARYCOLOR3HVNVPROC) (const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXWEIGHTHNVPROC) (GLhalfNV weight);
+typedef void (APIENTRYP PFNGLVERTEXWEIGHTHVNVPROC) (const GLhalfNV *weight);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1HNVPROC) (GLuint index, GLhalfNV x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1HVNVPROC) (GLuint index, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2HNVPROC) (GLuint index, GLhalfNV x, GLhalfNV y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2HVNVPROC) (GLuint index, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3HNVPROC) (GLuint index, GLhalfNV x, GLhalfNV y, GLhalfNV z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3HVNVPROC) (GLuint index, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4HNVPROC) (GLuint index, GLhalfNV x, GLhalfNV y, GLhalfNV z, GLhalfNV w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4HVNVPROC) (GLuint index, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS1HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS2HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS3HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS4HVNVPROC) (GLuint index, GLsizei n, const GLhalfNV *v);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertex2hNV (GLhalfNV x, GLhalfNV y);
+GLAPI void APIENTRY glVertex2hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glVertex3hNV (GLhalfNV x, GLhalfNV y, GLhalfNV z);
+GLAPI void APIENTRY glVertex3hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glVertex4hNV (GLhalfNV x, GLhalfNV y, GLhalfNV z, GLhalfNV w);
+GLAPI void APIENTRY glVertex4hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glNormal3hNV (GLhalfNV nx, GLhalfNV ny, GLhalfNV nz);
+GLAPI void APIENTRY glNormal3hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glColor3hNV (GLhalfNV red, GLhalfNV green, GLhalfNV blue);
+GLAPI void APIENTRY glColor3hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glColor4hNV (GLhalfNV red, GLhalfNV green, GLhalfNV blue, GLhalfNV alpha);
+GLAPI void APIENTRY glColor4hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glTexCoord1hNV (GLhalfNV s);
+GLAPI void APIENTRY glTexCoord1hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glTexCoord2hNV (GLhalfNV s, GLhalfNV t);
+GLAPI void APIENTRY glTexCoord2hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glTexCoord3hNV (GLhalfNV s, GLhalfNV t, GLhalfNV r);
+GLAPI void APIENTRY glTexCoord3hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glTexCoord4hNV (GLhalfNV s, GLhalfNV t, GLhalfNV r, GLhalfNV q);
+GLAPI void APIENTRY glTexCoord4hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glMultiTexCoord1hNV (GLenum target, GLhalfNV s);
+GLAPI void APIENTRY glMultiTexCoord1hvNV (GLenum target, const GLhalfNV *v);
+GLAPI void APIENTRY glMultiTexCoord2hNV (GLenum target, GLhalfNV s, GLhalfNV t);
+GLAPI void APIENTRY glMultiTexCoord2hvNV (GLenum target, const GLhalfNV *v);
+GLAPI void APIENTRY glMultiTexCoord3hNV (GLenum target, GLhalfNV s, GLhalfNV t, GLhalfNV r);
+GLAPI void APIENTRY glMultiTexCoord3hvNV (GLenum target, const GLhalfNV *v);
+GLAPI void APIENTRY glMultiTexCoord4hNV (GLenum target, GLhalfNV s, GLhalfNV t, GLhalfNV r, GLhalfNV q);
+GLAPI void APIENTRY glMultiTexCoord4hvNV (GLenum target, const GLhalfNV *v);
+GLAPI void APIENTRY glFogCoordhNV (GLhalfNV fog);
+GLAPI void APIENTRY glFogCoordhvNV (const GLhalfNV *fog);
+GLAPI void APIENTRY glSecondaryColor3hNV (GLhalfNV red, GLhalfNV green, GLhalfNV blue);
+GLAPI void APIENTRY glSecondaryColor3hvNV (const GLhalfNV *v);
+GLAPI void APIENTRY glVertexWeighthNV (GLhalfNV weight);
+GLAPI void APIENTRY glVertexWeighthvNV (const GLhalfNV *weight);
+GLAPI void APIENTRY glVertexAttrib1hNV (GLuint index, GLhalfNV x);
+GLAPI void APIENTRY glVertexAttrib1hvNV (GLuint index, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttrib2hNV (GLuint index, GLhalfNV x, GLhalfNV y);
+GLAPI void APIENTRY glVertexAttrib2hvNV (GLuint index, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttrib3hNV (GLuint index, GLhalfNV x, GLhalfNV y, GLhalfNV z);
+GLAPI void APIENTRY glVertexAttrib3hvNV (GLuint index, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttrib4hNV (GLuint index, GLhalfNV x, GLhalfNV y, GLhalfNV z, GLhalfNV w);
+GLAPI void APIENTRY glVertexAttrib4hvNV (GLuint index, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttribs1hvNV (GLuint index, GLsizei n, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttribs2hvNV (GLuint index, GLsizei n, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttribs3hvNV (GLuint index, GLsizei n, const GLhalfNV *v);
+GLAPI void APIENTRY glVertexAttribs4hvNV (GLuint index, GLsizei n, const GLhalfNV *v);
+#endif
+#endif /* GL_NV_half_float */
+
+#ifndef GL_NV_light_max_exponent
+#define GL_NV_light_max_exponent 1
+#define GL_MAX_SHININESS_NV               0x8504
+#define GL_MAX_SPOT_EXPONENT_NV           0x8505
+#endif /* GL_NV_light_max_exponent */
+
+#ifndef GL_NV_multisample_coverage
+#define GL_NV_multisample_coverage 1
+#define GL_COLOR_SAMPLES_NV               0x8E20
+#endif /* GL_NV_multisample_coverage */
+
+#ifndef GL_NV_multisample_filter_hint
+#define GL_NV_multisample_filter_hint 1
+#define GL_MULTISAMPLE_FILTER_HINT_NV     0x8534
+#endif /* GL_NV_multisample_filter_hint */
+
+#ifndef GL_NV_occlusion_query
+#define GL_NV_occlusion_query 1
+#define GL_PIXEL_COUNTER_BITS_NV          0x8864
+#define GL_CURRENT_OCCLUSION_QUERY_ID_NV  0x8865
+#define GL_PIXEL_COUNT_NV                 0x8866
+#define GL_PIXEL_COUNT_AVAILABLE_NV       0x8867
+typedef void (APIENTRYP PFNGLGENOCCLUSIONQUERIESNVPROC) (GLsizei n, GLuint *ids);
+typedef void (APIENTRYP PFNGLDELETEOCCLUSIONQUERIESNVPROC) (GLsizei n, const GLuint *ids);
+typedef GLboolean (APIENTRYP PFNGLISOCCLUSIONQUERYNVPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLBEGINOCCLUSIONQUERYNVPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLENDOCCLUSIONQUERYNVPROC) (void);
+typedef void (APIENTRYP PFNGLGETOCCLUSIONQUERYIVNVPROC) (GLuint id, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETOCCLUSIONQUERYUIVNVPROC) (GLuint id, GLenum pname, GLuint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGenOcclusionQueriesNV (GLsizei n, GLuint *ids);
+GLAPI void APIENTRY glDeleteOcclusionQueriesNV (GLsizei n, const GLuint *ids);
+GLAPI GLboolean APIENTRY glIsOcclusionQueryNV (GLuint id);
+GLAPI void APIENTRY glBeginOcclusionQueryNV (GLuint id);
+GLAPI void APIENTRY glEndOcclusionQueryNV (void);
+GLAPI void APIENTRY glGetOcclusionQueryivNV (GLuint id, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetOcclusionQueryuivNV (GLuint id, GLenum pname, GLuint *params);
+#endif
+#endif /* GL_NV_occlusion_query */
+
+#ifndef GL_NV_packed_depth_stencil
+#define GL_NV_packed_depth_stencil 1
+#define GL_DEPTH_STENCIL_NV               0x84F9
+#define GL_UNSIGNED_INT_24_8_NV           0x84FA
+#endif /* GL_NV_packed_depth_stencil */
+
+#ifndef GL_NV_parameter_buffer_object
+#define GL_NV_parameter_buffer_object 1
+#define GL_MAX_PROGRAM_PARAMETER_BUFFER_BINDINGS_NV 0x8DA0
+#define GL_MAX_PROGRAM_PARAMETER_BUFFER_SIZE_NV 0x8DA1
+#define GL_VERTEX_PROGRAM_PARAMETER_BUFFER_NV 0x8DA2
+#define GL_GEOMETRY_PROGRAM_PARAMETER_BUFFER_NV 0x8DA3
+#define GL_FRAGMENT_PROGRAM_PARAMETER_BUFFER_NV 0x8DA4
+typedef void (APIENTRYP PFNGLPROGRAMBUFFERPARAMETERSFVNVPROC) (GLenum target, GLuint bindingIndex, GLuint wordIndex, GLsizei count, const GLfloat *params);
+typedef void (APIENTRYP PFNGLPROGRAMBUFFERPARAMETERSIIVNVPROC) (GLenum target, GLuint bindingIndex, GLuint wordIndex, GLsizei count, const GLint *params);
+typedef void (APIENTRYP PFNGLPROGRAMBUFFERPARAMETERSIUIVNVPROC) (GLenum target, GLuint bindingIndex, GLuint wordIndex, GLsizei count, const GLuint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glProgramBufferParametersfvNV (GLenum target, GLuint bindingIndex, GLuint wordIndex, GLsizei count, const GLfloat *params);
+GLAPI void APIENTRY glProgramBufferParametersIivNV (GLenum target, GLuint bindingIndex, GLuint wordIndex, GLsizei count, const GLint *params);
+GLAPI void APIENTRY glProgramBufferParametersIuivNV (GLenum target, GLuint bindingIndex, GLuint wordIndex, GLsizei count, const GLuint *params);
+#endif
+#endif /* GL_NV_parameter_buffer_object */
+
+#ifndef GL_NV_parameter_buffer_object2
+#define GL_NV_parameter_buffer_object2 1
+#endif /* GL_NV_parameter_buffer_object2 */
+
+#ifndef GL_NV_path_rendering
+#define GL_NV_path_rendering 1
+#define GL_PATH_FORMAT_SVG_NV             0x9070
+#define GL_PATH_FORMAT_PS_NV              0x9071
+#define GL_STANDARD_FONT_NAME_NV          0x9072
+#define GL_SYSTEM_FONT_NAME_NV            0x9073
+#define GL_FILE_NAME_NV                   0x9074
+#define GL_PATH_STROKE_WIDTH_NV           0x9075
+#define GL_PATH_END_CAPS_NV               0x9076
+#define GL_PATH_INITIAL_END_CAP_NV        0x9077
+#define GL_PATH_TERMINAL_END_CAP_NV       0x9078
+#define GL_PATH_JOIN_STYLE_NV             0x9079
+#define GL_PATH_MITER_LIMIT_NV            0x907A
+#define GL_PATH_DASH_CAPS_NV              0x907B
+#define GL_PATH_INITIAL_DASH_CAP_NV       0x907C
+#define GL_PATH_TERMINAL_DASH_CAP_NV      0x907D
+#define GL_PATH_DASH_OFFSET_NV            0x907E
+#define GL_PATH_CLIENT_LENGTH_NV          0x907F
+#define GL_PATH_FILL_MODE_NV              0x9080
+#define GL_PATH_FILL_MASK_NV              0x9081
+#define GL_PATH_FILL_COVER_MODE_NV        0x9082
+#define GL_PATH_STROKE_COVER_MODE_NV      0x9083
+#define GL_PATH_STROKE_MASK_NV            0x9084
+#define GL_COUNT_UP_NV                    0x9088
+#define GL_COUNT_DOWN_NV                  0x9089
+#define GL_PATH_OBJECT_BOUNDING_BOX_NV    0x908A
+#define GL_CONVEX_HULL_NV                 0x908B
+#define GL_BOUNDING_BOX_NV                0x908D
+#define GL_TRANSLATE_X_NV                 0x908E
+#define GL_TRANSLATE_Y_NV                 0x908F
+#define GL_TRANSLATE_2D_NV                0x9090
+#define GL_TRANSLATE_3D_NV                0x9091
+#define GL_AFFINE_2D_NV                   0x9092
+#define GL_AFFINE_3D_NV                   0x9094
+#define GL_TRANSPOSE_AFFINE_2D_NV         0x9096
+#define GL_TRANSPOSE_AFFINE_3D_NV         0x9098
+#define GL_UTF8_NV                        0x909A
+#define GL_UTF16_NV                       0x909B
+#define GL_BOUNDING_BOX_OF_BOUNDING_BOXES_NV 0x909C
+#define GL_PATH_COMMAND_COUNT_NV          0x909D
+#define GL_PATH_COORD_COUNT_NV            0x909E
+#define GL_PATH_DASH_ARRAY_COUNT_NV       0x909F
+#define GL_PATH_COMPUTED_LENGTH_NV        0x90A0
+#define GL_PATH_FILL_BOUNDING_BOX_NV      0x90A1
+#define GL_PATH_STROKE_BOUNDING_BOX_NV    0x90A2
+#define GL_SQUARE_NV                      0x90A3
+#define GL_ROUND_NV                       0x90A4
+#define GL_TRIANGULAR_NV                  0x90A5
+#define GL_BEVEL_NV                       0x90A6
+#define GL_MITER_REVERT_NV                0x90A7
+#define GL_MITER_TRUNCATE_NV              0x90A8
+#define GL_SKIP_MISSING_GLYPH_NV          0x90A9
+#define GL_USE_MISSING_GLYPH_NV           0x90AA
+#define GL_PATH_ERROR_POSITION_NV         0x90AB
+#define GL_PATH_FOG_GEN_MODE_NV           0x90AC
+#define GL_ACCUM_ADJACENT_PAIRS_NV        0x90AD
+#define GL_ADJACENT_PAIRS_NV              0x90AE
+#define GL_FIRST_TO_REST_NV               0x90AF
+#define GL_PATH_GEN_MODE_NV               0x90B0
+#define GL_PATH_GEN_COEFF_NV              0x90B1
+#define GL_PATH_GEN_COLOR_FORMAT_NV       0x90B2
+#define GL_PATH_GEN_COMPONENTS_NV         0x90B3
+#define GL_PATH_STENCIL_FUNC_NV           0x90B7
+#define GL_PATH_STENCIL_REF_NV            0x90B8
+#define GL_PATH_STENCIL_VALUE_MASK_NV     0x90B9
+#define GL_PATH_STENCIL_DEPTH_OFFSET_FACTOR_NV 0x90BD
+#define GL_PATH_STENCIL_DEPTH_OFFSET_UNITS_NV 0x90BE
+#define GL_PATH_COVER_DEPTH_FUNC_NV       0x90BF
+#define GL_PATH_DASH_OFFSET_RESET_NV      0x90B4
+#define GL_MOVE_TO_RESETS_NV              0x90B5
+#define GL_MOVE_TO_CONTINUES_NV           0x90B6
+#define GL_CLOSE_PATH_NV                  0x00
+#define GL_MOVE_TO_NV                     0x02
+#define GL_RELATIVE_MOVE_TO_NV            0x03
+#define GL_LINE_TO_NV                     0x04
+#define GL_RELATIVE_LINE_TO_NV            0x05
+#define GL_HORIZONTAL_LINE_TO_NV          0x06
+#define GL_RELATIVE_HORIZONTAL_LINE_TO_NV 0x07
+#define GL_VERTICAL_LINE_TO_NV            0x08
+#define GL_RELATIVE_VERTICAL_LINE_TO_NV   0x09
+#define GL_QUADRATIC_CURVE_TO_NV          0x0A
+#define GL_RELATIVE_QUADRATIC_CURVE_TO_NV 0x0B
+#define GL_CUBIC_CURVE_TO_NV              0x0C
+#define GL_RELATIVE_CUBIC_CURVE_TO_NV     0x0D
+#define GL_SMOOTH_QUADRATIC_CURVE_TO_NV   0x0E
+#define GL_RELATIVE_SMOOTH_QUADRATIC_CURVE_TO_NV 0x0F
+#define GL_SMOOTH_CUBIC_CURVE_TO_NV       0x10
+#define GL_RELATIVE_SMOOTH_CUBIC_CURVE_TO_NV 0x11
+#define GL_SMALL_CCW_ARC_TO_NV            0x12
+#define GL_RELATIVE_SMALL_CCW_ARC_TO_NV   0x13
+#define GL_SMALL_CW_ARC_TO_NV             0x14
+#define GL_RELATIVE_SMALL_CW_ARC_TO_NV    0x15
+#define GL_LARGE_CCW_ARC_TO_NV            0x16
+#define GL_RELATIVE_LARGE_CCW_ARC_TO_NV   0x17
+#define GL_LARGE_CW_ARC_TO_NV             0x18
+#define GL_RELATIVE_LARGE_CW_ARC_TO_NV    0x19
+#define GL_RESTART_PATH_NV                0xF0
+#define GL_DUP_FIRST_CUBIC_CURVE_TO_NV    0xF2
+#define GL_DUP_LAST_CUBIC_CURVE_TO_NV     0xF4
+#define GL_RECT_NV                        0xF6
+#define GL_CIRCULAR_CCW_ARC_TO_NV         0xF8
+#define GL_CIRCULAR_CW_ARC_TO_NV          0xFA
+#define GL_CIRCULAR_TANGENT_ARC_TO_NV     0xFC
+#define GL_ARC_TO_NV                      0xFE
+#define GL_RELATIVE_ARC_TO_NV             0xFF
+#define GL_BOLD_BIT_NV                    0x01
+#define GL_ITALIC_BIT_NV                  0x02
+#define GL_GLYPH_WIDTH_BIT_NV             0x01
+#define GL_GLYPH_HEIGHT_BIT_NV            0x02
+#define GL_GLYPH_HORIZONTAL_BEARING_X_BIT_NV 0x04
+#define GL_GLYPH_HORIZONTAL_BEARING_Y_BIT_NV 0x08
+#define GL_GLYPH_HORIZONTAL_BEARING_ADVANCE_BIT_NV 0x10
+#define GL_GLYPH_VERTICAL_BEARING_X_BIT_NV 0x20
+#define GL_GLYPH_VERTICAL_BEARING_Y_BIT_NV 0x40
+#define GL_GLYPH_VERTICAL_BEARING_ADVANCE_BIT_NV 0x80
+#define GL_GLYPH_HAS_KERNING_BIT_NV       0x100
+#define GL_FONT_X_MIN_BOUNDS_BIT_NV       0x00010000
+#define GL_FONT_Y_MIN_BOUNDS_BIT_NV       0x00020000
+#define GL_FONT_X_MAX_BOUNDS_BIT_NV       0x00040000
+#define GL_FONT_Y_MAX_BOUNDS_BIT_NV       0x00080000
+#define GL_FONT_UNITS_PER_EM_BIT_NV       0x00100000
+#define GL_FONT_ASCENDER_BIT_NV           0x00200000
+#define GL_FONT_DESCENDER_BIT_NV          0x00400000
+#define GL_FONT_HEIGHT_BIT_NV             0x00800000
+#define GL_FONT_MAX_ADVANCE_WIDTH_BIT_NV  0x01000000
+#define GL_FONT_MAX_ADVANCE_HEIGHT_BIT_NV 0x02000000
+#define GL_FONT_UNDERLINE_POSITION_BIT_NV 0x04000000
+#define GL_FONT_UNDERLINE_THICKNESS_BIT_NV 0x08000000
+#define GL_FONT_HAS_KERNING_BIT_NV        0x10000000
+#define GL_PRIMARY_COLOR_NV               0x852C
+#define GL_SECONDARY_COLOR_NV             0x852D
+typedef GLuint (APIENTRYP PFNGLGENPATHSNVPROC) (GLsizei range);
+typedef void (APIENTRYP PFNGLDELETEPATHSNVPROC) (GLuint path, GLsizei range);
+typedef GLboolean (APIENTRYP PFNGLISPATHNVPROC) (GLuint path);
+typedef void (APIENTRYP PFNGLPATHCOMMANDSNVPROC) (GLuint path, GLsizei numCommands, const GLubyte *commands, GLsizei numCoords, GLenum coordType, const void *coords);
+typedef void (APIENTRYP PFNGLPATHCOORDSNVPROC) (GLuint path, GLsizei numCoords, GLenum coordType, const void *coords);
+typedef void (APIENTRYP PFNGLPATHSUBCOMMANDSNVPROC) (GLuint path, GLsizei commandStart, GLsizei commandsToDelete, GLsizei numCommands, const GLubyte *commands, GLsizei numCoords, GLenum coordType, const void *coords);
+typedef void (APIENTRYP PFNGLPATHSUBCOORDSNVPROC) (GLuint path, GLsizei coordStart, GLsizei numCoords, GLenum coordType, const void *coords);
+typedef void (APIENTRYP PFNGLPATHSTRINGNVPROC) (GLuint path, GLenum format, GLsizei length, const void *pathString);
+typedef void (APIENTRYP PFNGLPATHGLYPHSNVPROC) (GLuint firstPathName, GLenum fontTarget, const void *fontName, GLbitfield fontStyle, GLsizei numGlyphs, GLenum type, const void *charcodes, GLenum handleMissingGlyphs, GLuint pathParameterTemplate, GLfloat emScale);
+typedef void (APIENTRYP PFNGLPATHGLYPHRANGENVPROC) (GLuint firstPathName, GLenum fontTarget, const void *fontName, GLbitfield fontStyle, GLuint firstGlyph, GLsizei numGlyphs, GLenum handleMissingGlyphs, GLuint pathParameterTemplate, GLfloat emScale);
+typedef void (APIENTRYP PFNGLWEIGHTPATHSNVPROC) (GLuint resultPath, GLsizei numPaths, const GLuint *paths, const GLfloat *weights);
+typedef void (APIENTRYP PFNGLCOPYPATHNVPROC) (GLuint resultPath, GLuint srcPath);
+typedef void (APIENTRYP PFNGLINTERPOLATEPATHSNVPROC) (GLuint resultPath, GLuint pathA, GLuint pathB, GLfloat weight);
+typedef void (APIENTRYP PFNGLTRANSFORMPATHNVPROC) (GLuint resultPath, GLuint srcPath, GLenum transformType, const GLfloat *transformValues);
+typedef void (APIENTRYP PFNGLPATHPARAMETERIVNVPROC) (GLuint path, GLenum pname, const GLint *value);
+typedef void (APIENTRYP PFNGLPATHPARAMETERINVPROC) (GLuint path, GLenum pname, GLint value);
+typedef void (APIENTRYP PFNGLPATHPARAMETERFVNVPROC) (GLuint path, GLenum pname, const GLfloat *value);
+typedef void (APIENTRYP PFNGLPATHPARAMETERFNVPROC) (GLuint path, GLenum pname, GLfloat value);
+typedef void (APIENTRYP PFNGLPATHDASHARRAYNVPROC) (GLuint path, GLsizei dashCount, const GLfloat *dashArray);
+typedef void (APIENTRYP PFNGLPATHSTENCILFUNCNVPROC) (GLenum func, GLint ref, GLuint mask);
+typedef void (APIENTRYP PFNGLPATHSTENCILDEPTHOFFSETNVPROC) (GLfloat factor, GLfloat units);
+typedef void (APIENTRYP PFNGLSTENCILFILLPATHNVPROC) (GLuint path, GLenum fillMode, GLuint mask);
+typedef void (APIENTRYP PFNGLSTENCILSTROKEPATHNVPROC) (GLuint path, GLint reference, GLuint mask);
+typedef void (APIENTRYP PFNGLSTENCILFILLPATHINSTANCEDNVPROC) (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLenum fillMode, GLuint mask, GLenum transformType, const GLfloat *transformValues);
+typedef void (APIENTRYP PFNGLSTENCILSTROKEPATHINSTANCEDNVPROC) (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLint reference, GLuint mask, GLenum transformType, const GLfloat *transformValues);
+typedef void (APIENTRYP PFNGLPATHCOVERDEPTHFUNCNVPROC) (GLenum func);
+typedef void (APIENTRYP PFNGLPATHCOLORGENNVPROC) (GLenum color, GLenum genMode, GLenum colorFormat, const GLfloat *coeffs);
+typedef void (APIENTRYP PFNGLPATHTEXGENNVPROC) (GLenum texCoordSet, GLenum genMode, GLint components, const GLfloat *coeffs);
+typedef void (APIENTRYP PFNGLPATHFOGGENNVPROC) (GLenum genMode);
+typedef void (APIENTRYP PFNGLCOVERFILLPATHNVPROC) (GLuint path, GLenum coverMode);
+typedef void (APIENTRYP PFNGLCOVERSTROKEPATHNVPROC) (GLuint path, GLenum coverMode);
+typedef void (APIENTRYP PFNGLCOVERFILLPATHINSTANCEDNVPROC) (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLenum coverMode, GLenum transformType, const GLfloat *transformValues);
+typedef void (APIENTRYP PFNGLCOVERSTROKEPATHINSTANCEDNVPROC) (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLenum coverMode, GLenum transformType, const GLfloat *transformValues);
+typedef void (APIENTRYP PFNGLGETPATHPARAMETERIVNVPROC) (GLuint path, GLenum pname, GLint *value);
+typedef void (APIENTRYP PFNGLGETPATHPARAMETERFVNVPROC) (GLuint path, GLenum pname, GLfloat *value);
+typedef void (APIENTRYP PFNGLGETPATHCOMMANDSNVPROC) (GLuint path, GLubyte *commands);
+typedef void (APIENTRYP PFNGLGETPATHCOORDSNVPROC) (GLuint path, GLfloat *coords);
+typedef void (APIENTRYP PFNGLGETPATHDASHARRAYNVPROC) (GLuint path, GLfloat *dashArray);
+typedef void (APIENTRYP PFNGLGETPATHMETRICSNVPROC) (GLbitfield metricQueryMask, GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLsizei stride, GLfloat *metrics);
+typedef void (APIENTRYP PFNGLGETPATHMETRICRANGENVPROC) (GLbitfield metricQueryMask, GLuint firstPathName, GLsizei numPaths, GLsizei stride, GLfloat *metrics);
+typedef void (APIENTRYP PFNGLGETPATHSPACINGNVPROC) (GLenum pathListMode, GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLfloat advanceScale, GLfloat kerningScale, GLenum transformType, GLfloat *returnedSpacing);
+typedef void (APIENTRYP PFNGLGETPATHCOLORGENIVNVPROC) (GLenum color, GLenum pname, GLint *value);
+typedef void (APIENTRYP PFNGLGETPATHCOLORGENFVNVPROC) (GLenum color, GLenum pname, GLfloat *value);
+typedef void (APIENTRYP PFNGLGETPATHTEXGENIVNVPROC) (GLenum texCoordSet, GLenum pname, GLint *value);
+typedef void (APIENTRYP PFNGLGETPATHTEXGENFVNVPROC) (GLenum texCoordSet, GLenum pname, GLfloat *value);
+typedef GLboolean (APIENTRYP PFNGLISPOINTINFILLPATHNVPROC) (GLuint path, GLuint mask, GLfloat x, GLfloat y);
+typedef GLboolean (APIENTRYP PFNGLISPOINTINSTROKEPATHNVPROC) (GLuint path, GLfloat x, GLfloat y);
+typedef GLfloat (APIENTRYP PFNGLGETPATHLENGTHNVPROC) (GLuint path, GLsizei startSegment, GLsizei numSegments);
+typedef GLboolean (APIENTRYP PFNGLPOINTALONGPATHNVPROC) (GLuint path, GLsizei startSegment, GLsizei numSegments, GLfloat distance, GLfloat *x, GLfloat *y, GLfloat *tangentX, GLfloat *tangentY);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLuint APIENTRY glGenPathsNV (GLsizei range);
+GLAPI void APIENTRY glDeletePathsNV (GLuint path, GLsizei range);
+GLAPI GLboolean APIENTRY glIsPathNV (GLuint path);
+GLAPI void APIENTRY glPathCommandsNV (GLuint path, GLsizei numCommands, const GLubyte *commands, GLsizei numCoords, GLenum coordType, const void *coords);
+GLAPI void APIENTRY glPathCoordsNV (GLuint path, GLsizei numCoords, GLenum coordType, const void *coords);
+GLAPI void APIENTRY glPathSubCommandsNV (GLuint path, GLsizei commandStart, GLsizei commandsToDelete, GLsizei numCommands, const GLubyte *commands, GLsizei numCoords, GLenum coordType, const void *coords);
+GLAPI void APIENTRY glPathSubCoordsNV (GLuint path, GLsizei coordStart, GLsizei numCoords, GLenum coordType, const void *coords);
+GLAPI void APIENTRY glPathStringNV (GLuint path, GLenum format, GLsizei length, const void *pathString);
+GLAPI void APIENTRY glPathGlyphsNV (GLuint firstPathName, GLenum fontTarget, const void *fontName, GLbitfield fontStyle, GLsizei numGlyphs, GLenum type, const void *charcodes, GLenum handleMissingGlyphs, GLuint pathParameterTemplate, GLfloat emScale);
+GLAPI void APIENTRY glPathGlyphRangeNV (GLuint firstPathName, GLenum fontTarget, const void *fontName, GLbitfield fontStyle, GLuint firstGlyph, GLsizei numGlyphs, GLenum handleMissingGlyphs, GLuint pathParameterTemplate, GLfloat emScale);
+GLAPI void APIENTRY glWeightPathsNV (GLuint resultPath, GLsizei numPaths, const GLuint *paths, const GLfloat *weights);
+GLAPI void APIENTRY glCopyPathNV (GLuint resultPath, GLuint srcPath);
+GLAPI void APIENTRY glInterpolatePathsNV (GLuint resultPath, GLuint pathA, GLuint pathB, GLfloat weight);
+GLAPI void APIENTRY glTransformPathNV (GLuint resultPath, GLuint srcPath, GLenum transformType, const GLfloat *transformValues);
+GLAPI void APIENTRY glPathParameterivNV (GLuint path, GLenum pname, const GLint *value);
+GLAPI void APIENTRY glPathParameteriNV (GLuint path, GLenum pname, GLint value);
+GLAPI void APIENTRY glPathParameterfvNV (GLuint path, GLenum pname, const GLfloat *value);
+GLAPI void APIENTRY glPathParameterfNV (GLuint path, GLenum pname, GLfloat value);
+GLAPI void APIENTRY glPathDashArrayNV (GLuint path, GLsizei dashCount, const GLfloat *dashArray);
+GLAPI void APIENTRY glPathStencilFuncNV (GLenum func, GLint ref, GLuint mask);
+GLAPI void APIENTRY glPathStencilDepthOffsetNV (GLfloat factor, GLfloat units);
+GLAPI void APIENTRY glStencilFillPathNV (GLuint path, GLenum fillMode, GLuint mask);
+GLAPI void APIENTRY glStencilStrokePathNV (GLuint path, GLint reference, GLuint mask);
+GLAPI void APIENTRY glStencilFillPathInstancedNV (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLenum fillMode, GLuint mask, GLenum transformType, const GLfloat *transformValues);
+GLAPI void APIENTRY glStencilStrokePathInstancedNV (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLint reference, GLuint mask, GLenum transformType, const GLfloat *transformValues);
+GLAPI void APIENTRY glPathCoverDepthFuncNV (GLenum func);
+GLAPI void APIENTRY glPathColorGenNV (GLenum color, GLenum genMode, GLenum colorFormat, const GLfloat *coeffs);
+GLAPI void APIENTRY glPathTexGenNV (GLenum texCoordSet, GLenum genMode, GLint components, const GLfloat *coeffs);
+GLAPI void APIENTRY glPathFogGenNV (GLenum genMode);
+GLAPI void APIENTRY glCoverFillPathNV (GLuint path, GLenum coverMode);
+GLAPI void APIENTRY glCoverStrokePathNV (GLuint path, GLenum coverMode);
+GLAPI void APIENTRY glCoverFillPathInstancedNV (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLenum coverMode, GLenum transformType, const GLfloat *transformValues);
+GLAPI void APIENTRY glCoverStrokePathInstancedNV (GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLenum coverMode, GLenum transformType, const GLfloat *transformValues);
+GLAPI void APIENTRY glGetPathParameterivNV (GLuint path, GLenum pname, GLint *value);
+GLAPI void APIENTRY glGetPathParameterfvNV (GLuint path, GLenum pname, GLfloat *value);
+GLAPI void APIENTRY glGetPathCommandsNV (GLuint path, GLubyte *commands);
+GLAPI void APIENTRY glGetPathCoordsNV (GLuint path, GLfloat *coords);
+GLAPI void APIENTRY glGetPathDashArrayNV (GLuint path, GLfloat *dashArray);
+GLAPI void APIENTRY glGetPathMetricsNV (GLbitfield metricQueryMask, GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLsizei stride, GLfloat *metrics);
+GLAPI void APIENTRY glGetPathMetricRangeNV (GLbitfield metricQueryMask, GLuint firstPathName, GLsizei numPaths, GLsizei stride, GLfloat *metrics);
+GLAPI void APIENTRY glGetPathSpacingNV (GLenum pathListMode, GLsizei numPaths, GLenum pathNameType, const void *paths, GLuint pathBase, GLfloat advanceScale, GLfloat kerningScale, GLenum transformType, GLfloat *returnedSpacing);
+GLAPI void APIENTRY glGetPathColorGenivNV (GLenum color, GLenum pname, GLint *value);
+GLAPI void APIENTRY glGetPathColorGenfvNV (GLenum color, GLenum pname, GLfloat *value);
+GLAPI void APIENTRY glGetPathTexGenivNV (GLenum texCoordSet, GLenum pname, GLint *value);
+GLAPI void APIENTRY glGetPathTexGenfvNV (GLenum texCoordSet, GLenum pname, GLfloat *value);
+GLAPI GLboolean APIENTRY glIsPointInFillPathNV (GLuint path, GLuint mask, GLfloat x, GLfloat y);
+GLAPI GLboolean APIENTRY glIsPointInStrokePathNV (GLuint path, GLfloat x, GLfloat y);
+GLAPI GLfloat APIENTRY glGetPathLengthNV (GLuint path, GLsizei startSegment, GLsizei numSegments);
+GLAPI GLboolean APIENTRY glPointAlongPathNV (GLuint path, GLsizei startSegment, GLsizei numSegments, GLfloat distance, GLfloat *x, GLfloat *y, GLfloat *tangentX, GLfloat *tangentY);
+#endif
+#endif /* GL_NV_path_rendering */
+
+#ifndef GL_NV_pixel_data_range
+#define GL_NV_pixel_data_range 1
+#define GL_WRITE_PIXEL_DATA_RANGE_NV      0x8878
+#define GL_READ_PIXEL_DATA_RANGE_NV       0x8879
+#define GL_WRITE_PIXEL_DATA_RANGE_LENGTH_NV 0x887A
+#define GL_READ_PIXEL_DATA_RANGE_LENGTH_NV 0x887B
+#define GL_WRITE_PIXEL_DATA_RANGE_POINTER_NV 0x887C
+#define GL_READ_PIXEL_DATA_RANGE_POINTER_NV 0x887D
+typedef void (APIENTRYP PFNGLPIXELDATARANGENVPROC) (GLenum target, GLsizei length, const void *pointer);
+typedef void (APIENTRYP PFNGLFLUSHPIXELDATARANGENVPROC) (GLenum target);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPixelDataRangeNV (GLenum target, GLsizei length, const void *pointer);
+GLAPI void APIENTRY glFlushPixelDataRangeNV (GLenum target);
+#endif
+#endif /* GL_NV_pixel_data_range */
+
+#ifndef GL_NV_point_sprite
+#define GL_NV_point_sprite 1
+#define GL_POINT_SPRITE_NV                0x8861
+#define GL_COORD_REPLACE_NV               0x8862
+#define GL_POINT_SPRITE_R_MODE_NV         0x8863
+typedef void (APIENTRYP PFNGLPOINTPARAMETERINVPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERIVNVPROC) (GLenum pname, const GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPointParameteriNV (GLenum pname, GLint param);
+GLAPI void APIENTRY glPointParameterivNV (GLenum pname, const GLint *params);
+#endif
+#endif /* GL_NV_point_sprite */
+
+#ifndef GL_NV_present_video
+#define GL_NV_present_video 1
+#define GL_FRAME_NV                       0x8E26
+#define GL_FIELDS_NV                      0x8E27
+#define GL_CURRENT_TIME_NV                0x8E28
+#define GL_NUM_FILL_STREAMS_NV            0x8E29
+#define GL_PRESENT_TIME_NV                0x8E2A
+#define GL_PRESENT_DURATION_NV            0x8E2B
+typedef void (APIENTRYP PFNGLPRESENTFRAMEKEYEDNVPROC) (GLuint video_slot, GLuint64EXT minPresentTime, GLuint beginPresentTimeId, GLuint presentDurationId, GLenum type, GLenum target0, GLuint fill0, GLuint key0, GLenum target1, GLuint fill1, GLuint key1);
+typedef void (APIENTRYP PFNGLPRESENTFRAMEDUALFILLNVPROC) (GLuint video_slot, GLuint64EXT minPresentTime, GLuint beginPresentTimeId, GLuint presentDurationId, GLenum type, GLenum target0, GLuint fill0, GLenum target1, GLuint fill1, GLenum target2, GLuint fill2, GLenum target3, GLuint fill3);
+typedef void (APIENTRYP PFNGLGETVIDEOIVNVPROC) (GLuint video_slot, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVIDEOUIVNVPROC) (GLuint video_slot, GLenum pname, GLuint *params);
+typedef void (APIENTRYP PFNGLGETVIDEOI64VNVPROC) (GLuint video_slot, GLenum pname, GLint64EXT *params);
+typedef void (APIENTRYP PFNGLGETVIDEOUI64VNVPROC) (GLuint video_slot, GLenum pname, GLuint64EXT *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPresentFrameKeyedNV (GLuint video_slot, GLuint64EXT minPresentTime, GLuint beginPresentTimeId, GLuint presentDurationId, GLenum type, GLenum target0, GLuint fill0, GLuint key0, GLenum target1, GLuint fill1, GLuint key1);
+GLAPI void APIENTRY glPresentFrameDualFillNV (GLuint video_slot, GLuint64EXT minPresentTime, GLuint beginPresentTimeId, GLuint presentDurationId, GLenum type, GLenum target0, GLuint fill0, GLenum target1, GLuint fill1, GLenum target2, GLuint fill2, GLenum target3, GLuint fill3);
+GLAPI void APIENTRY glGetVideoivNV (GLuint video_slot, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVideouivNV (GLuint video_slot, GLenum pname, GLuint *params);
+GLAPI void APIENTRY glGetVideoi64vNV (GLuint video_slot, GLenum pname, GLint64EXT *params);
+GLAPI void APIENTRY glGetVideoui64vNV (GLuint video_slot, GLenum pname, GLuint64EXT *params);
+#endif
+#endif /* GL_NV_present_video */
+
+#ifndef GL_NV_primitive_restart
+#define GL_NV_primitive_restart 1
+#define GL_PRIMITIVE_RESTART_NV           0x8558
+#define GL_PRIMITIVE_RESTART_INDEX_NV     0x8559
+typedef void (APIENTRYP PFNGLPRIMITIVERESTARTNVPROC) (void);
+typedef void (APIENTRYP PFNGLPRIMITIVERESTARTINDEXNVPROC) (GLuint index);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPrimitiveRestartNV (void);
+GLAPI void APIENTRY glPrimitiveRestartIndexNV (GLuint index);
+#endif
+#endif /* GL_NV_primitive_restart */
+
+#ifndef GL_NV_register_combiners
+#define GL_NV_register_combiners 1
+#define GL_REGISTER_COMBINERS_NV          0x8522
+#define GL_VARIABLE_A_NV                  0x8523
+#define GL_VARIABLE_B_NV                  0x8524
+#define GL_VARIABLE_C_NV                  0x8525
+#define GL_VARIABLE_D_NV                  0x8526
+#define GL_VARIABLE_E_NV                  0x8527
+#define GL_VARIABLE_F_NV                  0x8528
+#define GL_VARIABLE_G_NV                  0x8529
+#define GL_CONSTANT_COLOR0_NV             0x852A
+#define GL_CONSTANT_COLOR1_NV             0x852B
+#define GL_SPARE0_NV                      0x852E
+#define GL_SPARE1_NV                      0x852F
+#define GL_DISCARD_NV                     0x8530
+#define GL_E_TIMES_F_NV                   0x8531
+#define GL_SPARE0_PLUS_SECONDARY_COLOR_NV 0x8532
+#define GL_UNSIGNED_IDENTITY_NV           0x8536
+#define GL_UNSIGNED_INVERT_NV             0x8537
+#define GL_EXPAND_NORMAL_NV               0x8538
+#define GL_EXPAND_NEGATE_NV               0x8539
+#define GL_HALF_BIAS_NORMAL_NV            0x853A
+#define GL_HALF_BIAS_NEGATE_NV            0x853B
+#define GL_SIGNED_IDENTITY_NV             0x853C
+#define GL_SIGNED_NEGATE_NV               0x853D
+#define GL_SCALE_BY_TWO_NV                0x853E
+#define GL_SCALE_BY_FOUR_NV               0x853F
+#define GL_SCALE_BY_ONE_HALF_NV           0x8540
+#define GL_BIAS_BY_NEGATIVE_ONE_HALF_NV   0x8541
+#define GL_COMBINER_INPUT_NV              0x8542
+#define GL_COMBINER_MAPPING_NV            0x8543
+#define GL_COMBINER_COMPONENT_USAGE_NV    0x8544
+#define GL_COMBINER_AB_DOT_PRODUCT_NV     0x8545
+#define GL_COMBINER_CD_DOT_PRODUCT_NV     0x8546
+#define GL_COMBINER_MUX_SUM_NV            0x8547
+#define GL_COMBINER_SCALE_NV              0x8548
+#define GL_COMBINER_BIAS_NV               0x8549
+#define GL_COMBINER_AB_OUTPUT_NV          0x854A
+#define GL_COMBINER_CD_OUTPUT_NV          0x854B
+#define GL_COMBINER_SUM_OUTPUT_NV         0x854C
+#define GL_MAX_GENERAL_COMBINERS_NV       0x854D
+#define GL_NUM_GENERAL_COMBINERS_NV       0x854E
+#define GL_COLOR_SUM_CLAMP_NV             0x854F
+#define GL_COMBINER0_NV                   0x8550
+#define GL_COMBINER1_NV                   0x8551
+#define GL_COMBINER2_NV                   0x8552
+#define GL_COMBINER3_NV                   0x8553
+#define GL_COMBINER4_NV                   0x8554
+#define GL_COMBINER5_NV                   0x8555
+#define GL_COMBINER6_NV                   0x8556
+#define GL_COMBINER7_NV                   0x8557
+typedef void (APIENTRYP PFNGLCOMBINERPARAMETERFVNVPROC) (GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLCOMBINERPARAMETERFNVPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLCOMBINERPARAMETERIVNVPROC) (GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLCOMBINERPARAMETERINVPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLCOMBINERINPUTNVPROC) (GLenum stage, GLenum portion, GLenum variable, GLenum input, GLenum mapping, GLenum componentUsage);
+typedef void (APIENTRYP PFNGLCOMBINEROUTPUTNVPROC) (GLenum stage, GLenum portion, GLenum abOutput, GLenum cdOutput, GLenum sumOutput, GLenum scale, GLenum bias, GLboolean abDotProduct, GLboolean cdDotProduct, GLboolean muxSum);
+typedef void (APIENTRYP PFNGLFINALCOMBINERINPUTNVPROC) (GLenum variable, GLenum input, GLenum mapping, GLenum componentUsage);
+typedef void (APIENTRYP PFNGLGETCOMBINERINPUTPARAMETERFVNVPROC) (GLenum stage, GLenum portion, GLenum variable, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCOMBINERINPUTPARAMETERIVNVPROC) (GLenum stage, GLenum portion, GLenum variable, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETCOMBINEROUTPUTPARAMETERFVNVPROC) (GLenum stage, GLenum portion, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCOMBINEROUTPUTPARAMETERIVNVPROC) (GLenum stage, GLenum portion, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETFINALCOMBINERINPUTPARAMETERFVNVPROC) (GLenum variable, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETFINALCOMBINERINPUTPARAMETERIVNVPROC) (GLenum variable, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCombinerParameterfvNV (GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glCombinerParameterfNV (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glCombinerParameterivNV (GLenum pname, const GLint *params);
+GLAPI void APIENTRY glCombinerParameteriNV (GLenum pname, GLint param);
+GLAPI void APIENTRY glCombinerInputNV (GLenum stage, GLenum portion, GLenum variable, GLenum input, GLenum mapping, GLenum componentUsage);
+GLAPI void APIENTRY glCombinerOutputNV (GLenum stage, GLenum portion, GLenum abOutput, GLenum cdOutput, GLenum sumOutput, GLenum scale, GLenum bias, GLboolean abDotProduct, GLboolean cdDotProduct, GLboolean muxSum);
+GLAPI void APIENTRY glFinalCombinerInputNV (GLenum variable, GLenum input, GLenum mapping, GLenum componentUsage);
+GLAPI void APIENTRY glGetCombinerInputParameterfvNV (GLenum stage, GLenum portion, GLenum variable, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetCombinerInputParameterivNV (GLenum stage, GLenum portion, GLenum variable, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetCombinerOutputParameterfvNV (GLenum stage, GLenum portion, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetCombinerOutputParameterivNV (GLenum stage, GLenum portion, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetFinalCombinerInputParameterfvNV (GLenum variable, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetFinalCombinerInputParameterivNV (GLenum variable, GLenum pname, GLint *params);
+#endif
+#endif /* GL_NV_register_combiners */
+
+#ifndef GL_NV_register_combiners2
+#define GL_NV_register_combiners2 1
+#define GL_PER_STAGE_CONSTANTS_NV         0x8535
+typedef void (APIENTRYP PFNGLCOMBINERSTAGEPARAMETERFVNVPROC) (GLenum stage, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCOMBINERSTAGEPARAMETERFVNVPROC) (GLenum stage, GLenum pname, GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glCombinerStageParameterfvNV (GLenum stage, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glGetCombinerStageParameterfvNV (GLenum stage, GLenum pname, GLfloat *params);
+#endif
+#endif /* GL_NV_register_combiners2 */
+
+#ifndef GL_NV_shader_atomic_counters
+#define GL_NV_shader_atomic_counters 1
+#endif /* GL_NV_shader_atomic_counters */
+
+#ifndef GL_NV_shader_atomic_float
+#define GL_NV_shader_atomic_float 1
+#endif /* GL_NV_shader_atomic_float */
+
+#ifndef GL_NV_shader_buffer_load
+#define GL_NV_shader_buffer_load 1
+#define GL_BUFFER_GPU_ADDRESS_NV          0x8F1D
+#define GL_GPU_ADDRESS_NV                 0x8F34
+#define GL_MAX_SHADER_BUFFER_ADDRESS_NV   0x8F35
+typedef void (APIENTRYP PFNGLMAKEBUFFERRESIDENTNVPROC) (GLenum target, GLenum access);
+typedef void (APIENTRYP PFNGLMAKEBUFFERNONRESIDENTNVPROC) (GLenum target);
+typedef GLboolean (APIENTRYP PFNGLISBUFFERRESIDENTNVPROC) (GLenum target);
+typedef void (APIENTRYP PFNGLMAKENAMEDBUFFERRESIDENTNVPROC) (GLuint buffer, GLenum access);
+typedef void (APIENTRYP PFNGLMAKENAMEDBUFFERNONRESIDENTNVPROC) (GLuint buffer);
+typedef GLboolean (APIENTRYP PFNGLISNAMEDBUFFERRESIDENTNVPROC) (GLuint buffer);
+typedef void (APIENTRYP PFNGLGETBUFFERPARAMETERUI64VNVPROC) (GLenum target, GLenum pname, GLuint64EXT *params);
+typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERUI64VNVPROC) (GLuint buffer, GLenum pname, GLuint64EXT *params);
+typedef void (APIENTRYP PFNGLGETINTEGERUI64VNVPROC) (GLenum value, GLuint64EXT *result);
+typedef void (APIENTRYP PFNGLUNIFORMUI64NVPROC) (GLint location, GLuint64EXT value);
+typedef void (APIENTRYP PFNGLUNIFORMUI64VNVPROC) (GLint location, GLsizei count, const GLuint64EXT *value);
+typedef void (APIENTRYP PFNGLGETUNIFORMUI64VNVPROC) (GLuint program, GLint location, GLuint64EXT *params);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMUI64NVPROC) (GLuint program, GLint location, GLuint64EXT value);
+typedef void (APIENTRYP PFNGLPROGRAMUNIFORMUI64VNVPROC) (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glMakeBufferResidentNV (GLenum target, GLenum access);
+GLAPI void APIENTRY glMakeBufferNonResidentNV (GLenum target);
+GLAPI GLboolean APIENTRY glIsBufferResidentNV (GLenum target);
+GLAPI void APIENTRY glMakeNamedBufferResidentNV (GLuint buffer, GLenum access);
+GLAPI void APIENTRY glMakeNamedBufferNonResidentNV (GLuint buffer);
+GLAPI GLboolean APIENTRY glIsNamedBufferResidentNV (GLuint buffer);
+GLAPI void APIENTRY glGetBufferParameterui64vNV (GLenum target, GLenum pname, GLuint64EXT *params);
+GLAPI void APIENTRY glGetNamedBufferParameterui64vNV (GLuint buffer, GLenum pname, GLuint64EXT *params);
+GLAPI void APIENTRY glGetIntegerui64vNV (GLenum value, GLuint64EXT *result);
+GLAPI void APIENTRY glUniformui64NV (GLint location, GLuint64EXT value);
+GLAPI void APIENTRY glUniformui64vNV (GLint location, GLsizei count, const GLuint64EXT *value);
+GLAPI void APIENTRY glGetUniformui64vNV (GLuint program, GLint location, GLuint64EXT *params);
+GLAPI void APIENTRY glProgramUniformui64NV (GLuint program, GLint location, GLuint64EXT value);
+GLAPI void APIENTRY glProgramUniformui64vNV (GLuint program, GLint location, GLsizei count, const GLuint64EXT *value);
+#endif
+#endif /* GL_NV_shader_buffer_load */
+
+#ifndef GL_NV_shader_buffer_store
+#define GL_NV_shader_buffer_store 1
+#define GL_SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 0x00000010
+#endif /* GL_NV_shader_buffer_store */
+
+#ifndef GL_NV_shader_storage_buffer_object
+#define GL_NV_shader_storage_buffer_object 1
+#endif /* GL_NV_shader_storage_buffer_object */
+
+#ifndef GL_NV_tessellation_program5
+#define GL_NV_tessellation_program5 1
+#define GL_MAX_PROGRAM_PATCH_ATTRIBS_NV   0x86D8
+#define GL_TESS_CONTROL_PROGRAM_NV        0x891E
+#define GL_TESS_EVALUATION_PROGRAM_NV     0x891F
+#define GL_TESS_CONTROL_PROGRAM_PARAMETER_BUFFER_NV 0x8C74
+#define GL_TESS_EVALUATION_PROGRAM_PARAMETER_BUFFER_NV 0x8C75
+#endif /* GL_NV_tessellation_program5 */
+
+#ifndef GL_NV_texgen_emboss
+#define GL_NV_texgen_emboss 1
+#define GL_EMBOSS_LIGHT_NV                0x855D
+#define GL_EMBOSS_CONSTANT_NV             0x855E
+#define GL_EMBOSS_MAP_NV                  0x855F
+#endif /* GL_NV_texgen_emboss */
+
+#ifndef GL_NV_texgen_reflection
+#define GL_NV_texgen_reflection 1
+#define GL_NORMAL_MAP_NV                  0x8511
+#define GL_REFLECTION_MAP_NV              0x8512
+#endif /* GL_NV_texgen_reflection */
+
+#ifndef GL_NV_texture_barrier
+#define GL_NV_texture_barrier 1
+typedef void (APIENTRYP PFNGLTEXTUREBARRIERNVPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTextureBarrierNV (void);
+#endif
+#endif /* GL_NV_texture_barrier */
+
+#ifndef GL_NV_texture_compression_vtc
+#define GL_NV_texture_compression_vtc 1
+#endif /* GL_NV_texture_compression_vtc */
+
+#ifndef GL_NV_texture_env_combine4
+#define GL_NV_texture_env_combine4 1
+#define GL_COMBINE4_NV                    0x8503
+#define GL_SOURCE3_RGB_NV                 0x8583
+#define GL_SOURCE3_ALPHA_NV               0x858B
+#define GL_OPERAND3_RGB_NV                0x8593
+#define GL_OPERAND3_ALPHA_NV              0x859B
+#endif /* GL_NV_texture_env_combine4 */
+
+#ifndef GL_NV_texture_expand_normal
+#define GL_NV_texture_expand_normal 1
+#define GL_TEXTURE_UNSIGNED_REMAP_MODE_NV 0x888F
+#endif /* GL_NV_texture_expand_normal */
+
+#ifndef GL_NV_texture_multisample
+#define GL_NV_texture_multisample 1
+#define GL_TEXTURE_COVERAGE_SAMPLES_NV    0x9045
+#define GL_TEXTURE_COLOR_SAMPLES_NV       0x9046
+typedef void (APIENTRYP PFNGLTEXIMAGE2DMULTISAMPLECOVERAGENVPROC) (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLboolean fixedSampleLocations);
+typedef void (APIENTRYP PFNGLTEXIMAGE3DMULTISAMPLECOVERAGENVPROC) (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedSampleLocations);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE2DMULTISAMPLENVPROC) (GLuint texture, GLenum target, GLsizei samples, GLint internalFormat, GLsizei width, GLsizei height, GLboolean fixedSampleLocations);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE3DMULTISAMPLENVPROC) (GLuint texture, GLenum target, GLsizei samples, GLint internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedSampleLocations);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE2DMULTISAMPLECOVERAGENVPROC) (GLuint texture, GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLboolean fixedSampleLocations);
+typedef void (APIENTRYP PFNGLTEXTUREIMAGE3DMULTISAMPLECOVERAGENVPROC) (GLuint texture, GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedSampleLocations);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexImage2DMultisampleCoverageNV (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLboolean fixedSampleLocations);
+GLAPI void APIENTRY glTexImage3DMultisampleCoverageNV (GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedSampleLocations);
+GLAPI void APIENTRY glTextureImage2DMultisampleNV (GLuint texture, GLenum target, GLsizei samples, GLint internalFormat, GLsizei width, GLsizei height, GLboolean fixedSampleLocations);
+GLAPI void APIENTRY glTextureImage3DMultisampleNV (GLuint texture, GLenum target, GLsizei samples, GLint internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedSampleLocations);
+GLAPI void APIENTRY glTextureImage2DMultisampleCoverageNV (GLuint texture, GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLboolean fixedSampleLocations);
+GLAPI void APIENTRY glTextureImage3DMultisampleCoverageNV (GLuint texture, GLenum target, GLsizei coverageSamples, GLsizei colorSamples, GLint internalFormat, GLsizei width, GLsizei height, GLsizei depth, GLboolean fixedSampleLocations);
+#endif
+#endif /* GL_NV_texture_multisample */
+
+#ifndef GL_NV_texture_rectangle
+#define GL_NV_texture_rectangle 1
+#define GL_TEXTURE_RECTANGLE_NV           0x84F5
+#define GL_TEXTURE_BINDING_RECTANGLE_NV   0x84F6
+#define GL_PROXY_TEXTURE_RECTANGLE_NV     0x84F7
+#define GL_MAX_RECTANGLE_TEXTURE_SIZE_NV  0x84F8
+#endif /* GL_NV_texture_rectangle */
+
+#ifndef GL_NV_texture_shader
+#define GL_NV_texture_shader 1
+#define GL_OFFSET_TEXTURE_RECTANGLE_NV    0x864C
+#define GL_OFFSET_TEXTURE_RECTANGLE_SCALE_NV 0x864D
+#define GL_DOT_PRODUCT_TEXTURE_RECTANGLE_NV 0x864E
+#define GL_RGBA_UNSIGNED_DOT_PRODUCT_MAPPING_NV 0x86D9
+#define GL_UNSIGNED_INT_S8_S8_8_8_NV      0x86DA
+#define GL_UNSIGNED_INT_8_8_S8_S8_REV_NV  0x86DB
+#define GL_DSDT_MAG_INTENSITY_NV          0x86DC
+#define GL_SHADER_CONSISTENT_NV           0x86DD
+#define GL_TEXTURE_SHADER_NV              0x86DE
+#define GL_SHADER_OPERATION_NV            0x86DF
+#define GL_CULL_MODES_NV                  0x86E0
+#define GL_OFFSET_TEXTURE_MATRIX_NV       0x86E1
+#define GL_OFFSET_TEXTURE_SCALE_NV        0x86E2
+#define GL_OFFSET_TEXTURE_BIAS_NV         0x86E3
+#define GL_OFFSET_TEXTURE_2D_MATRIX_NV    0x86E1
+#define GL_OFFSET_TEXTURE_2D_SCALE_NV     0x86E2
+#define GL_OFFSET_TEXTURE_2D_BIAS_NV      0x86E3
+#define GL_PREVIOUS_TEXTURE_INPUT_NV      0x86E4
+#define GL_CONST_EYE_NV                   0x86E5
+#define GL_PASS_THROUGH_NV                0x86E6
+#define GL_CULL_FRAGMENT_NV               0x86E7
+#define GL_OFFSET_TEXTURE_2D_NV           0x86E8
+#define GL_DEPENDENT_AR_TEXTURE_2D_NV     0x86E9
+#define GL_DEPENDENT_GB_TEXTURE_2D_NV     0x86EA
+#define GL_DOT_PRODUCT_NV                 0x86EC
+#define GL_DOT_PRODUCT_DEPTH_REPLACE_NV   0x86ED
+#define GL_DOT_PRODUCT_TEXTURE_2D_NV      0x86EE
+#define GL_DOT_PRODUCT_TEXTURE_CUBE_MAP_NV 0x86F0
+#define GL_DOT_PRODUCT_DIFFUSE_CUBE_MAP_NV 0x86F1
+#define GL_DOT_PRODUCT_REFLECT_CUBE_MAP_NV 0x86F2
+#define GL_DOT_PRODUCT_CONST_EYE_REFLECT_CUBE_MAP_NV 0x86F3
+#define GL_HILO_NV                        0x86F4
+#define GL_DSDT_NV                        0x86F5
+#define GL_DSDT_MAG_NV                    0x86F6
+#define GL_DSDT_MAG_VIB_NV                0x86F7
+#define GL_HILO16_NV                      0x86F8
+#define GL_SIGNED_HILO_NV                 0x86F9
+#define GL_SIGNED_HILO16_NV               0x86FA
+#define GL_SIGNED_RGBA_NV                 0x86FB
+#define GL_SIGNED_RGBA8_NV                0x86FC
+#define GL_SIGNED_RGB_NV                  0x86FE
+#define GL_SIGNED_RGB8_NV                 0x86FF
+#define GL_SIGNED_LUMINANCE_NV            0x8701
+#define GL_SIGNED_LUMINANCE8_NV           0x8702
+#define GL_SIGNED_LUMINANCE_ALPHA_NV      0x8703
+#define GL_SIGNED_LUMINANCE8_ALPHA8_NV    0x8704
+#define GL_SIGNED_ALPHA_NV                0x8705
+#define GL_SIGNED_ALPHA8_NV               0x8706
+#define GL_SIGNED_INTENSITY_NV            0x8707
+#define GL_SIGNED_INTENSITY8_NV           0x8708
+#define GL_DSDT8_NV                       0x8709
+#define GL_DSDT8_MAG8_NV                  0x870A
+#define GL_DSDT8_MAG8_INTENSITY8_NV       0x870B
+#define GL_SIGNED_RGB_UNSIGNED_ALPHA_NV   0x870C
+#define GL_SIGNED_RGB8_UNSIGNED_ALPHA8_NV 0x870D
+#define GL_HI_SCALE_NV                    0x870E
+#define GL_LO_SCALE_NV                    0x870F
+#define GL_DS_SCALE_NV                    0x8710
+#define GL_DT_SCALE_NV                    0x8711
+#define GL_MAGNITUDE_SCALE_NV             0x8712
+#define GL_VIBRANCE_SCALE_NV              0x8713
+#define GL_HI_BIAS_NV                     0x8714
+#define GL_LO_BIAS_NV                     0x8715
+#define GL_DS_BIAS_NV                     0x8716
+#define GL_DT_BIAS_NV                     0x8717
+#define GL_MAGNITUDE_BIAS_NV              0x8718
+#define GL_VIBRANCE_BIAS_NV               0x8719
+#define GL_TEXTURE_BORDER_VALUES_NV       0x871A
+#define GL_TEXTURE_HI_SIZE_NV             0x871B
+#define GL_TEXTURE_LO_SIZE_NV             0x871C
+#define GL_TEXTURE_DS_SIZE_NV             0x871D
+#define GL_TEXTURE_DT_SIZE_NV             0x871E
+#define GL_TEXTURE_MAG_SIZE_NV            0x871F
+#endif /* GL_NV_texture_shader */
+
+#ifndef GL_NV_texture_shader2
+#define GL_NV_texture_shader2 1
+#define GL_DOT_PRODUCT_TEXTURE_3D_NV      0x86EF
+#endif /* GL_NV_texture_shader2 */
+
+#ifndef GL_NV_texture_shader3
+#define GL_NV_texture_shader3 1
+#define GL_OFFSET_PROJECTIVE_TEXTURE_2D_NV 0x8850
+#define GL_OFFSET_PROJECTIVE_TEXTURE_2D_SCALE_NV 0x8851
+#define GL_OFFSET_PROJECTIVE_TEXTURE_RECTANGLE_NV 0x8852
+#define GL_OFFSET_PROJECTIVE_TEXTURE_RECTANGLE_SCALE_NV 0x8853
+#define GL_OFFSET_HILO_TEXTURE_2D_NV      0x8854
+#define GL_OFFSET_HILO_TEXTURE_RECTANGLE_NV 0x8855
+#define GL_OFFSET_HILO_PROJECTIVE_TEXTURE_2D_NV 0x8856
+#define GL_OFFSET_HILO_PROJECTIVE_TEXTURE_RECTANGLE_NV 0x8857
+#define GL_DEPENDENT_HILO_TEXTURE_2D_NV   0x8858
+#define GL_DEPENDENT_RGB_TEXTURE_3D_NV    0x8859
+#define GL_DEPENDENT_RGB_TEXTURE_CUBE_MAP_NV 0x885A
+#define GL_DOT_PRODUCT_PASS_THROUGH_NV    0x885B
+#define GL_DOT_PRODUCT_TEXTURE_1D_NV      0x885C
+#define GL_DOT_PRODUCT_AFFINE_DEPTH_REPLACE_NV 0x885D
+#define GL_HILO8_NV                       0x885E
+#define GL_SIGNED_HILO8_NV                0x885F
+#define GL_FORCE_BLUE_TO_ONE_NV           0x8860
+#endif /* GL_NV_texture_shader3 */
+
+#ifndef GL_NV_transform_feedback
+#define GL_NV_transform_feedback 1
+#define GL_BACK_PRIMARY_COLOR_NV          0x8C77
+#define GL_BACK_SECONDARY_COLOR_NV        0x8C78
+#define GL_TEXTURE_COORD_NV               0x8C79
+#define GL_CLIP_DISTANCE_NV               0x8C7A
+#define GL_VERTEX_ID_NV                   0x8C7B
+#define GL_PRIMITIVE_ID_NV                0x8C7C
+#define GL_GENERIC_ATTRIB_NV              0x8C7D
+#define GL_TRANSFORM_FEEDBACK_ATTRIBS_NV  0x8C7E
+#define GL_TRANSFORM_FEEDBACK_BUFFER_MODE_NV 0x8C7F
+#define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS_NV 0x8C80
+#define GL_ACTIVE_VARYINGS_NV             0x8C81
+#define GL_ACTIVE_VARYING_MAX_LENGTH_NV   0x8C82
+#define GL_TRANSFORM_FEEDBACK_VARYINGS_NV 0x8C83
+#define GL_TRANSFORM_FEEDBACK_BUFFER_START_NV 0x8C84
+#define GL_TRANSFORM_FEEDBACK_BUFFER_SIZE_NV 0x8C85
+#define GL_TRANSFORM_FEEDBACK_RECORD_NV   0x8C86
+#define GL_PRIMITIVES_GENERATED_NV        0x8C87
+#define GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN_NV 0x8C88
+#define GL_RASTERIZER_DISCARD_NV          0x8C89
+#define GL_MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS_NV 0x8C8A
+#define GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS_NV 0x8C8B
+#define GL_INTERLEAVED_ATTRIBS_NV         0x8C8C
+#define GL_SEPARATE_ATTRIBS_NV            0x8C8D
+#define GL_TRANSFORM_FEEDBACK_BUFFER_NV   0x8C8E
+#define GL_TRANSFORM_FEEDBACK_BUFFER_BINDING_NV 0x8C8F
+#define GL_LAYER_NV                       0x8DAA
+#define GL_NEXT_BUFFER_NV                 -2
+#define GL_SKIP_COMPONENTS4_NV            -3
+#define GL_SKIP_COMPONENTS3_NV            -4
+#define GL_SKIP_COMPONENTS2_NV            -5
+#define GL_SKIP_COMPONENTS1_NV            -6
+typedef void (APIENTRYP PFNGLBEGINTRANSFORMFEEDBACKNVPROC) (GLenum primitiveMode);
+typedef void (APIENTRYP PFNGLENDTRANSFORMFEEDBACKNVPROC) (void);
+typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKATTRIBSNVPROC) (GLuint count, const GLint *attribs, GLenum bufferMode);
+typedef void (APIENTRYP PFNGLBINDBUFFERRANGENVPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
+typedef void (APIENTRYP PFNGLBINDBUFFEROFFSETNVPROC) (GLenum target, GLuint index, GLuint buffer, GLintptr offset);
+typedef void (APIENTRYP PFNGLBINDBUFFERBASENVPROC) (GLenum target, GLuint index, GLuint buffer);
+typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKVARYINGSNVPROC) (GLuint program, GLsizei count, const GLint *locations, GLenum bufferMode);
+typedef void (APIENTRYP PFNGLACTIVEVARYINGNVPROC) (GLuint program, const GLchar *name);
+typedef GLint (APIENTRYP PFNGLGETVARYINGLOCATIONNVPROC) (GLuint program, const GLchar *name);
+typedef void (APIENTRYP PFNGLGETACTIVEVARYINGNVPROC) (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
+typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKVARYINGNVPROC) (GLuint program, GLuint index, GLint *location);
+typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKSTREAMATTRIBSNVPROC) (GLsizei count, const GLint *attribs, GLsizei nbuffers, const GLint *bufstreams, GLenum bufferMode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginTransformFeedbackNV (GLenum primitiveMode);
+GLAPI void APIENTRY glEndTransformFeedbackNV (void);
+GLAPI void APIENTRY glTransformFeedbackAttribsNV (GLuint count, const GLint *attribs, GLenum bufferMode);
+GLAPI void APIENTRY glBindBufferRangeNV (GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size);
+GLAPI void APIENTRY glBindBufferOffsetNV (GLenum target, GLuint index, GLuint buffer, GLintptr offset);
+GLAPI void APIENTRY glBindBufferBaseNV (GLenum target, GLuint index, GLuint buffer);
+GLAPI void APIENTRY glTransformFeedbackVaryingsNV (GLuint program, GLsizei count, const GLint *locations, GLenum bufferMode);
+GLAPI void APIENTRY glActiveVaryingNV (GLuint program, const GLchar *name);
+GLAPI GLint APIENTRY glGetVaryingLocationNV (GLuint program, const GLchar *name);
+GLAPI void APIENTRY glGetActiveVaryingNV (GLuint program, GLuint index, GLsizei bufSize, GLsizei *length, GLsizei *size, GLenum *type, GLchar *name);
+GLAPI void APIENTRY glGetTransformFeedbackVaryingNV (GLuint program, GLuint index, GLint *location);
+GLAPI void APIENTRY glTransformFeedbackStreamAttribsNV (GLsizei count, const GLint *attribs, GLsizei nbuffers, const GLint *bufstreams, GLenum bufferMode);
+#endif
+#endif /* GL_NV_transform_feedback */
+
+#ifndef GL_NV_transform_feedback2
+#define GL_NV_transform_feedback2 1
+#define GL_TRANSFORM_FEEDBACK_NV          0x8E22
+#define GL_TRANSFORM_FEEDBACK_BUFFER_PAUSED_NV 0x8E23
+#define GL_TRANSFORM_FEEDBACK_BUFFER_ACTIVE_NV 0x8E24
+#define GL_TRANSFORM_FEEDBACK_BINDING_NV  0x8E25
+typedef void (APIENTRYP PFNGLBINDTRANSFORMFEEDBACKNVPROC) (GLenum target, GLuint id);
+typedef void (APIENTRYP PFNGLDELETETRANSFORMFEEDBACKSNVPROC) (GLsizei n, const GLuint *ids);
+typedef void (APIENTRYP PFNGLGENTRANSFORMFEEDBACKSNVPROC) (GLsizei n, GLuint *ids);
+typedef GLboolean (APIENTRYP PFNGLISTRANSFORMFEEDBACKNVPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLPAUSETRANSFORMFEEDBACKNVPROC) (void);
+typedef void (APIENTRYP PFNGLRESUMETRANSFORMFEEDBACKNVPROC) (void);
+typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKNVPROC) (GLenum mode, GLuint id);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBindTransformFeedbackNV (GLenum target, GLuint id);
+GLAPI void APIENTRY glDeleteTransformFeedbacksNV (GLsizei n, const GLuint *ids);
+GLAPI void APIENTRY glGenTransformFeedbacksNV (GLsizei n, GLuint *ids);
+GLAPI GLboolean APIENTRY glIsTransformFeedbackNV (GLuint id);
+GLAPI void APIENTRY glPauseTransformFeedbackNV (void);
+GLAPI void APIENTRY glResumeTransformFeedbackNV (void);
+GLAPI void APIENTRY glDrawTransformFeedbackNV (GLenum mode, GLuint id);
+#endif
+#endif /* GL_NV_transform_feedback2 */
+
+#ifndef GL_NV_vdpau_interop
+#define GL_NV_vdpau_interop 1
+typedef GLintptr GLvdpauSurfaceNV;
+#define GL_SURFACE_STATE_NV               0x86EB
+#define GL_SURFACE_REGISTERED_NV          0x86FD
+#define GL_SURFACE_MAPPED_NV              0x8700
+#define GL_WRITE_DISCARD_NV               0x88BE
+typedef void (APIENTRYP PFNGLVDPAUINITNVPROC) (const void *vdpDevice, const void *getProcAddress);
+typedef void (APIENTRYP PFNGLVDPAUFININVPROC) (void);
+typedef GLvdpauSurfaceNV (APIENTRYP PFNGLVDPAUREGISTERVIDEOSURFACENVPROC) (const void *vdpSurface, GLenum target, GLsizei numTextureNames, const GLuint *textureNames);
+typedef GLvdpauSurfaceNV (APIENTRYP PFNGLVDPAUREGISTEROUTPUTSURFACENVPROC) (const void *vdpSurface, GLenum target, GLsizei numTextureNames, const GLuint *textureNames);
+typedef GLboolean (APIENTRYP PFNGLVDPAUISSURFACENVPROC) (GLvdpauSurfaceNV surface);
+typedef void (APIENTRYP PFNGLVDPAUUNREGISTERSURFACENVPROC) (GLvdpauSurfaceNV surface);
+typedef void (APIENTRYP PFNGLVDPAUGETSURFACEIVNVPROC) (GLvdpauSurfaceNV surface, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values);
+typedef void (APIENTRYP PFNGLVDPAUSURFACEACCESSNVPROC) (GLvdpauSurfaceNV surface, GLenum access);
+typedef void (APIENTRYP PFNGLVDPAUMAPSURFACESNVPROC) (GLsizei numSurfaces, const GLvdpauSurfaceNV *surfaces);
+typedef void (APIENTRYP PFNGLVDPAUUNMAPSURFACESNVPROC) (GLsizei numSurface, const GLvdpauSurfaceNV *surfaces);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVDPAUInitNV (const void *vdpDevice, const void *getProcAddress);
+GLAPI void APIENTRY glVDPAUFiniNV (void);
+GLAPI GLvdpauSurfaceNV APIENTRY glVDPAURegisterVideoSurfaceNV (const void *vdpSurface, GLenum target, GLsizei numTextureNames, const GLuint *textureNames);
+GLAPI GLvdpauSurfaceNV APIENTRY glVDPAURegisterOutputSurfaceNV (const void *vdpSurface, GLenum target, GLsizei numTextureNames, const GLuint *textureNames);
+GLAPI GLboolean APIENTRY glVDPAUIsSurfaceNV (GLvdpauSurfaceNV surface);
+GLAPI void APIENTRY glVDPAUUnregisterSurfaceNV (GLvdpauSurfaceNV surface);
+GLAPI void APIENTRY glVDPAUGetSurfaceivNV (GLvdpauSurfaceNV surface, GLenum pname, GLsizei bufSize, GLsizei *length, GLint *values);
+GLAPI void APIENTRY glVDPAUSurfaceAccessNV (GLvdpauSurfaceNV surface, GLenum access);
+GLAPI void APIENTRY glVDPAUMapSurfacesNV (GLsizei numSurfaces, const GLvdpauSurfaceNV *surfaces);
+GLAPI void APIENTRY glVDPAUUnmapSurfacesNV (GLsizei numSurface, const GLvdpauSurfaceNV *surfaces);
+#endif
+#endif /* GL_NV_vdpau_interop */
+
+#ifndef GL_NV_vertex_array_range
+#define GL_NV_vertex_array_range 1
+#define GL_VERTEX_ARRAY_RANGE_NV          0x851D
+#define GL_VERTEX_ARRAY_RANGE_LENGTH_NV   0x851E
+#define GL_VERTEX_ARRAY_RANGE_VALID_NV    0x851F
+#define GL_MAX_VERTEX_ARRAY_RANGE_ELEMENT_NV 0x8520
+#define GL_VERTEX_ARRAY_RANGE_POINTER_NV  0x8521
+typedef void (APIENTRYP PFNGLFLUSHVERTEXARRAYRANGENVPROC) (void);
+typedef void (APIENTRYP PFNGLVERTEXARRAYRANGENVPROC) (GLsizei length, const void *pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFlushVertexArrayRangeNV (void);
+GLAPI void APIENTRY glVertexArrayRangeNV (GLsizei length, const void *pointer);
+#endif
+#endif /* GL_NV_vertex_array_range */
+
+#ifndef GL_NV_vertex_array_range2
+#define GL_NV_vertex_array_range2 1
+#define GL_VERTEX_ARRAY_RANGE_WITHOUT_FLUSH_NV 0x8533
+#endif /* GL_NV_vertex_array_range2 */
+
+#ifndef GL_NV_vertex_attrib_integer_64bit
+#define GL_NV_vertex_attrib_integer_64bit 1
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1I64NVPROC) (GLuint index, GLint64EXT x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2I64NVPROC) (GLuint index, GLint64EXT x, GLint64EXT y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3I64NVPROC) (GLuint index, GLint64EXT x, GLint64EXT y, GLint64EXT z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4I64NVPROC) (GLuint index, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1I64VNVPROC) (GLuint index, const GLint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2I64VNVPROC) (GLuint index, const GLint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3I64VNVPROC) (GLuint index, const GLint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4I64VNVPROC) (GLuint index, const GLint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1UI64NVPROC) (GLuint index, GLuint64EXT x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2UI64NVPROC) (GLuint index, GLuint64EXT x, GLuint64EXT y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3UI64NVPROC) (GLuint index, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4UI64NVPROC) (GLuint index, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL1UI64VNVPROC) (GLuint index, const GLuint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL2UI64VNVPROC) (GLuint index, const GLuint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL3UI64VNVPROC) (GLuint index, const GLuint64EXT *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBL4UI64VNVPROC) (GLuint index, const GLuint64EXT *v);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBLI64VNVPROC) (GLuint index, GLenum pname, GLint64EXT *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBLUI64VNVPROC) (GLuint index, GLenum pname, GLuint64EXT *params);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBLFORMATNVPROC) (GLuint index, GLint size, GLenum type, GLsizei stride);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttribL1i64NV (GLuint index, GLint64EXT x);
+GLAPI void APIENTRY glVertexAttribL2i64NV (GLuint index, GLint64EXT x, GLint64EXT y);
+GLAPI void APIENTRY glVertexAttribL3i64NV (GLuint index, GLint64EXT x, GLint64EXT y, GLint64EXT z);
+GLAPI void APIENTRY glVertexAttribL4i64NV (GLuint index, GLint64EXT x, GLint64EXT y, GLint64EXT z, GLint64EXT w);
+GLAPI void APIENTRY glVertexAttribL1i64vNV (GLuint index, const GLint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL2i64vNV (GLuint index, const GLint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL3i64vNV (GLuint index, const GLint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL4i64vNV (GLuint index, const GLint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL1ui64NV (GLuint index, GLuint64EXT x);
+GLAPI void APIENTRY glVertexAttribL2ui64NV (GLuint index, GLuint64EXT x, GLuint64EXT y);
+GLAPI void APIENTRY glVertexAttribL3ui64NV (GLuint index, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z);
+GLAPI void APIENTRY glVertexAttribL4ui64NV (GLuint index, GLuint64EXT x, GLuint64EXT y, GLuint64EXT z, GLuint64EXT w);
+GLAPI void APIENTRY glVertexAttribL1ui64vNV (GLuint index, const GLuint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL2ui64vNV (GLuint index, const GLuint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL3ui64vNV (GLuint index, const GLuint64EXT *v);
+GLAPI void APIENTRY glVertexAttribL4ui64vNV (GLuint index, const GLuint64EXT *v);
+GLAPI void APIENTRY glGetVertexAttribLi64vNV (GLuint index, GLenum pname, GLint64EXT *params);
+GLAPI void APIENTRY glGetVertexAttribLui64vNV (GLuint index, GLenum pname, GLuint64EXT *params);
+GLAPI void APIENTRY glVertexAttribLFormatNV (GLuint index, GLint size, GLenum type, GLsizei stride);
+#endif
+#endif /* GL_NV_vertex_attrib_integer_64bit */
+
+#ifndef GL_NV_vertex_buffer_unified_memory
+#define GL_NV_vertex_buffer_unified_memory 1
+#define GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV 0x8F1E
+#define GL_ELEMENT_ARRAY_UNIFIED_NV       0x8F1F
+#define GL_VERTEX_ATTRIB_ARRAY_ADDRESS_NV 0x8F20
+#define GL_VERTEX_ARRAY_ADDRESS_NV        0x8F21
+#define GL_NORMAL_ARRAY_ADDRESS_NV        0x8F22
+#define GL_COLOR_ARRAY_ADDRESS_NV         0x8F23
+#define GL_INDEX_ARRAY_ADDRESS_NV         0x8F24
+#define GL_TEXTURE_COORD_ARRAY_ADDRESS_NV 0x8F25
+#define GL_EDGE_FLAG_ARRAY_ADDRESS_NV     0x8F26
+#define GL_SECONDARY_COLOR_ARRAY_ADDRESS_NV 0x8F27
+#define GL_FOG_COORD_ARRAY_ADDRESS_NV     0x8F28
+#define GL_ELEMENT_ARRAY_ADDRESS_NV       0x8F29
+#define GL_VERTEX_ATTRIB_ARRAY_LENGTH_NV  0x8F2A
+#define GL_VERTEX_ARRAY_LENGTH_NV         0x8F2B
+#define GL_NORMAL_ARRAY_LENGTH_NV         0x8F2C
+#define GL_COLOR_ARRAY_LENGTH_NV          0x8F2D
+#define GL_INDEX_ARRAY_LENGTH_NV          0x8F2E
+#define GL_TEXTURE_COORD_ARRAY_LENGTH_NV  0x8F2F
+#define GL_EDGE_FLAG_ARRAY_LENGTH_NV      0x8F30
+#define GL_SECONDARY_COLOR_ARRAY_LENGTH_NV 0x8F31
+#define GL_FOG_COORD_ARRAY_LENGTH_NV      0x8F32
+#define GL_ELEMENT_ARRAY_LENGTH_NV        0x8F33
+#define GL_DRAW_INDIRECT_UNIFIED_NV       0x8F40
+#define GL_DRAW_INDIRECT_ADDRESS_NV       0x8F41
+#define GL_DRAW_INDIRECT_LENGTH_NV        0x8F42
+typedef void (APIENTRYP PFNGLBUFFERADDRESSRANGENVPROC) (GLenum pname, GLuint index, GLuint64EXT address, GLsizeiptr length);
+typedef void (APIENTRYP PFNGLVERTEXFORMATNVPROC) (GLint size, GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLNORMALFORMATNVPROC) (GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLCOLORFORMATNVPROC) (GLint size, GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLINDEXFORMATNVPROC) (GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLTEXCOORDFORMATNVPROC) (GLint size, GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLEDGEFLAGFORMATNVPROC) (GLsizei stride);
+typedef void (APIENTRYP PFNGLSECONDARYCOLORFORMATNVPROC) (GLint size, GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLFOGCOORDFORMATNVPROC) (GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBFORMATNVPROC) (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBIFORMATNVPROC) (GLuint index, GLint size, GLenum type, GLsizei stride);
+typedef void (APIENTRYP PFNGLGETINTEGERUI64I_VNVPROC) (GLenum value, GLuint index, GLuint64EXT *result);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBufferAddressRangeNV (GLenum pname, GLuint index, GLuint64EXT address, GLsizeiptr length);
+GLAPI void APIENTRY glVertexFormatNV (GLint size, GLenum type, GLsizei stride);
+GLAPI void APIENTRY glNormalFormatNV (GLenum type, GLsizei stride);
+GLAPI void APIENTRY glColorFormatNV (GLint size, GLenum type, GLsizei stride);
+GLAPI void APIENTRY glIndexFormatNV (GLenum type, GLsizei stride);
+GLAPI void APIENTRY glTexCoordFormatNV (GLint size, GLenum type, GLsizei stride);
+GLAPI void APIENTRY glEdgeFlagFormatNV (GLsizei stride);
+GLAPI void APIENTRY glSecondaryColorFormatNV (GLint size, GLenum type, GLsizei stride);
+GLAPI void APIENTRY glFogCoordFormatNV (GLenum type, GLsizei stride);
+GLAPI void APIENTRY glVertexAttribFormatNV (GLuint index, GLint size, GLenum type, GLboolean normalized, GLsizei stride);
+GLAPI void APIENTRY glVertexAttribIFormatNV (GLuint index, GLint size, GLenum type, GLsizei stride);
+GLAPI void APIENTRY glGetIntegerui64i_vNV (GLenum value, GLuint index, GLuint64EXT *result);
+#endif
+#endif /* GL_NV_vertex_buffer_unified_memory */
+
+#ifndef GL_NV_vertex_program
+#define GL_NV_vertex_program 1
+#define GL_VERTEX_PROGRAM_NV              0x8620
+#define GL_VERTEX_STATE_PROGRAM_NV        0x8621
+#define GL_ATTRIB_ARRAY_SIZE_NV           0x8623
+#define GL_ATTRIB_ARRAY_STRIDE_NV         0x8624
+#define GL_ATTRIB_ARRAY_TYPE_NV           0x8625
+#define GL_CURRENT_ATTRIB_NV              0x8626
+#define GL_PROGRAM_LENGTH_NV              0x8627
+#define GL_PROGRAM_STRING_NV              0x8628
+#define GL_MODELVIEW_PROJECTION_NV        0x8629
+#define GL_IDENTITY_NV                    0x862A
+#define GL_INVERSE_NV                     0x862B
+#define GL_TRANSPOSE_NV                   0x862C
+#define GL_INVERSE_TRANSPOSE_NV           0x862D
+#define GL_MAX_TRACK_MATRIX_STACK_DEPTH_NV 0x862E
+#define GL_MAX_TRACK_MATRICES_NV          0x862F
+#define GL_MATRIX0_NV                     0x8630
+#define GL_MATRIX1_NV                     0x8631
+#define GL_MATRIX2_NV                     0x8632
+#define GL_MATRIX3_NV                     0x8633
+#define GL_MATRIX4_NV                     0x8634
+#define GL_MATRIX5_NV                     0x8635
+#define GL_MATRIX6_NV                     0x8636
+#define GL_MATRIX7_NV                     0x8637
+#define GL_CURRENT_MATRIX_STACK_DEPTH_NV  0x8640
+#define GL_CURRENT_MATRIX_NV              0x8641
+#define GL_VERTEX_PROGRAM_POINT_SIZE_NV   0x8642
+#define GL_VERTEX_PROGRAM_TWO_SIDE_NV     0x8643
+#define GL_PROGRAM_PARAMETER_NV           0x8644
+#define GL_ATTRIB_ARRAY_POINTER_NV        0x8645
+#define GL_PROGRAM_TARGET_NV              0x8646
+#define GL_PROGRAM_RESIDENT_NV            0x8647
+#define GL_TRACK_MATRIX_NV                0x8648
+#define GL_TRACK_MATRIX_TRANSFORM_NV      0x8649
+#define GL_VERTEX_PROGRAM_BINDING_NV      0x864A
+#define GL_PROGRAM_ERROR_POSITION_NV      0x864B
+#define GL_VERTEX_ATTRIB_ARRAY0_NV        0x8650
+#define GL_VERTEX_ATTRIB_ARRAY1_NV        0x8651
+#define GL_VERTEX_ATTRIB_ARRAY2_NV        0x8652
+#define GL_VERTEX_ATTRIB_ARRAY3_NV        0x8653
+#define GL_VERTEX_ATTRIB_ARRAY4_NV        0x8654
+#define GL_VERTEX_ATTRIB_ARRAY5_NV        0x8655
+#define GL_VERTEX_ATTRIB_ARRAY6_NV        0x8656
+#define GL_VERTEX_ATTRIB_ARRAY7_NV        0x8657
+#define GL_VERTEX_ATTRIB_ARRAY8_NV        0x8658
+#define GL_VERTEX_ATTRIB_ARRAY9_NV        0x8659
+#define GL_VERTEX_ATTRIB_ARRAY10_NV       0x865A
+#define GL_VERTEX_ATTRIB_ARRAY11_NV       0x865B
+#define GL_VERTEX_ATTRIB_ARRAY12_NV       0x865C
+#define GL_VERTEX_ATTRIB_ARRAY13_NV       0x865D
+#define GL_VERTEX_ATTRIB_ARRAY14_NV       0x865E
+#define GL_VERTEX_ATTRIB_ARRAY15_NV       0x865F
+#define GL_MAP1_VERTEX_ATTRIB0_4_NV       0x8660
+#define GL_MAP1_VERTEX_ATTRIB1_4_NV       0x8661
+#define GL_MAP1_VERTEX_ATTRIB2_4_NV       0x8662
+#define GL_MAP1_VERTEX_ATTRIB3_4_NV       0x8663
+#define GL_MAP1_VERTEX_ATTRIB4_4_NV       0x8664
+#define GL_MAP1_VERTEX_ATTRIB5_4_NV       0x8665
+#define GL_MAP1_VERTEX_ATTRIB6_4_NV       0x8666
+#define GL_MAP1_VERTEX_ATTRIB7_4_NV       0x8667
+#define GL_MAP1_VERTEX_ATTRIB8_4_NV       0x8668
+#define GL_MAP1_VERTEX_ATTRIB9_4_NV       0x8669
+#define GL_MAP1_VERTEX_ATTRIB10_4_NV      0x866A
+#define GL_MAP1_VERTEX_ATTRIB11_4_NV      0x866B
+#define GL_MAP1_VERTEX_ATTRIB12_4_NV      0x866C
+#define GL_MAP1_VERTEX_ATTRIB13_4_NV      0x866D
+#define GL_MAP1_VERTEX_ATTRIB14_4_NV      0x866E
+#define GL_MAP1_VERTEX_ATTRIB15_4_NV      0x866F
+#define GL_MAP2_VERTEX_ATTRIB0_4_NV       0x8670
+#define GL_MAP2_VERTEX_ATTRIB1_4_NV       0x8671
+#define GL_MAP2_VERTEX_ATTRIB2_4_NV       0x8672
+#define GL_MAP2_VERTEX_ATTRIB3_4_NV       0x8673
+#define GL_MAP2_VERTEX_ATTRIB4_4_NV       0x8674
+#define GL_MAP2_VERTEX_ATTRIB5_4_NV       0x8675
+#define GL_MAP2_VERTEX_ATTRIB6_4_NV       0x8676
+#define GL_MAP2_VERTEX_ATTRIB7_4_NV       0x8677
+#define GL_MAP2_VERTEX_ATTRIB8_4_NV       0x8678
+#define GL_MAP2_VERTEX_ATTRIB9_4_NV       0x8679
+#define GL_MAP2_VERTEX_ATTRIB10_4_NV      0x867A
+#define GL_MAP2_VERTEX_ATTRIB11_4_NV      0x867B
+#define GL_MAP2_VERTEX_ATTRIB12_4_NV      0x867C
+#define GL_MAP2_VERTEX_ATTRIB13_4_NV      0x867D
+#define GL_MAP2_VERTEX_ATTRIB14_4_NV      0x867E
+#define GL_MAP2_VERTEX_ATTRIB15_4_NV      0x867F
+typedef GLboolean (APIENTRYP PFNGLAREPROGRAMSRESIDENTNVPROC) (GLsizei n, const GLuint *programs, GLboolean *residences);
+typedef void (APIENTRYP PFNGLBINDPROGRAMNVPROC) (GLenum target, GLuint id);
+typedef void (APIENTRYP PFNGLDELETEPROGRAMSNVPROC) (GLsizei n, const GLuint *programs);
+typedef void (APIENTRYP PFNGLEXECUTEPROGRAMNVPROC) (GLenum target, GLuint id, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGENPROGRAMSNVPROC) (GLsizei n, GLuint *programs);
+typedef void (APIENTRYP PFNGLGETPROGRAMPARAMETERDVNVPROC) (GLenum target, GLuint index, GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMPARAMETERFVNVPROC) (GLenum target, GLuint index, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMIVNVPROC) (GLuint id, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETPROGRAMSTRINGNVPROC) (GLuint id, GLenum pname, GLubyte *program);
+typedef void (APIENTRYP PFNGLGETTRACKMATRIXIVNVPROC) (GLenum target, GLuint address, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBDVNVPROC) (GLuint index, GLenum pname, GLdouble *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBFVNVPROC) (GLuint index, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIVNVPROC) (GLuint index, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBPOINTERVNVPROC) (GLuint index, GLenum pname, void **pointer);
+typedef GLboolean (APIENTRYP PFNGLISPROGRAMNVPROC) (GLuint id);
+typedef void (APIENTRYP PFNGLLOADPROGRAMNVPROC) (GLenum target, GLuint id, GLsizei len, const GLubyte *program);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4DNVPROC) (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4DVNVPROC) (GLenum target, GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4FNVPROC) (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETER4FVNVPROC) (GLenum target, GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETERS4DVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLdouble *v);
+typedef void (APIENTRYP PFNGLPROGRAMPARAMETERS4FVNVPROC) (GLenum target, GLuint index, GLsizei count, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREQUESTRESIDENTPROGRAMSNVPROC) (GLsizei n, const GLuint *programs);
+typedef void (APIENTRYP PFNGLTRACKMATRIXNVPROC) (GLenum target, GLuint address, GLenum matrix, GLenum transform);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBPOINTERNVPROC) (GLuint index, GLint fsize, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1DNVPROC) (GLuint index, GLdouble x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1DVNVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1FNVPROC) (GLuint index, GLfloat x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1FVNVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1SNVPROC) (GLuint index, GLshort x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB1SVNVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2DNVPROC) (GLuint index, GLdouble x, GLdouble y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2DVNVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2FNVPROC) (GLuint index, GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2FVNVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2SNVPROC) (GLuint index, GLshort x, GLshort y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB2SVNVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3DNVPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3DVNVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3FNVPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3FVNVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3SNVPROC) (GLuint index, GLshort x, GLshort y, GLshort z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB3SVNVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4DNVPROC) (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4DVNVPROC) (GLuint index, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4FNVPROC) (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4FVNVPROC) (GLuint index, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4SNVPROC) (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4SVNVPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBNVPROC) (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIB4UBVNVPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS1DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS1FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS1SVNVPROC) (GLuint index, GLsizei count, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS2DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS2FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS2SVNVPROC) (GLuint index, GLsizei count, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS3DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS3FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS3SVNVPROC) (GLuint index, GLsizei count, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS4DVNVPROC) (GLuint index, GLsizei count, const GLdouble *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS4FVNVPROC) (GLuint index, GLsizei count, const GLfloat *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS4SVNVPROC) (GLuint index, GLsizei count, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBS4UBVNVPROC) (GLuint index, GLsizei count, const GLubyte *v);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLboolean APIENTRY glAreProgramsResidentNV (GLsizei n, const GLuint *programs, GLboolean *residences);
+GLAPI void APIENTRY glBindProgramNV (GLenum target, GLuint id);
+GLAPI void APIENTRY glDeleteProgramsNV (GLsizei n, const GLuint *programs);
+GLAPI void APIENTRY glExecuteProgramNV (GLenum target, GLuint id, const GLfloat *params);
+GLAPI void APIENTRY glGenProgramsNV (GLsizei n, GLuint *programs);
+GLAPI void APIENTRY glGetProgramParameterdvNV (GLenum target, GLuint index, GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glGetProgramParameterfvNV (GLenum target, GLuint index, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetProgramivNV (GLuint id, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetProgramStringNV (GLuint id, GLenum pname, GLubyte *program);
+GLAPI void APIENTRY glGetTrackMatrixivNV (GLenum target, GLuint address, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribdvNV (GLuint index, GLenum pname, GLdouble *params);
+GLAPI void APIENTRY glGetVertexAttribfvNV (GLuint index, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetVertexAttribivNV (GLuint index, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribPointervNV (GLuint index, GLenum pname, void **pointer);
+GLAPI GLboolean APIENTRY glIsProgramNV (GLuint id);
+GLAPI void APIENTRY glLoadProgramNV (GLenum target, GLuint id, GLsizei len, const GLubyte *program);
+GLAPI void APIENTRY glProgramParameter4dNV (GLenum target, GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glProgramParameter4dvNV (GLenum target, GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glProgramParameter4fNV (GLenum target, GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glProgramParameter4fvNV (GLenum target, GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glProgramParameters4dvNV (GLenum target, GLuint index, GLsizei count, const GLdouble *v);
+GLAPI void APIENTRY glProgramParameters4fvNV (GLenum target, GLuint index, GLsizei count, const GLfloat *v);
+GLAPI void APIENTRY glRequestResidentProgramsNV (GLsizei n, const GLuint *programs);
+GLAPI void APIENTRY glTrackMatrixNV (GLenum target, GLuint address, GLenum matrix, GLenum transform);
+GLAPI void APIENTRY glVertexAttribPointerNV (GLuint index, GLint fsize, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glVertexAttrib1dNV (GLuint index, GLdouble x);
+GLAPI void APIENTRY glVertexAttrib1dvNV (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib1fNV (GLuint index, GLfloat x);
+GLAPI void APIENTRY glVertexAttrib1fvNV (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib1sNV (GLuint index, GLshort x);
+GLAPI void APIENTRY glVertexAttrib1svNV (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib2dNV (GLuint index, GLdouble x, GLdouble y);
+GLAPI void APIENTRY glVertexAttrib2dvNV (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib2fNV (GLuint index, GLfloat x, GLfloat y);
+GLAPI void APIENTRY glVertexAttrib2fvNV (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib2sNV (GLuint index, GLshort x, GLshort y);
+GLAPI void APIENTRY glVertexAttrib2svNV (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib3dNV (GLuint index, GLdouble x, GLdouble y, GLdouble z);
+GLAPI void APIENTRY glVertexAttrib3dvNV (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib3fNV (GLuint index, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glVertexAttrib3fvNV (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib3sNV (GLuint index, GLshort x, GLshort y, GLshort z);
+GLAPI void APIENTRY glVertexAttrib3svNV (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4dNV (GLuint index, GLdouble x, GLdouble y, GLdouble z, GLdouble w);
+GLAPI void APIENTRY glVertexAttrib4dvNV (GLuint index, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttrib4fNV (GLuint index, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glVertexAttrib4fvNV (GLuint index, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttrib4sNV (GLuint index, GLshort x, GLshort y, GLshort z, GLshort w);
+GLAPI void APIENTRY glVertexAttrib4svNV (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttrib4ubNV (GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w);
+GLAPI void APIENTRY glVertexAttrib4ubvNV (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttribs1dvNV (GLuint index, GLsizei count, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribs1fvNV (GLuint index, GLsizei count, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttribs1svNV (GLuint index, GLsizei count, const GLshort *v);
+GLAPI void APIENTRY glVertexAttribs2dvNV (GLuint index, GLsizei count, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribs2fvNV (GLuint index, GLsizei count, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttribs2svNV (GLuint index, GLsizei count, const GLshort *v);
+GLAPI void APIENTRY glVertexAttribs3dvNV (GLuint index, GLsizei count, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribs3fvNV (GLuint index, GLsizei count, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttribs3svNV (GLuint index, GLsizei count, const GLshort *v);
+GLAPI void APIENTRY glVertexAttribs4dvNV (GLuint index, GLsizei count, const GLdouble *v);
+GLAPI void APIENTRY glVertexAttribs4fvNV (GLuint index, GLsizei count, const GLfloat *v);
+GLAPI void APIENTRY glVertexAttribs4svNV (GLuint index, GLsizei count, const GLshort *v);
+GLAPI void APIENTRY glVertexAttribs4ubvNV (GLuint index, GLsizei count, const GLubyte *v);
+#endif
+#endif /* GL_NV_vertex_program */
+
+#ifndef GL_NV_vertex_program1_1
+#define GL_NV_vertex_program1_1 1
+#endif /* GL_NV_vertex_program1_1 */
+
+#ifndef GL_NV_vertex_program2
+#define GL_NV_vertex_program2 1
+#endif /* GL_NV_vertex_program2 */
+
+#ifndef GL_NV_vertex_program2_option
+#define GL_NV_vertex_program2_option 1
+#endif /* GL_NV_vertex_program2_option */
+
+#ifndef GL_NV_vertex_program3
+#define GL_NV_vertex_program3 1
+#endif /* GL_NV_vertex_program3 */
+
+#ifndef GL_NV_vertex_program4
+#define GL_NV_vertex_program4 1
+#define GL_VERTEX_ATTRIB_ARRAY_INTEGER_NV 0x88FD
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IEXTPROC) (GLuint index, GLint x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IEXTPROC) (GLuint index, GLint x, GLint y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IEXTPROC) (GLuint index, GLint x, GLint y, GLint z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IEXTPROC) (GLuint index, GLint x, GLint y, GLint z, GLint w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIEXTPROC) (GLuint index, GLuint x);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIEXTPROC) (GLuint index, GLuint x, GLuint y);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIEXTPROC) (GLuint index, GLuint x, GLuint y, GLuint z);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIEXTPROC) (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1IVEXTPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2IVEXTPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3IVEXTPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4IVEXTPROC) (GLuint index, const GLint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI1UIVEXTPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI2UIVEXTPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI3UIVEXTPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UIVEXTPROC) (GLuint index, const GLuint *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4BVEXTPROC) (GLuint index, const GLbyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4SVEXTPROC) (GLuint index, const GLshort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4UBVEXTPROC) (GLuint index, const GLubyte *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBI4USVEXTPROC) (GLuint index, const GLushort *v);
+typedef void (APIENTRYP PFNGLVERTEXATTRIBIPOINTEREXTPROC) (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIIVEXTPROC) (GLuint index, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVERTEXATTRIBIUIVEXTPROC) (GLuint index, GLenum pname, GLuint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glVertexAttribI1iEXT (GLuint index, GLint x);
+GLAPI void APIENTRY glVertexAttribI2iEXT (GLuint index, GLint x, GLint y);
+GLAPI void APIENTRY glVertexAttribI3iEXT (GLuint index, GLint x, GLint y, GLint z);
+GLAPI void APIENTRY glVertexAttribI4iEXT (GLuint index, GLint x, GLint y, GLint z, GLint w);
+GLAPI void APIENTRY glVertexAttribI1uiEXT (GLuint index, GLuint x);
+GLAPI void APIENTRY glVertexAttribI2uiEXT (GLuint index, GLuint x, GLuint y);
+GLAPI void APIENTRY glVertexAttribI3uiEXT (GLuint index, GLuint x, GLuint y, GLuint z);
+GLAPI void APIENTRY glVertexAttribI4uiEXT (GLuint index, GLuint x, GLuint y, GLuint z, GLuint w);
+GLAPI void APIENTRY glVertexAttribI1ivEXT (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI2ivEXT (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI3ivEXT (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI4ivEXT (GLuint index, const GLint *v);
+GLAPI void APIENTRY glVertexAttribI1uivEXT (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI2uivEXT (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI3uivEXT (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI4uivEXT (GLuint index, const GLuint *v);
+GLAPI void APIENTRY glVertexAttribI4bvEXT (GLuint index, const GLbyte *v);
+GLAPI void APIENTRY glVertexAttribI4svEXT (GLuint index, const GLshort *v);
+GLAPI void APIENTRY glVertexAttribI4ubvEXT (GLuint index, const GLubyte *v);
+GLAPI void APIENTRY glVertexAttribI4usvEXT (GLuint index, const GLushort *v);
+GLAPI void APIENTRY glVertexAttribIPointerEXT (GLuint index, GLint size, GLenum type, GLsizei stride, const void *pointer);
+GLAPI void APIENTRY glGetVertexAttribIivEXT (GLuint index, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVertexAttribIuivEXT (GLuint index, GLenum pname, GLuint *params);
+#endif
+#endif /* GL_NV_vertex_program4 */
+
+#ifndef GL_NV_video_capture
+#define GL_NV_video_capture 1
+#define GL_VIDEO_BUFFER_NV                0x9020
+#define GL_VIDEO_BUFFER_BINDING_NV        0x9021
+#define GL_FIELD_UPPER_NV                 0x9022
+#define GL_FIELD_LOWER_NV                 0x9023
+#define GL_NUM_VIDEO_CAPTURE_STREAMS_NV   0x9024
+#define GL_NEXT_VIDEO_CAPTURE_BUFFER_STATUS_NV 0x9025
+#define GL_VIDEO_CAPTURE_TO_422_SUPPORTED_NV 0x9026
+#define GL_LAST_VIDEO_CAPTURE_STATUS_NV   0x9027
+#define GL_VIDEO_BUFFER_PITCH_NV          0x9028
+#define GL_VIDEO_COLOR_CONVERSION_MATRIX_NV 0x9029
+#define GL_VIDEO_COLOR_CONVERSION_MAX_NV  0x902A
+#define GL_VIDEO_COLOR_CONVERSION_MIN_NV  0x902B
+#define GL_VIDEO_COLOR_CONVERSION_OFFSET_NV 0x902C
+#define GL_VIDEO_BUFFER_INTERNAL_FORMAT_NV 0x902D
+#define GL_PARTIAL_SUCCESS_NV             0x902E
+#define GL_SUCCESS_NV                     0x902F
+#define GL_FAILURE_NV                     0x9030
+#define GL_YCBYCR8_422_NV                 0x9031
+#define GL_YCBAYCR8A_4224_NV              0x9032
+#define GL_Z6Y10Z6CB10Z6Y10Z6CR10_422_NV  0x9033
+#define GL_Z6Y10Z6CB10Z6A10Z6Y10Z6CR10Z6A10_4224_NV 0x9034
+#define GL_Z4Y12Z4CB12Z4Y12Z4CR12_422_NV  0x9035
+#define GL_Z4Y12Z4CB12Z4A12Z4Y12Z4CR12Z4A12_4224_NV 0x9036
+#define GL_Z4Y12Z4CB12Z4CR12_444_NV       0x9037
+#define GL_VIDEO_CAPTURE_FRAME_WIDTH_NV   0x9038
+#define GL_VIDEO_CAPTURE_FRAME_HEIGHT_NV  0x9039
+#define GL_VIDEO_CAPTURE_FIELD_UPPER_HEIGHT_NV 0x903A
+#define GL_VIDEO_CAPTURE_FIELD_LOWER_HEIGHT_NV 0x903B
+#define GL_VIDEO_CAPTURE_SURFACE_ORIGIN_NV 0x903C
+typedef void (APIENTRYP PFNGLBEGINVIDEOCAPTURENVPROC) (GLuint video_capture_slot);
+typedef void (APIENTRYP PFNGLBINDVIDEOCAPTURESTREAMBUFFERNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum frame_region, GLintptrARB offset);
+typedef void (APIENTRYP PFNGLBINDVIDEOCAPTURESTREAMTEXTURENVPROC) (GLuint video_capture_slot, GLuint stream, GLenum frame_region, GLenum target, GLuint texture);
+typedef void (APIENTRYP PFNGLENDVIDEOCAPTURENVPROC) (GLuint video_capture_slot);
+typedef void (APIENTRYP PFNGLGETVIDEOCAPTUREIVNVPROC) (GLuint video_capture_slot, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVIDEOCAPTURESTREAMIVNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETVIDEOCAPTURESTREAMFVNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETVIDEOCAPTURESTREAMDVNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum pname, GLdouble *params);
+typedef GLenum (APIENTRYP PFNGLVIDEOCAPTURENVPROC) (GLuint video_capture_slot, GLuint *sequence_num, GLuint64EXT *capture_time);
+typedef void (APIENTRYP PFNGLVIDEOCAPTURESTREAMPARAMETERIVNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLVIDEOCAPTURESTREAMPARAMETERFVNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLVIDEOCAPTURESTREAMPARAMETERDVNVPROC) (GLuint video_capture_slot, GLuint stream, GLenum pname, const GLdouble *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glBeginVideoCaptureNV (GLuint video_capture_slot);
+GLAPI void APIENTRY glBindVideoCaptureStreamBufferNV (GLuint video_capture_slot, GLuint stream, GLenum frame_region, GLintptrARB offset);
+GLAPI void APIENTRY glBindVideoCaptureStreamTextureNV (GLuint video_capture_slot, GLuint stream, GLenum frame_region, GLenum target, GLuint texture);
+GLAPI void APIENTRY glEndVideoCaptureNV (GLuint video_capture_slot);
+GLAPI void APIENTRY glGetVideoCaptureivNV (GLuint video_capture_slot, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVideoCaptureStreamivNV (GLuint video_capture_slot, GLuint stream, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetVideoCaptureStreamfvNV (GLuint video_capture_slot, GLuint stream, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetVideoCaptureStreamdvNV (GLuint video_capture_slot, GLuint stream, GLenum pname, GLdouble *params);
+GLAPI GLenum APIENTRY glVideoCaptureNV (GLuint video_capture_slot, GLuint *sequence_num, GLuint64EXT *capture_time);
+GLAPI void APIENTRY glVideoCaptureStreamParameterivNV (GLuint video_capture_slot, GLuint stream, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glVideoCaptureStreamParameterfvNV (GLuint video_capture_slot, GLuint stream, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glVideoCaptureStreamParameterdvNV (GLuint video_capture_slot, GLuint stream, GLenum pname, const GLdouble *params);
+#endif
+#endif /* GL_NV_video_capture */
+
+#ifndef GL_OML_interlace
+#define GL_OML_interlace 1
+#define GL_INTERLACE_OML                  0x8980
+#define GL_INTERLACE_READ_OML             0x8981
+#endif /* GL_OML_interlace */
+
+#ifndef GL_OML_resample
+#define GL_OML_resample 1
+#define GL_PACK_RESAMPLE_OML              0x8984
+#define GL_UNPACK_RESAMPLE_OML            0x8985
+#define GL_RESAMPLE_REPLICATE_OML         0x8986
+#define GL_RESAMPLE_ZERO_FILL_OML         0x8987
+#define GL_RESAMPLE_AVERAGE_OML           0x8988
+#define GL_RESAMPLE_DECIMATE_OML          0x8989
+#endif /* GL_OML_resample */
+
+#ifndef GL_OML_subsample
+#define GL_OML_subsample 1
+#define GL_FORMAT_SUBSAMPLE_24_24_OML     0x8982
+#define GL_FORMAT_SUBSAMPLE_244_244_OML   0x8983
+#endif /* GL_OML_subsample */
+
+#ifndef GL_PGI_misc_hints
+#define GL_PGI_misc_hints 1
+#define GL_PREFER_DOUBLEBUFFER_HINT_PGI   0x1A1F8
+#define GL_CONSERVE_MEMORY_HINT_PGI       0x1A1FD
+#define GL_RECLAIM_MEMORY_HINT_PGI        0x1A1FE
+#define GL_NATIVE_GRAPHICS_HANDLE_PGI     0x1A202
+#define GL_NATIVE_GRAPHICS_BEGIN_HINT_PGI 0x1A203
+#define GL_NATIVE_GRAPHICS_END_HINT_PGI   0x1A204
+#define GL_ALWAYS_FAST_HINT_PGI           0x1A20C
+#define GL_ALWAYS_SOFT_HINT_PGI           0x1A20D
+#define GL_ALLOW_DRAW_OBJ_HINT_PGI        0x1A20E
+#define GL_ALLOW_DRAW_WIN_HINT_PGI        0x1A20F
+#define GL_ALLOW_DRAW_FRG_HINT_PGI        0x1A210
+#define GL_ALLOW_DRAW_MEM_HINT_PGI        0x1A211
+#define GL_STRICT_DEPTHFUNC_HINT_PGI      0x1A216
+#define GL_STRICT_LIGHTING_HINT_PGI       0x1A217
+#define GL_STRICT_SCISSOR_HINT_PGI        0x1A218
+#define GL_FULL_STIPPLE_HINT_PGI          0x1A219
+#define GL_CLIP_NEAR_HINT_PGI             0x1A220
+#define GL_CLIP_FAR_HINT_PGI              0x1A221
+#define GL_WIDE_LINE_HINT_PGI             0x1A222
+#define GL_BACK_NORMALS_HINT_PGI          0x1A223
+typedef void (APIENTRYP PFNGLHINTPGIPROC) (GLenum target, GLint mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glHintPGI (GLenum target, GLint mode);
+#endif
+#endif /* GL_PGI_misc_hints */
+
+#ifndef GL_PGI_vertex_hints
+#define GL_PGI_vertex_hints 1
+#define GL_VERTEX_DATA_HINT_PGI           0x1A22A
+#define GL_VERTEX_CONSISTENT_HINT_PGI     0x1A22B
+#define GL_MATERIAL_SIDE_HINT_PGI         0x1A22C
+#define GL_MAX_VERTEX_HINT_PGI            0x1A22D
+#define GL_COLOR3_BIT_PGI                 0x00010000
+#define GL_COLOR4_BIT_PGI                 0x00020000
+#define GL_EDGEFLAG_BIT_PGI               0x00040000
+#define GL_INDEX_BIT_PGI                  0x00080000
+#define GL_MAT_AMBIENT_BIT_PGI            0x00100000
+#define GL_MAT_AMBIENT_AND_DIFFUSE_BIT_PGI 0x00200000
+#define GL_MAT_DIFFUSE_BIT_PGI            0x00400000
+#define GL_MAT_EMISSION_BIT_PGI           0x00800000
+#define GL_MAT_COLOR_INDEXES_BIT_PGI      0x01000000
+#define GL_MAT_SHININESS_BIT_PGI          0x02000000
+#define GL_MAT_SPECULAR_BIT_PGI           0x04000000
+#define GL_NORMAL_BIT_PGI                 0x08000000
+#define GL_TEXCOORD1_BIT_PGI              0x10000000
+#define GL_TEXCOORD2_BIT_PGI              0x20000000
+#define GL_TEXCOORD3_BIT_PGI              0x40000000
+#define GL_TEXCOORD4_BIT_PGI              0x80000000
+#define GL_VERTEX23_BIT_PGI               0x00000004
+#define GL_VERTEX4_BIT_PGI                0x00000008
+#endif /* GL_PGI_vertex_hints */
+
+#ifndef GL_REND_screen_coordinates
+#define GL_REND_screen_coordinates 1
+#define GL_SCREEN_COORDINATES_REND        0x8490
+#define GL_INVERTED_SCREEN_W_REND         0x8491
+#endif /* GL_REND_screen_coordinates */
+
+#ifndef GL_S3_s3tc
+#define GL_S3_s3tc 1
+#define GL_RGB_S3TC                       0x83A0
+#define GL_RGB4_S3TC                      0x83A1
+#define GL_RGBA_S3TC                      0x83A2
+#define GL_RGBA4_S3TC                     0x83A3
+#define GL_RGBA_DXT5_S3TC                 0x83A4
+#define GL_RGBA4_DXT5_S3TC                0x83A5
+#endif /* GL_S3_s3tc */
+
+#ifndef GL_SGIS_detail_texture
+#define GL_SGIS_detail_texture 1
+#define GL_DETAIL_TEXTURE_2D_SGIS         0x8095
+#define GL_DETAIL_TEXTURE_2D_BINDING_SGIS 0x8096
+#define GL_LINEAR_DETAIL_SGIS             0x8097
+#define GL_LINEAR_DETAIL_ALPHA_SGIS       0x8098
+#define GL_LINEAR_DETAIL_COLOR_SGIS       0x8099
+#define GL_DETAIL_TEXTURE_LEVEL_SGIS      0x809A
+#define GL_DETAIL_TEXTURE_MODE_SGIS       0x809B
+#define GL_DETAIL_TEXTURE_FUNC_POINTS_SGIS 0x809C
+typedef void (APIENTRYP PFNGLDETAILTEXFUNCSGISPROC) (GLenum target, GLsizei n, const GLfloat *points);
+typedef void (APIENTRYP PFNGLGETDETAILTEXFUNCSGISPROC) (GLenum target, GLfloat *points);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDetailTexFuncSGIS (GLenum target, GLsizei n, const GLfloat *points);
+GLAPI void APIENTRY glGetDetailTexFuncSGIS (GLenum target, GLfloat *points);
+#endif
+#endif /* GL_SGIS_detail_texture */
+
+#ifndef GL_SGIS_fog_function
+#define GL_SGIS_fog_function 1
+#define GL_FOG_FUNC_SGIS                  0x812A
+#define GL_FOG_FUNC_POINTS_SGIS           0x812B
+#define GL_MAX_FOG_FUNC_POINTS_SGIS       0x812C
+typedef void (APIENTRYP PFNGLFOGFUNCSGISPROC) (GLsizei n, const GLfloat *points);
+typedef void (APIENTRYP PFNGLGETFOGFUNCSGISPROC) (GLfloat *points);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFogFuncSGIS (GLsizei n, const GLfloat *points);
+GLAPI void APIENTRY glGetFogFuncSGIS (GLfloat *points);
+#endif
+#endif /* GL_SGIS_fog_function */
+
+#ifndef GL_SGIS_generate_mipmap
+#define GL_SGIS_generate_mipmap 1
+#define GL_GENERATE_MIPMAP_SGIS           0x8191
+#define GL_GENERATE_MIPMAP_HINT_SGIS      0x8192
+#endif /* GL_SGIS_generate_mipmap */
+
+#ifndef GL_SGIS_multisample
+#define GL_SGIS_multisample 1
+#define GL_MULTISAMPLE_SGIS               0x809D
+#define GL_SAMPLE_ALPHA_TO_MASK_SGIS      0x809E
+#define GL_SAMPLE_ALPHA_TO_ONE_SGIS       0x809F
+#define GL_SAMPLE_MASK_SGIS               0x80A0
+#define GL_1PASS_SGIS                     0x80A1
+#define GL_2PASS_0_SGIS                   0x80A2
+#define GL_2PASS_1_SGIS                   0x80A3
+#define GL_4PASS_0_SGIS                   0x80A4
+#define GL_4PASS_1_SGIS                   0x80A5
+#define GL_4PASS_2_SGIS                   0x80A6
+#define GL_4PASS_3_SGIS                   0x80A7
+#define GL_SAMPLE_BUFFERS_SGIS            0x80A8
+#define GL_SAMPLES_SGIS                   0x80A9
+#define GL_SAMPLE_MASK_VALUE_SGIS         0x80AA
+#define GL_SAMPLE_MASK_INVERT_SGIS        0x80AB
+#define GL_SAMPLE_PATTERN_SGIS            0x80AC
+typedef void (APIENTRYP PFNGLSAMPLEMASKSGISPROC) (GLclampf value, GLboolean invert);
+typedef void (APIENTRYP PFNGLSAMPLEPATTERNSGISPROC) (GLenum pattern);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSampleMaskSGIS (GLclampf value, GLboolean invert);
+GLAPI void APIENTRY glSamplePatternSGIS (GLenum pattern);
+#endif
+#endif /* GL_SGIS_multisample */
+
+#ifndef GL_SGIS_pixel_texture
+#define GL_SGIS_pixel_texture 1
+#define GL_PIXEL_TEXTURE_SGIS             0x8353
+#define GL_PIXEL_FRAGMENT_RGB_SOURCE_SGIS 0x8354
+#define GL_PIXEL_FRAGMENT_ALPHA_SOURCE_SGIS 0x8355
+#define GL_PIXEL_GROUP_COLOR_SGIS         0x8356
+typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERISGISPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERIVSGISPROC) (GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERFSGISPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLPIXELTEXGENPARAMETERFVSGISPROC) (GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLGETPIXELTEXGENPARAMETERIVSGISPROC) (GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETPIXELTEXGENPARAMETERFVSGISPROC) (GLenum pname, GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPixelTexGenParameteriSGIS (GLenum pname, GLint param);
+GLAPI void APIENTRY glPixelTexGenParameterivSGIS (GLenum pname, const GLint *params);
+GLAPI void APIENTRY glPixelTexGenParameterfSGIS (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glPixelTexGenParameterfvSGIS (GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glGetPixelTexGenParameterivSGIS (GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetPixelTexGenParameterfvSGIS (GLenum pname, GLfloat *params);
+#endif
+#endif /* GL_SGIS_pixel_texture */
+
+#ifndef GL_SGIS_point_line_texgen
+#define GL_SGIS_point_line_texgen 1
+#define GL_EYE_DISTANCE_TO_POINT_SGIS     0x81F0
+#define GL_OBJECT_DISTANCE_TO_POINT_SGIS  0x81F1
+#define GL_EYE_DISTANCE_TO_LINE_SGIS      0x81F2
+#define GL_OBJECT_DISTANCE_TO_LINE_SGIS   0x81F3
+#define GL_EYE_POINT_SGIS                 0x81F4
+#define GL_OBJECT_POINT_SGIS              0x81F5
+#define GL_EYE_LINE_SGIS                  0x81F6
+#define GL_OBJECT_LINE_SGIS               0x81F7
+#endif /* GL_SGIS_point_line_texgen */
+
+#ifndef GL_SGIS_point_parameters
+#define GL_SGIS_point_parameters 1
+#define GL_POINT_SIZE_MIN_SGIS            0x8126
+#define GL_POINT_SIZE_MAX_SGIS            0x8127
+#define GL_POINT_FADE_THRESHOLD_SIZE_SGIS 0x8128
+#define GL_DISTANCE_ATTENUATION_SGIS      0x8129
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFSGISPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLPOINTPARAMETERFVSGISPROC) (GLenum pname, const GLfloat *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPointParameterfSGIS (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glPointParameterfvSGIS (GLenum pname, const GLfloat *params);
+#endif
+#endif /* GL_SGIS_point_parameters */
+
+#ifndef GL_SGIS_sharpen_texture
+#define GL_SGIS_sharpen_texture 1
+#define GL_LINEAR_SHARPEN_SGIS            0x80AD
+#define GL_LINEAR_SHARPEN_ALPHA_SGIS      0x80AE
+#define GL_LINEAR_SHARPEN_COLOR_SGIS      0x80AF
+#define GL_SHARPEN_TEXTURE_FUNC_POINTS_SGIS 0x80B0
+typedef void (APIENTRYP PFNGLSHARPENTEXFUNCSGISPROC) (GLenum target, GLsizei n, const GLfloat *points);
+typedef void (APIENTRYP PFNGLGETSHARPENTEXFUNCSGISPROC) (GLenum target, GLfloat *points);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSharpenTexFuncSGIS (GLenum target, GLsizei n, const GLfloat *points);
+GLAPI void APIENTRY glGetSharpenTexFuncSGIS (GLenum target, GLfloat *points);
+#endif
+#endif /* GL_SGIS_sharpen_texture */
+
+#ifndef GL_SGIS_texture4D
+#define GL_SGIS_texture4D 1
+#define GL_PACK_SKIP_VOLUMES_SGIS         0x8130
+#define GL_PACK_IMAGE_DEPTH_SGIS          0x8131
+#define GL_UNPACK_SKIP_VOLUMES_SGIS       0x8132
+#define GL_UNPACK_IMAGE_DEPTH_SGIS        0x8133
+#define GL_TEXTURE_4D_SGIS                0x8134
+#define GL_PROXY_TEXTURE_4D_SGIS          0x8135
+#define GL_TEXTURE_4DSIZE_SGIS            0x8136
+#define GL_TEXTURE_WRAP_Q_SGIS            0x8137
+#define GL_MAX_4D_TEXTURE_SIZE_SGIS       0x8138
+#define GL_TEXTURE_4D_BINDING_SGIS        0x814F
+typedef void (APIENTRYP PFNGLTEXIMAGE4DSGISPROC) (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLsizei size4d, GLint border, GLenum format, GLenum type, const void *pixels);
+typedef void (APIENTRYP PFNGLTEXSUBIMAGE4DSGISPROC) (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint woffset, GLsizei width, GLsizei height, GLsizei depth, GLsizei size4d, GLenum format, GLenum type, const void *pixels);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTexImage4DSGIS (GLenum target, GLint level, GLenum internalformat, GLsizei width, GLsizei height, GLsizei depth, GLsizei size4d, GLint border, GLenum format, GLenum type, const void *pixels);
+GLAPI void APIENTRY glTexSubImage4DSGIS (GLenum target, GLint level, GLint xoffset, GLint yoffset, GLint zoffset, GLint woffset, GLsizei width, GLsizei height, GLsizei depth, GLsizei size4d, GLenum format, GLenum type, const void *pixels);
+#endif
+#endif /* GL_SGIS_texture4D */
+
+#ifndef GL_SGIS_texture_border_clamp
+#define GL_SGIS_texture_border_clamp 1
+#define GL_CLAMP_TO_BORDER_SGIS           0x812D
+#endif /* GL_SGIS_texture_border_clamp */
+
+#ifndef GL_SGIS_texture_color_mask
+#define GL_SGIS_texture_color_mask 1
+#define GL_TEXTURE_COLOR_WRITEMASK_SGIS   0x81EF
+typedef void (APIENTRYP PFNGLTEXTURECOLORMASKSGISPROC) (GLboolean red, GLboolean green, GLboolean blue, GLboolean alpha);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTextureColorMaskSGIS (GLboolean red, GLboolean green, GLboolean blue, GLboolean alpha);
+#endif
+#endif /* GL_SGIS_texture_color_mask */
+
+#ifndef GL_SGIS_texture_edge_clamp
+#define GL_SGIS_texture_edge_clamp 1
+#define GL_CLAMP_TO_EDGE_SGIS             0x812F
+#endif /* GL_SGIS_texture_edge_clamp */
+
+#ifndef GL_SGIS_texture_filter4
+#define GL_SGIS_texture_filter4 1
+#define GL_FILTER4_SGIS                   0x8146
+#define GL_TEXTURE_FILTER4_SIZE_SGIS      0x8147
+typedef void (APIENTRYP PFNGLGETTEXFILTERFUNCSGISPROC) (GLenum target, GLenum filter, GLfloat *weights);
+typedef void (APIENTRYP PFNGLTEXFILTERFUNCSGISPROC) (GLenum target, GLenum filter, GLsizei n, const GLfloat *weights);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetTexFilterFuncSGIS (GLenum target, GLenum filter, GLfloat *weights);
+GLAPI void APIENTRY glTexFilterFuncSGIS (GLenum target, GLenum filter, GLsizei n, const GLfloat *weights);
+#endif
+#endif /* GL_SGIS_texture_filter4 */
+
+#ifndef GL_SGIS_texture_lod
+#define GL_SGIS_texture_lod 1
+#define GL_TEXTURE_MIN_LOD_SGIS           0x813A
+#define GL_TEXTURE_MAX_LOD_SGIS           0x813B
+#define GL_TEXTURE_BASE_LEVEL_SGIS        0x813C
+#define GL_TEXTURE_MAX_LEVEL_SGIS         0x813D
+#endif /* GL_SGIS_texture_lod */
+
+#ifndef GL_SGIS_texture_select
+#define GL_SGIS_texture_select 1
+#define GL_DUAL_ALPHA4_SGIS               0x8110
+#define GL_DUAL_ALPHA8_SGIS               0x8111
+#define GL_DUAL_ALPHA12_SGIS              0x8112
+#define GL_DUAL_ALPHA16_SGIS              0x8113
+#define GL_DUAL_LUMINANCE4_SGIS           0x8114
+#define GL_DUAL_LUMINANCE8_SGIS           0x8115
+#define GL_DUAL_LUMINANCE12_SGIS          0x8116
+#define GL_DUAL_LUMINANCE16_SGIS          0x8117
+#define GL_DUAL_INTENSITY4_SGIS           0x8118
+#define GL_DUAL_INTENSITY8_SGIS           0x8119
+#define GL_DUAL_INTENSITY12_SGIS          0x811A
+#define GL_DUAL_INTENSITY16_SGIS          0x811B
+#define GL_DUAL_LUMINANCE_ALPHA4_SGIS     0x811C
+#define GL_DUAL_LUMINANCE_ALPHA8_SGIS     0x811D
+#define GL_QUAD_ALPHA4_SGIS               0x811E
+#define GL_QUAD_ALPHA8_SGIS               0x811F
+#define GL_QUAD_LUMINANCE4_SGIS           0x8120
+#define GL_QUAD_LUMINANCE8_SGIS           0x8121
+#define GL_QUAD_INTENSITY4_SGIS           0x8122
+#define GL_QUAD_INTENSITY8_SGIS           0x8123
+#define GL_DUAL_TEXTURE_SELECT_SGIS       0x8124
+#define GL_QUAD_TEXTURE_SELECT_SGIS       0x8125
+#endif /* GL_SGIS_texture_select */
+
+#ifndef GL_SGIX_async
+#define GL_SGIX_async 1
+#define GL_ASYNC_MARKER_SGIX              0x8329
+typedef void (APIENTRYP PFNGLASYNCMARKERSGIXPROC) (GLuint marker);
+typedef GLint (APIENTRYP PFNGLFINISHASYNCSGIXPROC) (GLuint *markerp);
+typedef GLint (APIENTRYP PFNGLPOLLASYNCSGIXPROC) (GLuint *markerp);
+typedef GLuint (APIENTRYP PFNGLGENASYNCMARKERSSGIXPROC) (GLsizei range);
+typedef void (APIENTRYP PFNGLDELETEASYNCMARKERSSGIXPROC) (GLuint marker, GLsizei range);
+typedef GLboolean (APIENTRYP PFNGLISASYNCMARKERSGIXPROC) (GLuint marker);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glAsyncMarkerSGIX (GLuint marker);
+GLAPI GLint APIENTRY glFinishAsyncSGIX (GLuint *markerp);
+GLAPI GLint APIENTRY glPollAsyncSGIX (GLuint *markerp);
+GLAPI GLuint APIENTRY glGenAsyncMarkersSGIX (GLsizei range);
+GLAPI void APIENTRY glDeleteAsyncMarkersSGIX (GLuint marker, GLsizei range);
+GLAPI GLboolean APIENTRY glIsAsyncMarkerSGIX (GLuint marker);
+#endif
+#endif /* GL_SGIX_async */
+
+#ifndef GL_SGIX_async_histogram
+#define GL_SGIX_async_histogram 1
+#define GL_ASYNC_HISTOGRAM_SGIX           0x832C
+#define GL_MAX_ASYNC_HISTOGRAM_SGIX       0x832D
+#endif /* GL_SGIX_async_histogram */
+
+#ifndef GL_SGIX_async_pixel
+#define GL_SGIX_async_pixel 1
+#define GL_ASYNC_TEX_IMAGE_SGIX           0x835C
+#define GL_ASYNC_DRAW_PIXELS_SGIX         0x835D
+#define GL_ASYNC_READ_PIXELS_SGIX         0x835E
+#define GL_MAX_ASYNC_TEX_IMAGE_SGIX       0x835F
+#define GL_MAX_ASYNC_DRAW_PIXELS_SGIX     0x8360
+#define GL_MAX_ASYNC_READ_PIXELS_SGIX     0x8361
+#endif /* GL_SGIX_async_pixel */
+
+#ifndef GL_SGIX_blend_alpha_minmax
+#define GL_SGIX_blend_alpha_minmax 1
+#define GL_ALPHA_MIN_SGIX                 0x8320
+#define GL_ALPHA_MAX_SGIX                 0x8321
+#endif /* GL_SGIX_blend_alpha_minmax */
+
+#ifndef GL_SGIX_calligraphic_fragment
+#define GL_SGIX_calligraphic_fragment 1
+#define GL_CALLIGRAPHIC_FRAGMENT_SGIX     0x8183
+#endif /* GL_SGIX_calligraphic_fragment */
+
+#ifndef GL_SGIX_clipmap
+#define GL_SGIX_clipmap 1
+#define GL_LINEAR_CLIPMAP_LINEAR_SGIX     0x8170
+#define GL_TEXTURE_CLIPMAP_CENTER_SGIX    0x8171
+#define GL_TEXTURE_CLIPMAP_FRAME_SGIX     0x8172
+#define GL_TEXTURE_CLIPMAP_OFFSET_SGIX    0x8173
+#define GL_TEXTURE_CLIPMAP_VIRTUAL_DEPTH_SGIX 0x8174
+#define GL_TEXTURE_CLIPMAP_LOD_OFFSET_SGIX 0x8175
+#define GL_TEXTURE_CLIPMAP_DEPTH_SGIX     0x8176
+#define GL_MAX_CLIPMAP_DEPTH_SGIX         0x8177
+#define GL_MAX_CLIPMAP_VIRTUAL_DEPTH_SGIX 0x8178
+#define GL_NEAREST_CLIPMAP_NEAREST_SGIX   0x844D
+#define GL_NEAREST_CLIPMAP_LINEAR_SGIX    0x844E
+#define GL_LINEAR_CLIPMAP_NEAREST_SGIX    0x844F
+#endif /* GL_SGIX_clipmap */
+
+#ifndef GL_SGIX_convolution_accuracy
+#define GL_SGIX_convolution_accuracy 1
+#define GL_CONVOLUTION_HINT_SGIX          0x8316
+#endif /* GL_SGIX_convolution_accuracy */
+
+#ifndef GL_SGIX_depth_pass_instrument
+#define GL_SGIX_depth_pass_instrument 1
+#endif /* GL_SGIX_depth_pass_instrument */
+
+#ifndef GL_SGIX_depth_texture
+#define GL_SGIX_depth_texture 1
+#define GL_DEPTH_COMPONENT16_SGIX         0x81A5
+#define GL_DEPTH_COMPONENT24_SGIX         0x81A6
+#define GL_DEPTH_COMPONENT32_SGIX         0x81A7
+#endif /* GL_SGIX_depth_texture */
+
+#ifndef GL_SGIX_flush_raster
+#define GL_SGIX_flush_raster 1
+typedef void (APIENTRYP PFNGLFLUSHRASTERSGIXPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFlushRasterSGIX (void);
+#endif
+#endif /* GL_SGIX_flush_raster */
+
+#ifndef GL_SGIX_fog_offset
+#define GL_SGIX_fog_offset 1
+#define GL_FOG_OFFSET_SGIX                0x8198
+#define GL_FOG_OFFSET_VALUE_SGIX          0x8199
+#endif /* GL_SGIX_fog_offset */
+
+#ifndef GL_SGIX_fragment_lighting
+#define GL_SGIX_fragment_lighting 1
+#define GL_FRAGMENT_LIGHTING_SGIX         0x8400
+#define GL_FRAGMENT_COLOR_MATERIAL_SGIX   0x8401
+#define GL_FRAGMENT_COLOR_MATERIAL_FACE_SGIX 0x8402
+#define GL_FRAGMENT_COLOR_MATERIAL_PARAMETER_SGIX 0x8403
+#define GL_MAX_FRAGMENT_LIGHTS_SGIX       0x8404
+#define GL_MAX_ACTIVE_LIGHTS_SGIX         0x8405
+#define GL_CURRENT_RASTER_NORMAL_SGIX     0x8406
+#define GL_LIGHT_ENV_MODE_SGIX            0x8407
+#define GL_FRAGMENT_LIGHT_MODEL_LOCAL_VIEWER_SGIX 0x8408
+#define GL_FRAGMENT_LIGHT_MODEL_TWO_SIDE_SGIX 0x8409
+#define GL_FRAGMENT_LIGHT_MODEL_AMBIENT_SGIX 0x840A
+#define GL_FRAGMENT_LIGHT_MODEL_NORMAL_INTERPOLATION_SGIX 0x840B
+#define GL_FRAGMENT_LIGHT0_SGIX           0x840C
+#define GL_FRAGMENT_LIGHT1_SGIX           0x840D
+#define GL_FRAGMENT_LIGHT2_SGIX           0x840E
+#define GL_FRAGMENT_LIGHT3_SGIX           0x840F
+#define GL_FRAGMENT_LIGHT4_SGIX           0x8410
+#define GL_FRAGMENT_LIGHT5_SGIX           0x8411
+#define GL_FRAGMENT_LIGHT6_SGIX           0x8412
+#define GL_FRAGMENT_LIGHT7_SGIX           0x8413
+typedef void (APIENTRYP PFNGLFRAGMENTCOLORMATERIALSGIXPROC) (GLenum face, GLenum mode);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTFSGIXPROC) (GLenum light, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTFVSGIXPROC) (GLenum light, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTISGIXPROC) (GLenum light, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTIVSGIXPROC) (GLenum light, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELFSGIXPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELFVSGIXPROC) (GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELISGIXPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLFRAGMENTLIGHTMODELIVSGIXPROC) (GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLFRAGMENTMATERIALFSGIXPROC) (GLenum face, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLFRAGMENTMATERIALFVSGIXPROC) (GLenum face, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLFRAGMENTMATERIALISGIXPROC) (GLenum face, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLFRAGMENTMATERIALIVSGIXPROC) (GLenum face, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLGETFRAGMENTLIGHTFVSGIXPROC) (GLenum light, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETFRAGMENTLIGHTIVSGIXPROC) (GLenum light, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLGETFRAGMENTMATERIALFVSGIXPROC) (GLenum face, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETFRAGMENTMATERIALIVSGIXPROC) (GLenum face, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLLIGHTENVISGIXPROC) (GLenum pname, GLint param);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFragmentColorMaterialSGIX (GLenum face, GLenum mode);
+GLAPI void APIENTRY glFragmentLightfSGIX (GLenum light, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glFragmentLightfvSGIX (GLenum light, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glFragmentLightiSGIX (GLenum light, GLenum pname, GLint param);
+GLAPI void APIENTRY glFragmentLightivSGIX (GLenum light, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glFragmentLightModelfSGIX (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glFragmentLightModelfvSGIX (GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glFragmentLightModeliSGIX (GLenum pname, GLint param);
+GLAPI void APIENTRY glFragmentLightModelivSGIX (GLenum pname, const GLint *params);
+GLAPI void APIENTRY glFragmentMaterialfSGIX (GLenum face, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glFragmentMaterialfvSGIX (GLenum face, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glFragmentMaterialiSGIX (GLenum face, GLenum pname, GLint param);
+GLAPI void APIENTRY glFragmentMaterialivSGIX (GLenum face, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glGetFragmentLightfvSGIX (GLenum light, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetFragmentLightivSGIX (GLenum light, GLenum pname, GLint *params);
+GLAPI void APIENTRY glGetFragmentMaterialfvSGIX (GLenum face, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetFragmentMaterialivSGIX (GLenum face, GLenum pname, GLint *params);
+GLAPI void APIENTRY glLightEnviSGIX (GLenum pname, GLint param);
+#endif
+#endif /* GL_SGIX_fragment_lighting */
+
+#ifndef GL_SGIX_framezoom
+#define GL_SGIX_framezoom 1
+#define GL_FRAMEZOOM_SGIX                 0x818B
+#define GL_FRAMEZOOM_FACTOR_SGIX          0x818C
+#define GL_MAX_FRAMEZOOM_FACTOR_SGIX      0x818D
+typedef void (APIENTRYP PFNGLFRAMEZOOMSGIXPROC) (GLint factor);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFrameZoomSGIX (GLint factor);
+#endif
+#endif /* GL_SGIX_framezoom */
+
+#ifndef GL_SGIX_igloo_interface
+#define GL_SGIX_igloo_interface 1
+typedef void (APIENTRYP PFNGLIGLOOINTERFACESGIXPROC) (GLenum pname, const void *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glIglooInterfaceSGIX (GLenum pname, const void *params);
+#endif
+#endif /* GL_SGIX_igloo_interface */
+
+#ifndef GL_SGIX_instruments
+#define GL_SGIX_instruments 1
+#define GL_INSTRUMENT_BUFFER_POINTER_SGIX 0x8180
+#define GL_INSTRUMENT_MEASUREMENTS_SGIX   0x8181
+typedef GLint (APIENTRYP PFNGLGETINSTRUMENTSSGIXPROC) (void);
+typedef void (APIENTRYP PFNGLINSTRUMENTSBUFFERSGIXPROC) (GLsizei size, GLint *buffer);
+typedef GLint (APIENTRYP PFNGLPOLLINSTRUMENTSSGIXPROC) (GLint *marker_p);
+typedef void (APIENTRYP PFNGLREADINSTRUMENTSSGIXPROC) (GLint marker);
+typedef void (APIENTRYP PFNGLSTARTINSTRUMENTSSGIXPROC) (void);
+typedef void (APIENTRYP PFNGLSTOPINSTRUMENTSSGIXPROC) (GLint marker);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI GLint APIENTRY glGetInstrumentsSGIX (void);
+GLAPI void APIENTRY glInstrumentsBufferSGIX (GLsizei size, GLint *buffer);
+GLAPI GLint APIENTRY glPollInstrumentsSGIX (GLint *marker_p);
+GLAPI void APIENTRY glReadInstrumentsSGIX (GLint marker);
+GLAPI void APIENTRY glStartInstrumentsSGIX (void);
+GLAPI void APIENTRY glStopInstrumentsSGIX (GLint marker);
+#endif
+#endif /* GL_SGIX_instruments */
+
+#ifndef GL_SGIX_interlace
+#define GL_SGIX_interlace 1
+#define GL_INTERLACE_SGIX                 0x8094
+#endif /* GL_SGIX_interlace */
+
+#ifndef GL_SGIX_ir_instrument1
+#define GL_SGIX_ir_instrument1 1
+#define GL_IR_INSTRUMENT1_SGIX            0x817F
+#endif /* GL_SGIX_ir_instrument1 */
+
+#ifndef GL_SGIX_list_priority
+#define GL_SGIX_list_priority 1
+#define GL_LIST_PRIORITY_SGIX             0x8182
+typedef void (APIENTRYP PFNGLGETLISTPARAMETERFVSGIXPROC) (GLuint list, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETLISTPARAMETERIVSGIXPROC) (GLuint list, GLenum pname, GLint *params);
+typedef void (APIENTRYP PFNGLLISTPARAMETERFSGIXPROC) (GLuint list, GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLLISTPARAMETERFVSGIXPROC) (GLuint list, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLLISTPARAMETERISGIXPROC) (GLuint list, GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLLISTPARAMETERIVSGIXPROC) (GLuint list, GLenum pname, const GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGetListParameterfvSGIX (GLuint list, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetListParameterivSGIX (GLuint list, GLenum pname, GLint *params);
+GLAPI void APIENTRY glListParameterfSGIX (GLuint list, GLenum pname, GLfloat param);
+GLAPI void APIENTRY glListParameterfvSGIX (GLuint list, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glListParameteriSGIX (GLuint list, GLenum pname, GLint param);
+GLAPI void APIENTRY glListParameterivSGIX (GLuint list, GLenum pname, const GLint *params);
+#endif
+#endif /* GL_SGIX_list_priority */
+
+#ifndef GL_SGIX_pixel_texture
+#define GL_SGIX_pixel_texture 1
+#define GL_PIXEL_TEX_GEN_SGIX             0x8139
+#define GL_PIXEL_TEX_GEN_MODE_SGIX        0x832B
+typedef void (APIENTRYP PFNGLPIXELTEXGENSGIXPROC) (GLenum mode);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glPixelTexGenSGIX (GLenum mode);
+#endif
+#endif /* GL_SGIX_pixel_texture */
+
+#ifndef GL_SGIX_pixel_tiles
+#define GL_SGIX_pixel_tiles 1
+#define GL_PIXEL_TILE_BEST_ALIGNMENT_SGIX 0x813E
+#define GL_PIXEL_TILE_CACHE_INCREMENT_SGIX 0x813F
+#define GL_PIXEL_TILE_WIDTH_SGIX          0x8140
+#define GL_PIXEL_TILE_HEIGHT_SGIX         0x8141
+#define GL_PIXEL_TILE_GRID_WIDTH_SGIX     0x8142
+#define GL_PIXEL_TILE_GRID_HEIGHT_SGIX    0x8143
+#define GL_PIXEL_TILE_GRID_DEPTH_SGIX     0x8144
+#define GL_PIXEL_TILE_CACHE_SIZE_SGIX     0x8145
+#endif /* GL_SGIX_pixel_tiles */
+
+#ifndef GL_SGIX_polynomial_ffd
+#define GL_SGIX_polynomial_ffd 1
+#define GL_TEXTURE_DEFORMATION_BIT_SGIX   0x00000001
+#define GL_GEOMETRY_DEFORMATION_BIT_SGIX  0x00000002
+#define GL_GEOMETRY_DEFORMATION_SGIX      0x8194
+#define GL_TEXTURE_DEFORMATION_SGIX       0x8195
+#define GL_DEFORMATIONS_MASK_SGIX         0x8196
+#define GL_MAX_DEFORMATION_ORDER_SGIX     0x8197
+typedef void (APIENTRYP PFNGLDEFORMATIONMAP3DSGIXPROC) (GLenum target, GLdouble u1, GLdouble u2, GLint ustride, GLint uorder, GLdouble v1, GLdouble v2, GLint vstride, GLint vorder, GLdouble w1, GLdouble w2, GLint wstride, GLint worder, const GLdouble *points);
+typedef void (APIENTRYP PFNGLDEFORMATIONMAP3FSGIXPROC) (GLenum target, GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder, GLfloat w1, GLfloat w2, GLint wstride, GLint worder, const GLfloat *points);
+typedef void (APIENTRYP PFNGLDEFORMSGIXPROC) (GLbitfield mask);
+typedef void (APIENTRYP PFNGLLOADIDENTITYDEFORMATIONMAPSGIXPROC) (GLbitfield mask);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDeformationMap3dSGIX (GLenum target, GLdouble u1, GLdouble u2, GLint ustride, GLint uorder, GLdouble v1, GLdouble v2, GLint vstride, GLint vorder, GLdouble w1, GLdouble w2, GLint wstride, GLint worder, const GLdouble *points);
+GLAPI void APIENTRY glDeformationMap3fSGIX (GLenum target, GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder, GLfloat w1, GLfloat w2, GLint wstride, GLint worder, const GLfloat *points);
+GLAPI void APIENTRY glDeformSGIX (GLbitfield mask);
+GLAPI void APIENTRY glLoadIdentityDeformationMapSGIX (GLbitfield mask);
+#endif
+#endif /* GL_SGIX_polynomial_ffd */
+
+#ifndef GL_SGIX_reference_plane
+#define GL_SGIX_reference_plane 1
+#define GL_REFERENCE_PLANE_SGIX           0x817D
+#define GL_REFERENCE_PLANE_EQUATION_SGIX  0x817E
+typedef void (APIENTRYP PFNGLREFERENCEPLANESGIXPROC) (const GLdouble *equation);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glReferencePlaneSGIX (const GLdouble *equation);
+#endif
+#endif /* GL_SGIX_reference_plane */
+
+#ifndef GL_SGIX_resample
+#define GL_SGIX_resample 1
+#define GL_PACK_RESAMPLE_SGIX             0x842C
+#define GL_UNPACK_RESAMPLE_SGIX           0x842D
+#define GL_RESAMPLE_REPLICATE_SGIX        0x842E
+#define GL_RESAMPLE_ZERO_FILL_SGIX        0x842F
+#define GL_RESAMPLE_DECIMATE_SGIX         0x8430
+#endif /* GL_SGIX_resample */
+
+#ifndef GL_SGIX_scalebias_hint
+#define GL_SGIX_scalebias_hint 1
+#define GL_SCALEBIAS_HINT_SGIX            0x8322
+#endif /* GL_SGIX_scalebias_hint */
+
+#ifndef GL_SGIX_shadow
+#define GL_SGIX_shadow 1
+#define GL_TEXTURE_COMPARE_SGIX           0x819A
+#define GL_TEXTURE_COMPARE_OPERATOR_SGIX  0x819B
+#define GL_TEXTURE_LEQUAL_R_SGIX          0x819C
+#define GL_TEXTURE_GEQUAL_R_SGIX          0x819D
+#endif /* GL_SGIX_shadow */
+
+#ifndef GL_SGIX_shadow_ambient
+#define GL_SGIX_shadow_ambient 1
+#define GL_SHADOW_AMBIENT_SGIX            0x80BF
+#endif /* GL_SGIX_shadow_ambient */
+
+#ifndef GL_SGIX_sprite
+#define GL_SGIX_sprite 1
+#define GL_SPRITE_SGIX                    0x8148
+#define GL_SPRITE_MODE_SGIX               0x8149
+#define GL_SPRITE_AXIS_SGIX               0x814A
+#define GL_SPRITE_TRANSLATION_SGIX        0x814B
+#define GL_SPRITE_AXIAL_SGIX              0x814C
+#define GL_SPRITE_OBJECT_ALIGNED_SGIX     0x814D
+#define GL_SPRITE_EYE_ALIGNED_SGIX        0x814E
+typedef void (APIENTRYP PFNGLSPRITEPARAMETERFSGIXPROC) (GLenum pname, GLfloat param);
+typedef void (APIENTRYP PFNGLSPRITEPARAMETERFVSGIXPROC) (GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLSPRITEPARAMETERISGIXPROC) (GLenum pname, GLint param);
+typedef void (APIENTRYP PFNGLSPRITEPARAMETERIVSGIXPROC) (GLenum pname, const GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glSpriteParameterfSGIX (GLenum pname, GLfloat param);
+GLAPI void APIENTRY glSpriteParameterfvSGIX (GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glSpriteParameteriSGIX (GLenum pname, GLint param);
+GLAPI void APIENTRY glSpriteParameterivSGIX (GLenum pname, const GLint *params);
+#endif
+#endif /* GL_SGIX_sprite */
+
+#ifndef GL_SGIX_subsample
+#define GL_SGIX_subsample 1
+#define GL_PACK_SUBSAMPLE_RATE_SGIX       0x85A0
+#define GL_UNPACK_SUBSAMPLE_RATE_SGIX     0x85A1
+#define GL_PIXEL_SUBSAMPLE_4444_SGIX      0x85A2
+#define GL_PIXEL_SUBSAMPLE_2424_SGIX      0x85A3
+#define GL_PIXEL_SUBSAMPLE_4242_SGIX      0x85A4
+#endif /* GL_SGIX_subsample */
+
+#ifndef GL_SGIX_tag_sample_buffer
+#define GL_SGIX_tag_sample_buffer 1
+typedef void (APIENTRYP PFNGLTAGSAMPLEBUFFERSGIXPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glTagSampleBufferSGIX (void);
+#endif
+#endif /* GL_SGIX_tag_sample_buffer */
+
+#ifndef GL_SGIX_texture_add_env
+#define GL_SGIX_texture_add_env 1
+#define GL_TEXTURE_ENV_BIAS_SGIX          0x80BE
+#endif /* GL_SGIX_texture_add_env */
+
+#ifndef GL_SGIX_texture_coordinate_clamp
+#define GL_SGIX_texture_coordinate_clamp 1
+#define GL_TEXTURE_MAX_CLAMP_S_SGIX       0x8369
+#define GL_TEXTURE_MAX_CLAMP_T_SGIX       0x836A
+#define GL_TEXTURE_MAX_CLAMP_R_SGIX       0x836B
+#endif /* GL_SGIX_texture_coordinate_clamp */
+
+#ifndef GL_SGIX_texture_lod_bias
+#define GL_SGIX_texture_lod_bias 1
+#define GL_TEXTURE_LOD_BIAS_S_SGIX        0x818E
+#define GL_TEXTURE_LOD_BIAS_T_SGIX        0x818F
+#define GL_TEXTURE_LOD_BIAS_R_SGIX        0x8190
+#endif /* GL_SGIX_texture_lod_bias */
+
+#ifndef GL_SGIX_texture_multi_buffer
+#define GL_SGIX_texture_multi_buffer 1
+#define GL_TEXTURE_MULTI_BUFFER_HINT_SGIX 0x812E
+#endif /* GL_SGIX_texture_multi_buffer */
+
+#ifndef GL_SGIX_texture_scale_bias
+#define GL_SGIX_texture_scale_bias 1
+#define GL_POST_TEXTURE_FILTER_BIAS_SGIX  0x8179
+#define GL_POST_TEXTURE_FILTER_SCALE_SGIX 0x817A
+#define GL_POST_TEXTURE_FILTER_BIAS_RANGE_SGIX 0x817B
+#define GL_POST_TEXTURE_FILTER_SCALE_RANGE_SGIX 0x817C
+#endif /* GL_SGIX_texture_scale_bias */
+
+#ifndef GL_SGIX_vertex_preclip
+#define GL_SGIX_vertex_preclip 1
+#define GL_VERTEX_PRECLIP_SGIX            0x83EE
+#define GL_VERTEX_PRECLIP_HINT_SGIX       0x83EF
+#endif /* GL_SGIX_vertex_preclip */
+
+#ifndef GL_SGIX_ycrcb
+#define GL_SGIX_ycrcb 1
+#define GL_YCRCB_422_SGIX                 0x81BB
+#define GL_YCRCB_444_SGIX                 0x81BC
+#endif /* GL_SGIX_ycrcb */
+
+#ifndef GL_SGIX_ycrcb_subsample
+#define GL_SGIX_ycrcb_subsample 1
+#endif /* GL_SGIX_ycrcb_subsample */
+
+#ifndef GL_SGIX_ycrcba
+#define GL_SGIX_ycrcba 1
+#define GL_YCRCB_SGIX                     0x8318
+#define GL_YCRCBA_SGIX                    0x8319
+#endif /* GL_SGIX_ycrcba */
+
+#ifndef GL_SGI_color_matrix
+#define GL_SGI_color_matrix 1
+#define GL_COLOR_MATRIX_SGI               0x80B1
+#define GL_COLOR_MATRIX_STACK_DEPTH_SGI   0x80B2
+#define GL_MAX_COLOR_MATRIX_STACK_DEPTH_SGI 0x80B3
+#define GL_POST_COLOR_MATRIX_RED_SCALE_SGI 0x80B4
+#define GL_POST_COLOR_MATRIX_GREEN_SCALE_SGI 0x80B5
+#define GL_POST_COLOR_MATRIX_BLUE_SCALE_SGI 0x80B6
+#define GL_POST_COLOR_MATRIX_ALPHA_SCALE_SGI 0x80B7
+#define GL_POST_COLOR_MATRIX_RED_BIAS_SGI 0x80B8
+#define GL_POST_COLOR_MATRIX_GREEN_BIAS_SGI 0x80B9
+#define GL_POST_COLOR_MATRIX_BLUE_BIAS_SGI 0x80BA
+#define GL_POST_COLOR_MATRIX_ALPHA_BIAS_SGI 0x80BB
+#endif /* GL_SGI_color_matrix */
+
+#ifndef GL_SGI_color_table
+#define GL_SGI_color_table 1
+#define GL_COLOR_TABLE_SGI                0x80D0
+#define GL_POST_CONVOLUTION_COLOR_TABLE_SGI 0x80D1
+#define GL_POST_COLOR_MATRIX_COLOR_TABLE_SGI 0x80D2
+#define GL_PROXY_COLOR_TABLE_SGI          0x80D3
+#define GL_PROXY_POST_CONVOLUTION_COLOR_TABLE_SGI 0x80D4
+#define GL_PROXY_POST_COLOR_MATRIX_COLOR_TABLE_SGI 0x80D5
+#define GL_COLOR_TABLE_SCALE_SGI          0x80D6
+#define GL_COLOR_TABLE_BIAS_SGI           0x80D7
+#define GL_COLOR_TABLE_FORMAT_SGI         0x80D8
+#define GL_COLOR_TABLE_WIDTH_SGI          0x80D9
+#define GL_COLOR_TABLE_RED_SIZE_SGI       0x80DA
+#define GL_COLOR_TABLE_GREEN_SIZE_SGI     0x80DB
+#define GL_COLOR_TABLE_BLUE_SIZE_SGI      0x80DC
+#define GL_COLOR_TABLE_ALPHA_SIZE_SGI     0x80DD
+#define GL_COLOR_TABLE_LUMINANCE_SIZE_SGI 0x80DE
+#define GL_COLOR_TABLE_INTENSITY_SIZE_SGI 0x80DF
+typedef void (APIENTRYP PFNGLCOLORTABLESGIPROC) (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *table);
+typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERFVSGIPROC) (GLenum target, GLenum pname, const GLfloat *params);
+typedef void (APIENTRYP PFNGLCOLORTABLEPARAMETERIVSGIPROC) (GLenum target, GLenum pname, const GLint *params);
+typedef void (APIENTRYP PFNGLCOPYCOLORTABLESGIPROC) (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+typedef void (APIENTRYP PFNGLGETCOLORTABLESGIPROC) (GLenum target, GLenum format, GLenum type, void *table);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERFVSGIPROC) (GLenum target, GLenum pname, GLfloat *params);
+typedef void (APIENTRYP PFNGLGETCOLORTABLEPARAMETERIVSGIPROC) (GLenum target, GLenum pname, GLint *params);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColorTableSGI (GLenum target, GLenum internalformat, GLsizei width, GLenum format, GLenum type, const void *table);
+GLAPI void APIENTRY glColorTableParameterfvSGI (GLenum target, GLenum pname, const GLfloat *params);
+GLAPI void APIENTRY glColorTableParameterivSGI (GLenum target, GLenum pname, const GLint *params);
+GLAPI void APIENTRY glCopyColorTableSGI (GLenum target, GLenum internalformat, GLint x, GLint y, GLsizei width);
+GLAPI void APIENTRY glGetColorTableSGI (GLenum target, GLenum format, GLenum type, void *table);
+GLAPI void APIENTRY glGetColorTableParameterfvSGI (GLenum target, GLenum pname, GLfloat *params);
+GLAPI void APIENTRY glGetColorTableParameterivSGI (GLenum target, GLenum pname, GLint *params);
+#endif
+#endif /* GL_SGI_color_table */
+
+#ifndef GL_SGI_texture_color_table
+#define GL_SGI_texture_color_table 1
+#define GL_TEXTURE_COLOR_TABLE_SGI        0x80BC
+#define GL_PROXY_TEXTURE_COLOR_TABLE_SGI  0x80BD
+#endif /* GL_SGI_texture_color_table */
+
+#ifndef GL_SUNX_constant_data
+#define GL_SUNX_constant_data 1
+#define GL_UNPACK_CONSTANT_DATA_SUNX      0x81D5
+#define GL_TEXTURE_CONSTANT_DATA_SUNX     0x81D6
+typedef void (APIENTRYP PFNGLFINISHTEXTURESUNXPROC) (void);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glFinishTextureSUNX (void);
+#endif
+#endif /* GL_SUNX_constant_data */
+
+#ifndef GL_SUN_convolution_border_modes
+#define GL_SUN_convolution_border_modes 1
+#define GL_WRAP_BORDER_SUN                0x81D4
+#endif /* GL_SUN_convolution_border_modes */
+
+#ifndef GL_SUN_global_alpha
+#define GL_SUN_global_alpha 1
+#define GL_GLOBAL_ALPHA_SUN               0x81D9
+#define GL_GLOBAL_ALPHA_FACTOR_SUN        0x81DA
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORBSUNPROC) (GLbyte factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORSSUNPROC) (GLshort factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORISUNPROC) (GLint factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORFSUNPROC) (GLfloat factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORDSUNPROC) (GLdouble factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORUBSUNPROC) (GLubyte factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORUSSUNPROC) (GLushort factor);
+typedef void (APIENTRYP PFNGLGLOBALALPHAFACTORUISUNPROC) (GLuint factor);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glGlobalAlphaFactorbSUN (GLbyte factor);
+GLAPI void APIENTRY glGlobalAlphaFactorsSUN (GLshort factor);
+GLAPI void APIENTRY glGlobalAlphaFactoriSUN (GLint factor);
+GLAPI void APIENTRY glGlobalAlphaFactorfSUN (GLfloat factor);
+GLAPI void APIENTRY glGlobalAlphaFactordSUN (GLdouble factor);
+GLAPI void APIENTRY glGlobalAlphaFactorubSUN (GLubyte factor);
+GLAPI void APIENTRY glGlobalAlphaFactorusSUN (GLushort factor);
+GLAPI void APIENTRY glGlobalAlphaFactoruiSUN (GLuint factor);
+#endif
+#endif /* GL_SUN_global_alpha */
+
+#ifndef GL_SUN_mesh_array
+#define GL_SUN_mesh_array 1
+#define GL_QUAD_MESH_SUN                  0x8614
+#define GL_TRIANGLE_MESH_SUN              0x8615
+typedef void (APIENTRYP PFNGLDRAWMESHARRAYSSUNPROC) (GLenum mode, GLint first, GLsizei count, GLsizei width);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glDrawMeshArraysSUN (GLenum mode, GLint first, GLsizei count, GLsizei width);
+#endif
+#endif /* GL_SUN_mesh_array */
+
+#ifndef GL_SUN_slice_accum
+#define GL_SUN_slice_accum 1
+#define GL_SLICE_ACCUM_SUN                0x85CC
+#endif /* GL_SUN_slice_accum */
+
+#ifndef GL_SUN_triangle_list
+#define GL_SUN_triangle_list 1
+#define GL_RESTART_SUN                    0x0001
+#define GL_REPLACE_MIDDLE_SUN             0x0002
+#define GL_REPLACE_OLDEST_SUN             0x0003
+#define GL_TRIANGLE_LIST_SUN              0x81D7
+#define GL_REPLACEMENT_CODE_SUN           0x81D8
+#define GL_REPLACEMENT_CODE_ARRAY_SUN     0x85C0
+#define GL_REPLACEMENT_CODE_ARRAY_TYPE_SUN 0x85C1
+#define GL_REPLACEMENT_CODE_ARRAY_STRIDE_SUN 0x85C2
+#define GL_REPLACEMENT_CODE_ARRAY_POINTER_SUN 0x85C3
+#define GL_R1UI_V3F_SUN                   0x85C4
+#define GL_R1UI_C4UB_V3F_SUN              0x85C5
+#define GL_R1UI_C3F_V3F_SUN               0x85C6
+#define GL_R1UI_N3F_V3F_SUN               0x85C7
+#define GL_R1UI_C4F_N3F_V3F_SUN           0x85C8
+#define GL_R1UI_T2F_V3F_SUN               0x85C9
+#define GL_R1UI_T2F_N3F_V3F_SUN           0x85CA
+#define GL_R1UI_T2F_C4F_N3F_V3F_SUN       0x85CB
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUISUNPROC) (GLuint code);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUSSUNPROC) (GLushort code);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUBSUNPROC) (GLubyte code);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUIVSUNPROC) (const GLuint *code);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUSVSUNPROC) (const GLushort *code);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUBVSUNPROC) (const GLubyte *code);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEPOINTERSUNPROC) (GLenum type, GLsizei stride, const void **pointer);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glReplacementCodeuiSUN (GLuint code);
+GLAPI void APIENTRY glReplacementCodeusSUN (GLushort code);
+GLAPI void APIENTRY glReplacementCodeubSUN (GLubyte code);
+GLAPI void APIENTRY glReplacementCodeuivSUN (const GLuint *code);
+GLAPI void APIENTRY glReplacementCodeusvSUN (const GLushort *code);
+GLAPI void APIENTRY glReplacementCodeubvSUN (const GLubyte *code);
+GLAPI void APIENTRY glReplacementCodePointerSUN (GLenum type, GLsizei stride, const void **pointer);
+#endif
+#endif /* GL_SUN_triangle_list */
+
+#ifndef GL_SUN_vertex
+#define GL_SUN_vertex 1
+typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX2FSUNPROC) (GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y);
+typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX2FVSUNPROC) (const GLubyte *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX3FSUNPROC) (GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLCOLOR4UBVERTEX3FVSUNPROC) (const GLubyte *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLCOLOR3FVERTEX3FSUNPROC) (GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLCOLOR3FVERTEX3FVSUNPROC) (const GLfloat *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLNORMAL3FVERTEX3FSUNPROC) (GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLCOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLCOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *c, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD2FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLTEXCOORD2FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD4FVERTEX4FSUNPROC) (GLfloat s, GLfloat t, GLfloat p, GLfloat q, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLTEXCOORD4FVERTEX4FVSUNPROC) (const GLfloat *tc, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4UBVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4UBVERTEX3FVSUNPROC) (const GLfloat *tc, const GLubyte *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR3FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR3FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD2FNORMAL3FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLTEXCOORD2FNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLTEXCOORD2FCOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLTEXCOORD4FCOLOR4FNORMAL3FVERTEX4FSUNPROC) (GLfloat s, GLfloat t, GLfloat p, GLfloat q, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+typedef void (APIENTRYP PFNGLTEXCOORD4FCOLOR4FNORMAL3FVERTEX4FVSUNPROC) (const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUIVERTEX3FSUNPROC) (GLuint rc, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUIVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4UBVERTEX3FSUNPROC) (GLuint rc, GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4UBVERTEX3FVSUNPROC) (const GLuint *rc, const GLubyte *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR3FVERTEX3FSUNPROC) (GLuint rc, GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *c, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUINORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUINORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUICOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FVERTEX3FSUNPROC) (GLuint rc, GLfloat s, GLfloat t, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *tc, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FNORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat s, GLfloat t, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FNORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *tc, const GLfloat *n, const GLfloat *v);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FCOLOR4FNORMAL3FVERTEX3FSUNPROC) (GLuint rc, GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+typedef void (APIENTRYP PFNGLREPLACEMENTCODEUITEXCOORD2FCOLOR4FNORMAL3FVERTEX3FVSUNPROC) (const GLuint *rc, const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+#ifdef GL_GLEXT_PROTOTYPES
+GLAPI void APIENTRY glColor4ubVertex2fSUN (GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y);
+GLAPI void APIENTRY glColor4ubVertex2fvSUN (const GLubyte *c, const GLfloat *v);
+GLAPI void APIENTRY glColor4ubVertex3fSUN (GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glColor4ubVertex3fvSUN (const GLubyte *c, const GLfloat *v);
+GLAPI void APIENTRY glColor3fVertex3fSUN (GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glColor3fVertex3fvSUN (const GLfloat *c, const GLfloat *v);
+GLAPI void APIENTRY glNormal3fVertex3fSUN (GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glNormal3fVertex3fvSUN (const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glColor4fNormal3fVertex3fSUN (GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glColor4fNormal3fVertex3fvSUN (const GLfloat *c, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord2fVertex3fSUN (GLfloat s, GLfloat t, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glTexCoord2fVertex3fvSUN (const GLfloat *tc, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord4fVertex4fSUN (GLfloat s, GLfloat t, GLfloat p, GLfloat q, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glTexCoord4fVertex4fvSUN (const GLfloat *tc, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord2fColor4ubVertex3fSUN (GLfloat s, GLfloat t, GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glTexCoord2fColor4ubVertex3fvSUN (const GLfloat *tc, const GLubyte *c, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord2fColor3fVertex3fSUN (GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glTexCoord2fColor3fVertex3fvSUN (const GLfloat *tc, const GLfloat *c, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord2fNormal3fVertex3fSUN (GLfloat s, GLfloat t, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glTexCoord2fNormal3fVertex3fvSUN (const GLfloat *tc, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord2fColor4fNormal3fVertex3fSUN (GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glTexCoord2fColor4fNormal3fVertex3fvSUN (const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glTexCoord4fColor4fNormal3fVertex4fSUN (GLfloat s, GLfloat t, GLfloat p, GLfloat q, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z, GLfloat w);
+GLAPI void APIENTRY glTexCoord4fColor4fNormal3fVertex4fvSUN (const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiVertex3fSUN (GLuint rc, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiVertex3fvSUN (const GLuint *rc, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiColor4ubVertex3fSUN (GLuint rc, GLubyte r, GLubyte g, GLubyte b, GLubyte a, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiColor4ubVertex3fvSUN (const GLuint *rc, const GLubyte *c, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiColor3fVertex3fSUN (GLuint rc, GLfloat r, GLfloat g, GLfloat b, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiColor3fVertex3fvSUN (const GLuint *rc, const GLfloat *c, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiNormal3fVertex3fSUN (GLuint rc, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiNormal3fVertex3fvSUN (const GLuint *rc, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiColor4fNormal3fVertex3fSUN (GLuint rc, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiColor4fNormal3fVertex3fvSUN (const GLuint *rc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiTexCoord2fVertex3fSUN (GLuint rc, GLfloat s, GLfloat t, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiTexCoord2fVertex3fvSUN (const GLuint *rc, const GLfloat *tc, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiTexCoord2fNormal3fVertex3fSUN (GLuint rc, GLfloat s, GLfloat t, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiTexCoord2fNormal3fVertex3fvSUN (const GLuint *rc, const GLfloat *tc, const GLfloat *n, const GLfloat *v);
+GLAPI void APIENTRY glReplacementCodeuiTexCoord2fColor4fNormal3fVertex3fSUN (GLuint rc, GLfloat s, GLfloat t, GLfloat r, GLfloat g, GLfloat b, GLfloat a, GLfloat nx, GLfloat ny, GLfloat nz, GLfloat x, GLfloat y, GLfloat z);
+GLAPI void APIENTRY glReplacementCodeuiTexCoord2fColor4fNormal3fVertex3fvSUN (const GLuint *rc, const GLfloat *tc, const GLfloat *c, const GLfloat *n, const GLfloat *v);
+#endif
+#endif /* GL_SUN_vertex */
+
+#ifndef GL_WIN_phong_shading
+#define GL_WIN_phong_shading 1
+#define GL_PHONG_WIN                      0x80EA
+#define GL_PHONG_HINT_WIN                 0x80EB
+#endif /* GL_WIN_phong_shading */
+
+#ifndef GL_WIN_specular_fog
+#define GL_WIN_specular_fog 1
+#define GL_FOG_SPECULAR_TEXTURE_WIN       0x80EC
+#endif /* GL_WIN_specular_fog */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/vendor/stb/tests/caveview/glext_list.h b/vendor/stb/tests/caveview/glext_list.h
new file mode 100644
index 0000000..5cdbca5
--- /dev/null
+++ b/vendor/stb/tests/caveview/glext_list.h
@@ -0,0 +1,34 @@
+GLARB(ActiveTexture,ACTIVETEXTURE)
+GLARB(ClientActiveTexture,CLIENTACTIVETEXTURE)
+GLARB(MultiTexCoord2f,MULTITEXCOORD2F)
+GLEXT(TexImage3D,TEXIMAGE3D)
+GLEXT(TexSubImage3D,TEXSUBIMAGE3D)
+GLEXT(GenerateMipmap,GENERATEMIPMAP)
+GLARB(DebugMessageCallback,DEBUGMESSAGECALLBACK)
+
+GLCORE(VertexAttribIPointer,VERTEXATTRIBIPOINTER)
+
+GLEXT(BindFramebuffer,BINDFRAMEBUFFER)
+GLEXT(DeleteFramebuffers,DELETEFRAMEBUFFERS)
+GLEXT(GenFramebuffers,GENFRAMEBUFFERS)
+GLEXT(CheckFramebufferStatus,CHECKFRAMEBUFFERSTATUS)
+GLEXT(FramebufferTexture2D,FRAMEBUFFERTEXTURE2D)
+GLEXT(BindRenderBuffer,BINDRENDERBUFFER)
+GLEXT(RenderbufferStorage,RENDERBUFFERSTORAGE)
+GLEXT(GenRenderbuffers,GENRENDERBUFFERS)
+GLEXT(BindRenderbuffer,BINDRENDERBUFFER)
+GLEXT(FramebufferRenderbuffer,FRAMEBUFFERRENDERBUFFER)
+GLEXT(GenerateMipmap,GENERATEMIPMAP)
+
+GLARB(BindBuffer   ,BINDBUFFER,)
+GLARB(GenBuffers   ,GENBUFFERS   )
+GLARB(DeleteBuffers,DELETEBUFFERS)
+GLARB(BufferData   ,BUFFERDATA   )
+GLARB(BufferSubData,BUFFERSUBDATA)
+GLARB(MapBuffer    ,MAPBUFFER    )
+GLARB(UnmapBuffer  ,UNMAPBUFFER  )
+GLARB(TexBuffer    ,TEXBUFFER    )
+
+GLEXT(NamedBufferStorage,NAMEDBUFFERSTORAGE)
+GLE(BufferStorage,BUFFERSTORAGE)
+GLE(GetStringi,GETSTRINGI)
\ No newline at end of file
diff --git a/vendor/stb/tests/caveview/main.c b/vendor/stb/tests/caveview/main.c
new file mode 100644
index 0000000..e69de29
diff --git a/vendor/stb/tests/caveview/stb_gl.h b/vendor/stb/tests/caveview/stb_gl.h
new file mode 100644
index 0000000..6498e28
--- /dev/null
+++ b/vendor/stb/tests/caveview/stb_gl.h
@@ -0,0 +1,1103 @@
+// stbgl - v0.04 - Sean Barrett 2008 - public domain
+//
+// Note that the gl extensions support requires glext.h. In fact, it works
+// if you just concatenate glext.h onto the end of this file. In that case,
+// this file is covered by the SGI FreeB license, and is not public domain.
+//
+// Extension usage:
+//
+//    1. Make a file called something like "extlist.txt" which contains stuff like:
+//         GLE(ShaderSourceARB,SHADERSOURCEARB)
+//         GLE(Uniform1iARB,UNIFORM1IARB)
+//         GLARB(ActiveTexture,ACTIVETEXTURE)   // same as GLE(ActiveTextureARB,ACTIVETEXTUREARB)
+//         GLARB(ClientActiveTexture,CLIENTACTIVETEXTURE)
+//         GLE(MultiTexCoord2f,MULTITEXCOORD2F)
+//
+//    2. To declare functions (to make a header file), do this:
+//         #define STB_GLEXT_DECLARE "extlist.txt"
+//         #include "stb_gl.h"
+//
+//       A good way to do this is to define STB_GLEXT_DECLARE project-wide.
+//
+//    3. To define functions (implement), do this in some C file:
+//         #define STB_GLEXT_DEFINE "extlist.txt"
+//         #include "stb_gl.h"
+//
+//       If you've already defined STB_GLEXT_DECLARE, you can just do:
+//         #define STB_GLEXT_DEFINE_DECLARE
+//         #include "stb_gl.h"
+//
+//    4. Now you need to initialize:
+//
+//         stbgl_initExtensions();
+
+
+#ifndef INCLUDE_STB_GL_H
+#define INCLUDE_STB_GL_H
+
+#define STB_GL
+
+#ifdef _WIN32
+#ifndef WINGDIAPI
+#define CALLBACK    __stdcall
+#define WINGDIAPI   __declspec(dllimport)
+#define APIENTRY    __stdcall
+#endif
+#endif //_WIN32
+
+#include <stddef.h>
+
+#include <gl/gl.h>
+#include <gl/glu.h>
+
+#ifndef M_PI
+#define M_PI  3.14159265358979323846f
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// like gluPerspective, but:
+//    fov is chosen to satisfy both hfov <= max_hfov & vfov <= max_vfov;
+//            set one to 179 or 0 to ignore it
+//    zoom is applied separately, so you can do linear zoom without
+//            mucking with trig with fov; 1 -> use exact fov
+//    'aspect' is inferred from the current viewport, and ignores the
+//            possibility of non-square pixels
+extern void stbgl_Perspective(float zoom, float max_hfov, float max_vfov, float znear, float zfar);
+extern void stbgl_PerspectiveViewport(int x, int y, int w, int h, float zoom, float max_hfov, float max_vfov, float znear, float zfar);
+extern void stbgl_initCamera_zup_facing_x(void);
+extern void stbgl_initCamera_zup_facing_y(void);
+extern void stbgl_positionCameraWithEulerAngles(float *loc, float *ang);
+extern void stbgl_drawRect(float x0, float y0, float x1, float y1);
+extern void stbgl_drawRectTC(float x0, float y0, float x1, float y1, float s0, float t0, float s1, float t1);
+extern void stbgl_drawBox(float x, float y, float z, float sx, float sy, float sz, int cw);
+
+extern int stbgl_hasExtension(char *ext);
+extern void stbgl_SimpleLight(int index, float bright, float x, float y, float z);
+extern void stbgl_GlobalAmbient(float r, float g, float b);
+
+extern int stbgl_LoadTexture(char *filename, char *props); // only if stb_image is available
+
+extern int stbgl_TestTexture(int w);
+extern int stbgl_TestTextureEx(int w, char *scale_table, int checks_log2, int r1,int g1,int b1, int r2, int b2, int g2);
+extern unsigned int stbgl_rand(void); // internal, but exposed just in case; LCG, so use middle bits
+
+extern int stbgl_TexImage2D(int texid, int w, int h, void *data, char *props);
+extern int stbgl_TexImage2D_Extra(int texid, int w, int h, void *data, int chan, char *props, int preserve_data);
+// "props" is a series of characters (and blocks of characters), a la fopen()'s mode,
+// e.g.:
+//   GLuint texid = stbgl_LoadTexture("myfile.jpg", "mbc")
+//      means: load the image "myfile.jpg", and do the following:
+//                generate mipmaps
+//                use bilinear filtering (not trilinear)
+//                use clamp-to-edge on both channels
+//
+// input descriptor: AT MOST ONE
+//   TEXT     MEANING
+//    1         1 channel of input (intensity/alpha)
+//    2         2 channels of input (luminance, alpha)
+//    3         3 channels of input (RGB)
+//    4         4 channels of input (RGBA)
+//    l         1 channel of input (luminance)
+//    a         1 channel of input (alpha)
+//    la        2 channels of input (lum/alpha)
+//    rgb       3 channels of input (RGB)
+//    ycocg     3 channels of input (YCoCg - forces YCoCg output)
+//    ycocgj    4 channels of input (YCoCgJunk - forces YCoCg output)
+//    rgba      4 channels of input (RGBA)
+//    
+// output descriptor: AT MOST ONE
+//   TEXT     MEANING
+//    A         1 channel of output (alpha)
+//    I         1 channel of output (intensity)
+//    LA        2 channels of output (lum/alpha)
+//    RGB       3 channels of output (RGB)
+//    RGBA      4 channels of output (RGBA)
+//    DXT1      encode as a DXT1 texture (RGB unless input has RGBA)
+//    DXT3      encode as a DXT3 texture
+//    DXT5      encode as a DXT5 texture
+//    YCoCg     encode as a DXT5 texture with Y in alpha, CoCg in RG
+//    D         GL_DEPTH_COMPONENT
+//    NONE      no input/output, don't call TexImage2D at all
+//
+// when reading from a file or using another interface with an explicit
+// channel count, the input descriptor is ignored and instead the channel
+// count is used as the input descriptor. if the file read is a DXT DDS,
+// then it is passed directly to OpenGL in the file format.
+//
+// if an input descriptor is supplied but no output descriptor, the output
+// is assumed to be the same as the input. if an output descriptor is supplied
+// but no input descriptor, the input is assumed to be the same as the
+// output. if neither is supplied, the input is assumed to be 4-channel.
+// If DXT1 or YCoCG output is requested with no input, the input is assumed
+// to be 4-channel but the alpha channel is ignored.
+//
+// filtering descriptor (default is no mipmaps)
+//   TEXT     MEANING
+//    m         generate mipmaps
+//    M         mipmaps are provided, concatenated at end of data (from largest to smallest)
+//    t         use trilinear filtering (default if mipmapped)
+//    b         use bilinear filtering (default if not-mipmapped)
+//    n         use nearest-neighbor sampling
+//
+// wrapping descriptor
+//   TEXT     MEANING
+//    w         wrap (default)
+//    c         clamp-to-edge
+//    C         GL_CLAMP (uses border color)
+//
+// If only one wrapping descriptor is supplied, it is applied to both channels.
+//
+// special:
+//   TEXT     MEANING
+//    f         input data is floats (default unsigned bytes)
+//    F         input&output data is floats (default unsigned bytes)
+//    p         explicitly pre-multiply the alpha
+//    P         pad to power-of-two (default stretches)
+//    NP2       non-power-of-two
+//    +         can overwrite the texture data with temp data
+//    !         free the texture data with "free"
+//
+// the properties string can also include spaces
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#ifdef STB_GL_IMPLEMENTATION
+#include <math.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <memory.h>
+
+int stbgl_hasExtension(char *ext)
+{
+   const char *s = glGetString(GL_EXTENSIONS);
+   for(;;) {
+      char *e = ext;
+      for (;;) {
+         if (*e == 0) {
+            if (*s == 0 || *s == ' ') return 1;
+            break;
+         }
+         if (*s != *e)
+            break;
+         ++s, ++e;
+      }
+      while (*s && *s != ' ') ++s;
+      if (!*s) return 0;
+      ++s; // skip space
+   }
+}
+
+void stbgl_drawRect(float x0, float y0, float x1, float y1)
+{
+   glBegin(GL_POLYGON);
+      glTexCoord2f(0,0); glVertex2f(x0,y0);
+      glTexCoord2f(1,0); glVertex2f(x1,y0);
+      glTexCoord2f(1,1); glVertex2f(x1,y1);
+      glTexCoord2f(0,1); glVertex2f(x0,y1);
+   glEnd();
+}
+
+void stbgl_drawRectTC(float x0, float y0, float x1, float y1, float s0, float t0, float s1, float t1)
+{
+   glBegin(GL_POLYGON);
+      glTexCoord2f(s0,t0); glVertex2f(x0,y0);
+      glTexCoord2f(s1,t0); glVertex2f(x1,y0);
+      glTexCoord2f(s1,t1); glVertex2f(x1,y1);
+      glTexCoord2f(s0,t1); glVertex2f(x0,y1);
+   glEnd();
+}
+
+void stbgl_drawBox(float x, float y, float z, float sx, float sy, float sz, int cw)
+{
+   float x0,y0,z0,x1,y1,z1;
+   sx /=2, sy/=2, sz/=2;
+   x0 = x-sx; y0 = y-sy; z0 = z-sz;
+   x1 = x+sx; y1 = y+sy; z1 = z+sz;
+
+   glBegin(GL_QUADS);
+      if (cw) {
+         glNormal3f(0,0,-1);
+         glTexCoord2f(0,0); glVertex3f(x0,y0,z0);
+         glTexCoord2f(1,0); glVertex3f(x1,y0,z0);
+         glTexCoord2f(1,1); glVertex3f(x1,y1,z0);
+         glTexCoord2f(0,1); glVertex3f(x0,y1,z0);
+
+         glNormal3f(0,0,1);
+         glTexCoord2f(0,0); glVertex3f(x1,y0,z1);
+         glTexCoord2f(1,0); glVertex3f(x0,y0,z1);
+         glTexCoord2f(1,1); glVertex3f(x0,y1,z1);
+         glTexCoord2f(0,1); glVertex3f(x1,y1,z1);
+
+         glNormal3f(-1,0,0);
+         glTexCoord2f(0,0); glVertex3f(x0,y1,z1);
+         glTexCoord2f(1,0); glVertex3f(x0,y0,z1);
+         glTexCoord2f(1,1); glVertex3f(x0,y0,z0);
+         glTexCoord2f(0,1); glVertex3f(x0,y1,z0);
+
+         glNormal3f(1,0,0);
+         glTexCoord2f(0,0); glVertex3f(x1,y0,z1);
+         glTexCoord2f(1,0); glVertex3f(x1,y1,z1);
+         glTexCoord2f(1,1); glVertex3f(x1,y1,z0);
+         glTexCoord2f(0,1); glVertex3f(x1,y0,z0);
+
+         glNormal3f(0,-1,0);
+         glTexCoord2f(0,0); glVertex3f(x0,y0,z1);
+         glTexCoord2f(1,0); glVertex3f(x1,y0,z1);
+         glTexCoord2f(1,1); glVertex3f(x1,y0,z0);
+         glTexCoord2f(0,1); glVertex3f(x0,y0,z0);
+
+         glNormal3f(0,1,0);
+         glTexCoord2f(0,0); glVertex3f(x1,y1,z1);
+         glTexCoord2f(1,0); glVertex3f(x0,y1,z1);
+         glTexCoord2f(1,1); glVertex3f(x0,y1,z0);
+         glTexCoord2f(0,1); glVertex3f(x1,y1,z0);
+      } else {
+         glNormal3f(0,0,-1);
+         glTexCoord2f(0,0); glVertex3f(x0,y0,z0);
+         glTexCoord2f(0,1); glVertex3f(x0,y1,z0);
+         glTexCoord2f(1,1); glVertex3f(x1,y1,z0);
+         glTexCoord2f(1,0); glVertex3f(x1,y0,z0);
+
+         glNormal3f(0,0,1);
+         glTexCoord2f(0,0); glVertex3f(x1,y0,z1);
+         glTexCoord2f(0,1); glVertex3f(x1,y1,z1);
+         glTexCoord2f(1,1); glVertex3f(x0,y1,z1);
+         glTexCoord2f(1,0); glVertex3f(x0,y0,z1);
+
+         glNormal3f(-1,0,0);
+         glTexCoord2f(0,0); glVertex3f(x0,y1,z1);
+         glTexCoord2f(0,1); glVertex3f(x0,y1,z0);
+         glTexCoord2f(1,1); glVertex3f(x0,y0,z0);
+         glTexCoord2f(1,0); glVertex3f(x0,y0,z1);
+
+         glNormal3f(1,0,0);
+         glTexCoord2f(0,0); glVertex3f(x1,y0,z1);
+         glTexCoord2f(0,1); glVertex3f(x1,y0,z0);
+         glTexCoord2f(1,1); glVertex3f(x1,y1,z0);
+         glTexCoord2f(1,0); glVertex3f(x1,y1,z1);
+
+         glNormal3f(0,-1,0);
+         glTexCoord2f(0,0); glVertex3f(x0,y0,z1);
+         glTexCoord2f(0,1); glVertex3f(x0,y0,z0);
+         glTexCoord2f(1,1); glVertex3f(x1,y0,z0);
+         glTexCoord2f(1,0); glVertex3f(x1,y0,z1);
+
+         glNormal3f(0,1,0);
+         glTexCoord2f(0,0); glVertex3f(x1,y1,z1);
+         glTexCoord2f(0,1); glVertex3f(x1,y1,z0);
+         glTexCoord2f(1,1); glVertex3f(x0,y1,z0);
+         glTexCoord2f(1,0); glVertex3f(x0,y1,z1);
+      }
+   glEnd();
+}
+
+void stbgl_SimpleLight(int index, float bright, float x, float y, float z)
+{
+   float d = (float) (1.0f/sqrt(x*x+y*y+z*z));
+   float dir[4] = { x*d,y*d,z*d,0 }, zero[4] = { 0,0,0,0 };
+   float c[4] = { bright,bright,bright,0 };
+   GLuint light = GL_LIGHT0 + index;
+   glLightfv(light, GL_POSITION, dir);
+   glLightfv(light, GL_DIFFUSE, c);
+   glLightfv(light, GL_AMBIENT, zero);
+   glLightfv(light, GL_SPECULAR, zero);
+   glEnable(light);
+   glColorMaterial(GL_FRONT, GL_AMBIENT_AND_DIFFUSE);
+   glEnable(GL_COLOR_MATERIAL);
+}
+
+void stbgl_GlobalAmbient(float r, float g, float b)
+{
+   float v[4] = { r,g,b,0 };
+   glLightModelfv(GL_LIGHT_MODEL_AMBIENT, v);
+}
+
+
+#define stbgl_rad2deg(r)  ((r)*180.0f / M_PI)
+#define stbgl_deg2rad(r)  ((r)/180.0f * M_PI)
+
+void stbgl_Perspective(float zoom, float max_hfov, float max_vfov, float znear, float zfar)
+{
+   float unit_width, unit_height, aspect, vfov;
+   int data[4],w,h;
+   glGetIntegerv(GL_VIEWPORT, data);
+   w = data[2];
+   h = data[3];
+   aspect = (float) w / h;
+
+   if (max_hfov <= 0) max_hfov = 179;
+   if (max_vfov <= 0) max_vfov = 179;
+
+   // convert max_hfov, max_vfov to worldspace width at depth=1
+   unit_width  = (float) tan(stbgl_deg2rad(max_hfov/2)) * 2;
+   unit_height = (float) tan(stbgl_deg2rad(max_vfov/2)) * 2;
+   // check if hfov = max_hfov is enough to satisfy it
+   if (unit_width <= aspect * unit_height) {
+      float height = unit_width / aspect;
+      vfov = (float) atan((     height/2) / zoom);
+   } else {
+      vfov = (float) atan((unit_height/2) / zoom);
+   }
+   vfov = (float) stbgl_rad2deg(vfov * 2);
+   gluPerspective(vfov, aspect, znear, zfar);
+}
+
+void stbgl_PerspectiveViewport(int x, int y, int w, int h, float zoom, float min_hfov, float min_vfov, float znear, float zfar)
+{
+   if (znear <= 0.0001f) znear = 0.0001f;
+   glViewport(x,y,w,h);
+   glScissor(x,y,w,h);
+   glMatrixMode(GL_PROJECTION);
+   glLoadIdentity();
+   stbgl_Perspective(zoom, min_hfov, min_vfov, znear, zfar);
+   glMatrixMode(GL_MODELVIEW);
+}
+
+// point the camera along the positive X axis, Z-up
+void stbgl_initCamera_zup_facing_x(void)
+{
+   glRotatef(-90, 1,0,0);
+   glRotatef( 90, 0,0,1);
+}
+
+// point the camera along the positive Y axis, Z-up
+void stbgl_initCamera_zup_facing_y(void)
+{
+   glRotatef(-90, 1,0,0);
+}
+
+// setup a camera using Euler angles
+void stbgl_positionCameraWithEulerAngles(float *loc, float *ang)
+{
+   glRotatef(-ang[1], 0,1,0);
+   glRotatef(-ang[0], 1,0,0);
+   glRotatef(-ang[2], 0,0,1);
+   glTranslatef(-loc[0], -loc[1], -loc[2]);
+}
+
+static int stbgl_m(char *a, char *b)
+{
+   // skip first character
+   do { ++a,++b; } while (*b && *a == *b);
+   return *b == 0;
+}
+
+#ifdef STBI_VERSION
+#ifndef STBI_NO_STDIO
+int stbgl_LoadTexture(char *filename, char *props)
+{
+   // @TODO: handle DDS files directly
+   int res;
+   void *data;
+   int w,h,c;
+   #ifndef STBI_NO_HDR
+   if (stbi_is_hdr(filename)) {
+      data = stbi_loadf(filename, &w, &h, &c, 0);
+      if (!data) return 0;
+      res = stbgl_TexImage2D_Extra(0, w,h,data, -c, props, 0);
+      free(data);
+      return res;
+   }
+   #endif
+
+   data = stbi_load(filename, &w, &h, &c, 0);
+   if (!data) return 0;
+   res = stbgl_TexImage2D_Extra(0, w,h,data, c, props, 0);
+   free(data);
+   return res;
+}
+#endif
+#endif // STBI_VERSION
+
+int stbgl_TexImage2D(int texid, int w, int h, void *data, char *props)
+{
+   return stbgl_TexImage2D_Extra(texid, w, h, data, 0, props,1);
+}
+
+int stbgl_TestTexture(int w)
+{
+   char scale_table[] = { 10,20,30,30,35,40,5,18,25,13,7,5,3,3,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0 };
+   return stbgl_TestTextureEx(w, scale_table, 2, 140,130,200, 180,200,170);
+}
+
+unsigned int stbgl_rand(void)
+{
+   static unsigned int stbgl__rand_seed = 3248980923; // random typing
+   return stbgl__rand_seed = stbgl__rand_seed * 2147001325 + 715136305; // BCPL generator
+}
+
+// wish this could be smaller, since it's so frivolous
+int stbgl_TestTextureEx(int w, char *scale_table, int checks_log2, int r1,int g1,int b1, int r2, int b2, int g2)
+{
+   int rt[2] = {r1,r2}, gt[2] = {g1,g2}, bt[2] = {b1,b2};
+   signed char modded[256];
+   int i,j, m = w-1, s,k,scale;
+   unsigned char *data = (unsigned char *) malloc(w*w*3);
+   assert((m & w) == 0);
+   data[0] = 128;
+   for (s=0; s < 16; ++s) if ((1 << s) == w) break;
+   assert(w == (1 << s));
+   // plasma fractal noise
+   for (k=s-1; k >= 0; --k) {
+      int step = 1 << k;
+      // interpolate from "parents"
+      for (j=0; j < w; j += step*2) {
+         for (i=0; i < w; i += step*2) {
+            int i1 = i+step, j1=j+step;
+            int i2 = (i+step*2)&m, j2 = (j+step*2)&m;
+            int p00 = data[(j*w+i )*3], p01 = data[(j2*w+i )*3];
+            int p10 = data[(j*w+i2)*3], p11 = data[(j2*w+i2)*3];
+            data[(j*w+i1)*3] = (p00+p10)>>1;
+            data[(j1*w+i)*3] = (p00+p01)>>1;
+            data[(j1*w+i1)*3]= (p00+p01+p10+p11)>>2;
+         }
+      }
+      scale = scale_table[s-k+1];
+      if (!scale) continue; // just interpolate down the remaining data
+      for (j=0,i=0; i < 256; i += 2, j == scale ? j=0 : ++j)
+         modded[i] = j, modded[i+1] = -j; // precompute i%scale (plus sign)
+      for (j=0; j < w; j += step)
+         for (i=0; i < w; i += step) {
+            int x = data[(j*w+i)*3] + modded[(stbgl_rand() >> 12) & 255];
+            data[(j*w+i)*3] = x < 0 ? 0 : x > 255 ? 255 : x;
+         }
+   }
+   for (j=0; j < w; ++j)
+      for (i=0; i < w; ++i) {
+         int check = ((i^j) & (1 << (s-checks_log2))) == 0;
+         int v = data[(j*w+i)*3] >> 2;
+         data[(j*w+i)*3+0] = rt[check]-v;
+         data[(j*w+i)*3+1] = gt[check]-v;
+         data[(j*w+i)*3+2] = bt[check]-v;
+      }
+   return stbgl_TexImage2D(0, w, w, data, "3m!"); // 3 channels, mipmap, free
+}
+
+#ifdef _WIN32
+#ifndef WINGDIAPI
+typedef int (__stdcall *stbgl__voidfunc)(void);
+__declspec(dllimport) stbgl__voidfunc wglGetProcAddress(char *);
+#endif
+#define STB__HAS_WGLPROC
+static void (__stdcall *stbgl__CompressedTexImage2DARB)(int target, int level,
+                                   int internalformat, int width,
+                                   int height, int border, 
+                                   int imageSize, void *data);
+static void stbgl__initCompTex(void)
+{
+   *((void **) &stbgl__CompressedTexImage2DARB) = (void *) wglGetProcAddress("glCompressedTexImage2DARB");
+}
+#else
+static void (*stbgl__CompressedTexImage2DARB)(int target, int level,
+                                   int internalformat, int width,
+                                   int height, int border, 
+                                   int imageSize, void *data);
+static void stbgl__initCompTex(void)
+{
+}
+#endif // _WIN32
+
+#define STBGL_COMPRESSED_RGB_S3TC_DXT1    0x83F0
+#define STBGL_COMPRESSED_RGBA_S3TC_DXT1   0x83F1
+#define STBGL_COMPRESSED_RGBA_S3TC_DXT3   0x83F2
+#define STBGL_COMPRESSED_RGBA_S3TC_DXT5   0x83F3
+
+#ifdef STB_COMPRESS_DXT_BLOCK
+static void stbgl__convert(uint8 *p, uint8 *q, int n, int input_desc, uint8 *end)
+{
+   int i;
+   switch (input_desc) {
+      case GL_RED:
+      case GL_LUMINANCE: for (i=0; i < n; ++i,p+=4) p[0] = p[1] = p[2] = q[0], p[3]=255, q+=1; break;
+      case GL_ALPHA:     for (i=0; i < n; ++i,p+=4) p[0] = p[1] = p[2] = 0, p[3] = q[0], q+=1; break;
+      case GL_LUMINANCE_ALPHA: for (i=0; i < n; ++i,p+=4) p[0] = p[1] = p[2] = q[0], p[3]=q[1], q+=2; break;
+      case GL_RGB:       for (i=0; i < n; ++i,p+=4) p[0]=q[0],p[1]=q[1],p[2]=q[2],p[3]=255,q+=3; break;
+      case GL_RGBA:      memcpy(p, q, n*4); break;
+      case GL_INTENSITY: for (i=0; i < n; ++i,p+=4) p[0] = p[1] = p[2] = p[3] = q[0], q+=1; break;
+   }
+   assert(p <= end);
+}
+
+static void stbgl__compress(uint8 *p, uint8 *rgba, int w, int h, int output_desc, uint8 *end)
+{
+   int i,j,y,y2;
+   int alpha = (output_desc == STBGL_COMPRESSED_RGBA_S3TC_DXT5);
+   for (j=0; j < w; j += 4) {
+      int x=4;
+      for (i=0; i < h; i += 4) {
+         uint8 block[16*4];
+         if (i+3 >= w) x = w-i;
+         for (y=0; y < 4; ++y) {
+            if (j+y >= h) break;
+            memcpy(block+y*16, rgba + w*4*(j+y) + i*4, x*4);
+         }
+         if (x < 4) {
+            switch (x) {
+               case 0: assert(0);
+               case 1:
+                  for (y2=0; y2 < y; ++y2) {
+                     memcpy(block+y2*16+1*4, block+y2*16+0*4, 4);
+                     memcpy(block+y2*16+2*4, block+y2*16+0*4, 8);
+                  }
+                  break;
+               case 2:
+                  for (y2=0; y2 < y; ++y2)
+                     memcpy(block+y2*16+2*4, block+y2*16+0*4, 8);
+                  break;
+               case 3:
+                  for (y2=0; y2 < y; ++y2)
+                     memcpy(block+y2*16+3*4, block+y2*16+1*4, 4);
+                  break;
+            }
+         }
+         y2 = 0;
+         for(; y<4; ++y,++y2)
+            memcpy(block+y*16, block+y2*16, 4*4);
+         stb_compress_dxt_block(p, block, alpha, 10);
+         p += alpha ? 16 : 8;
+      }
+   }
+   assert(p <= end);
+}
+#endif // STB_COMPRESS_DXT_BLOCK
+
+// use the reserved temporary-use enumerant range, since no
+// OpenGL enumerants should fall in that range
+enum
+{
+   STBGL_UNDEFINED = 0x6000,
+   STBGL_YCOCG,
+   STBGL_YCOCGJ,
+   STBGL_GEN_MIPMAPS,
+   STBGL_MIPMAPS,
+   STBGL_NO_DOWNLOAD,
+};
+
+#define STBGL_CLAMP_TO_EDGE               0x812F
+#define STBGL_CLAMP_TO_BORDER             0x812D
+
+#define STBGL_DEPTH_COMPONENT16           0x81A5
+#define STBGL_DEPTH_COMPONENT24           0x81A6
+#define STBGL_DEPTH_COMPONENT32           0x81A7
+
+int stbgl_TexImage2D_Extra(int texid, int w, int h, void *data, int chan, char *props, int preserve_data)
+{
+   static int has_s3tc = -1; // haven't checked yet
+   int free_data = 0, is_compressed = 0;
+   int pad_to_power_of_two = 0, non_power_of_two = 0;
+   int premultiply_alpha = 0; // @TODO
+   int float_tex   = 0; // @TODO
+   int input_type  = GL_UNSIGNED_BYTE;
+   int input_desc  = STBGL_UNDEFINED;
+   int output_desc = STBGL_UNDEFINED;
+   int mipmaps     = STBGL_UNDEFINED;
+   int filter      = STBGL_UNDEFINED, mag_filter;
+   int wrap_s = STBGL_UNDEFINED, wrap_t = STBGL_UNDEFINED;
+
+   // parse out the properties
+   if (props == NULL) props = "";
+   while (*props) {
+      switch (*props) {
+         case '1' :  input_desc = GL_LUMINANCE; break;
+         case '2' :  input_desc = GL_LUMINANCE_ALPHA; break;
+         case '3' :  input_desc = GL_RGB; break;
+         case '4' :  input_desc = GL_RGBA; break;
+         case 'l' :  if (props[1] == 'a') { input_desc = GL_LUMINANCE_ALPHA; ++props; }
+                     else input_desc = GL_LUMINANCE;
+                     break;
+         case 'a' :  input_desc = GL_ALPHA; break;
+         case 'r' :  if (stbgl_m(props, "rgba")) { input_desc = GL_RGBA; props += 3; break; }
+                     if (stbgl_m(props, "rgb")) { input_desc = GL_RGB; props += 2; break; }
+                     input_desc = GL_RED;
+                     break;
+         case 'y' :  if (stbgl_m(props, "ycocg")) {
+                        if (props[5] == 'j') { props += 5; input_desc = STBGL_YCOCGJ; }
+                        else { props += 4; input_desc = STBGL_YCOCG; }
+                        break;
+                     }
+                     return 0;
+         case 'L' :  if (props[1] == 'A') { output_desc = GL_LUMINANCE_ALPHA; ++props; }
+                     else output_desc = GL_LUMINANCE;
+                     break;
+         case 'I' :  output_desc = GL_INTENSITY; break;
+         case 'A' :  output_desc = GL_ALPHA; break;
+         case 'R' :  if (stbgl_m(props, "RGBA")) { output_desc = GL_RGBA; props += 3; break; }
+                     if (stbgl_m(props, "RGB")) { output_desc = GL_RGB; props += 2; break; }
+                     output_desc = GL_RED;
+                     break;
+         case 'Y' :  if (stbgl_m(props, "YCoCg") || stbgl_m(props, "YCOCG")) {
+                        props += 4;
+                        output_desc = STBGL_YCOCG;
+                        break;
+                     }
+                     return 0;
+         case 'D' :  if (stbgl_m(props, "DXT")) {
+                        switch (props[3]) {
+                           case '1': output_desc = STBGL_COMPRESSED_RGB_S3TC_DXT1; break;
+                           case '3': output_desc = STBGL_COMPRESSED_RGBA_S3TC_DXT3; break;
+                           case '5': output_desc = STBGL_COMPRESSED_RGBA_S3TC_DXT5; break;
+                           default: return 0;
+                        }
+                        props += 3;
+                     } else if (stbgl_m(props, "D16")) {
+                        output_desc = STBGL_DEPTH_COMPONENT16;
+                        input_desc  = GL_DEPTH_COMPONENT;
+                        props += 2;
+                     } else if (stbgl_m(props, "D24")) {
+                        output_desc = STBGL_DEPTH_COMPONENT24;
+                        input_desc  = GL_DEPTH_COMPONENT;
+                        props += 2;
+                     } else if (stbgl_m(props, "D32")) {
+                        output_desc = STBGL_DEPTH_COMPONENT32;
+                        input_desc  = GL_DEPTH_COMPONENT;
+                        props += 2;
+                     } else {
+                        output_desc = GL_DEPTH_COMPONENT;
+                        input_desc  = GL_DEPTH_COMPONENT;
+                     }
+                     break;
+         case 'N' :  if (stbgl_m(props, "NONE")) {
+                        props += 3;
+                        input_desc = STBGL_NO_DOWNLOAD;
+                        output_desc = STBGL_NO_DOWNLOAD;
+                        break;
+                     }
+                     if (stbgl_m(props, "NP2")) {
+                        non_power_of_two = 1;
+                        props += 2;
+                        break;
+                     }
+                     return 0;
+         case 'm' :  mipmaps = STBGL_GEN_MIPMAPS; break;
+         case 'M' :  mipmaps = STBGL_MIPMAPS; break;
+         case 't' :  filter = GL_LINEAR_MIPMAP_LINEAR; break;
+         case 'b' :  filter = GL_LINEAR; break;
+         case 'n' :  filter = GL_NEAREST; break;
+         case 'w' :  if (wrap_s == STBGL_UNDEFINED) wrap_s = GL_REPEAT; else wrap_t = GL_REPEAT; break;
+         case 'C' :  if (wrap_s == STBGL_UNDEFINED) wrap_s = STBGL_CLAMP_TO_BORDER; else wrap_t = STBGL_CLAMP_TO_BORDER; break;
+         case 'c' :  if (wrap_s == STBGL_UNDEFINED) wrap_s = STBGL_CLAMP_TO_EDGE; else wrap_t = STBGL_CLAMP_TO_EDGE; break;
+         case 'f' :  input_type = GL_FLOAT; break;
+         case 'F' :  input_type = GL_FLOAT; float_tex = 1; break;
+         case 'p' :  premultiply_alpha = 1; break;
+         case 'P' :  pad_to_power_of_two = 1; break;
+         case '+' :  preserve_data = 0; break;
+         case '!' :  preserve_data = 0; free_data = 1; break;
+         case ' ' :  break;
+         case '-' :  break;
+         default  :  if (free_data) free(data);
+                     return 0;
+      }
+      ++props;
+   }
+   
+   // override input_desc based on channel count
+   if (output_desc != STBGL_NO_DOWNLOAD) {
+      switch (abs(chan)) {
+         case 1: input_desc = GL_LUMINANCE; break;
+         case 2: input_desc = GL_LUMINANCE_ALPHA; break;
+         case 3: input_desc = GL_RGB; break;
+         case 4: input_desc = GL_RGBA; break;
+         case 0: break;
+         default: return 0;
+      }
+   }
+
+   // override input_desc based on channel info
+   if (chan > 0) { input_type = GL_UNSIGNED_BYTE; }
+   if (chan < 0) { input_type = GL_FLOAT; }
+
+   if (output_desc == GL_ALPHA) {
+      if (input_desc == GL_LUMINANCE)
+         input_desc = GL_ALPHA;
+      if (input_desc == GL_RGB) {
+         // force a presumably-mono image to alpha
+         // @TODO handle 'preserve_data' case?
+         if (data && !preserve_data && input_type == GL_UNSIGNED_BYTE) {
+            int i;
+            unsigned char *p = (unsigned char *) data, *q = p;
+            for (i=0; i < w*h; ++i) {
+               *q = (p[0] + 2*p[1] + p[2]) >> 2;
+               p += 3;
+               q += 1;
+            }
+            input_desc = GL_ALPHA;
+         }
+      }
+   }
+
+   // set undefined input/output based on the other
+   if (input_desc == STBGL_UNDEFINED && output_desc == STBGL_UNDEFINED) {
+      input_desc = output_desc = GL_RGBA;
+   } else if (output_desc == STBGL_UNDEFINED) {
+      switch (input_desc) {
+         case GL_LUMINANCE:
+         case GL_ALPHA:
+         case GL_LUMINANCE_ALPHA:
+         case GL_RGB:
+         case GL_RGBA:
+            output_desc = input_desc;
+            break;
+         case GL_RED:
+            output_desc = GL_INTENSITY;
+            break;
+         case STBGL_YCOCG:
+         case STBGL_YCOCGJ:
+            output_desc = STBGL_YCOCG;
+            break;
+         default: assert(0); return 0;
+      }
+   } else if (input_desc == STBGL_UNDEFINED) {
+      switch (output_desc) {
+         case GL_LUMINANCE:
+         case GL_ALPHA:
+         case GL_LUMINANCE_ALPHA:
+         case GL_RGB:
+         case GL_RGBA:
+            input_desc = output_desc;
+            break;
+         case GL_INTENSITY:
+            input_desc = GL_RED;
+            break;
+         case STBGL_YCOCG:
+         case STBGL_COMPRESSED_RGB_S3TC_DXT1:
+         case STBGL_COMPRESSED_RGBA_S3TC_DXT3:
+         case STBGL_COMPRESSED_RGBA_S3TC_DXT5:
+            input_desc = GL_RGBA;
+            break;
+      }
+   } else {
+      if (output_desc == STBGL_COMPRESSED_RGB_S3TC_DXT1) {
+         // if input has alpha, force output alpha
+         switch (input_desc) {
+            case GL_ALPHA:
+            case GL_LUMINANCE_ALPHA:
+            case GL_RGBA:
+               output_desc = STBGL_COMPRESSED_RGBA_S3TC_DXT5;
+               break;
+         }
+      }
+   }
+
+   switch(input_desc) {
+      case GL_LUMINANCE:
+      case GL_RED:
+      case GL_ALPHA:
+         chan = 1;
+         break;
+      case GL_LUMINANCE_ALPHA:
+         chan = 2;
+         break;
+      case GL_RGB:
+         chan = 3;
+         break;
+      case GL_RGBA:
+         chan = 4;
+         break;
+   }
+
+   if (pad_to_power_of_two && ((w & (w-1)) || (h & (h-1)))) {
+      if (output_desc != STBGL_NO_DOWNLOAD && input_type == GL_UNSIGNED_BYTE && chan > 0) {
+         unsigned char *new_data;
+         int w2 = w, h2 = h, j;
+         while (w & (w-1))
+            w = (w | (w>>1))+1;
+         while (h & (h-1))
+            h = (h | (h>>1))+1;
+         new_data = malloc(w * h * chan);
+         for (j=0; j < h2; ++j) {
+            memcpy(new_data + j * w * chan, (char *) data+j*w2*chan, w2*chan);
+            memset(new_data + (j * w+w2) * chan, 0, (w-w2)*chan);
+         }
+         for (; j < h; ++j)
+            memset(new_data + j*w*chan, 0, w*chan);
+         if (free_data)
+            free(data);
+         data = new_data;
+         free_data = 1;
+      }
+   }
+
+   switch (output_desc) {
+      case STBGL_COMPRESSED_RGB_S3TC_DXT1:
+      case STBGL_COMPRESSED_RGBA_S3TC_DXT1:
+      case STBGL_COMPRESSED_RGBA_S3TC_DXT3:
+      case STBGL_COMPRESSED_RGBA_S3TC_DXT5:
+         is_compressed = 1;
+         if (has_s3tc == -1) {
+            has_s3tc = stbgl_hasExtension("GL_EXT_texture_compression_s3tc");
+            if (has_s3tc) stbgl__initCompTex();
+         }
+         if (!has_s3tc) {
+            is_compressed = 0;
+            if (output_desc == STBGL_COMPRESSED_RGB_S3TC_DXT1)
+               output_desc = GL_RGB;
+            else
+               output_desc = GL_RGBA;
+         }
+   }
+
+   if (output_desc == STBGL_YCOCG) {
+      assert(0);
+      output_desc = GL_RGB; // @TODO!
+      if (free_data) free(data);
+      return 0;
+   }
+
+   mag_filter = 0;
+   if (mipmaps != STBGL_UNDEFINED) {
+      switch (filter) {
+         case STBGL_UNDEFINED: filter = GL_LINEAR_MIPMAP_LINEAR; break;
+         case GL_NEAREST     : mag_filter = GL_NEAREST; filter = GL_LINEAR_MIPMAP_LINEAR; break;
+         case GL_LINEAR      : filter = GL_LINEAR_MIPMAP_NEAREST; break;
+      }
+   } else {
+      if (filter == STBGL_UNDEFINED)
+         filter = GL_LINEAR;
+   }
+
+   // update filtering
+   if (!mag_filter) {
+      if (filter == GL_NEAREST)
+         mag_filter = GL_NEAREST;
+      else
+         mag_filter = GL_LINEAR;
+   }
+
+   // update wrap/clamp
+   if (wrap_s == STBGL_UNDEFINED) wrap_s = GL_REPEAT;
+   if (wrap_t == STBGL_UNDEFINED) wrap_t = wrap_s;
+
+   // if no texture id, generate one
+   if (texid == 0) {
+      GLuint tex;
+      glGenTextures(1, &tex);
+      if (tex == 0) { if (free_data) free(data); return 0; }
+      texid = tex;
+   }
+
+   if (data == NULL && mipmaps == STBGL_GEN_MIPMAPS)
+      mipmaps = STBGL_MIPMAPS;
+
+   if (output_desc == STBGL_NO_DOWNLOAD)
+      mipmaps = STBGL_NO_DOWNLOAD;
+
+   glBindTexture(GL_TEXTURE_2D, texid);
+
+#ifdef STB_COMPRESS_DXT_BLOCK
+   if (!is_compressed || !stbgl__CompressedTexImage2DARB || output_desc == STBGL_COMPRESSED_RGBA_S3TC_DXT3 || data == NULL)
+#endif
+   {
+      switch (mipmaps) {
+         case STBGL_NO_DOWNLOAD:
+            break;
+
+         case STBGL_UNDEFINED:
+            // check if actually power-of-two
+            if (non_power_of_two || ((w & (w-1)) == 0 && (h & (h-1)) == 0))
+               glTexImage2D(GL_TEXTURE_2D, 0, output_desc, w, h, 0, input_desc, input_type, data);
+            else
+               gluBuild2DMipmaps(GL_TEXTURE_2D, output_desc, w, h, input_desc, input_type, data);
+               // not power of two, so use glu to resize (generates mipmaps needlessly)
+            break;
+
+         case STBGL_MIPMAPS: {
+            int level = 0;
+            int size = input_type == GL_FLOAT ? sizeof(float) : 1;
+            if (data == NULL) size = 0; // reuse same block of memory for all mipmaps
+            assert((w & (w-1)) == 0 && (h & (h-1)) == 0); // verify power-of-two
+            while (w > 1 && h > 1) {
+               glTexImage2D(GL_TEXTURE_2D, level, output_desc, w, h, 0, input_desc, input_type, data);
+               data = (void *) ((char *) data + w * h * size * chan);
+               if (w > 1) w >>= 1;
+               if (h > 1) h >>= 1;
+               ++level;
+            }
+            break;
+         }
+         case STBGL_GEN_MIPMAPS:
+            gluBuild2DMipmaps(GL_TEXTURE_2D, output_desc, w, h, input_desc, input_type, data);
+            break;
+
+         default:
+            assert(0);
+            if (free_data) free(data);
+            return 0;
+      }
+#ifdef STB_COMPRESS_DXT_BLOCK
+   } else {
+      uint8 *out, *rgba=0, *end_out, *end_rgba;
+      int level = 0, alpha = (output_desc != STBGL_COMPRESSED_RGB_S3TC_DXT1);
+      int size = input_type == GL_FLOAT ? sizeof(float) : 1;
+      int osize = alpha ? 16 : 8;
+      if (!free_data && mipmaps == STBGL_GEN_MIPMAPS) {
+         uint8 *temp = malloc(w*h*chan);
+         if (!temp) { if (free_data) free(data); return 0; }
+         memcpy(temp, data, w*h*chan);
+         if (free_data) free(data);
+         free_data = 1;
+         data = temp;
+      }
+      if (chan != 4 || size != 1) {
+         rgba = malloc(w*h*4);
+         if (!rgba) return 0;
+         end_rgba = rgba+w*h*4;
+      }
+      out = malloc((w+3)*(h+3)/16*osize); // enough storage for the s3tc data
+      if (!out) return 0;
+      end_out = out + ((w+3)*(h+3))/16*osize;
+
+      for(;;) {
+         if (chan != 4)
+            stbgl__convert(rgba, data, w*h, input_desc, end_rgba);
+         stbgl__compress(out, rgba ? rgba : data, w, h, output_desc, end_out);
+         stbgl__CompressedTexImage2DARB(GL_TEXTURE_2D, level, output_desc, w, h, 0, ((w+3)&~3)*((h+3)&~3)/16*osize, out);
+         //glTexImage2D(GL_TEXTURE_2D, level, alpha?GL_RGBA:GL_RGB, w, h, 0, GL_RGBA, GL_UNSIGNED_BYTE, rgba ? rgba : data);
+
+         if (mipmaps == STBGL_UNDEFINED) break;
+         if (w <= 1 && h <= 1) break;
+         if (mipmaps == STBGL_MIPMAPS) data = (void *) ((char *) data + w * h * size * chan);
+         if (mipmaps == STBGL_GEN_MIPMAPS) {
+            int w2 = w>>1, h2=h>>1, i,j,k, s=w*chan;
+            uint8 *p = data, *q=data;
+            if (w == 1) {
+               for (j=0; j < h2; ++j) {
+                  for (k=0; k < chan; ++k)
+                     *p++ = (q[k] + q[s+k] + 1) >> 1;
+                  q += s*2;
+               }
+            } else if (h == 1) {
+               for (i=0; i < w2; ++i) {
+                  for (k=0; k < chan; ++k)
+                     *p++ = (q[k] + q[k+chan] + 1) >> 1;
+                  q += chan*2;
+               }
+            } else {
+               for (j=0; j < h2; ++j) {
+                  for (i=0; i < w2; ++i) {
+                     for (k=0; k < chan; ++k)
+                        *p++ = (q[k] + q[k+chan] + q[s+k] + q[s+k+chan] + 2) >> 2;
+                     q += chan*2;
+                  }
+                  q += s;
+               }
+            }
+         }
+         if (w > 1) w >>= 1;
+         if (h > 1) h >>= 1;
+         ++level;
+      }
+      if (out) free(out);
+      if (rgba) free(rgba);
+#endif // STB_COMPRESS_DXT_BLOCK
+   }
+
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, wrap_s);
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, wrap_t);
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, mag_filter);
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, filter);
+
+   if (free_data) free(data);
+   return texid;
+}
+
+#endif // STB_DEFINE
+#undef STB_EXTERN
+
+#endif //INCLUDE_STB_GL_H
+
+// Extension handling... must be outside the INCLUDE_ brackets
+
+#if defined(STB_GLEXT_DEFINE) || defined(STB_GLEXT_DECLARE)
+
+#ifndef STB_GLEXT_SKIP_DURING_RECURSION
+
+#ifndef GL_GLEXT_VERSION
+
+   // First check if glext.h is concatenated on the end of this file
+   // (if it's concatenated on the beginning, we'll have GL_GLEXT_VERSION)
+
+   #define  STB_GLEXT_SKIP_DURING_RECURSION
+   #include __FILE__
+   #undef   STB_GLEXT_SKIP_DURING_RECURSION
+
+   // now check if it's still undefined; if so, try going for it by name;
+   // if this errors, that's fine, since we can't compile without it
+
+   #ifndef GL_GLEXT_VERSION
+   #include "glext.h"
+   #endif
+#endif
+
+#define GLARB(a,b) GLE(a##ARB,b##ARB)
+#define GLEXT(a,b) GLE(a##EXT,b##EXT)
+#define GLNV(a,b)  GLE(a##NV ,b##NV)
+#define GLATI(a,b) GLE(a##ATI,b##ATI)
+#define GLCORE(a,b) GLE(a,b)
+
+#ifdef STB_GLEXT_DEFINE_DECLARE
+#define STB_GLEXT_DEFINE STB_GLEXT_DECLARE
+#endif
+
+#if defined(STB_GLEXT_DECLARE) && defined(STB_GLEXT_DEFINE)
+#undef STB_GLEXT_DECLARE
+#endif
+
+#if defined(STB_GLEXT_DECLARE) && !defined(STB_GLEXT_DEFINE)
+   #define GLE(a,b) extern PFNGL##b##PROC gl##a;
+
+   #ifdef __cplusplus
+   extern "C" {
+   #endif
+
+   extern void stbgl_initExtensions(void);
+
+   #include STB_GLEXT_DECLARE
+
+   #ifdef __cplusplus
+   };
+   #endif
+
+#else
+
+   #ifndef STB_GLEXT_DEFINE
+   #error "Header file is screwed up somehow"
+   #endif
+
+   #ifdef _WIN32
+   #ifndef WINGDIAPI
+   #ifndef STB__HAS_WGLPROC
+   typedef int (__stdcall *stbgl__voidfunc)(void);
+   __declspec(dllimport) stbgl__voidfunc wglGetProcAddress(char *);
+   #endif
+   #endif
+   #define STBGL__GET_FUNC(x)   wglGetProcAddress(x)
+   #endif
+
+   #ifdef GLE
+   #undef GLE
+   #endif
+
+   #define GLE(a,b)  PFNGL##b##PROC gl##a;
+   #include STB_GLEXT_DEFINE
+
+   #undef GLE
+   #define GLE(a,b) gl##a = (PFNGL##b##PROC) STBGL__GET_FUNC("gl" #a );
+
+   void stbgl_initExtensions(void)
+   {
+      #include STB_GLEXT_DEFINE
+   }
+
+   #undef GLE
+
+#endif // STB_GLEXT_DECLARE
+
+#endif // STB_GLEXT_SKIP
+
+#endif // STB_GLEXT_DEFINE || STB_GLEXT_DECLARE
diff --git a/vendor/stb/tests/caveview/stb_glprog.h b/vendor/stb/tests/caveview/stb_glprog.h
new file mode 100644
index 0000000..8883a3e
--- /dev/null
+++ b/vendor/stb/tests/caveview/stb_glprog.h
@@ -0,0 +1,504 @@
+// stb_glprog v0.02 public domain         functions to reduce GLSL boilerplate
+// http://nothings.org/stb/stb_glprog.h   especially with GL1 + ARB extensions
+//
+// Following defines *before* including have following effects:
+//
+//     STB_GLPROG_IMPLEMENTATION
+//           creates the implementation
+//
+//     STB_GLPROG_STATIC
+//           forces the implementation to be static (private to file that creates it)
+//
+//     STB_GLPROG_ARB
+//           uses ARB extension names for GLSL functions and enumerants instead of core names
+//
+//     STB_GLPROG_ARB_DEFINE_EXTENSIONS
+//           instantiates function pointers needed, static to implementing file
+//           to avoid collisions (but will collide if implementing file also
+//           defines any; best to isolate this to its own file in this case).
+//           This will try to automatically #include glext.h, but if it's not
+//           in the default include directories you'll need to include it
+//           yourself and define the next macro.
+//
+//     STB_GLPROG_SUPPRESS_GLEXT_INCLUDE
+//           disables the automatic #include of glext.h which is normally
+//           forced by STB_GLPROG_ARB_DEFINE_EXTENSIONS
+//
+// So, e.g., sample usage on an old Windows compiler:
+//
+//     #define STB_GLPROG_IMPLEMENTATION
+//     #define STB_GLPROG_ARB_DEFINE_EXTENSIONS
+//     #include <windows.h>
+//     #include "gl/gl.h"
+//     #include "stb_glprog.h"
+//
+// Note though that the header-file version of this (when you don't define
+// STB_GLPROG_IMPLEMENTATION) still uses GLint and such, so you basically
+// can only include it in places where you're already including GL, especially
+// on Windows where including "gl.h" requires (some of) "windows.h".
+//
+// See following comment blocks for function documentation.
+//
+// Version history:
+//    2013-12-08   v0.02   slightly simplified API and reduced GL resource usage (@rygorous)
+//    2013-12-08   v0.01   initial release
+
+
+// header file section starts here
+#if !defined(INCLUDE_STB_GLPROG_H)
+#define INCLUDE_STB_GLPROG_H
+
+#ifndef STB_GLPROG_STATIC
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+
+/////////////    SHADER CREATION
+
+
+/// EASY API
+
+extern GLuint stbgl_create_program(char const **vertex_source, char const **frag_source, char const **binds, char *error, int error_buflen);
+// This function returns a compiled program or 0 if there's an error.
+// To free the created program, call stbgl_delete_program.
+//
+//     stbgl_create_program(
+//             char **vertex_source,   // NULL or one or more strings with the vertex shader source, with a final NULL
+//             char **frag_source,     // NULL or one or more strings with the fragment shader source, with a final NULL
+//             char **binds,           // NULL or zero or more strings with attribute bind names, with a final NULL
+//             char *error,            // output location where compile error message is placed
+//             int error_buflen)       // length of error output buffer
+//
+// Returns a GLuint with the GL program object handle.
+//
+// If an individual bind string is "", no name is bound to that slot (this
+// allows you to create binds that aren't continuous integers starting at 0).
+//
+// If the vertex shader is NULL, then fixed-function vertex pipeline
+// is used, if that's legal in your version of GL.
+//
+// If the fragment shader is NULL, then fixed-function fragment pipeline
+// is used, if that's legal in your version of GL.
+
+extern void stgbl_delete_program(GLuint program);
+// deletes a program created by stbgl_create_program or stbgl_link_program
+
+
+/// FLEXIBLE API
+
+extern GLuint stbgl_compile_shader(GLenum type, char const **sources, int num_sources, char *error, int error_buflen);
+// compiles a shader. returns the shader on success or 0 on failure.
+//
+//    type          either:  GL_VERTEX_SHADER or GL_FRAGMENT_SHADER
+//                     or    GL_VERTEX_SHADER_ARB or GL_FRAGMENT_SHADER_ARB
+//                     or    STBGL_VERTEX_SHADER or STBGL_FRAGMENT_SHADER
+//    sources       array of strings containing the shader source
+//    num_sources   number of string in sources, or -1 meaning sources is NULL-terminated
+//    error         string to output compiler error to
+//    error_buflen  length of error buffer in chars
+
+extern GLuint stbgl_link_program(GLuint vertex_shader, GLuint fragment_shader, char const **binds, int num_binds, char *error, int error_buflen);
+// links a shader. returns the linked program on success or 0 on failure.
+//
+//    vertex_shader    a compiled vertex shader from stbgl_compile_shader, or 0 for fixed-function (if legal)
+//    fragment_shader  a compiled fragment shader from stbgl_compile_shader, or 0 for fixed-function (if legal)
+//
+
+extern void stbgl_delete_shader(GLuint shader);
+// deletes a shader created by stbgl_compile_shader
+
+
+/////////////    RENDERING WITH SHADERS
+
+extern GLint stbgl_find_uniform(GLuint prog, char *uniform);
+
+extern void stbgl_find_uniforms(GLuint prog, GLint *locations, char const **uniforms, int num_uniforms);
+// Given the locations array that is num_uniforms long, fills out
+// the locations of each of those uniforms for the specified program.
+// If num_uniforms is -1, then uniforms[] must be NULL-terminated
+
+// the following functions just wrap the difference in naming between GL2+ and ARB,
+// so you don't need them unless you're using both ARB and GL2+ in the same codebase,
+// or you're relying on this lib to provide the extensions
+extern void stbglUseProgram(GLuint program);
+extern void stbglVertexAttribPointer(GLuint index,  GLint size,  GLenum type,  GLboolean normalized,  GLsizei stride,  const GLvoid * pointer);
+extern void stbglEnableVertexAttribArray(GLuint index);
+extern void stbglDisableVertexAttribArray(GLuint index);
+extern void stbglUniform1fv(GLint loc, GLsizei count, const GLfloat *v);
+extern void stbglUniform2fv(GLint loc, GLsizei count, const GLfloat *v);
+extern void stbglUniform3fv(GLint loc, GLsizei count, const GLfloat *v);
+extern void stbglUniform4fv(GLint loc, GLsizei count, const GLfloat *v);
+extern void stbglUniform1iv(GLint loc, GLsizei count, const GLint *v);
+extern void stbglUniform2iv(GLint loc, GLsizei count, const GLint *v);
+extern void stbglUniform3iv(GLint loc, GLsizei count, const GLint *v);
+extern void stbglUniform4iv(GLint loc, GLsizei count, const GLint *v);
+extern void stbglUniform1f(GLint loc, float v0);
+extern void stbglUniform2f(GLint loc, float v0, float v1);
+extern void stbglUniform3f(GLint loc, float v0, float v1, float v2);
+extern void stbglUniform4f(GLint loc, float v0, float v1, float v2, float v3);
+extern void stbglUniform1i(GLint loc, GLint v0);
+extern void stbglUniform2i(GLint loc, GLint v0, GLint v1);
+extern void stbglUniform3i(GLint loc, GLint v0, GLint v1, GLint v2);
+extern void stbglUniform4i(GLint loc, GLint v0, GLint v1, GLint v2, GLint v3);
+
+
+//////////////     END OF FUNCTIONS
+
+//////////////////////////////////////////////////////////////////////////////
+
+#ifdef __cplusplus
+}
+#endif
+#endif // STB_GLPROG_STATIC
+
+#ifdef STB_GLPROG_ARB
+#define STBGL_VERTEX_SHADER    GL_VERTEX_SHADER_ARB
+#define STBGL_FRAGMENT_SHADER  GL_FRAGMENT_SHADER_ARB
+#else
+#define STBGL_VERTEX_SHADER    GL_VERTEX_SHADER
+#define STBGL_FRAGMENT_SHADER  GL_FRAGMENT_SHADER
+#endif
+
+#endif // INCLUDE_STB_GLPROG_H
+
+
+/////////    header file section ends here
+
+
+#ifdef STB_GLPROG_IMPLEMENTATION
+#include <string.h> // strncpy
+
+#ifdef STB_GLPROG_STATIC
+#define STB_GLPROG_DECLARE static
+#else
+#define STB_GLPROG_DECLARE extern
+#endif
+
+// check if user wants this file to define the GL extensions itself
+#ifdef STB_GLPROG_ARB_DEFINE_EXTENSIONS
+#define STB_GLPROG_ARB  // make sure later code uses the extensions
+
+#ifndef STB_GLPROG_SUPPRESS_GLEXT_INCLUDE
+#include "glext.h"
+#endif
+
+#define STB_GLPROG_EXTENSIONS                                 \
+   STB_GLPROG_FUNC(ATTACHOBJECT        , AttachObject        ) \
+   STB_GLPROG_FUNC(BINDATTRIBLOCATION  , BindAttribLocation  ) \
+   STB_GLPROG_FUNC(COMPILESHADER       , CompileShader       ) \
+   STB_GLPROG_FUNC(CREATEPROGRAMOBJECT , CreateProgramObject ) \
+   STB_GLPROG_FUNC(CREATESHADEROBJECT  , CreateShaderObject  ) \
+   STB_GLPROG_FUNC(DELETEOBJECT        , DeleteObject        ) \
+   STB_GLPROG_FUNC(DETACHOBJECT        , DetachObject        ) \
+   STB_GLPROG_FUNC(DISABLEVERTEXATTRIBARRAY, DisableVertexAttribArray) \
+   STB_GLPROG_FUNC(ENABLEVERTEXATTRIBARRAY,  EnableVertexAttribArray ) \
+   STB_GLPROG_FUNC(GETATTACHEDOBJECTS  , GetAttachedObjects  ) \
+   STB_GLPROG_FUNC(GETOBJECTPARAMETERIV, GetObjectParameteriv) \
+   STB_GLPROG_FUNC(GETINFOLOG          , GetInfoLog          ) \
+   STB_GLPROG_FUNC(GETUNIFORMLOCATION  , GetUniformLocation  ) \
+   STB_GLPROG_FUNC(LINKPROGRAM         , LinkProgram         ) \
+   STB_GLPROG_FUNC(SHADERSOURCE        , ShaderSource        ) \
+   STB_GLPROG_FUNC(UNIFORM1F           , Uniform1f           ) \
+   STB_GLPROG_FUNC(UNIFORM2F           , Uniform2f           ) \
+   STB_GLPROG_FUNC(UNIFORM3F           , Uniform3f           ) \
+   STB_GLPROG_FUNC(UNIFORM4F           , Uniform4f           ) \
+   STB_GLPROG_FUNC(UNIFORM1I           , Uniform1i           ) \
+   STB_GLPROG_FUNC(UNIFORM2I           , Uniform2i           ) \
+   STB_GLPROG_FUNC(UNIFORM3I           , Uniform3i           ) \
+   STB_GLPROG_FUNC(UNIFORM4I           , Uniform4i           ) \
+   STB_GLPROG_FUNC(UNIFORM1FV          , Uniform1fv          ) \
+   STB_GLPROG_FUNC(UNIFORM2FV          , Uniform2fv          ) \
+   STB_GLPROG_FUNC(UNIFORM3FV          , Uniform3fv          ) \
+   STB_GLPROG_FUNC(UNIFORM4FV          , Uniform4fv          ) \
+   STB_GLPROG_FUNC(UNIFORM1IV          , Uniform1iv          ) \
+   STB_GLPROG_FUNC(UNIFORM2IV          , Uniform2iv          ) \
+   STB_GLPROG_FUNC(UNIFORM3IV          , Uniform3iv          ) \
+   STB_GLPROG_FUNC(UNIFORM4IV          , Uniform4iv          ) \
+   STB_GLPROG_FUNC(USEPROGRAMOBJECT    , UseProgramObject    ) \
+   STB_GLPROG_FUNC(VERTEXATTRIBPOINTER , VertexAttribPointer )
+
+// define the static function pointers
+
+#define STB_GLPROG_FUNC(x,y)  static PFNGL##x##ARBPROC gl##y##ARB;
+STB_GLPROG_EXTENSIONS
+#undef STB_GLPROG_FUNC
+
+// define the GetProcAddress
+
+#ifdef _WIN32
+#ifndef WINGDIAPI
+#ifndef STB__HAS_WGLPROC
+typedef int (__stdcall *stbgl__voidfunc)(void);
+static __declspec(dllimport) stbgl__voidfunc wglGetProcAddress(char *);
+#endif
+#endif
+#define STBGL__GET_FUNC(x)   wglGetProcAddress(x)
+#else
+#error "need to define how this platform gets extensions"
+#endif
+
+// create a function that fills out the function pointers
+
+static void stb_glprog_init(void)
+{
+   static int initialized = 0; // not thread safe!
+   if (initialized) return;
+   #define STB_GLPROG_FUNC(x,y) gl##y##ARB = (PFNGL##x##ARBPROC) STBGL__GET_FUNC("gl" #y "ARB");
+   STB_GLPROG_EXTENSIONS
+   #undef STB_GLPROG_FUNC
+}
+#undef STB_GLPROG_EXTENSIONS
+
+#else
+static void stb_glprog_init(void)
+{
+}
+#endif
+
+
+// define generic names for many of the gl functions or extensions for later use;
+// note that in some cases there are two functions in core and one function in ARB
+#ifdef STB_GLPROG_ARB
+#define stbglCreateShader          glCreateShaderObjectARB
+#define stbglDeleteShader          glDeleteObjectARB
+#define stbglAttachShader          glAttachObjectARB
+#define stbglDetachShader          glDetachObjectARB
+#define stbglShaderSource          glShaderSourceARB
+#define stbglCompileShader         glCompileShaderARB
+#define stbglGetShaderStatus(a,b)  glGetObjectParameterivARB(a, GL_OBJECT_COMPILE_STATUS_ARB, b)
+#define stbglGetShaderInfoLog      glGetInfoLogARB
+#define stbglCreateProgram         glCreateProgramObjectARB
+#define stbglDeleteProgram         glDeleteObjectARB
+#define stbglLinkProgram           glLinkProgramARB
+#define stbglGetProgramStatus(a,b) glGetObjectParameterivARB(a, GL_OBJECT_LINK_STATUS_ARB, b)
+#define stbglGetProgramInfoLog     glGetInfoLogARB
+#define stbglGetAttachedShaders    glGetAttachedObjectsARB
+#define stbglBindAttribLocation    glBindAttribLocationARB
+#define stbglGetUniformLocation    glGetUniformLocationARB
+#define stbgl_UseProgram           glUseProgramObjectARB
+#else
+#define stbglCreateShader          glCreateShader
+#define stbglDeleteShader          glDeleteShader
+#define stbglAttachShader          glAttachShader
+#define stbglDetachShader          glDetachShader
+#define stbglShaderSource          glShaderSource
+#define stbglCompileShader         glCompileShader
+#define stbglGetShaderStatus(a,b)  glGetShaderiv(a, GL_COMPILE_STATUS, b)
+#define stbglGetShaderInfoLog      glGetShaderInfoLog
+#define stbglCreateProgram         glCreateProgram
+#define stbglDeleteProgram         glDeleteProgram
+#define stbglLinkProgram           glLinkProgram
+#define stbglGetProgramStatus(a,b) glGetProgramiv(a, GL_LINK_STATUS, b)
+#define stbglGetProgramInfoLog     glGetProgramInfoLog
+#define stbglGetAttachedShaders    glGetAttachedShaders
+#define stbglBindAttribLocation    glBindAttribLocation
+#define stbglGetUniformLocation    glGetUniformLocation
+#define stbgl_UseProgram           glUseProgram
+#endif
+
+
+// perform a safe strcat of 3 strings, given that we can't rely on portable snprintf
+// if you need to break on error, this is the best place to place a breakpoint
+static void stb_glprog_error(char *error, int error_buflen, char *str1, char *str2, char *str3)
+{
+   int n = strlen(str1);
+   strncpy(error, str1, error_buflen);
+   if (n < error_buflen && str2) {
+      strncpy(error+n, str2, error_buflen - n);
+      n += strlen(str2);
+      if (n < error_buflen && str3) {
+         strncpy(error+n, str3, error_buflen - n);
+      }
+   }
+   error[error_buflen-1] = 0;
+}
+
+STB_GLPROG_DECLARE GLuint stbgl_compile_shader(GLenum type, char const **sources, int num_sources, char *error, int error_buflen)
+{
+   char *typename = (type == STBGL_VERTEX_SHADER ? "vertex" : "fragment");
+   int len;
+   GLint result;
+   GLuint shader;
+
+   // initialize the extensions if we haven't already
+   stb_glprog_init();
+
+   // allocate
+
+   shader = stbglCreateShader(type);
+   if (!shader) {
+      stb_glprog_error(error, error_buflen, "Couldn't allocate shader object in stbgl_compile_shader for ", typename, NULL);
+      return 0;
+   }
+
+   // compile
+
+   // if num_sources is negative, assume source is NULL-terminated and count the non-NULL ones
+   if (num_sources < 0)
+      for (num_sources = 0; sources[num_sources] != NULL; ++num_sources)
+         ;
+   stbglShaderSource(shader, num_sources, sources, NULL);
+   stbglCompileShader(shader);
+   stbglGetShaderStatus(shader, &result);
+   if (result)
+      return shader;
+
+   // errors
+
+   stb_glprog_error(error, error_buflen, "Compile error for ", typename, " shader: ");
+   len = strlen(error);
+   if (len < error_buflen)
+      stbglGetShaderInfoLog(shader, error_buflen-len, NULL, error+len);
+
+   stbglDeleteShader(shader);
+   return 0;
+}
+
+STB_GLPROG_DECLARE GLuint stbgl_link_program(GLuint vertex_shader, GLuint fragment_shader, char const **binds, int num_binds, char *error, int error_buflen)
+{
+   int len;
+   GLint result;
+
+   // allocate
+
+   GLuint prog = stbglCreateProgram();
+   if (!prog) {
+      stb_glprog_error(error, error_buflen, "Couldn't allocate program object in stbgl_link_program", NULL, NULL);
+      return 0;
+   }
+
+   // attach
+
+   if (vertex_shader)
+      stbglAttachShader(prog, vertex_shader);
+   if (fragment_shader)
+      stbglAttachShader(prog, fragment_shader);
+
+   // attribute binds
+
+   if (binds) {
+      int i;
+      // if num_binds is negative, then it is NULL terminated
+      if (num_binds < 0)
+         for (num_binds=0; binds[num_binds]; ++num_binds)
+            ;
+      for (i=0; i < num_binds; ++i)
+         if (binds[i] && binds[i][0]) // empty binds can be NULL or ""
+            stbglBindAttribLocation(prog, i, binds[i]);
+   }
+
+   // link
+
+   stbglLinkProgram(prog);
+
+   // detach
+
+   if (vertex_shader)
+      stbglDetachShader(prog, vertex_shader);
+   if (fragment_shader)
+      stbglDetachShader(prog, fragment_shader);
+
+   // errors
+
+   stbglGetProgramStatus(prog, &result);
+   if (result)
+      return prog;
+
+   stb_glprog_error(error, error_buflen, "Link error: ", NULL, NULL);
+   len = strlen(error);
+   if (len < error_buflen)
+      stbglGetProgramInfoLog(prog, error_buflen-len, NULL, error+len);
+
+   stbglDeleteProgram(prog);
+   return 0;   
+}
+
+STB_GLPROG_DECLARE GLuint stbgl_create_program(char const **vertex_source, char const **frag_source, char const **binds, char *error, int error_buflen)
+{
+   GLuint vertex, fragment, prog=0;
+   vertex = stbgl_compile_shader(STBGL_VERTEX_SHADER, vertex_source, -1, error, error_buflen);
+   if (vertex) {
+      fragment = stbgl_compile_shader(STBGL_FRAGMENT_SHADER, frag_source, -1, error, error_buflen);
+      if (fragment)
+         prog = stbgl_link_program(vertex, fragment, binds, -1, error, error_buflen);
+      if (fragment)
+         stbglDeleteShader(fragment);
+      stbglDeleteShader(vertex);
+   }
+   return prog;
+}
+
+STB_GLPROG_DECLARE void stbgl_delete_shader(GLuint shader)
+{
+   stbglDeleteShader(shader);
+}
+
+STB_GLPROG_DECLARE void stgbl_delete_program(GLuint program)
+{
+   stbglDeleteProgram(program);
+}
+
+GLint stbgl_find_uniform(GLuint prog, char *uniform)
+{
+   return stbglGetUniformLocation(prog, uniform);
+}
+
+STB_GLPROG_DECLARE void stbgl_find_uniforms(GLuint prog, GLint *locations, char const **uniforms, int num_uniforms)
+{
+   int i;
+   if (num_uniforms < 0)
+      num_uniforms = 999999;
+   for (i=0; i < num_uniforms && uniforms[i]; ++i)
+      locations[i] = stbglGetUniformLocation(prog, uniforms[i]);
+}
+
+STB_GLPROG_DECLARE void stbglUseProgram(GLuint program)
+{
+   stbgl_UseProgram(program);
+}
+
+#ifdef STB_GLPROG_ARB
+#define STBGL_ARBIFY(name)   name##ARB
+#else
+#define STBGL_ARBIFY(name)   name
+#endif
+
+STB_GLPROG_DECLARE void stbglVertexAttribPointer(GLuint index,  GLint size,  GLenum type,  GLboolean normalized,  GLsizei stride,  const GLvoid * pointer)
+{
+   STBGL_ARBIFY(glVertexAttribPointer)(index, size, type, normalized, stride, pointer);
+}
+
+STB_GLPROG_DECLARE void stbglEnableVertexAttribArray (GLuint index) { STBGL_ARBIFY(glEnableVertexAttribArray )(index); }
+STB_GLPROG_DECLARE void stbglDisableVertexAttribArray(GLuint index) { STBGL_ARBIFY(glDisableVertexAttribArray)(index); }
+
+STB_GLPROG_DECLARE void stbglUniform1fv(GLint loc, GLsizei count, const GLfloat *v) { STBGL_ARBIFY(glUniform1fv)(loc,count,v); }
+STB_GLPROG_DECLARE void stbglUniform2fv(GLint loc, GLsizei count, const GLfloat *v) { STBGL_ARBIFY(glUniform2fv)(loc,count,v); }
+STB_GLPROG_DECLARE void stbglUniform3fv(GLint loc, GLsizei count, const GLfloat *v) { STBGL_ARBIFY(glUniform3fv)(loc,count,v); }
+STB_GLPROG_DECLARE void stbglUniform4fv(GLint loc, GLsizei count, const GLfloat *v) { STBGL_ARBIFY(glUniform4fv)(loc,count,v); }
+
+STB_GLPROG_DECLARE void stbglUniform1iv(GLint loc, GLsizei count, const GLint   *v) { STBGL_ARBIFY(glUniform1iv)(loc,count,v); }
+STB_GLPROG_DECLARE void stbglUniform2iv(GLint loc, GLsizei count, const GLint   *v) { STBGL_ARBIFY(glUniform2iv)(loc,count,v); }
+STB_GLPROG_DECLARE void stbglUniform3iv(GLint loc, GLsizei count, const GLint   *v) { STBGL_ARBIFY(glUniform3iv)(loc,count,v); }
+STB_GLPROG_DECLARE void stbglUniform4iv(GLint loc, GLsizei count, const GLint   *v) { STBGL_ARBIFY(glUniform4iv)(loc,count,v); }
+
+STB_GLPROG_DECLARE void stbglUniform1f(GLint loc, float v0)
+    { STBGL_ARBIFY(glUniform1f)(loc,v0); }
+STB_GLPROG_DECLARE void stbglUniform2f(GLint loc, float v0, float v1)
+    { STBGL_ARBIFY(glUniform2f)(loc,v0,v1); }
+STB_GLPROG_DECLARE void stbglUniform3f(GLint loc, float v0, float v1, float v2)
+    { STBGL_ARBIFY(glUniform3f)(loc,v0,v1,v2); }
+STB_GLPROG_DECLARE void stbglUniform4f(GLint loc, float v0, float v1, float v2, float v3)
+    { STBGL_ARBIFY(glUniform4f)(loc,v0,v1,v2,v3); }
+
+STB_GLPROG_DECLARE void stbglUniform1i(GLint loc, GLint v0)
+    { STBGL_ARBIFY(glUniform1i)(loc,v0); }
+STB_GLPROG_DECLARE void stbglUniform2i(GLint loc, GLint v0, GLint v1)
+    { STBGL_ARBIFY(glUniform2i)(loc,v0,v1); }
+STB_GLPROG_DECLARE void stbglUniform3i(GLint loc, GLint v0, GLint v1, GLint v2)
+    { STBGL_ARBIFY(glUniform3i)(loc,v0,v1,v2); }
+STB_GLPROG_DECLARE void stbglUniform4i(GLint loc, GLint v0, GLint v1, GLint v2, GLint v3)
+    { STBGL_ARBIFY(glUniform4i)(loc,v0,v1,v2,v3); }
+
+#endif
diff --git a/vendor/stb/tests/caveview/win32/SDL_windows_main.c b/vendor/stb/tests/caveview/win32/SDL_windows_main.c
new file mode 100644
index 0000000..32e316b
--- /dev/null
+++ b/vendor/stb/tests/caveview/win32/SDL_windows_main.c
@@ -0,0 +1,224 @@
+/*
+    SDL_windows_main.c, placed in the public domain by Sam Lantinga  4/13/98
+
+    The WinMain function -- calls your program's main() function
+*/
+#include "SDL_config.h"
+
+#ifdef __WIN32__
+
+//#include "../../core/windows/SDL_windows.h"
+
+/* Include this so we define UNICODE properly */
+#if defined(__WIN32__)
+#define WIN32_LEAN_AND_MEAN
+#define STRICT
+#ifndef UNICODE
+#define UNICODE 1
+#endif
+#undef _WIN32_WINNT
+#define _WIN32_WINNT  0x501   /* Need 0x410 for AlphaBlend() and 0x500 for EnumDisplayDevices(), 0x501 for raw input */
+#endif
+
+#include <windows.h>
+
+/* Routines to convert from UTF8 to native Windows text */
+#if UNICODE
+#define WIN_StringToUTF8(S) SDL_iconv_string("UTF-8", "UTF-16LE", (char *)(S), (SDL_wcslen(S)+1)*sizeof(WCHAR))
+#define WIN_UTF8ToString(S) (WCHAR *)SDL_iconv_string("UTF-16LE", "UTF-8", (char *)(S), SDL_strlen(S)+1)
+#else
+/* !!! FIXME: UTF8ToString() can just be a SDL_strdup() here. */
+#define WIN_StringToUTF8(S) SDL_iconv_string("UTF-8", "ASCII", (char *)(S), (SDL_strlen(S)+1))
+#define WIN_UTF8ToString(S) SDL_iconv_string("ASCII", "UTF-8", (char *)(S), SDL_strlen(S)+1)
+#endif
+
+/* Sets an error message based on a given HRESULT */
+extern int WIN_SetErrorFromHRESULT(const char *prefix, HRESULT hr);
+
+/* Sets an error message based on GetLastError(). Always return -1. */
+extern int WIN_SetError(const char *prefix);
+
+/* Wrap up the oddities of CoInitialize() into a common function. */
+extern HRESULT WIN_CoInitialize(void);
+extern void WIN_CoUninitialize(void);
+
+/* Returns SDL_TRUE if we're running on Windows Vista and newer */
+extern BOOL WIN_IsWindowsVistaOrGreater();
+
+#include <stdio.h>
+#include <stdlib.h>
+
+/* Include the SDL main definition header */
+#include "SDL.h"
+#include "SDL_main.h"
+
+#ifdef main
+#  undef main
+#endif /* main */
+
+static void
+UnEscapeQuotes(char *arg)
+{
+    char *last = NULL;
+
+    while (*arg) {
+        if (*arg == '"' && (last != NULL && *last == '\\')) {
+            char *c_curr = arg;
+            char *c_last = last;
+
+            while (*c_curr) {
+                *c_last = *c_curr;
+                c_last = c_curr;
+                c_curr++;
+            }
+            *c_last = '\0';
+        }
+        last = arg;
+        arg++;
+    }
+}
+
+/* Parse a command line buffer into arguments */
+static int
+ParseCommandLine(char *cmdline, char **argv)
+{
+    char *bufp;
+    char *lastp = NULL;
+    int argc, last_argc;
+
+    argc = last_argc = 0;
+    for (bufp = cmdline; *bufp;) {
+        /* Skip leading whitespace */
+        while (SDL_isspace(*bufp)) {
+            ++bufp;
+        }
+        /* Skip over argument */
+        if (*bufp == '"') {
+            ++bufp;
+            if (*bufp) {
+                if (argv) {
+                    argv[argc] = bufp;
+                }
+                ++argc;
+            }
+            /* Skip over word */
+            lastp = bufp;
+            while (*bufp && (*bufp != '"' || *lastp == '\\')) {
+                lastp = bufp;
+                ++bufp;
+            }
+        } else {
+            if (*bufp) {
+                if (argv) {
+                    argv[argc] = bufp;
+                }
+                ++argc;
+            }
+            /* Skip over word */
+            while (*bufp && !SDL_isspace(*bufp)) {
+                ++bufp;
+            }
+        }
+        if (*bufp) {
+            if (argv) {
+                *bufp = '\0';
+            }
+            ++bufp;
+        }
+
+        /* Strip out \ from \" sequences */
+        if (argv && last_argc != argc) {
+            UnEscapeQuotes(argv[last_argc]);
+        }
+        last_argc = argc;
+    }
+    if (argv) {
+        argv[argc] = NULL;
+    }
+    return (argc);
+}
+
+/* Show an error message */
+static void
+ShowError(const char *title, const char *message)
+{
+/* If USE_MESSAGEBOX is defined, you need to link with user32.lib */
+#ifdef USE_MESSAGEBOX
+    MessageBox(NULL, message, title, MB_ICONEXCLAMATION | MB_OK);
+#else
+    fprintf(stderr, "%s: %s\n", title, message);
+#endif
+}
+
+/* Pop up an out of memory message, returns to Windows */
+static BOOL
+OutOfMemory(void)
+{
+    ShowError("Fatal Error", "Out of memory - aborting");
+    return FALSE;
+}
+
+#if defined(_MSC_VER)
+/* The VC++ compiler needs main defined */
+#define console_main main
+#endif
+
+/* This is where execution begins [console apps] */
+int
+console_main(int argc, char *argv[])
+{
+    int status;
+
+    SDL_SetMainReady();
+
+    /* Run the application main() code */
+    status = SDL_main(argc, argv);
+
+    /* Exit cleanly, calling atexit() functions */
+    exit(status);
+
+    /* Hush little compiler, don't you cry... */
+    return 0;
+}
+
+/* This is where execution begins [windowed apps] */
+int WINAPI
+WinMain(HINSTANCE hInst, HINSTANCE hPrev, LPSTR szCmdLine, int sw)
+{
+    char **argv;
+    int argc;
+    char *cmdline;
+
+    /* Grab the command line */
+    TCHAR *text = GetCommandLine();
+#if UNICODE
+    cmdline = SDL_iconv_string("UTF-8", "UCS-2-INTERNAL", (char *)(text), (SDL_wcslen(text)+1)*sizeof(WCHAR));
+#else
+    cmdline = SDL_strdup(text);
+#endif
+    if (cmdline == NULL) {
+        return OutOfMemory();
+    }
+
+    /* Parse it into argv and argc */
+    argc = ParseCommandLine(cmdline, NULL);
+    argv = SDL_stack_alloc(char *, argc + 1);
+    if (argv == NULL) {
+        return OutOfMemory();
+    }
+    ParseCommandLine(cmdline, argv);
+
+    /* Run the main program */
+    console_main(argc, argv);
+
+    SDL_stack_free(argv);
+
+    SDL_free(cmdline);
+
+    /* Hush little compiler, don't you cry... */
+    return 0;
+}
+
+#endif /* __WIN32__ */
+
+/* vi: set ts=4 sw=4 expandtab: */
diff --git a/vendor/stb/tests/fuzz_main.c b/vendor/stb/tests/fuzz_main.c
new file mode 100644
index 0000000..40c0cc8
--- /dev/null
+++ b/vendor/stb/tests/fuzz_main.c
@@ -0,0 +1,54 @@
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+/* fuzz target entry point, works without libFuzzer */
+
+int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size);
+
+int main(int argc, char **argv)
+{
+    FILE *f;
+    char *buf = NULL;
+    long siz_buf;
+
+    if(argc < 2)
+    {
+        fprintf(stderr, "no input file\n");
+        goto err;
+    }
+
+    f = fopen(argv[1], "rb");
+    if(f == NULL)
+    {
+        fprintf(stderr, "error opening input file %s\n", argv[1]);
+        goto err;
+    }
+
+    fseek(f, 0, SEEK_END);
+
+    siz_buf = ftell(f);
+    rewind(f);
+
+    if(siz_buf < 1) goto err;
+
+    buf = (char*)malloc((size_t)siz_buf);
+    if(buf == NULL)
+    {
+        fprintf(stderr, "malloc() failed\n");
+        goto err;
+    }
+
+    if(fread(buf, (size_t)siz_buf, 1, f) != 1)
+    {
+        fprintf(stderr, "fread() failed\n");
+        goto err;
+    }
+
+    (void)LLVMFuzzerTestOneInput((uint8_t*)buf, (size_t)siz_buf);
+
+err:
+    free(buf);
+
+    return 0;
+}
diff --git a/vendor/stb/tests/grid_reachability.c b/vendor/stb/tests/grid_reachability.c
new file mode 100644
index 0000000..905f2c2
--- /dev/null
+++ b/vendor/stb/tests/grid_reachability.c
@@ -0,0 +1,363 @@
+#define STB_CONNECTED_COMPONENTS_IMPLEMENTATION
+#define STBCC_GRID_COUNT_X_LOG2  10
+#define STBCC_GRID_COUNT_Y_LOG2  10
+#include "stb_connected_components.h"
+
+#ifdef GRID_TEST
+
+#include <windows.h>
+#include <stdio.h>
+#include <direct.h>
+
+//#define STB_DEFINE
+#include "stb.h"
+
+//#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+//#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+typedef struct
+{
+   uint16 x,y;
+} point;
+
+point leader[1024][1024];
+uint32 color[1024][1024];
+
+point find(int x, int y)
+{
+   point p,q;
+   p = leader[y][x];
+   if (p.x == x && p.y == y)
+      return p;
+   q = find(p.x, p.y);
+   leader[y][x] = q;
+   return q;
+}
+
+void onion(int x1, int y1, int x2, int y2)
+{
+   point p = find(x1,y1);
+   point q = find(x2,y2);
+
+   if (p.x == q.x && p.y == q.y)
+      return;
+
+   leader[p.y][p.x] = q;
+}
+
+void reference(uint8 *map, int w, int h)
+{
+   int i,j;
+
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i) {
+         leader[j][i].x = i;
+         leader[j][i].y = j;
+      }
+   }
+         
+   for (j=1; j < h-1; ++j) {
+      for (i=1; i < w-1; ++i) {
+         if (map[j*w+i] == 255) {
+            if (map[(j+1)*w+i] == 255) onion(i,j, i,j+1);
+            if (map[(j)*w+i+1] == 255) onion(i,j, i+1,j);
+         }
+      }
+   }
+
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i) {
+         uint32 c = 0xff000000;
+         if (leader[j][i].x == i && leader[j][i].y == j) {
+            if (map[j*w+i] == 255)
+               c = stb_randLCG() | 0xff404040;
+         }
+         color[j][i] = c;
+      }
+   }
+
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i) {
+         if (leader[j][i].x != i || leader[j][i].y != j) {
+            point p = find(i,j);
+            color[j][i] = color[p.y][p.x];
+         }
+      }
+   }
+}
+
+void write_map(stbcc_grid *g, int w, int h, char *filename)
+{
+   int i,j;
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i) {
+         unsigned int c;
+         c = stbcc_get_unique_id(g,i,j);
+         c = stb_rehash_improved(c)&0xffffff;
+         if (c == STBCC_NULL_UNIQUE_ID)
+            c = 0xff000000;
+         else
+            c = (~c)^0x555555;
+         if (i % 32 == 0 || j %32 == 0) {
+            int r = (c >> 16) & 255;
+            int g = (c >> 8) & 255;
+            int b = c & 255;
+            r = (r+130)/2;
+            g = (g+130)/2;
+            b = (b+130)/2;
+            c = 0xff000000 + (r<<16) + (g<<8) + b;
+         }
+         color[j][i] = c;
+      }
+   }
+   stbi_write_png(filename, w, h, 4, color, 4*w);
+}
+
+void test_connected(stbcc_grid *g)
+{
+   int n = stbcc_query_grid_node_connection(g, 512, 90, 512, 871);
+   //printf("%d ", n);
+}
+
+static char *message;
+LARGE_INTEGER start;
+
+void start_timer(char *s)
+{
+   message = s;
+   QueryPerformanceCounter(&start);
+}
+
+void end_timer(void)
+{
+   LARGE_INTEGER end, freq;
+   double tm;
+
+   QueryPerformanceCounter(&end);
+   QueryPerformanceFrequency(&freq);
+
+   tm = (end.QuadPart - start.QuadPart) / (double) freq.QuadPart;
+   printf("%6.4lf ms: %s\n", tm * 1000, message);
+}
+
+extern void quicktest(void);
+
+int loc[5000][2];
+int main(int argc, char **argv)
+{
+   stbcc_grid *g;
+
+   int w,h, i,j,k=0, count=0, r;
+   uint8 *map = stbi_load("data/map_03.png", &w, &h, 0, 1);
+
+   assert(map);
+   quicktest();
+
+   for (j=0; j < h; ++j)
+      for (i=0; i < w; ++i)
+         map[j*w+i] = ~map[j*w+i];
+
+   for (i=0; i < w; ++i)
+      for (j=0; j < h; ++j)
+         //map[j*w+i] = (((i+1) ^ (j+1)) >> 1) & 1 ? 255 : 0;
+         map[j*w+i] = stb_max(abs(i-w/2),abs(j-h/2)) & 1 ? 255 : 0;
+         //map[j*w+i] = (((i ^ j) >> 5) ^ (i ^ j)) & 1 ? 255 : 0;
+         //map[j*w+i] = stb_rand() & 1 ? 255 : 0;
+
+   #if 1
+   for (i=0; i < 100000; ++i)
+      map[(stb_rand()%h)*w + stb_rand()%w] ^= 255;
+   #endif
+            
+   _mkdir("tests/output/stbcc");
+
+   stbi_write_png("tests/output/stbcc/reference.png", w, h, 1, map, 0);
+
+   //reference(map, w, h);
+
+   g = malloc(stbcc_grid_sizeof());
+   printf("Size: %d\n", stbcc_grid_sizeof());
+
+#if 0
+   memset(map, 0, w*h);
+   stbcc_init_grid(g, map, w, h);
+   {
+      int n;
+      char **s = stb_stringfile("c:/x/clockwork_update.txt", &n);
+      write_map(g, w, h, "tests/output/stbcc/base.png");
+      for (i=1; i < n; i += 1) {
+         int x,y,t;
+         sscanf(s[i], "%d %d %d", &x, &y, &t);
+         if (i == 571678)
+            write_map(g, w, h, stb_sprintf("tests/output/stbcc/clockwork_good.png", i));
+         stbcc_update_grid(g, x, y, t);
+         if (i == 571678)
+            write_map(g, w, h, stb_sprintf("tests/output/stbcc/clockwork_bad.png", i));
+         //if (i > 571648 && i <= 571712)
+            //write_map(g, w, h, stb_sprintf("tests/output/stbcc/clockwork_%06d.png", i));
+      }
+      write_map(g, w, h, stb_sprintf("tests/output/stbcc/clockwork_%06d.png", i-1));
+   }
+   return 0;
+#endif
+
+
+   start_timer("init");
+   stbcc_init_grid(g, map, w, h);
+   end_timer();
+   //g = stb_file("c:/x/clockwork_path.bin", 0);
+   write_map(g, w, h, "tests/output/stbcc/base.png");
+
+   for (i=0; i < 5000;) {
+      loc[i][0] = stb_rand() % w;
+      loc[i][1] = stb_rand() % h;
+      if (stbcc_query_grid_open(g, loc[i][0], loc[i][1]))
+         ++i;
+   }
+
+   r = 0;
+   start_timer("reachable");
+   for (i=0; i < 2000; ++i) {
+      for (j=0; j < 2000; ++j) {
+         int x1 = loc[i][0], y1 = loc[i][1];
+         int x2 = loc[2000+j][0], y2 = loc[2000+j][1];
+         r += stbcc_query_grid_node_connection(g, x1,y1, x2,y2);
+      }
+   }
+   end_timer();
+   printf("%d reachable\n", r);
+
+   printf("Cluster size: %d,%d\n", STBCC__CLUSTER_SIZE_X, STBCC__CLUSTER_SIZE_Y);
+
+   #if 1
+   for (j=0; j < 10; ++j) {
+      for (i=0; i < 5000; ++i) {
+         loc[i][0] = stb_rand() % w;
+         loc[i][1] = stb_rand() % h;
+      }
+      start_timer("updating 2500");
+      for (i=0; i < 2500; ++i) {
+         if (stbcc_query_grid_open(g, loc[i][0], loc[i][1]))
+            stbcc_update_grid(g, loc[i][0], loc[i][1], 1);
+         else
+            stbcc_update_grid(g, loc[i][0], loc[i][1], 0);
+      }
+      end_timer();
+      write_map(g, w, h, stb_sprintf("tests/output/stbcc/update_random_%d.png", j*i));
+   }
+   #endif
+
+   #if 0
+   start_timer("removing");
+   count = 0;
+   for (i=0; i < 1800; ++i) {
+      int x,y,a,b;
+      x = stb_rand() % (w-32);
+      y = stb_rand() % (h-32);
+      
+      if (i & 1) {
+         for (a=0; a < 32; ++a)
+            for (b=0; b < 1; ++b)
+               if (stbcc_query_grid_open(g, x+a, y+b)) {
+                  stbcc_update_grid(g, x+a, y+b, 1);
+                  ++count;
+               }
+      } else {
+         for (a=0; a < 1; ++a)
+            for (b=0; b < 32; ++b)
+               if (stbcc_query_grid_open(g, x+a, y+b)) {
+                  stbcc_update_grid(g, x+a, y+b, 1);
+                  ++count;
+               }
+      }
+
+      //if (i % 100 == 0) write_map(g, w, h, stb_sprintf("tests/output/stbcc/open_random_%d.png", i+1));
+   }
+   end_timer();
+   printf("Removed %d grid spaces\n", count);
+   write_map(g, w, h, stb_sprintf("tests/output/stbcc/open_random_%d.png", i));
+
+
+   r = 0;
+   start_timer("reachable");
+   for (i=0; i < 1000; ++i) {
+      for (j=0; j < 1000; ++j) {
+         int x1 = loc[i][0], y1 = loc[i][1];
+         int x2 = loc[j][0], y2 = loc[j][1];
+         r += stbcc_query_grid_node_connection(g, x1,y1, x2,y2);
+      }
+   }
+   end_timer();
+   printf("%d reachable\n", r);
+
+   start_timer("adding");
+   count = 0;
+   for (i=0; i < 1800; ++i) {
+      int x,y,a,b;
+      x = stb_rand() % (w-32);
+      y = stb_rand() % (h-32);
+
+      if (i & 1) {
+         for (a=0; a < 32; ++a)
+            for (b=0; b < 1; ++b)
+               if (!stbcc_query_grid_open(g, x+a, y+b)) {
+                  stbcc_update_grid(g, x+a, y+b, 0);
+                  ++count;
+               }
+      } else {
+         for (a=0; a < 1; ++a)
+            for (b=0; b < 32; ++b)
+               if (!stbcc_query_grid_open(g, x+a, y+b)) {
+                  stbcc_update_grid(g, x+a, y+b, 0);
+                  ++count;
+               }
+      }
+
+      //if (i % 100 == 0) write_map(g, w, h, stb_sprintf("tests/output/stbcc/close_random_%d.png", i+1));
+   }
+   end_timer();
+   write_map(g, w, h, stb_sprintf("tests/output/stbcc/close_random_%d.png", i));
+   printf("Added %d grid spaces\n", count);
+   #endif
+
+
+   #if 0  // for map_02.png
+   start_timer("process");
+   for (k=0; k < 20; ++k) {
+      for (j=0; j < h; ++j) {
+         int any=0;
+         for (i=0; i < w; ++i) {
+            if (map[j*w+i] > 10 && map[j*w+i] < 250) {
+               //start_timer(stb_sprintf("open %d,%d", i,j));
+               stbcc_update_grid(g, i, j, 0);
+               test_connected(g);
+               //end_timer();
+               any = 1;
+            }
+         }
+         if (any) write_map(g, w, h, stb_sprintf("tests/output/stbcc/open_row_%04d.png", j));
+      }
+
+      for (j=0; j < h; ++j) {
+         int any=0;
+         for (i=0; i < w; ++i) {
+            if (map[j*w+i] > 10 && map[j*w+i] < 250) {
+               //start_timer(stb_sprintf("close %d,%d", i,j));
+               stbcc_update_grid(g, i, j, 1);
+               test_connected(g);
+               //end_timer();
+               any = 1;
+            }
+         }
+         if (any) write_map(g, w, h, stb_sprintf("tests/output/stbcc/close_row_%04d.png", j));
+      }
+   }
+   end_timer();
+   #endif
+
+   return 0;
+}
+#endif
diff --git a/vendor/stb/tests/herringbone.dsp b/vendor/stb/tests/herringbone.dsp
new file mode 100644
index 0000000..b82fee4
--- /dev/null
+++ b/vendor/stb/tests/herringbone.dsp
@@ -0,0 +1,95 @@
+# Microsoft Developer Studio Project File - Name="herringbone" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=herringbone - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "herringbone.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "herringbone.mak" CFG="herringbone - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "herringbone - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "herringbone - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "herringbone - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "herringbone - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "herringbone___Win32_Debug"
+# PROP BASE Intermediate_Dir "herringbone___Win32_Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "herringbone - Win32 Release"
+# Name "herringbone - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\herringbone_generator.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_herringbone_wang_tile.h
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/herringbone_generator.c b/vendor/stb/tests/herringbone_generator.c
new file mode 100644
index 0000000..cf2a99e
--- /dev/null
+++ b/vendor/stb/tests/herringbone_generator.c
@@ -0,0 +1,87 @@
+#define STB_HERRINGBONE_WANG_TILE_IMPLEMENTATION
+#include "stb_herringbone_wang_tile.h"
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+//  e 12 1 1 1 1 1 1 4 4
+
+int main(int argc, char **argv)
+{
+   stbhw_config c = { 0 };
+   int w,h, num_colors,i;
+   unsigned char *data;
+
+   if (argc == 1)  goto usage;
+   if (argc  < 3)  goto error;
+
+   switch (argv[2][0]) {
+      case 'c':
+         if (argc <  8 || argc > 10)
+            goto error;
+         num_colors = 4;
+         c.is_corner = 1;
+         break;
+
+      case 'e':
+         if (argc < 10 || argc > 12)
+            goto error;
+         num_colors = 6;
+         c.is_corner = 0;
+         break;
+
+      default:
+         goto error;
+   }
+
+   c.short_side_len = atoi(argv[3]);
+   for (i=0; i < num_colors; ++i)
+      c.num_color[i] = atoi(argv[4+i]);
+
+   c.num_vary_x = 1;
+   c.num_vary_y = 1;
+
+   if (argc > 4+i)
+      c.num_vary_x = atoi(argv[4+i]);
+   if (argc > 5+i)
+      c.num_vary_y = atoi(argv[5+i]);
+
+   stbhw_get_template_size(&c, &w, &h);
+
+   data = (unsigned char *) malloc(w*h*3);
+
+   if (stbhw_make_template(&c, data, w, h, w*3))
+      stbi_write_png(argv[1], w, h, 3, data, w*3);
+   else
+      fprintf(stderr, "Error: %s\n", stbhw_get_last_error());
+   return 0;
+
+ error:
+   fputs("Invalid command-line arguments\n\n", stderr);
+ usage:
+   fputs("Usage (see source for corner & edge type definitions):\n\n", stderr);
+   fputs("herringbone_generator {outfile} c {sidelen} {c0} {c1} {c2} {c3} [{vx} {vy}]\n"
+         "     {outfile}  -- filename that template will be written to as PNG\n"
+         "     {sidelen}  -- length of short side of rectangle in pixels\n"
+         "     {c0}       -- number of colors for corner type 0\n"
+         "     {c1}       -- number of colors for corner type 1\n"
+         "     {c2}       -- number of colors for corner type 2\n"
+         "     {c3}       -- number of colors for corner type 3\n"
+         "     {vx}       -- number of color-duplicating variations horizontally in template\n"
+         "     {vy}       -- number of color-duplicating variations vertically in template\n"
+         "\n"
+         , stderr);
+   fputs("herringbone_generator {outfile} e {sidelen} {e0} {e1} {e2} {e3} {e4} {e5} [{vx} {vy}]\n"
+         "     {outfile}  -- filename that template will be written to as PNG\n"
+         "     {sidelen}  -- length of short side of rectangle in pixels\n"
+         "     {e0}       -- number of colors for edge type 0\n"
+         "     {e1}       -- number of colors for edge type 1\n"
+         "     {e2}       -- number of colors for edge type 2\n"
+         "     {e3}       -- number of colors for edge type 3\n"
+         "     {e4}       -- number of colors for edge type 4\n"
+         "     {e5}       -- number of colors for edge type 5\n"
+         "     {vx}       -- number of color-duplicating variations horizontally in template\n"
+         "     {vy}       -- number of color-duplicating variations vertically in template\n"
+         , stderr);
+   return 1;
+}
diff --git a/vendor/stb/tests/herringbone_map.c b/vendor/stb/tests/herringbone_map.c
new file mode 100644
index 0000000..22cc013
--- /dev/null
+++ b/vendor/stb/tests/herringbone_map.c
@@ -0,0 +1,83 @@
+#include <stdio.h>
+
+#define STB_HBWANG_MAX_X  500
+#define STB_HBWANG_MAX_Y  500
+
+#define STB_HERRINGBONE_WANG_TILE_IMPLEMENTATION
+#include "stb_herringbone_wang_tile.h"
+
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+int main(int argc, char **argv)
+{
+   if (argc < 5) {
+      fprintf(stderr, "Usage: herringbone_map {inputfile} {output-width} {output-height} {outputfile}\n");
+      return 1;
+   } else {
+      char *filename = argv[1];
+      int out_w = atoi(argv[2]);
+      int out_h = atoi(argv[3]);
+      char *outfile = argv[4];
+
+      unsigned char *pixels, *out_pixels;
+      stbhw_tileset ts;
+      int w,h;
+
+      pixels = stbi_load(filename, &w, &h, 0, 3);
+      if (pixels == 0) {
+         fprintf(stderr, "Couldn't open input file '%s'\n", filename);
+			exit(1);
+      }
+
+      if (!stbhw_build_tileset_from_image(&ts, pixels, w*3, w, h)) {
+         fprintf(stderr, "Error: %s\n", stbhw_get_last_error());
+         return 1;
+      }
+
+      free(pixels);
+
+      #ifdef DEBUG_OUTPUT
+      {
+         int i,j,k;
+         // add blue borders to top-left edges of the tiles
+         int hstride = (ts.short_side_len*2)*3;
+         int vstride = (ts.short_side_len  )*3;
+         for (i=0; i < ts.num_h_tiles; ++i) {
+            unsigned char *pix = ts.h_tiles[i]->pixels;
+            for (j=0; j < ts.short_side_len*2; ++j)
+               for (k=0; k < 3; ++k)
+                  pix[j*3+k] = (pix[j*3+k]*0.5+100+k*75)/1.5;
+            for (j=1; j < ts.short_side_len; ++j)
+               for (k=0; k < 3; ++k)
+                  pix[j*hstride+k] = (pix[j*hstride+k]*0.5+100+k*75)/1.5;
+         }
+         for (i=0; i < ts.num_v_tiles; ++i) {
+            unsigned char *pix = ts.v_tiles[i]->pixels;
+            for (j=0; j < ts.short_side_len; ++j)
+               for (k=0; k < 3; ++k)
+                  pix[j*3+k] = (pix[j*3+k]*0.5+100+k*75)/1.5;
+            for (j=1; j < ts.short_side_len*2; ++j)
+               for (k=0; k < 3; ++k)
+                  pix[j*vstride+k] = (pix[j*vstride+k]*0.5+100+k*75)/1.5;
+         }
+      }
+      #endif
+
+      out_pixels = malloc(out_w * out_h * 3);
+
+      if (!stbhw_generate_image(&ts, NULL, out_pixels, out_w*3, out_w, out_h)) {
+         fprintf(stderr, "Error: %s\n", stbhw_get_last_error());
+         return 1;
+      }
+
+      stbi_write_png(argv[4], out_w, out_h, 3, out_pixels, out_w*3);
+      free(out_pixels);
+
+      stbhw_free_tileset(&ts);
+      return 0;
+   }
+}
\ No newline at end of file
diff --git a/vendor/stb/tests/herringbone_map.dsp b/vendor/stb/tests/herringbone_map.dsp
new file mode 100644
index 0000000..3e26d6d
--- /dev/null
+++ b/vendor/stb/tests/herringbone_map.dsp
@@ -0,0 +1,94 @@
+# Microsoft Developer Studio Project File - Name="herringbone_map" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=herringbone_map - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "herringbone_map.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "herringbone_map.mak" CFG="herringbone_map - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "herringbone_map - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "herringbone_map - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "herringbone_map - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "herringbone_map - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "herringbone_map___Win32_Debug"
+# PROP BASE Intermediate_Dir "herringbone_map___Win32_Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "herringbone_map - Win32 Release"
+# Name "herringbone_map - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\herringbone_map.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_herringbone_wang_tile.h
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/image_test.c b/vendor/stb/tests/image_test.c
new file mode 100644
index 0000000..9c216cf
--- /dev/null
+++ b/vendor/stb/tests/image_test.c
@@ -0,0 +1,173 @@
+#define STBI_WINDOWS_UTF8
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+#define STB_DEFINE
+#include "stb.h"
+
+//#define PNGSUITE_PRIMARY
+
+#if 0
+void test_ycbcr(void)
+{
+   STBI_SIMD_ALIGN(unsigned char, y[256]);
+   STBI_SIMD_ALIGN(unsigned char, cb[256]);
+   STBI_SIMD_ALIGN(unsigned char, cr[256]);
+   STBI_SIMD_ALIGN(unsigned char, out1[256][4]);
+   STBI_SIMD_ALIGN(unsigned char, out2[256][4]);
+
+   int i,j,k;
+   int count = 0, bigcount=0, total=0;
+
+   for (i=0; i < 256; ++i) {
+      for (j=0; j < 256; ++j) {
+         for (k=0; k < 256; ++k) {
+            y [k] = k;
+            cb[k] = j;
+            cr[k] = i;
+         }
+         stbi__YCbCr_to_RGB_row(out1[0], y, cb, cr, 256, 4);
+         stbi__YCbCr_to_RGB_sse2(out2[0], y, cb, cr, 256, 4);
+         for (k=0; k < 256; ++k) {
+            // inaccurate proxy for values outside of RGB cube
+            if (out1[k][0] == 0 || out1[k][1] == 0 || out1[k][2] == 0 || out1[k][0] == 255 || out1[k][1] == 255 || out1[k][2] == 255)
+               continue;
+            ++total;
+            if (out1[k][0] != out2[k][0] || out1[k][1] != out2[k][1] || out1[k][2] != out2[k][2]) {
+               int dist1 = abs(out1[k][0] - out2[k][0]);
+               int dist2 = abs(out1[k][1] - out2[k][1]);
+               int dist3 = abs(out1[k][2] - out2[k][2]);
+               ++count;
+               if (out1[k][1] > out2[k][1])
+                  ++bigcount;
+            }
+         }
+      }
+      printf("So far: %d (%d big) of %d\n", count, bigcount, total);
+   }
+   printf("Final: %d (%d big) of %d\n", count, bigcount, total);
+}
+#endif
+
+float hdr_data[200][200][3];
+
+void dummy_write(void *context, void *data, int len)
+{
+   static char dummy[1024];
+   if (len > 1024) len = 1024;
+   memcpy(dummy, data, len);
+}
+
+extern void image_write_test(void);
+
+int main(int argc, char **argv)
+{
+   int w,h;
+   //test_ycbcr();
+
+   image_write_test();
+
+   #if 0
+   // test hdr asserts
+   for (h=0; h < 100; h += 2)
+      for (w=0; w < 200; ++w)
+         hdr_data[h][w][0] = (float) rand(),
+         hdr_data[h][w][1] = (float) rand(),
+         hdr_data[h][w][2] = (float) rand();
+
+   stbi_write_hdr("output/test.hdr", 200,200,3,hdr_data[0][0]);
+   #endif
+
+   if (argc > 1) {
+      int i, n;
+
+      for (i=1; i < argc; ++i) {
+         int res;
+         int w2,h2,n2;
+         unsigned char *data;
+         printf("%s\n", argv[i]);
+         res = stbi_info(argv[i], &w2, &h2, &n2);
+         data = stbi_load(argv[i], &w, &h, &n, 0); if (data) free(data); else printf("Failed &n\n");
+         data = stbi_load(argv[i], &w, &h, &n, 4); if (data) free(data); else printf("Failed &n\n");
+         data = stbi_load(argv[i], &w, &h,  0, 1); if (data) free(data); else printf("Failed 1\n");
+         data = stbi_load(argv[i], &w, &h,  0, 2); if (data) free(data); else printf("Failed 2\n");
+         data = stbi_load(argv[i], &w, &h,  0, 3); if (data) free(data); else printf("Failed 3\n");
+         data = stbi_load(argv[i], &w, &h, &n, 4);
+         assert(data);
+         assert(w == w2 && h == h2 && n == n2);
+         assert(res);
+         if (data) {
+            char fname[512];
+            stb_splitpath(fname, argv[i], STB_FILE);
+            stbi_write_png(stb_sprintf("output/%s.png", fname), w, h, 4, data, w*4);
+            stbi_write_bmp(stb_sprintf("output/%s.bmp", fname), w, h, 4, data);
+            stbi_write_tga(stb_sprintf("output/%s.tga", fname), w, h, 4, data);
+            stbi_write_png_to_func(dummy_write,0, w, h, 4, data, w*4);
+            stbi_write_bmp_to_func(dummy_write,0, w, h, 4, data);
+            stbi_write_tga_to_func(dummy_write,0, w, h, 4, data);
+            free(data);
+         } else
+            printf("FAILED 4\n");
+      }
+   } else {
+      int i;
+      #ifdef PNGSUITE_PRIMARY
+      char **files = stb_readdir_files("pngsuite/primary");
+      #else
+      char **files = stb_readdir_files("images");
+      #endif
+      for (i=0; i < stb_arr_len(files); ++i) {
+         int n;
+         char **failed = NULL;
+         unsigned char *data;
+         printf(".");
+         //printf("%s\n", files[i]);
+         data = stbi_load(files[i], &w, &h, &n, 0); if (data) free(data); else stb_arr_push(failed, "&n");
+         data = stbi_load(files[i], &w, &h,  0, 1); if (data) free(data); else stb_arr_push(failed, "1");
+         data = stbi_load(files[i], &w, &h,  0, 2); if (data) free(data); else stb_arr_push(failed, "2");
+         data = stbi_load(files[i], &w, &h,  0, 3); if (data) free(data); else stb_arr_push(failed, "3");
+         data = stbi_load(files[i], &w, &h,  0, 4); if (data)           ; else stb_arr_push(failed, "4");
+         if (data) {
+            char fname[512];
+
+            #ifdef PNGSUITE_PRIMARY
+            int w2,h2;
+            unsigned char *data2;
+            stb_splitpath(fname, files[i], STB_FILE_EXT);
+            data2 = stbi_load(stb_sprintf("pngsuite/primary_check/%s", fname), &w2, &h2, 0, 4);
+            if (!data2)
+               printf("FAILED: couldn't load 'pngsuite/primary_check/%s\n", fname);
+            else {
+               if (w != w2 || h != w2 || 0 != memcmp(data, data2, w*h*4)) {
+                  int x,y,c;
+                  if (w == w2 && h == h2)
+                     for (y=0; y < h; ++y)
+                        for (x=0; x < w; ++x)
+                           for (c=0; c < 4; ++c)
+                              assert(data[y*w*4+x*4+c] == data2[y*w*4+x*4+c]);
+                  printf("FAILED: %s loaded but didn't match PRIMARY_check 32-bit version\n", files[i]);
+               }
+               free(data2);
+            }
+            #else
+            stb_splitpath(fname, files[i], STB_FILE);
+            stbi_write_png(stb_sprintf("output/%s.png", fname), w, h, 4, data, w*4);
+            #endif
+            free(data);
+         }
+         if (failed) {
+            int j;
+            printf("FAILED: ");
+            for (j=0; j < stb_arr_len(failed); ++j)
+               printf("%s ", failed[j]);
+            printf(" -- %s\n", files[i]);
+         }
+      }
+      printf("Tested %d files.\n", i);
+   }
+   return 0;
+}
diff --git a/vendor/stb/tests/image_test.dsp b/vendor/stb/tests/image_test.dsp
new file mode 100644
index 0000000..65f0aaf
--- /dev/null
+++ b/vendor/stb/tests/image_test.dsp
@@ -0,0 +1,103 @@
+# Microsoft Developer Studio Project File - Name="image_test" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=image_test - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "image_test.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "image_test.mak" CFG="image_test - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "image_test - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "image_test - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "image_test - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /Zi /O2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /FD /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386
+
+!ELSEIF  "$(CFG)" == "image_test - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug\image_test"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "image_test - Win32 Release"
+# Name "image_test - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\image_test.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\image_write_test.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_image.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_image_write.h
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/image_write_test.c b/vendor/stb/tests/image_write_test.c
new file mode 100644
index 0000000..4e4a7e8
--- /dev/null
+++ b/vendor/stb/tests/image_write_test.c
@@ -0,0 +1,60 @@
+#ifdef __clang__
+#define STBIWDEF static inline
+#endif
+#define STB_IMAGE_WRITE_STATIC
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+// using an 'F' since it has no rotational symmetries, and 6x5
+// because it's a small, atypical size likely to trigger edge cases.
+//
+// conveniently, it's also small enough to fully fit inside a typical
+// directory listing thumbnail, which simplifies checking at a glance.
+static const char img6x5_template[] =
+   ".****."
+   ".*...."
+   ".***.."
+   ".*...."
+   ".*....";
+
+void image_write_test(void)
+{
+   // make a RGB version of the template image
+   // use red on blue to detect R<->B swaps
+   unsigned char img6x5_rgb[6*5*3];
+   float img6x5_rgbf[6*5*3];
+   int i;
+
+   for (i = 0; i < 6*5; i++) {
+      int on = img6x5_template[i] == '*';
+      img6x5_rgb[i*3 + 0] = on ? 255 : 0;
+      img6x5_rgb[i*3 + 1] = 0;
+      img6x5_rgb[i*3 + 2] = on ? 0 : 255;
+
+      img6x5_rgbf[i*3 + 0] = on ? 1.0f : 0.0f;
+      img6x5_rgbf[i*3 + 1] = 0.0f;
+      img6x5_rgbf[i*3 + 2] = on ? 0.0f : 1.0f;
+   }
+
+   stbi_write_png("output/wr6x5_regular.png", 6, 5, 3, img6x5_rgb, 6*3);
+   stbi_write_bmp("output/wr6x5_regular.bmp", 6, 5, 3, img6x5_rgb);
+   stbi_write_tga("output/wr6x5_regular.tga", 6, 5, 3, img6x5_rgb);
+   stbi_write_jpg("output/wr6x5_regular.jpg", 6, 5, 3, img6x5_rgb, 95);
+   stbi_write_hdr("output/wr6x5_regular.hdr", 6, 5, 3, img6x5_rgbf);
+
+   stbi_flip_vertically_on_write(1);
+
+   stbi_write_png("output/wr6x5_flip.png", 6, 5, 3, img6x5_rgb, 6*3);
+   stbi_write_bmp("output/wr6x5_flip.bmp", 6, 5, 3, img6x5_rgb);
+   stbi_write_tga("output/wr6x5_flip.tga", 6, 5, 3, img6x5_rgb);
+   stbi_write_jpg("output/wr6x5_flip.jpg", 6, 5, 3, img6x5_rgb, 95);
+   stbi_write_hdr("output/wr6x5_flip.hdr", 6, 5, 3, img6x5_rgbf);
+}
+
+#ifdef IWT_TEST
+int main(int argc, char **argv)
+{
+   image_write_test();
+   return 0;
+}
+#endif
diff --git a/vendor/stb/tests/ossfuzz.sh b/vendor/stb/tests/ossfuzz.sh
new file mode 100755
index 0000000..2db2060
--- /dev/null
+++ b/vendor/stb/tests/ossfuzz.sh
@@ -0,0 +1,29 @@
+#!/bin/bash -eu
+# This script is meant to be run by
+# https://github.com/google/oss-fuzz/blob/master/projects/stb/Dockerfile
+
+$CXX $CXXFLAGS -std=c++11 -I. -DSTBI_ONLY_PNG  \
+    $SRC/stb/tests/stbi_read_fuzzer.c \
+    -o $OUT/stb_png_read_fuzzer $LIB_FUZZING_ENGINE
+
+$CXX $CXXFLAGS -std=c++11 -I. \
+    $SRC/stb/tests/stbi_read_fuzzer.c \
+    -o $OUT/stbi_read_fuzzer $LIB_FUZZING_ENGINE
+
+find $SRC/stb/tests/pngsuite -name "*.png" | \
+     xargs zip $OUT/stb_png_read_fuzzer_seed_corpus.zip
+
+cp $SRC/stb/tests/stb_png.dict $OUT/stb_png_read_fuzzer.dict
+
+tar xvzf $SRC/stbi/jpg.tar.gz --directory $SRC/stb/tests
+tar xvzf $SRC/stbi/gif.tar.gz --directory $SRC/stb/tests
+unzip    $SRC/stbi/bmp.zip    -d $SRC/stb/tests
+unzip    $SRC/stbi/tga.zip    -d $SRC/stb/tests
+
+find $SRC/stb/tests -name "*.png" -o -name "*.jpg" -o -name "*.gif" \
+                 -o -name "*.bmp" -o -name "*.tga" -o -name "*.TGA" \
+                 -o -name "*.ppm" -o -name "*.pgm" \
+    | xargs zip $OUT/stbi_read_fuzzer_seed_corpus.zip
+
+echo "" >> $SRC/stbi/gif.dict
+cat $SRC/stbi/gif.dict $SRC/stb/tests/stb_png.dict > $OUT/stbi_read_fuzzer.dict
diff --git a/vendor/stb/tests/oversample/README.md b/vendor/stb/tests/oversample/README.md
new file mode 100644
index 0000000..cdfdfff
--- /dev/null
+++ b/vendor/stb/tests/oversample/README.md
@@ -0,0 +1,94 @@
+# Font character oversampling for rendering from atlas textures
+
+TL,DR: Run oversample.exe on a windows machine to see the
+benefits of oversampling. It will try to use arial.ttf from the
+Windows font directory unless you type the name of a .ttf file as
+a command-line argument.
+
+## Benefits of oversampling
+
+Oversampling is a mechanism for improving subpixel rendering of characters.
+
+Improving subpixel has a few benefits:
+
+* With horizontal-oversampling, text can remain sharper while still being sub-pixel positioned for better kerning
+* Horizontally-oversampled text significantly reduces aliasing when text animates horizontally
+* Vertically-oversampled text significantly reduces aliasing when text animates vertically
+* Text oversampled in both directions significantly reduces aliasing when text rotates
+
+## What text oversampling is
+
+A common strategy for rendering text is to cache character bitmaps
+and reuse them. For hinted characters, every instance of a given
+character is always identical, so this works fine. However, stb_truetype
+doesn't do hinting.
+
+For anti-aliased characters, you can actually position the characters
+with subpixel precision, and get different bitmaps based on that positioning
+if you re-render the vector data.
+
+However, if you simply cache a single version of the bitmap and
+draw it at different subpixel positions with a GPU, you will get
+either the exact same result (if you use point-sampling on the
+texture) or linear filtering. Linear filtering will cause a sub-pixel
+positioned bitmap to blur further, causing a visible de-sharpening
+of the character. (And, since the character wasn't hinted, it was
+already blurrier than a hinted one would be, and now it gets even
+more blurry.)
+
+You can avoid this by caching multiple variants of a character which
+were rendered independently from the vector data. For example, you
+might cache 3 versions of a char, at 0, 1/3, and 2/3rds of a pixel
+horizontal offset, and always require characters to fall on integer
+positions vertically.
+
+When creating a texture atlas for use on GPUs, which support bilinear
+filtering, there is a better approach than caching several independent
+positions, which is to allow lerping between the versions to allow
+finer subpixel positioning. You can achieve these by interleaving
+each of the cached bitmaps, but this turns out to be mathematically
+equivalent to a simpler operation: oversampling and prefiltering the
+characters.
+
+So, setting oversampling of 2x2 in stb_truetype is equivalent to caching
+each character in 4 different variations, 1 for each subpixel position
+in a 2x2 set.
+
+An advantage of this formulation is that no changes are required to
+the rendering code; the exact same quad-rendering code works, it just
+uses different texture coordinates. (Note this does potentially increase
+texture bandwidth for text rendering since we end up minifying the texture
+without using mipmapping, but you probably are not going to be fill-bound
+by your text rendering.)
+
+## What about gamma?
+
+Gamma-correction for fonts just doesn't work. This doesn't seem to make
+much sense -- it's physically correct, it simulates what we'd see if you
+shrunk a font down really far, right?
+
+But you can play with it in the oversample.exe app. If you turn it on,
+white-on-black fonts become too thick (i.e. they become too bright), and
+black-on-white fonts become too thin (i.e. they are insufficiently dark). There is
+no way to adjust the font's inherent thickness (i.e. by switching to
+bold) to fix this for both; making the font thicker will make white
+text worse, and making the font thinner will make black text worse.
+Obviously you could use different fonts for light and dark cases, but
+this doesn't seem like a very good way for fonts to work.
+
+Multiple people who have experimented with this independently (me,
+Fabian Giesen,and Maxim Shemanarev of Anti-Grain Geometry) have all
+concluded that correct gamma-correction does not produce the best
+results for fonts. Font rendering just generally looks better without
+gamma correction (or possibly with some arbitrary power stuck in
+there, but it's not really correcting for gamma at that point). Maybe
+this is in part a product of how we're used to fonts being on screens
+which has changed how we expect them to look (e.g. perhaps hinting
+oversharpens them and prevents the real-world thinning you'd see in
+a black-on-white text).
+
+(AGG link on text rendering, including mention of gamma:
+  http://www.antigrain.com/research/font_rasterization/ )
+
+Nevertheless, even if you turn on gamma-correction, you will find that
+oversampling still helps in many cases for small fonts.
diff --git a/vendor/stb/tests/oversample/main.c b/vendor/stb/tests/oversample/main.c
new file mode 100644
index 0000000..bc6bd0f
--- /dev/null
+++ b/vendor/stb/tests/oversample/main.c
@@ -0,0 +1,332 @@
+#pragma warning(disable:4244; disable:4305; disable:4018)
+#include <assert.h>
+#include <ctype.h>
+
+#define STB_WINMAIN
+#include "stb_wingraph.h"
+
+#define STB_TRUETYPE_IMPLEMENTATION
+#define STB_RECT_PACK_IMPLEMENTATION
+#include "stb_rect_pack.h"
+#include "stb_truetype.h"
+
+#ifndef WINGDIAPI
+#define CALLBACK    __stdcall
+#define WINGDIAPI   __declspec(dllimport)
+#define APIENTRY    __stdcall
+#endif
+
+#include <gl/gl.h>
+#include <gl/glu.h>
+
+#define GL_FRAMEBUFFER_SRGB_EXT           0x8DB9
+
+#define SIZE_X  1024
+#define SIZE_Y  768
+
+stbtt_packedchar chardata[6][128];
+
+int sx=SIZE_X, sy=SIZE_Y;
+
+#define BITMAP_W 512
+#define BITMAP_H 512
+unsigned char temp_bitmap[BITMAP_W][BITMAP_H];
+unsigned char ttf_buffer[1 << 25];
+GLuint font_tex;
+
+float scale[2] = { 24.0f, 14.0f };
+
+int sf[6] = { 0,1,2, 0,1,2 };
+
+void load_fonts(void)
+{
+   stbtt_pack_context pc;
+   int i;
+   FILE *f;
+   char filename[256];
+   char *win = getenv("windir");
+   if (win == NULL) win = getenv("SystemRoot");
+
+   f = fopen(stb_wingraph_commandline, "rb");
+   if (!f) {
+      if (win == NULL)
+         sprintf(filename, "arial.ttf", win);
+      else
+         sprintf(filename, "%s/fonts/arial.ttf", win);
+      f = fopen(filename, "rb");
+      if (!f) exit(0);
+   }
+
+   fread(ttf_buffer, 1, 1<<25, f);
+
+   stbtt_PackBegin(&pc, temp_bitmap[0], BITMAP_W, BITMAP_H, 0, 1, NULL);
+   for (i=0; i < 2; ++i) {
+      stbtt_PackSetOversampling(&pc, 1, 1);
+      stbtt_PackFontRange(&pc, ttf_buffer, 0, scale[i], 32, 95, chardata[i*3+0]+32);
+      stbtt_PackSetOversampling(&pc, 2, 2);
+      stbtt_PackFontRange(&pc, ttf_buffer, 0, scale[i], 32, 95, chardata[i*3+1]+32);
+      stbtt_PackSetOversampling(&pc, 3, 1);
+      stbtt_PackFontRange(&pc, ttf_buffer, 0, scale[i], 32, 95, chardata[i*3+2]+32);
+   }
+   stbtt_PackEnd(&pc);
+
+   glGenTextures(1, &font_tex);
+   glBindTexture(GL_TEXTURE_2D, font_tex);
+   glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, BITMAP_W, BITMAP_H, 0, GL_ALPHA, GL_UNSIGNED_BYTE, temp_bitmap);
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
+}
+
+int black_on_white;
+
+void draw_init(void)
+{
+   glDisable(GL_CULL_FACE);
+   glDisable(GL_TEXTURE_2D);
+   glDisable(GL_LIGHTING);
+   glDisable(GL_DEPTH_TEST);
+
+   glViewport(0,0,sx,sy);
+   if (black_on_white)
+      glClearColor(255,255,255,0);
+   else
+      glClearColor(0,0,0,0);
+   glClear(GL_COLOR_BUFFER_BIT);
+
+   glMatrixMode(GL_PROJECTION);
+   glLoadIdentity();
+   gluOrtho2D(0,sx,sy,0);
+   glMatrixMode(GL_MODELVIEW);
+   glLoadIdentity();
+}
+
+
+void drawBoxTC(float x0, float y0, float x1, float y1, float s0, float t0, float s1, float t1)
+{
+   glTexCoord2f(s0,t0); glVertex2f(x0,y0);
+   glTexCoord2f(s1,t0); glVertex2f(x1,y0);
+   glTexCoord2f(s1,t1); glVertex2f(x1,y1);
+   glTexCoord2f(s0,t1); glVertex2f(x0,y1);
+}
+
+int integer_align;
+
+void print(float x, float y, int font, char *text)
+{
+   glEnable(GL_TEXTURE_2D);
+   glBindTexture(GL_TEXTURE_2D, font_tex);
+   glBegin(GL_QUADS);
+   while (*text) {
+      stbtt_aligned_quad q;
+      stbtt_GetPackedQuad(chardata[font], BITMAP_W, BITMAP_H, *text++, &x, &y, &q, font ? 0 : integer_align);
+      drawBoxTC(q.x0,q.y0,q.x1,q.y1, q.s0,q.t0,q.s1,q.t1);
+   }
+   glEnd();
+}
+
+int font=3;
+int translating;
+int rotating=0;
+int srgb=0;
+float rotate_t, translate_t;
+int show_tex;
+
+void draw_world(void)
+{
+   int sfont = sf[font];
+   float x = 20;
+   glEnable(GL_BLEND);
+   glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
+
+   if (black_on_white)
+      glColor3f(0,0,0);
+   else
+      glColor3f(1,1,1);
+
+
+   print(80, 30, sfont, "Controls:");
+   print(100, 60, sfont, "S: toggle font size");
+   print(100, 85, sfont, "O: toggle oversampling");
+   print(100,110, sfont, "T: toggle translation");
+   print(100,135, sfont, "R: toggle rotation");
+   print(100,160, sfont, "P: toggle pixel-snap (only non-oversampled)");
+   print(100,185, sfont, "G: toggle srgb gamma-correction");
+   if (black_on_white)
+      print(100,210, sfont, "B: toggle to white-on-black");
+   else
+      print(100,210, sfont, "B: toggle to black-on-white");
+   print(100,235, sfont, "V: view font texture");
+
+   print(80, 300, sfont, "Current font:");
+
+   if (!show_tex) {
+      if (font < 3)
+         print(100, 350, sfont, "Font height: 24 pixels");
+      else
+         print(100, 350, sfont, "Font height: 14 pixels");
+   }
+
+   if (font%3==1)
+      print(100, 325, sfont, "2x2 oversampled text at 1:1");
+   else if (font%3 == 2)
+      print(100, 325, sfont, "3x1 oversampled text at 1:1");
+   else if (integer_align)
+      print(100, 325, sfont, "1:1 text, one texel = one pixel, snapped to integer coordinates");
+   else
+      print(100, 325, sfont, "1:1 text, one texel = one pixel");
+
+   if (show_tex) {
+      glBegin(GL_QUADS);
+      drawBoxTC(200,400, 200+BITMAP_W,300+BITMAP_H, 0,0,1,1);
+      glEnd();
+   } else {
+      glMatrixMode(GL_MODELVIEW);
+      glTranslatef(200,350,0);
+
+      if (translating)
+         x += fmod(translate_t*8,30);
+
+      if (rotating) {
+         glTranslatef(100,150,0);
+         glRotatef(rotate_t*2,0,0,1);
+         glTranslatef(-100,-150,0);
+      }
+      print(x,100, font, "This is a test");
+      print(x,130, font, "Now is the time for all good men to come to the aid of their country.");
+      print(x,160, font, "The quick brown fox jumps over the lazy dog.");
+      print(x,190, font, "0123456789");
+   }
+}
+
+void draw(void)
+{
+   draw_init();
+   draw_world();
+   stbwingraph_SwapBuffers(NULL);
+}
+
+static int initialized=0;
+static float last_dt;
+
+int move[4];
+int raw_mouse_x, raw_mouse_y;
+
+int loopmode(float dt, int real, int in_client)
+{
+   float actual_dt = dt;
+
+   if (!initialized) return 0;
+
+   rotate_t += dt;
+   translate_t += dt;
+
+//   music_sim();
+   if (!real)
+      return 0;
+
+   if (dt > 0.25) dt = 0.25;
+   if (dt < 0.01) dt = 0.01;
+
+   draw();
+
+   return 0;
+}
+
+int winproc(void *data, stbwingraph_event *e)
+{
+   switch (e->type) {
+      case STBWGE_create:
+         break;
+
+      case STBWGE_char:
+         switch(e->key) {
+            case 27:
+               stbwingraph_ShowCursor(NULL,1);
+               return STBWINGRAPH_winproc_exit;
+               break;
+            case 'o': case 'O':
+               font = (font+1) % 3 + (font/3)*3;
+               break;
+            case 's': case 'S':
+               font = (font+3) % 6;
+               break;
+            case 't': case 'T':
+               translating = !translating;
+               translate_t = 0;
+               break;
+            case 'r': case 'R':
+               rotating = !rotating;
+               rotate_t = 0;
+               break;
+            case 'p': case 'P':
+               integer_align = !integer_align;
+               break;
+            case 'g': case 'G':
+               srgb = !srgb;
+               if (srgb)
+                  glEnable(GL_FRAMEBUFFER_SRGB_EXT);
+               else
+                  glDisable(GL_FRAMEBUFFER_SRGB_EXT);
+               break;
+            case 'v': case 'V':
+               show_tex = !show_tex;
+               break;
+            case 'b': case 'B':
+               black_on_white = !black_on_white;
+               break;
+         }
+         break;
+
+      case STBWGE_mousemove:
+         raw_mouse_x = e->mx;
+         raw_mouse_y = e->my;
+         break;
+
+#if 0
+      case STBWGE_mousewheel:  do_mouse(e,0,0); break;
+      case STBWGE_leftdown:    do_mouse(e, 1,0); break;
+      case STBWGE_leftup:      do_mouse(e,-1,0); break;
+      case STBWGE_rightdown:   do_mouse(e,0, 1); break;
+      case STBWGE_rightup:     do_mouse(e,0,-1); break;
+#endif
+
+      case STBWGE_keydown:
+         if (e->key == VK_RIGHT) move[0] = 1;
+         if (e->key == VK_LEFT)  move[1] = 1;
+         if (e->key == VK_UP)    move[2] = 1;
+         if (e->key == VK_DOWN)  move[3] = 1;
+         break;
+      case STBWGE_keyup:
+         if (e->key == VK_RIGHT) move[0] = 0;
+         if (e->key == VK_LEFT)  move[1] = 0;
+         if (e->key == VK_UP)    move[2] = 0;
+         if (e->key == VK_DOWN)  move[3] = 0;
+         break;
+
+      case STBWGE_size:
+         sx = e->width;
+         sy = e->height;
+         loopmode(0,1,0);
+         break;
+
+      case STBWGE_draw:
+         if (initialized)
+            loopmode(0,1,0);
+         break;
+
+      default:
+         return STBWINGRAPH_unprocessed;
+   }
+   return 0;
+}
+
+void stbwingraph_main(void)
+{
+   stbwingraph_Priority(2);
+   stbwingraph_CreateWindow(1, winproc, NULL, "tt", SIZE_X,SIZE_Y, 0, 1, 0, 0);
+   stbwingraph_ShowCursor(NULL, 0);
+   load_fonts();
+   initialized = 1;
+   stbwingraph_MainLoop(loopmode, 0.016f);   // 30 fps = 0.33
+}
+
diff --git a/vendor/stb/tests/oversample/oversample.dsp b/vendor/stb/tests/oversample/oversample.dsp
new file mode 100644
index 0000000..cc1edc3
--- /dev/null
+++ b/vendor/stb/tests/oversample/oversample.dsp
@@ -0,0 +1,97 @@
+# Microsoft Developer Studio Project File - Name="oversample" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Application" 0x0101
+
+CFG=oversample - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "oversample.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "oversample.mak" CFG="oversample - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "oversample - Win32 Release" (based on "Win32 (x86) Application")
+!MESSAGE "oversample - Win32 Debug" (based on "Win32 (x86) Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+MTL=midl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "oversample - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /WX /GX /O2 /I "c:\sean\prj\stb" /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /FD /c
+# SUBTRACT CPP /YX
+# ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /win32
+# ADD MTL /nologo /D "NDEBUG" /mktyplib203 /win32
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /machine:I386
+# SUBTRACT LINK32 /map /debug
+
+!ELSEIF  "$(CFG)" == "oversample - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /WX /Gm /GX /Zi /Od /I "c:\sean\prj\stb" /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /D "_MBCS" /FD /GZ /c
+# ADD BASE MTL /nologo /D "_DEBUG" /mktyplib203 /win32
+# ADD MTL /nologo /D "_DEBUG" /mktyplib203 /win32
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib advapi32.lib winspool.lib comdlg32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /incremental:no /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "oversample - Win32 Release"
+# Name "oversample - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\main.c
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/oversample/oversample.dsw b/vendor/stb/tests/oversample/oversample.dsw
new file mode 100644
index 0000000..0f5aa7f
--- /dev/null
+++ b/vendor/stb/tests/oversample/oversample.dsw
@@ -0,0 +1,29 @@
+Microsoft Developer Studio Workspace File, Format Version 6.00
+# WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE!
+
+###############################################################################
+
+Project: "oversample"=.\oversample.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Global:
+
+Package=<5>
+{{{
+}}}
+
+Package=<3>
+{{{
+}}}
+
+###############################################################################
+
diff --git a/vendor/stb/tests/oversample/oversample.exe b/vendor/stb/tests/oversample/oversample.exe
new file mode 100644
index 0000000..0040693
Binary files /dev/null and b/vendor/stb/tests/oversample/oversample.exe differ
diff --git a/vendor/stb/tests/oversample/stb_wingraph.h b/vendor/stb/tests/oversample/stb_wingraph.h
new file mode 100644
index 0000000..94798eb
--- /dev/null
+++ b/vendor/stb/tests/oversample/stb_wingraph.h
@@ -0,0 +1,829 @@
+// stb_wingraph.h  v0.01 - public domain windows graphics programming
+// wraps WinMain, ChoosePixelFormat, ChangeDisplayResolution, etc. for
+// doing OpenGL graphics
+//
+// in ONE source file, put '#define STB_DEFINE' before including this
+// OR put '#define STB_WINMAIN' to define a WinMain that calls stbwingraph_main(void)
+//
+// @TODO:
+//    2d rendering interface (that can be done easily in software)
+//    STB_WINGRAPH_SOFTWARE -- 2d software rendering only
+//    STB_WINGRAPH_OPENGL   -- OpenGL only
+
+
+#ifndef INCLUDE_STB_WINGRAPH_H
+#define INCLUDE_STB_WINGRAPH_H
+
+#ifdef STB_WINMAIN
+   #ifndef STB_DEFINE
+      #define STB_DEFINE
+      #define STB_WINGRAPH_DISABLE_DEFINE_AT_END
+   #endif
+#endif
+
+#ifdef STB_DEFINE
+   #pragma comment(lib, "opengl32.lib")
+   #pragma comment(lib, "glu32.lib")
+   #pragma comment(lib, "winmm.lib")
+   #pragma comment(lib, "gdi32.lib")
+   #pragma comment(lib, "user32.lib")
+#endif
+
+#ifdef __cplusplus
+#define STB_EXTERN extern "C"
+#else
+#define STB_EXTERN
+#endif
+
+#ifdef STB_DEFINE
+#ifndef _WINDOWS_
+   #ifdef APIENTRY
+   #undef APIENTRY
+   #endif
+   #ifdef WINGDIAPI
+   #undef WINGDIAPI
+   #endif
+   #define _WIN32_WINNT 0x0400  // WM_MOUSEWHEEL
+   #include <windows.h>
+#endif
+#include <stdio.h>
+#include <math.h>
+#include <time.h>
+#include <string.h>
+#include <assert.h>
+#endif
+
+typedef void * stbwingraph_hwnd;
+typedef void * stbwingraph_hinstance;
+
+enum
+{
+   STBWINGRAPH_unprocessed = -(1 << 24),
+   STBWINGRAPH_do_not_show,
+   STBWINGRAPH_winproc_exit,
+   STBWINGRAPH_winproc_update,
+   STBWINGRAPH_update_exit,
+   STBWINGRAPH_update_pause,
+};
+
+typedef enum
+{
+   STBWGE__none=0,
+
+   STBWGE_create,
+   STBWGE_create_postshow,
+   STBWGE_draw,
+   STBWGE_destroy,
+   STBWGE_char,
+   STBWGE_keydown,
+   STBWGE_syskeydown,
+   STBWGE_keyup,
+   STBWGE_syskeyup,
+   STBWGE_deactivate,
+   STBWGE_activate,
+   STBWGE_size,
+
+   STBWGE_mousemove ,
+   STBWGE_leftdown  , STBWGE_leftup  ,
+   STBWGE_middledown, STBWGE_middleup,
+   STBWGE_rightdown , STBWGE_rightup ,
+   STBWGE_mousewheel,
+} stbwingraph_event_type;
+
+typedef struct
+{
+   stbwingraph_event_type type;
+
+   // for input events (mouse, keyboard)
+   int mx,my; // mouse x & y
+   int dx,dy;
+   int shift, ctrl, alt;
+
+   // for keyboard events
+   int key;
+
+   // for STBWGE_size:
+   int width, height;
+
+   // for STBWGE_crate
+   int did_share_lists;  // if true, wglShareLists succeeded
+
+   void *handle;
+
+} stbwingraph_event;
+
+typedef int (*stbwingraph_window_proc)(void *data, stbwingraph_event *event);
+
+extern stbwingraph_hinstance   stbwingraph_app;
+extern stbwingraph_hwnd        stbwingraph_primary_window;
+extern int                     stbwingraph_request_fullscreen;
+extern int                     stbwingraph_request_windowed;
+
+STB_EXTERN void stbwingraph_ods(char *str, ...);
+STB_EXTERN int stbwingraph_MessageBox(stbwingraph_hwnd win, unsigned int type,
+                                              char *caption, char *text, ...);
+STB_EXTERN int stbwingraph_ChangeResolution(unsigned int w, unsigned int h,
+                                      unsigned int bits, int use_message_box);
+STB_EXTERN int stbwingraph_SetPixelFormat(stbwingraph_hwnd win, int color_bits,
+            int alpha_bits, int depth_bits, int stencil_bits, int accum_bits);
+STB_EXTERN int stbwingraph_DefineClass(void *hinstance, char *iconname);
+STB_EXTERN void stbwingraph_SwapBuffers(void *win);
+STB_EXTERN void stbwingraph_Priority(int n);
+
+STB_EXTERN void stbwingraph_MakeFonts(void *window, int font_base);
+STB_EXTERN void stbwingraph_ShowWindow(void *window);
+STB_EXTERN void *stbwingraph_CreateWindow(int primary, stbwingraph_window_proc func, void *data, char *text, int width, int height, int fullscreen, int resizeable, int dest_alpha, int stencil);
+STB_EXTERN void *stbwingraph_CreateWindowSimple(stbwingraph_window_proc func, int width, int height);
+STB_EXTERN void *stbwingraph_CreateWindowSimpleFull(stbwingraph_window_proc func, int fullscreen, int ww, int wh, int fw, int fh);
+STB_EXTERN void stbwingraph_DestroyWindow(void *window);
+STB_EXTERN void stbwingraph_ShowCursor(void *window, int visible);
+STB_EXTERN float stbwingraph_GetTimestep(float minimum_time);
+STB_EXTERN void stbwingraph_SetGLWindow(void *win);
+typedef int (*stbwingraph_update)(float timestep, int real, int in_client);
+STB_EXTERN int stbwingraph_MainLoop(stbwingraph_update func, float mintime);
+
+#ifdef STB_DEFINE
+stbwingraph_hinstance   stbwingraph_app;
+stbwingraph_hwnd        stbwingraph_primary_window;
+int stbwingraph_request_fullscreen;
+int stbwingraph_request_windowed;
+
+void stbwingraph_ods(char *str, ...)
+{
+   char buffer[1024];
+   va_list v;
+   va_start(v,str);
+   vsprintf(buffer, str, v);
+   va_end(v);
+   OutputDebugString(buffer);
+}
+
+int stbwingraph_MessageBox(stbwingraph_hwnd win, unsigned int type, char *caption, char *text, ...)
+{
+   va_list v;
+   char buffer[1024];
+   va_start(v, text);
+   vsprintf(buffer, text, v);
+   va_end(v);
+   return MessageBox(win, buffer, caption, type);
+}
+
+void stbwingraph_Priority(int n)
+{
+   int p;
+   switch (n) {
+      case -1: p = THREAD_PRIORITY_BELOW_NORMAL; break;
+      case 0: p = THREAD_PRIORITY_NORMAL; break;
+      case 1: p = THREAD_PRIORITY_ABOVE_NORMAL; break;
+      default:
+         if (n < 0) p = THREAD_PRIORITY_LOWEST;
+         else p = THREAD_PRIORITY_HIGHEST;
+   }
+   SetThreadPriority(GetCurrentThread(), p);
+}
+
+static void stbwingraph_ResetResolution(void)
+{
+   ChangeDisplaySettings(NULL, 0);
+}
+
+static void stbwingraph_RegisterResetResolution(void)
+{
+   static int done=0;
+   if (!done) {
+      done = 1;
+      atexit(stbwingraph_ResetResolution);
+   }
+}
+
+int stbwingraph_ChangeResolution(unsigned int w, unsigned int h, unsigned int bits, int use_message_box)
+{
+   DEVMODE mode;
+   int res;
+   
+   int i, tries=0;
+   for (i=0; ; ++i) {
+      int success = EnumDisplaySettings(NULL, i, &mode);
+      if (!success) break;
+      if (mode.dmBitsPerPel == bits && mode.dmPelsWidth == w && mode.dmPelsHeight == h) {
+         ++tries;
+         success = ChangeDisplaySettings(&mode, CDS_FULLSCREEN); 
+         if (success == DISP_CHANGE_SUCCESSFUL) {
+            stbwingraph_RegisterResetResolution();
+            return TRUE;
+         }
+         break;
+      }
+   }
+
+   if (!tries) {
+      if (use_message_box)
+         stbwingraph_MessageBox(stbwingraph_primary_window, MB_ICONERROR, NULL, "The resolution %d x %d x %d-bits is not supported.", w, h, bits);
+      return FALSE;
+   }
+
+   // we tried but failed, so try explicitly doing it without specifying refresh rate
+
+   // Win95 support logic
+   mode.dmBitsPerPel = bits; 
+   mode.dmPelsWidth = w; 
+   mode.dmPelsHeight = h; 
+   mode.dmFields = DM_BITSPERPEL | DM_PELSWIDTH | DM_PELSHEIGHT; 
+
+   res = ChangeDisplaySettings(&mode, CDS_FULLSCREEN);
+
+   switch (res) {
+      case DISP_CHANGE_SUCCESSFUL:
+         stbwingraph_RegisterResetResolution();
+         return TRUE;
+
+      case DISP_CHANGE_RESTART:
+         if (use_message_box)
+            stbwingraph_MessageBox(stbwingraph_primary_window, MB_ICONERROR, NULL, "Please set your desktop to %d-bit color and then try again.");
+         return FALSE;
+
+      case DISP_CHANGE_FAILED:
+         if (use_message_box)
+            stbwingraph_MessageBox(stbwingraph_primary_window, MB_ICONERROR, NULL, "The hardware failed to change modes.");
+         return FALSE;
+
+      case DISP_CHANGE_BADMODE:
+         if (use_message_box)
+            stbwingraph_MessageBox(stbwingraph_primary_window, MB_ICONERROR, NULL, "The resolution %d x %d x %d-bits is not supported.", w, h, bits);
+         return FALSE;
+
+      default:
+         if (use_message_box)
+            stbwingraph_MessageBox(stbwingraph_primary_window, MB_ICONERROR, NULL, "An unknown error prevented a change to a %d x %d x %d-bit display.", w, h, bits);
+         return FALSE;
+   }
+}
+
+int stbwingraph_SetPixelFormat(stbwingraph_hwnd win, int color_bits, int alpha_bits, int depth_bits, int stencil_bits, int accum_bits)
+{
+   HDC dc = GetDC(win);
+   PIXELFORMATDESCRIPTOR pfd = { sizeof(pfd) };
+   int                   pixel_format;
+
+   pfd.nVersion = 1;
+   pfd.dwFlags = PFD_SUPPORT_OPENGL | PFD_DRAW_TO_WINDOW | PFD_DOUBLEBUFFER;
+   pfd.dwLayerMask = PFD_MAIN_PLANE;
+   pfd.iPixelType = PFD_TYPE_RGBA;
+   pfd.cColorBits = color_bits;
+   pfd.cAlphaBits = alpha_bits;
+   pfd.cDepthBits = depth_bits;
+   pfd.cStencilBits = stencil_bits;
+   pfd.cAccumBits = accum_bits;
+
+   pixel_format = ChoosePixelFormat(dc, &pfd);
+   if (!pixel_format) return FALSE;
+
+   if (!DescribePixelFormat(dc, pixel_format, sizeof(PIXELFORMATDESCRIPTOR), &pfd))
+      return FALSE;
+   SetPixelFormat(dc, pixel_format, &pfd);
+
+   return TRUE;
+}
+
+typedef struct
+{
+   // app data
+   stbwingraph_window_proc func;
+   void *data;
+   // creation parameters
+   int   color, alpha, depth, stencil, accum;
+   HWND  share_window;
+   HWND  window;
+   // internal data
+   HGLRC rc;
+   HDC   dc;
+   int   hide_mouse;
+   int   in_client;
+   int   active;
+   int   did_share_lists;
+   int   mx,my; // last mouse positions
+} stbwingraph__window;
+
+static void stbwingraph__inclient(stbwingraph__window *win, int state)
+{
+   if (state != win->in_client) {
+      win->in_client = state;
+      if (win->hide_mouse)
+         ShowCursor(!state);
+   }
+}
+
+static void stbwingraph__key(stbwingraph_event *e, int type, int key, stbwingraph__window *z)
+{
+   e->type  = type;
+   e->key   = key;
+   e->shift = (GetKeyState(VK_SHIFT)   < 0);
+   e->ctrl  = (GetKeyState(VK_CONTROL) < 0);
+   e->alt   = (GetKeyState(VK_MENU)    < 0);
+   if  (z) {
+      e->mx    = z->mx;
+      e->my    = z->my;
+   } else {
+      e->mx = e->my = 0;
+   }
+   e->dx = e->dy = 0;
+}
+
+static void stbwingraph__mouse(stbwingraph_event *e, int type, WPARAM wparam, LPARAM lparam, int capture, void *wnd, stbwingraph__window *z)
+{
+   static int captured = 0;
+   e->type = type;
+   e->mx = (short) LOWORD(lparam);
+   e->my = (short) HIWORD(lparam);
+   if (!z || z->mx == -(1 << 30)) {
+      e->dx = e->dy = 0;
+   } else {
+      e->dx = e->mx - z->mx;
+      e->dy = e->my - z->my;
+   }
+   e->shift = (wparam & MK_SHIFT) != 0;
+   e->ctrl  = (wparam & MK_CONTROL) != 0;
+   e->alt   = (wparam & MK_ALT) != 0;
+   if (z) {
+      z->mx = e->mx;
+      z->my = e->my;
+   }
+   if (capture) {
+      if (!captured && capture == 1)
+         SetCapture(wnd);
+      captured += capture;
+      if (!captured && capture == -1)
+         ReleaseCapture();
+      if (captured < 0) captured = 0;
+   }
+}
+
+static void stbwingraph__mousewheel(stbwingraph_event *e, int type, WPARAM wparam, LPARAM lparam, int capture, void *wnd, stbwingraph__window *z)
+{
+   // lparam seems bogus!
+   static int captured = 0;
+   e->type = type;
+   if (z) {
+      e->mx = z->mx;
+      e->my = z->my;
+   }
+   e->dx = e->dy = 0;
+   e->shift = (wparam & MK_SHIFT) != 0;
+   e->ctrl  = (wparam & MK_CONTROL) != 0;
+   e->alt   = (GetKeyState(VK_MENU)    < 0);
+   e->key = ((int) wparam >> 16);
+}
+
+int stbwingraph_force_update;
+static int WINAPI stbwingraph_WinProc(HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam)
+{
+   int allow_default = TRUE;
+   stbwingraph_event e = { STBWGE__none };
+   // the following line is wrong for 64-bit windows, but VC6 doesn't have GetWindowLongPtr
+   stbwingraph__window *z = (stbwingraph__window *) GetWindowLong(wnd, GWL_USERDATA);
+
+   switch (msg) {
+
+      case WM_CREATE:
+      {
+         LPCREATESTRUCT lpcs = (LPCREATESTRUCT) lparam;
+         assert(z == NULL);
+         z = (stbwingraph__window *) lpcs->lpCreateParams;
+         SetWindowLong(wnd, GWL_USERDATA, (LONG) z);
+         z->dc = GetDC(wnd);
+         if (stbwingraph_SetPixelFormat(wnd, z->color, z->alpha, z->depth, z->stencil, z->accum)) {
+            z->rc = wglCreateContext(z->dc);
+            if (z->rc) {
+               e.type = STBWGE_create;
+               z->did_share_lists = FALSE;
+               if (z->share_window) {
+                  stbwingraph__window *y = (stbwingraph__window *) GetWindowLong(z->share_window, GWL_USERDATA);
+                  if (wglShareLists(z->rc, y->rc))
+                     z->did_share_lists = TRUE;
+               }
+               wglMakeCurrent(z->dc, z->rc);
+               return 0;
+            }
+         }
+         return -1;
+      }
+
+      case WM_PAINT: {
+         PAINTSTRUCT ps;
+         HDC hdc = BeginPaint(wnd, &ps);
+         SelectObject(hdc, GetStockObject(NULL_BRUSH));
+         e.type = STBWGE_draw;
+         e.handle = wnd;
+         z->func(z->data, &e);
+         EndPaint(wnd, &ps);
+         return 0;
+      }
+
+      case WM_DESTROY:
+         e.type = STBWGE_destroy;
+         e.handle = wnd;
+         if (z && z->func)
+            z->func(z->data, &e);
+         wglMakeCurrent(NULL, NULL) ; 
+         if (z) {
+            if (z->rc) wglDeleteContext(z->rc);
+            z->dc = 0;
+            z->rc = 0;
+         }
+         if (wnd == stbwingraph_primary_window)
+            PostQuitMessage (0);
+         return 0;
+
+      case WM_CHAR:         stbwingraph__key(&e, STBWGE_char   , wparam, z); break;
+      case WM_KEYDOWN:      stbwingraph__key(&e, STBWGE_keydown, wparam, z); break;
+      case WM_KEYUP:        stbwingraph__key(&e, STBWGE_keyup  , wparam, z); break;
+
+      case WM_NCMOUSEMOVE:  stbwingraph__inclient(z,0); break;
+      case WM_MOUSEMOVE:    stbwingraph__inclient(z,1); stbwingraph__mouse(&e, STBWGE_mousemove,  wparam, lparam,0,wnd, z); break;
+      case WM_LBUTTONDOWN:  stbwingraph__mouse(&e, STBWGE_leftdown,   wparam, lparam,1,wnd, z); break;
+      case WM_MBUTTONDOWN:  stbwingraph__mouse(&e, STBWGE_middledown, wparam, lparam,1,wnd, z); break;
+      case WM_RBUTTONDOWN:  stbwingraph__mouse(&e, STBWGE_rightdown,  wparam, lparam,1,wnd, z); break;
+      case WM_LBUTTONUP:    stbwingraph__mouse(&e, STBWGE_leftup,     wparam, lparam,-1,wnd, z); break;
+      case WM_MBUTTONUP:    stbwingraph__mouse(&e, STBWGE_middleup,   wparam, lparam,-1,wnd, z); break;
+      case WM_RBUTTONUP:    stbwingraph__mouse(&e, STBWGE_rightup,    wparam, lparam,-1,wnd, z); break;
+      case WM_MOUSEWHEEL:   stbwingraph__mousewheel(&e, STBWGE_mousewheel, wparam, lparam,0,wnd, z); break;
+
+      case WM_ACTIVATE:
+         allow_default = FALSE;
+         if (LOWORD(wparam)==WA_INACTIVE ) {
+            wglMakeCurrent(z->dc, NULL);
+            e.type = STBWGE_deactivate;
+            z->active = FALSE;
+         } else {
+            wglMakeCurrent(z->dc, z->rc);
+            e.type = STBWGE_activate;
+            z->active = TRUE;
+         }
+         e.handle = wnd;
+         z->func(z->data, &e);
+         return 0;
+
+      case WM_SIZE: {
+         RECT rect;
+         allow_default = FALSE;
+         GetClientRect(wnd, &rect);
+         e.type = STBWGE_size;
+         e.width = rect.right;
+         e.height = rect.bottom;
+         e.handle = wnd;
+         z->func(z->data, &e);
+         return 0;
+      }
+
+      default:
+         return DefWindowProc (wnd, msg, wparam, lparam);
+   }
+
+   if (e.type != STBWGE__none) {
+      int n;
+      e.handle = wnd;
+      n = z->func(z->data, &e);
+      if (n == STBWINGRAPH_winproc_exit) {
+         PostQuitMessage(0);
+         n = 0;
+      }
+      if (n == STBWINGRAPH_winproc_update) {
+         stbwingraph_force_update = TRUE;
+         return 1;
+      }
+      if (n != STBWINGRAPH_unprocessed)
+         return n;
+   }
+   return DefWindowProc (wnd, msg, wparam, lparam);
+}
+
+int stbwingraph_DefineClass(HINSTANCE hInstance, char *iconname)
+{
+   WNDCLASSEX  wndclass;
+
+   stbwingraph_app = hInstance;
+
+   wndclass.cbSize        = sizeof(wndclass);
+   wndclass.style         = CS_OWNDC;
+   wndclass.lpfnWndProc   = (WNDPROC) stbwingraph_WinProc;
+   wndclass.cbClsExtra    = 0;
+   wndclass.cbWndExtra    = 0;
+   wndclass.hInstance     = hInstance;
+   wndclass.hIcon         = LoadIcon(hInstance, iconname);
+   wndclass.hCursor       = LoadCursor(NULL,IDC_ARROW);
+   wndclass.hbrBackground = GetStockObject(NULL_BRUSH);
+   wndclass.lpszMenuName  = "zwingraph";
+   wndclass.lpszClassName = "zwingraph";
+   wndclass.hIconSm       = NULL;
+
+   if (!RegisterClassEx(&wndclass))
+      return FALSE;
+   return TRUE;
+}
+
+void stbwingraph_ShowWindow(void *window)
+{
+   stbwingraph_event e = { STBWGE_create_postshow };
+   stbwingraph__window *z = (stbwingraph__window *) GetWindowLong(window, GWL_USERDATA);
+   ShowWindow(window, SW_SHOWNORMAL);
+   InvalidateRect(window, NULL, TRUE);
+   UpdateWindow(window);
+   e.handle = window;
+   z->func(z->data, &e);
+}
+
+void *stbwingraph_CreateWindow(int primary, stbwingraph_window_proc func, void *data, char *text,
+           int width, int height, int fullscreen, int resizeable, int dest_alpha, int stencil)
+{
+   HWND win;
+   DWORD dwstyle;
+   stbwingraph__window *z = (stbwingraph__window *) malloc(sizeof(*z));
+
+   if (z == NULL) return NULL;
+   memset(z, 0, sizeof(*z));
+   z->color = 24;
+   z->depth = 24;
+   z->alpha = dest_alpha;
+   z->stencil = stencil;
+   z->func = func;
+   z->data = data;
+   z->mx = -(1 << 30);
+   z->my = 0;
+
+   if (primary) {
+      if (stbwingraph_request_windowed)
+         fullscreen = FALSE;
+      else if (stbwingraph_request_fullscreen)
+         fullscreen = TRUE;
+   }
+
+   if (fullscreen) {
+      #ifdef STB_SIMPLE
+      stbwingraph_ChangeResolution(width, height, 32, 1);
+      #else
+      if (!stbwingraph_ChangeResolution(width, height, 32, 0))
+         return NULL;
+      #endif
+      dwstyle = WS_POPUP | WS_CLIPSIBLINGS;
+   } else {
+      RECT rect;
+      dwstyle = WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_MINIMIZEBOX;
+      if (resizeable)
+         dwstyle |= WS_SIZEBOX | WS_MAXIMIZEBOX;
+      rect.top = 0;
+      rect.left = 0;
+      rect.right = width;
+      rect.bottom = height;
+      AdjustWindowRect(&rect, dwstyle, FALSE);
+      width = rect.right - rect.left;
+      height = rect.bottom - rect.top;
+   }
+
+   win = CreateWindow("zwingraph", text ? text : "sample", dwstyle,
+                      CW_USEDEFAULT,0, width, height,
+                      NULL, NULL, stbwingraph_app, z);
+
+   if (win == NULL) return win;
+
+   if (primary) {
+      if (stbwingraph_primary_window)
+         stbwingraph_DestroyWindow(stbwingraph_primary_window);
+      stbwingraph_primary_window = win;
+   }
+
+   {
+      stbwingraph_event e = { STBWGE_create };
+      stbwingraph__window *z = (stbwingraph__window *) GetWindowLong(win, GWL_USERDATA);
+      z->window = win;
+      e.did_share_lists = z->did_share_lists;
+      e.handle = win;
+      if (z->func(z->data, &e) != STBWINGRAPH_do_not_show)
+         stbwingraph_ShowWindow(win);
+   }
+
+   return win;
+}
+
+void *stbwingraph_CreateWindowSimple(stbwingraph_window_proc func, int width, int height)
+{
+   int fullscreen = 0;
+   #ifndef _DEBUG
+   if (width ==  640 && height ==  480) fullscreen = 1;
+   if (width ==  800 && height ==  600) fullscreen = 1;
+   if (width == 1024 && height ==  768) fullscreen = 1;
+   if (width == 1280 && height == 1024) fullscreen = 1;
+   if (width == 1600 && height == 1200) fullscreen = 1;
+   //@TODO: widescreen widths
+   #endif
+   return stbwingraph_CreateWindow(1, func, NULL, NULL, width, height, fullscreen, 1, 0, 0);
+}
+
+void *stbwingraph_CreateWindowSimpleFull(stbwingraph_window_proc func, int fullscreen, int ww, int wh, int fw, int fh)
+{
+   if (fullscreen == -1) {
+   #ifdef _DEBUG
+      fullscreen = 0;
+   #else
+      fullscreen = 1;
+   #endif
+   }
+
+   if (fullscreen) {
+      if (fw) ww = fw;
+      if (fh) wh = fh;
+   }
+   return stbwingraph_CreateWindow(1, func, NULL, NULL, ww, wh, fullscreen, 1, 0, 0);
+}
+
+void stbwingraph_DestroyWindow(void *window)
+{
+   stbwingraph__window *z = (stbwingraph__window *) GetWindowLong(window, GWL_USERDATA);
+   DestroyWindow(window);
+   free(z);
+   if (stbwingraph_primary_window == window)
+      stbwingraph_primary_window = NULL;
+}
+
+void stbwingraph_ShowCursor(void *window, int visible)
+{
+   int hide;
+   stbwingraph__window *win;
+   if (!window)
+      window = stbwingraph_primary_window;
+   win = (stbwingraph__window *) GetWindowLong((HWND) window, GWL_USERDATA);
+   hide = !visible;
+   if (hide != win->hide_mouse) {
+      win->hide_mouse = hide;
+      if (!hide)
+         ShowCursor(TRUE);
+      else if (win->in_client)
+         ShowCursor(FALSE);
+   }
+}
+
+float stbwingraph_GetTimestep(float minimum_time)
+{
+   float elapsedTime;
+   double thisTime;
+   static double lastTime = -1;
+   
+   if (lastTime == -1)
+      lastTime = timeGetTime() / 1000.0 - minimum_time;
+
+   for(;;) {
+      thisTime = timeGetTime() / 1000.0;
+      elapsedTime = (float) (thisTime - lastTime);
+      if (elapsedTime >= minimum_time) {
+         lastTime = thisTime;         
+         return elapsedTime;
+      }
+      #if 1
+      Sleep(2);
+      #endif
+   }
+}
+
+void stbwingraph_SetGLWindow(void *win)
+{
+   stbwingraph__window *z = (stbwingraph__window *) GetWindowLong(win, GWL_USERDATA);
+   if (z)
+      wglMakeCurrent(z->dc, z->rc);
+}
+
+void stbwingraph_MakeFonts(void *window, int font_base)
+{
+   wglUseFontBitmaps(GetDC(window ? window : stbwingraph_primary_window), 0, 256, font_base);
+}
+
+// returns 1 if WM_QUIT, 0 if 'func' returned 0
+int stbwingraph_MainLoop(stbwingraph_update func, float mintime)
+{
+   int needs_drawing = FALSE;
+   MSG msg;
+
+   int is_animating = TRUE;
+   if (mintime <= 0) mintime = 0.01f;
+
+   for(;;) {
+      int n;
+
+      is_animating = TRUE;
+      // wait for a message if: (a) we're animating and there's already a message
+      // or (b) we're not animating
+      if (!is_animating || PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE)) {
+         stbwingraph_force_update = FALSE;
+         if (GetMessage(&msg, NULL, 0, 0)) {
+            TranslateMessage(&msg);
+            DispatchMessage(&msg);
+         } else {
+            return 1;   // WM_QUIT
+         }
+
+         // only force a draw for certain messages...
+         // if I don't do this, we peg at 50% for some reason... must
+         // be a bug somewhere, because we peg at 100% when rendering...
+         // very weird... looks like NVIDIA is pumping some messages
+         // through our pipeline? well, ok, I guess if we can get
+         // non-user-generated messages we have to do this
+         if (!stbwingraph_force_update) {
+            switch (msg.message) {
+               case WM_MOUSEMOVE:
+               case WM_NCMOUSEMOVE:
+                  break;
+               case WM_CHAR:         
+               case WM_KEYDOWN:      
+               case WM_KEYUP:        
+               case WM_LBUTTONDOWN:  
+               case WM_MBUTTONDOWN:  
+               case WM_RBUTTONDOWN:  
+               case WM_LBUTTONUP:    
+               case WM_MBUTTONUP:    
+               case WM_RBUTTONUP:    
+               case WM_TIMER:
+               case WM_SIZE:
+               case WM_ACTIVATE:
+                  needs_drawing = TRUE;
+                  break;
+            }
+         } else
+            needs_drawing = TRUE;
+      }
+
+      // if another message, process that first
+      // @TODO: i don't think this is working, because I can't key ahead
+      // in the SVT demo app
+      if (PeekMessage(&msg, NULL, 0,0, PM_NOREMOVE))
+         continue;
+
+      // and now call update
+      if (needs_drawing || is_animating) {
+         int real=1, in_client=1;
+         if (stbwingraph_primary_window) {
+            stbwingraph__window *z = (stbwingraph__window *) GetWindowLong(stbwingraph_primary_window, GWL_USERDATA);
+            if (z && !z->active) {
+               real = 0;
+            }
+            if (z)
+               in_client = z->in_client;
+         }
+
+         if (stbwingraph_primary_window)
+            stbwingraph_SetGLWindow(stbwingraph_primary_window);
+         n = func(stbwingraph_GetTimestep(mintime), real, in_client);
+         if (n == STBWINGRAPH_update_exit)
+            return 0; // update_quit
+
+         is_animating = (n != STBWINGRAPH_update_pause);
+
+         needs_drawing = FALSE;
+      }
+   }
+}
+
+void stbwingraph_SwapBuffers(void *win)
+{
+   stbwingraph__window *z;
+   if (win == NULL) win = stbwingraph_primary_window;
+   z = (stbwingraph__window *) GetWindowLong(win, GWL_USERDATA);
+   if (z && z->dc)
+      SwapBuffers(z->dc);
+}
+#endif
+
+#ifdef STB_WINMAIN    
+void stbwingraph_main(void);
+
+char *stb_wingraph_commandline;
+
+int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow)
+{
+   {
+      char buffer[1024];
+      // add spaces to either side of the string
+      buffer[0] = ' ';
+      strcpy(buffer+1, lpCmdLine);
+      strcat(buffer, " ");
+      if (strstr(buffer, " -reset ")) {
+         ChangeDisplaySettings(NULL, 0); 
+         exit(0);
+      }
+      if (strstr(buffer, " -window ") || strstr(buffer, " -windowed "))
+         stbwingraph_request_windowed = TRUE;
+      else if (strstr(buffer, " -full ") || strstr(buffer, " -fullscreen "))
+         stbwingraph_request_fullscreen = TRUE;
+   }
+   stb_wingraph_commandline = lpCmdLine;
+
+   stbwingraph_DefineClass(hInstance, "appicon");
+   stbwingraph_main();
+
+   return 0;
+}
+#endif
+
+#undef STB_EXTERN
+#ifdef STB_WINGRAPH_DISABLE_DEFINE_AT_END
+#undef STB_DEFINE
+#endif
+
+#endif // INCLUDE_STB_WINGRAPH_H
diff --git a/vendor/stb/tests/pbm/basi0g16.pgm b/vendor/stb/tests/pbm/basi0g16.pgm
new file mode 100644
index 0000000..7241243
Binary files /dev/null and b/vendor/stb/tests/pbm/basi0g16.pgm differ
diff --git a/vendor/stb/tests/pbm/basi2c16.ppm b/vendor/stb/tests/pbm/basi2c16.ppm
new file mode 100644
index 0000000..f2913bb
Binary files /dev/null and b/vendor/stb/tests/pbm/basi2c16.ppm differ
diff --git a/vendor/stb/tests/pbm/cdfn2c08.ppm b/vendor/stb/tests/pbm/cdfn2c08.ppm
new file mode 100644
index 0000000..1a9e0f0
Binary files /dev/null and b/vendor/stb/tests/pbm/cdfn2c08.ppm differ
diff --git a/vendor/stb/tests/pbm/cdun2c08.ppm b/vendor/stb/tests/pbm/cdun2c08.ppm
new file mode 100644
index 0000000..2d7202b
Binary files /dev/null and b/vendor/stb/tests/pbm/cdun2c08.ppm differ
diff --git a/vendor/stb/tests/pbm/comment.pgm b/vendor/stb/tests/pbm/comment.pgm
new file mode 100644
index 0000000..aa9dc71
Binary files /dev/null and b/vendor/stb/tests/pbm/comment.pgm differ
diff --git a/vendor/stb/tests/pbm/ctfn0g04.pgm b/vendor/stb/tests/pbm/ctfn0g04.pgm
new file mode 100644
index 0000000..284f870
Binary files /dev/null and b/vendor/stb/tests/pbm/ctfn0g04.pgm differ
diff --git a/vendor/stb/tests/pg_test/pg_test.c b/vendor/stb/tests/pg_test/pg_test.c
new file mode 100644
index 0000000..c028d5f
--- /dev/null
+++ b/vendor/stb/tests/pg_test/pg_test.c
@@ -0,0 +1,124 @@
+#define STB_DEFINE
+#include "stb.h"
+#define STB_PG_IMPLEMENTATION
+#include "stb_pg.h"
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+static float *hf;
+static int hf_width = 10001;
+static int hf_height = 10001;
+
+static float get_height(float x, float y)
+{
+   float h00,h01,h10,h11,h0,h1;
+   int ix,iy;
+   if (x < 0) x = 0;
+   if (x > hf_width-1) x = (float) hf_width-1;
+   if (y < 0) y = 0;
+   if (y > hf_height-1) y = (float) hf_height-1;
+   ix = (int) x; x -= ix;
+   iy = (int) y; y -= iy;
+   h00 = hf[(iy+0)*hf_height+(ix+0)];
+   h10 = hf[(iy+0)*hf_height+(ix+1)];
+   h01 = hf[(iy+1)*hf_height+(ix+0)];
+   h11 = hf[(iy+1)*hf_height+(ix+1)];
+   h0 = stb_lerp(y, h00, h01);
+   h1 = stb_lerp(y, h10, h11);
+   return stb_lerp(x, h0, h1);
+}
+
+void stbpg_tick(float dt)
+{
+   int i=0,j=0;
+   int step = 1;
+
+   glUseProgram(0);
+
+   glClearColor(0.6f,0.7f,1.0f,1.0f);
+   glClearDepth(1.0f);
+   glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
+
+   glDepthFunc(GL_LESS);
+   glEnable(GL_DEPTH_TEST);
+#if 1
+   glEnable(GL_CULL_FACE);
+
+   glMatrixMode(GL_PROJECTION);
+   glLoadIdentity();
+   gluPerspective(60.0, 1920/1080.0f, 0.02f, 8000.0f);
+   //glOrtho(-8,8,-6,6, -100, 100);
+
+   glMatrixMode(GL_MODELVIEW);
+   glLoadIdentity();
+   glRotatef(-90, 1,0,0); // z-up
+
+   {
+      float x,y;
+      stbpg_get_mouselook(&x,&y);
+      glRotatef(-y, 1,0,0);
+      glRotatef(-x, 0,0,1);
+   }
+
+   {
+      static float cam_x = 1000;
+      static float cam_y = 1000;
+      static float cam_z = 700;
+      float x=0,y=0;
+      stbpg_get_keymove(&x,&y);
+      cam_x += x*dt*5.0f;
+      cam_y += y*dt*5.0f;
+      glTranslatef(-cam_x, -cam_y, -cam_z);
+      if (cam_x >= 0 && cam_x < hf_width && cam_y >= 0 && cam_y < hf_height)
+         cam_z = get_height(cam_x, cam_y) + 1.65f; // average eye height in meters
+   }
+
+   for (j=501; j+1 < 1500+0*hf_height; j += step) {
+      glBegin(GL_QUAD_STRIP);
+      for (i=501; i < 1500+0*hf_width; i += step) {
+         static int flip=0;
+         if (flip)
+            glColor3f(0.5,0.5,0.5);
+         else
+            glColor3f(0.4f,0.4f,0.4f);
+         flip = !flip;
+         glVertex3f((float) i, (float) j+step,hf[(j+step)*hf_width+i]);
+         glVertex3f((float) i, (float) j   ,hf[  j    *hf_width+i]);
+      }
+      glEnd();
+   }
+
+   glBegin(GL_LINES);
+      glColor3f(1,0,0); glVertex3f(10,0,0); glVertex3f(0,0,0);
+      glColor3f(0,1,0); glVertex3f(0,10,0); glVertex3f(0,0,0);
+      glColor3f(0,0,1); glVertex3f(0,0,10); glVertex3f(0,0,0);
+   glEnd();
+#endif
+}
+
+void stbpg_main(int argc, char **argv)
+{
+   int i,j;
+
+   #if 0
+   int w,h,c;
+   unsigned short *data = stbi_load_16("c:/x/ned_1m/test2.png", &w, &h, &c, 1);
+   stb_filewrite("c:/x/ned_1m/x73_y428_10012_10012.bin", data, w*h*2);
+   #else
+   unsigned short *data = stb_file("c:/x/ned_1m/x73_y428_10012_10012.bin", NULL);
+   int w=10012, h = 10012;
+   #endif
+
+   hf = malloc(hf_width * hf_height * 4);
+   for (j=0; j < hf_height; ++j)
+      for (i=0; i < hf_width; ++i)
+         hf[j*hf_width+i] = data[j*w+i] / 32.0f;
+
+   stbpg_gl_compat_version(1,1);   
+   stbpg_windowed("terrain_edit", 1920, 1080);
+   stbpg_run();
+
+   return;
+}
diff --git a/vendor/stb/tests/pngsuite/16bit/basi0g16.png b/vendor/stb/tests/pngsuite/16bit/basi0g16.png
new file mode 100644
index 0000000..a9f2816
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basi0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basi2c16.png b/vendor/stb/tests/pngsuite/16bit/basi2c16.png
new file mode 100644
index 0000000..cd7e50f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basi2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basi4a16.png b/vendor/stb/tests/pngsuite/16bit/basi4a16.png
new file mode 100644
index 0000000..51192e7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basi4a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basi6a16.png b/vendor/stb/tests/pngsuite/16bit/basi6a16.png
new file mode 100644
index 0000000..4181533
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basi6a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basn0g16.png b/vendor/stb/tests/pngsuite/16bit/basn0g16.png
new file mode 100644
index 0000000..e7c82f7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basn0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basn2c16.png b/vendor/stb/tests/pngsuite/16bit/basn2c16.png
new file mode 100644
index 0000000..50c1cb9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basn2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basn4a16.png b/vendor/stb/tests/pngsuite/16bit/basn4a16.png
new file mode 100644
index 0000000..8243644
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basn4a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/basn6a16.png b/vendor/stb/tests/pngsuite/16bit/basn6a16.png
new file mode 100644
index 0000000..984a995
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/basn6a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/bgai4a16.png b/vendor/stb/tests/pngsuite/16bit/bgai4a16.png
new file mode 100644
index 0000000..51192e7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/bgai4a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/bgan6a16.png b/vendor/stb/tests/pngsuite/16bit/bgan6a16.png
new file mode 100644
index 0000000..984a995
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/bgan6a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/bggn4a16.png b/vendor/stb/tests/pngsuite/16bit/bggn4a16.png
new file mode 100644
index 0000000..13fd85b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/bggn4a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/bgyn6a16.png b/vendor/stb/tests/pngsuite/16bit/bgyn6a16.png
new file mode 100644
index 0000000..ae3e9be
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/bgyn6a16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi1n0g16.png b/vendor/stb/tests/pngsuite/16bit/oi1n0g16.png
new file mode 100644
index 0000000..e7c82f7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi1n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi1n2c16.png b/vendor/stb/tests/pngsuite/16bit/oi1n2c16.png
new file mode 100644
index 0000000..50c1cb9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi1n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi2n0g16.png b/vendor/stb/tests/pngsuite/16bit/oi2n0g16.png
new file mode 100644
index 0000000..14d64c5
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi2n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi2n2c16.png b/vendor/stb/tests/pngsuite/16bit/oi2n2c16.png
new file mode 100644
index 0000000..4c2e3e3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi2n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi4n0g16.png b/vendor/stb/tests/pngsuite/16bit/oi4n0g16.png
new file mode 100644
index 0000000..69e73ed
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi4n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi4n2c16.png b/vendor/stb/tests/pngsuite/16bit/oi4n2c16.png
new file mode 100644
index 0000000..93691e3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi4n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi9n0g16.png b/vendor/stb/tests/pngsuite/16bit/oi9n0g16.png
new file mode 100644
index 0000000..9248413
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi9n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/oi9n2c16.png b/vendor/stb/tests/pngsuite/16bit/oi9n2c16.png
new file mode 100644
index 0000000..f0512e4
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/oi9n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/tbbn2c16.png b/vendor/stb/tests/pngsuite/16bit/tbbn2c16.png
new file mode 100644
index 0000000..dd3168e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/tbbn2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/tbgn2c16.png b/vendor/stb/tests/pngsuite/16bit/tbgn2c16.png
new file mode 100644
index 0000000..85cec39
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/tbgn2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/16bit/tbwn0g16.png b/vendor/stb/tests/pngsuite/16bit/tbwn0g16.png
new file mode 100644
index 0000000..99bdeed
Binary files /dev/null and b/vendor/stb/tests/pngsuite/16bit/tbwn0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/PngSuite.LICENSE b/vendor/stb/tests/pngsuite/PngSuite.LICENSE
new file mode 100644
index 0000000..8d4d1d0
--- /dev/null
+++ b/vendor/stb/tests/pngsuite/PngSuite.LICENSE
@@ -0,0 +1,9 @@
+PngSuite
+--------
+
+Permission to use, copy, modify and distribute these images for any
+purpose and without fee is hereby granted.
+
+
+(c) Willem van Schaik, 1996, 2011
+
diff --git a/vendor/stb/tests/pngsuite/corrupt/xc1n0g08.png b/vendor/stb/tests/pngsuite/corrupt/xc1n0g08.png
new file mode 100644
index 0000000..9404227
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xc1n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xc9n2c08.png b/vendor/stb/tests/pngsuite/corrupt/xc9n2c08.png
new file mode 100644
index 0000000..b11c2a7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xc9n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xcrn0g04.png b/vendor/stb/tests/pngsuite/corrupt/xcrn0g04.png
new file mode 100644
index 0000000..48abba1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xcrn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xcsn0g01.png b/vendor/stb/tests/pngsuite/corrupt/xcsn0g01.png
new file mode 100644
index 0000000..9863a26
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xcsn0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xd0n2c08.png b/vendor/stb/tests/pngsuite/corrupt/xd0n2c08.png
new file mode 100644
index 0000000..2f00161
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xd0n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xd3n2c08.png b/vendor/stb/tests/pngsuite/corrupt/xd3n2c08.png
new file mode 100644
index 0000000..9e4a3ff
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xd3n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xd9n2c08.png b/vendor/stb/tests/pngsuite/corrupt/xd9n2c08.png
new file mode 100644
index 0000000..2c3b91a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xd9n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xdtn0g01.png b/vendor/stb/tests/pngsuite/corrupt/xdtn0g01.png
new file mode 100644
index 0000000..1a81abe
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xdtn0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xhdn0g08.png b/vendor/stb/tests/pngsuite/corrupt/xhdn0g08.png
new file mode 100644
index 0000000..fcb8737
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xhdn0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xlfn0g04.png b/vendor/stb/tests/pngsuite/corrupt/xlfn0g04.png
new file mode 100644
index 0000000..d9ec53e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xlfn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xs1n0g01.png b/vendor/stb/tests/pngsuite/corrupt/xs1n0g01.png
new file mode 100644
index 0000000..1817c51
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xs1n0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xs2n0g01.png b/vendor/stb/tests/pngsuite/corrupt/xs2n0g01.png
new file mode 100644
index 0000000..b8147f2
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xs2n0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xs4n0g01.png b/vendor/stb/tests/pngsuite/corrupt/xs4n0g01.png
new file mode 100644
index 0000000..45237a1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xs4n0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/corrupt/xs7n0g01.png b/vendor/stb/tests/pngsuite/corrupt/xs7n0g01.png
new file mode 100644
index 0000000..3f307f1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/corrupt/xs7n0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_basi0g01.png b/vendor/stb/tests/pngsuite/iphone/iphone_basi0g01.png
new file mode 100644
index 0000000..33db08c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_basi0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_basi0g02.png b/vendor/stb/tests/pngsuite/iphone/iphone_basi0g02.png
new file mode 100644
index 0000000..484f46d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_basi0g02.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_basi3p02.png b/vendor/stb/tests/pngsuite/iphone/iphone_basi3p02.png
new file mode 100644
index 0000000..1699e9a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_basi3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_bgwn6a08.png b/vendor/stb/tests/pngsuite/iphone/iphone_bgwn6a08.png
new file mode 100644
index 0000000..7d0ac50
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_bgwn6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_bgyn6a16.png b/vendor/stb/tests/pngsuite/iphone/iphone_bgyn6a16.png
new file mode 100644
index 0000000..9046336
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_bgyn6a16.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_tbyn3p08.png b/vendor/stb/tests/pngsuite/iphone/iphone_tbyn3p08.png
new file mode 100644
index 0000000..3c224d0
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_tbyn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/iphone/iphone_z06n2c08.png b/vendor/stb/tests/pngsuite/iphone/iphone_z06n2c08.png
new file mode 100644
index 0000000..de5dba3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/iphone/iphone_z06n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi0g01.png b/vendor/stb/tests/pngsuite/primary/basi0g01.png
new file mode 100644
index 0000000..556fa72
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi0g02.png b/vendor/stb/tests/pngsuite/primary/basi0g02.png
new file mode 100644
index 0000000..ce09821
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi0g02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi0g04.png b/vendor/stb/tests/pngsuite/primary/basi0g04.png
new file mode 100644
index 0000000..3853273
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi0g08.png b/vendor/stb/tests/pngsuite/primary/basi0g08.png
new file mode 100644
index 0000000..faed8be
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi2c08.png b/vendor/stb/tests/pngsuite/primary/basi2c08.png
new file mode 100644
index 0000000..2aab44d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi3p01.png b/vendor/stb/tests/pngsuite/primary/basi3p01.png
new file mode 100644
index 0000000..00a7cea
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi3p02.png b/vendor/stb/tests/pngsuite/primary/basi3p02.png
new file mode 100644
index 0000000..bb16b44
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi3p04.png b/vendor/stb/tests/pngsuite/primary/basi3p04.png
new file mode 100644
index 0000000..b4e888e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi3p08.png b/vendor/stb/tests/pngsuite/primary/basi3p08.png
new file mode 100644
index 0000000..50a6d1c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi4a08.png b/vendor/stb/tests/pngsuite/primary/basi4a08.png
new file mode 100644
index 0000000..398132b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basi6a08.png b/vendor/stb/tests/pngsuite/primary/basi6a08.png
new file mode 100644
index 0000000..aecb32e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basi6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn0g01.png b/vendor/stb/tests/pngsuite/primary/basn0g01.png
new file mode 100644
index 0000000..1d72242
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn0g02.png b/vendor/stb/tests/pngsuite/primary/basn0g02.png
new file mode 100644
index 0000000..5083324
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn0g02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn0g04.png b/vendor/stb/tests/pngsuite/primary/basn0g04.png
new file mode 100644
index 0000000..0bf3687
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn0g08.png b/vendor/stb/tests/pngsuite/primary/basn0g08.png
new file mode 100644
index 0000000..23c8237
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn2c08.png b/vendor/stb/tests/pngsuite/primary/basn2c08.png
new file mode 100644
index 0000000..db5ad15
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn3p01.png b/vendor/stb/tests/pngsuite/primary/basn3p01.png
new file mode 100644
index 0000000..b145c2b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn3p02.png b/vendor/stb/tests/pngsuite/primary/basn3p02.png
new file mode 100644
index 0000000..8985b3d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn3p04.png b/vendor/stb/tests/pngsuite/primary/basn3p04.png
new file mode 100644
index 0000000..0fbf9e8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn3p08.png b/vendor/stb/tests/pngsuite/primary/basn3p08.png
new file mode 100644
index 0000000..0ddad07
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn4a08.png b/vendor/stb/tests/pngsuite/primary/basn4a08.png
new file mode 100644
index 0000000..3e13052
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/basn6a08.png b/vendor/stb/tests/pngsuite/primary/basn6a08.png
new file mode 100644
index 0000000..e608738
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/basn6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/bgai4a08.png b/vendor/stb/tests/pngsuite/primary/bgai4a08.png
new file mode 100644
index 0000000..398132b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/bgai4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/bgan6a08.png b/vendor/stb/tests/pngsuite/primary/bgan6a08.png
new file mode 100644
index 0000000..e608738
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/bgan6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/bgbn4a08.png b/vendor/stb/tests/pngsuite/primary/bgbn4a08.png
new file mode 100644
index 0000000..7cbefc3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/bgbn4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/bgwn6a08.png b/vendor/stb/tests/pngsuite/primary/bgwn6a08.png
new file mode 100644
index 0000000..a67ff20
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/bgwn6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s01i3p01.png b/vendor/stb/tests/pngsuite/primary/s01i3p01.png
new file mode 100644
index 0000000..6c0fad1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s01i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s01n3p01.png b/vendor/stb/tests/pngsuite/primary/s01n3p01.png
new file mode 100644
index 0000000..cb2c8c7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s01n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s02i3p01.png b/vendor/stb/tests/pngsuite/primary/s02i3p01.png
new file mode 100644
index 0000000..2defaed
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s02i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s02n3p01.png b/vendor/stb/tests/pngsuite/primary/s02n3p01.png
new file mode 100644
index 0000000..2b1b669
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s02n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s03i3p01.png b/vendor/stb/tests/pngsuite/primary/s03i3p01.png
new file mode 100644
index 0000000..c23fdc4
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s03i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s03n3p01.png b/vendor/stb/tests/pngsuite/primary/s03n3p01.png
new file mode 100644
index 0000000..6d96ee4
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s03n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s04i3p01.png b/vendor/stb/tests/pngsuite/primary/s04i3p01.png
new file mode 100644
index 0000000..0e710c2
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s04i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s04n3p01.png b/vendor/stb/tests/pngsuite/primary/s04n3p01.png
new file mode 100644
index 0000000..956396c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s04n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s05i3p02.png b/vendor/stb/tests/pngsuite/primary/s05i3p02.png
new file mode 100644
index 0000000..d14cbd3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s05i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s05n3p02.png b/vendor/stb/tests/pngsuite/primary/s05n3p02.png
new file mode 100644
index 0000000..bf940f0
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s05n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s06i3p02.png b/vendor/stb/tests/pngsuite/primary/s06i3p02.png
new file mode 100644
index 0000000..456ada3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s06i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s06n3p02.png b/vendor/stb/tests/pngsuite/primary/s06n3p02.png
new file mode 100644
index 0000000..501064d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s06n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s07i3p02.png b/vendor/stb/tests/pngsuite/primary/s07i3p02.png
new file mode 100644
index 0000000..44b66ba
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s07i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s07n3p02.png b/vendor/stb/tests/pngsuite/primary/s07n3p02.png
new file mode 100644
index 0000000..6a58259
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s07n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s08i3p02.png b/vendor/stb/tests/pngsuite/primary/s08i3p02.png
new file mode 100644
index 0000000..acf74f3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s08i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s08n3p02.png b/vendor/stb/tests/pngsuite/primary/s08n3p02.png
new file mode 100644
index 0000000..b7094e1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s08n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s09i3p02.png b/vendor/stb/tests/pngsuite/primary/s09i3p02.png
new file mode 100644
index 0000000..0bfae8e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s09i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s09n3p02.png b/vendor/stb/tests/pngsuite/primary/s09n3p02.png
new file mode 100644
index 0000000..711ab82
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s09n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s32i3p04.png b/vendor/stb/tests/pngsuite/primary/s32i3p04.png
new file mode 100644
index 0000000..0841910
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s32i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s32n3p04.png b/vendor/stb/tests/pngsuite/primary/s32n3p04.png
new file mode 100644
index 0000000..fa58e3e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s32n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s33i3p04.png b/vendor/stb/tests/pngsuite/primary/s33i3p04.png
new file mode 100644
index 0000000..ab0dc14
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s33i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s33n3p04.png b/vendor/stb/tests/pngsuite/primary/s33n3p04.png
new file mode 100644
index 0000000..764f1a3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s33n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s34i3p04.png b/vendor/stb/tests/pngsuite/primary/s34i3p04.png
new file mode 100644
index 0000000..bd99039
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s34i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s34n3p04.png b/vendor/stb/tests/pngsuite/primary/s34n3p04.png
new file mode 100644
index 0000000..9cbc68b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s34n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s35i3p04.png b/vendor/stb/tests/pngsuite/primary/s35i3p04.png
new file mode 100644
index 0000000..e2a5e0a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s35i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s35n3p04.png b/vendor/stb/tests/pngsuite/primary/s35n3p04.png
new file mode 100644
index 0000000..90b892e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s35n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s36i3p04.png b/vendor/stb/tests/pngsuite/primary/s36i3p04.png
new file mode 100644
index 0000000..eb61b6f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s36i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s36n3p04.png b/vendor/stb/tests/pngsuite/primary/s36n3p04.png
new file mode 100644
index 0000000..b38d179
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s36n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s37i3p04.png b/vendor/stb/tests/pngsuite/primary/s37i3p04.png
new file mode 100644
index 0000000..6e2b1e9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s37i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s37n3p04.png b/vendor/stb/tests/pngsuite/primary/s37n3p04.png
new file mode 100644
index 0000000..4d3054d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s37n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s38i3p04.png b/vendor/stb/tests/pngsuite/primary/s38i3p04.png
new file mode 100644
index 0000000..a0a8a14
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s38i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s38n3p04.png b/vendor/stb/tests/pngsuite/primary/s38n3p04.png
new file mode 100644
index 0000000..1233ed0
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s38n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s39i3p04.png b/vendor/stb/tests/pngsuite/primary/s39i3p04.png
new file mode 100644
index 0000000..04fee93
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s39i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s39n3p04.png b/vendor/stb/tests/pngsuite/primary/s39n3p04.png
new file mode 100644
index 0000000..c750100
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s39n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s40i3p04.png b/vendor/stb/tests/pngsuite/primary/s40i3p04.png
new file mode 100644
index 0000000..68f358b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s40i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/s40n3p04.png b/vendor/stb/tests/pngsuite/primary/s40n3p04.png
new file mode 100644
index 0000000..864b6b9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/s40n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tbbn0g04.png b/vendor/stb/tests/pngsuite/primary/tbbn0g04.png
new file mode 100644
index 0000000..39a7050
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tbbn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tbbn3p08.png b/vendor/stb/tests/pngsuite/primary/tbbn3p08.png
new file mode 100644
index 0000000..0ede357
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tbbn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tbgn3p08.png b/vendor/stb/tests/pngsuite/primary/tbgn3p08.png
new file mode 100644
index 0000000..8cf2e6f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tbgn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tbrn2c08.png b/vendor/stb/tests/pngsuite/primary/tbrn2c08.png
new file mode 100644
index 0000000..5cca0d6
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tbrn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tbwn3p08.png b/vendor/stb/tests/pngsuite/primary/tbwn3p08.png
new file mode 100644
index 0000000..eacab7a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tbwn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tbyn3p08.png b/vendor/stb/tests/pngsuite/primary/tbyn3p08.png
new file mode 100644
index 0000000..656db09
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tbyn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tm3n3p02.png b/vendor/stb/tests/pngsuite/primary/tm3n3p02.png
new file mode 100644
index 0000000..fb3ef1d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tm3n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tp0n0g08.png b/vendor/stb/tests/pngsuite/primary/tp0n0g08.png
new file mode 100644
index 0000000..333465f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tp0n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tp0n2c08.png b/vendor/stb/tests/pngsuite/primary/tp0n2c08.png
new file mode 100644
index 0000000..fc6e42c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tp0n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tp0n3p08.png b/vendor/stb/tests/pngsuite/primary/tp0n3p08.png
new file mode 100644
index 0000000..69a69e5
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tp0n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/tp1n3p08.png b/vendor/stb/tests/pngsuite/primary/tp1n3p08.png
new file mode 100644
index 0000000..a6c9f35
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/tp1n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/z00n2c08.png b/vendor/stb/tests/pngsuite/primary/z00n2c08.png
new file mode 100644
index 0000000..7669eb8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/z00n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/z03n2c08.png b/vendor/stb/tests/pngsuite/primary/z03n2c08.png
new file mode 100644
index 0000000..bfb10de
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/z03n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/z06n2c08.png b/vendor/stb/tests/pngsuite/primary/z06n2c08.png
new file mode 100644
index 0000000..b90ebc1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/z06n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary/z09n2c08.png b/vendor/stb/tests/pngsuite/primary/z09n2c08.png
new file mode 100644
index 0000000..5f191a7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary/z09n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi0g01.png b/vendor/stb/tests/pngsuite/primary_check/basi0g01.png
new file mode 100644
index 0000000..96ed62d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi0g02.png b/vendor/stb/tests/pngsuite/primary_check/basi0g02.png
new file mode 100644
index 0000000..bb53098
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi0g02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi0g04.png b/vendor/stb/tests/pngsuite/primary_check/basi0g04.png
new file mode 100644
index 0000000..2efd487
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi0g08.png b/vendor/stb/tests/pngsuite/primary_check/basi0g08.png
new file mode 100644
index 0000000..2395213
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi2c08.png b/vendor/stb/tests/pngsuite/primary_check/basi2c08.png
new file mode 100644
index 0000000..64ef3f8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi3p01.png b/vendor/stb/tests/pngsuite/primary_check/basi3p01.png
new file mode 100644
index 0000000..a8599e9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi3p02.png b/vendor/stb/tests/pngsuite/primary_check/basi3p02.png
new file mode 100644
index 0000000..c911ea9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi3p04.png b/vendor/stb/tests/pngsuite/primary_check/basi3p04.png
new file mode 100644
index 0000000..750ef69
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi3p08.png b/vendor/stb/tests/pngsuite/primary_check/basi3p08.png
new file mode 100644
index 0000000..39272c3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi4a08.png b/vendor/stb/tests/pngsuite/primary_check/basi4a08.png
new file mode 100644
index 0000000..1b7b3a5
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basi6a08.png b/vendor/stb/tests/pngsuite/primary_check/basi6a08.png
new file mode 100644
index 0000000..c12484f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basi6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn0g01.png b/vendor/stb/tests/pngsuite/primary_check/basn0g01.png
new file mode 100644
index 0000000..20f6404
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn0g01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn0g02.png b/vendor/stb/tests/pngsuite/primary_check/basn0g02.png
new file mode 100644
index 0000000..c4fae00
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn0g02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn0g04.png b/vendor/stb/tests/pngsuite/primary_check/basn0g04.png
new file mode 100644
index 0000000..166e7db
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn0g08.png b/vendor/stb/tests/pngsuite/primary_check/basn0g08.png
new file mode 100644
index 0000000..192c792
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn2c08.png b/vendor/stb/tests/pngsuite/primary_check/basn2c08.png
new file mode 100644
index 0000000..d774b80
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn3p01.png b/vendor/stb/tests/pngsuite/primary_check/basn3p01.png
new file mode 100644
index 0000000..77c580b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn3p02.png b/vendor/stb/tests/pngsuite/primary_check/basn3p02.png
new file mode 100644
index 0000000..8427124
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn3p04.png b/vendor/stb/tests/pngsuite/primary_check/basn3p04.png
new file mode 100644
index 0000000..f08c6e9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn3p08.png b/vendor/stb/tests/pngsuite/primary_check/basn3p08.png
new file mode 100644
index 0000000..0fa8195
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn4a08.png b/vendor/stb/tests/pngsuite/primary_check/basn4a08.png
new file mode 100644
index 0000000..d4e0a72
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/basn6a08.png b/vendor/stb/tests/pngsuite/primary_check/basn6a08.png
new file mode 100644
index 0000000..1f54e56
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/basn6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/bgai4a08.png b/vendor/stb/tests/pngsuite/primary_check/bgai4a08.png
new file mode 100644
index 0000000..23ec6ae
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/bgai4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/bgan6a08.png b/vendor/stb/tests/pngsuite/primary_check/bgan6a08.png
new file mode 100644
index 0000000..6cb76f2
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/bgan6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/bgbn4a08.png b/vendor/stb/tests/pngsuite/primary_check/bgbn4a08.png
new file mode 100644
index 0000000..1086ccc
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/bgbn4a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/bgwn6a08.png b/vendor/stb/tests/pngsuite/primary_check/bgwn6a08.png
new file mode 100644
index 0000000..03a0a30
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/bgwn6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s01i3p01.png b/vendor/stb/tests/pngsuite/primary_check/s01i3p01.png
new file mode 100644
index 0000000..45f8c61
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s01i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s01n3p01.png b/vendor/stb/tests/pngsuite/primary_check/s01n3p01.png
new file mode 100644
index 0000000..a79ac1b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s01n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s02i3p01.png b/vendor/stb/tests/pngsuite/primary_check/s02i3p01.png
new file mode 100644
index 0000000..d84f406
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s02i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s02n3p01.png b/vendor/stb/tests/pngsuite/primary_check/s02n3p01.png
new file mode 100644
index 0000000..3b813c4
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s02n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s03i3p01.png b/vendor/stb/tests/pngsuite/primary_check/s03i3p01.png
new file mode 100644
index 0000000..51367f7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s03i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s03n3p01.png b/vendor/stb/tests/pngsuite/primary_check/s03n3p01.png
new file mode 100644
index 0000000..7c44b73
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s03n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s04i3p01.png b/vendor/stb/tests/pngsuite/primary_check/s04i3p01.png
new file mode 100644
index 0000000..ae326c1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s04i3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s04n3p01.png b/vendor/stb/tests/pngsuite/primary_check/s04n3p01.png
new file mode 100644
index 0000000..55c63df
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s04n3p01.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s05i3p02.png b/vendor/stb/tests/pngsuite/primary_check/s05i3p02.png
new file mode 100644
index 0000000..fd41d1d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s05i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s05n3p02.png b/vendor/stb/tests/pngsuite/primary_check/s05n3p02.png
new file mode 100644
index 0000000..d6ab572
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s05n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s06i3p02.png b/vendor/stb/tests/pngsuite/primary_check/s06i3p02.png
new file mode 100644
index 0000000..73a7b0c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s06i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s06n3p02.png b/vendor/stb/tests/pngsuite/primary_check/s06n3p02.png
new file mode 100644
index 0000000..e85eac8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s06n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s07i3p02.png b/vendor/stb/tests/pngsuite/primary_check/s07i3p02.png
new file mode 100644
index 0000000..08f6180
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s07i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s07n3p02.png b/vendor/stb/tests/pngsuite/primary_check/s07n3p02.png
new file mode 100644
index 0000000..029bc29
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s07n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s08i3p02.png b/vendor/stb/tests/pngsuite/primary_check/s08i3p02.png
new file mode 100644
index 0000000..23d16c7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s08i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s08n3p02.png b/vendor/stb/tests/pngsuite/primary_check/s08n3p02.png
new file mode 100644
index 0000000..4a46016
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s08n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s09i3p02.png b/vendor/stb/tests/pngsuite/primary_check/s09i3p02.png
new file mode 100644
index 0000000..ea14f9b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s09i3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s09n3p02.png b/vendor/stb/tests/pngsuite/primary_check/s09n3p02.png
new file mode 100644
index 0000000..7a82253
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s09n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s32i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s32i3p04.png
new file mode 100644
index 0000000..db4bef7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s32i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s32n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s32n3p04.png
new file mode 100644
index 0000000..c250971
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s32n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s33i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s33i3p04.png
new file mode 100644
index 0000000..0faaa74
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s33i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s33n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s33n3p04.png
new file mode 100644
index 0000000..599171c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s33n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s34i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s34i3p04.png
new file mode 100644
index 0000000..ca0e5eb
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s34i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s34n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s34n3p04.png
new file mode 100644
index 0000000..b175f59
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s34n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s35i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s35i3p04.png
new file mode 100644
index 0000000..ccb1b8a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s35i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s35n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s35n3p04.png
new file mode 100644
index 0000000..2c7219f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s35n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s36i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s36i3p04.png
new file mode 100644
index 0000000..d61491f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s36i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s36n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s36n3p04.png
new file mode 100644
index 0000000..1f50479
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s36n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s37i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s37i3p04.png
new file mode 100644
index 0000000..2906fa3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s37i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s37n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s37n3p04.png
new file mode 100644
index 0000000..8931b85
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s37n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s38i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s38i3p04.png
new file mode 100644
index 0000000..becf5a1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s38i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s38n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s38n3p04.png
new file mode 100644
index 0000000..43f8c98
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s38n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s39i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s39i3p04.png
new file mode 100644
index 0000000..b045ad5
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s39i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s39n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s39n3p04.png
new file mode 100644
index 0000000..d37d66d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s39n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s40i3p04.png b/vendor/stb/tests/pngsuite/primary_check/s40i3p04.png
new file mode 100644
index 0000000..dd2f7a1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s40i3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/s40n3p04.png b/vendor/stb/tests/pngsuite/primary_check/s40n3p04.png
new file mode 100644
index 0000000..6f8596c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/s40n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tbbn0g04.png b/vendor/stb/tests/pngsuite/primary_check/tbbn0g04.png
new file mode 100644
index 0000000..8d9f7d5
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tbbn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tbbn3p08.png b/vendor/stb/tests/pngsuite/primary_check/tbbn3p08.png
new file mode 100644
index 0000000..706e6c4
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tbbn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tbgn3p08.png b/vendor/stb/tests/pngsuite/primary_check/tbgn3p08.png
new file mode 100644
index 0000000..fa5cdbc
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tbgn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tbrn2c08.png b/vendor/stb/tests/pngsuite/primary_check/tbrn2c08.png
new file mode 100644
index 0000000..bbe748f
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tbrn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tbwn3p08.png b/vendor/stb/tests/pngsuite/primary_check/tbwn3p08.png
new file mode 100644
index 0000000..9ecd404
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tbwn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tbyn3p08.png b/vendor/stb/tests/pngsuite/primary_check/tbyn3p08.png
new file mode 100644
index 0000000..4fbdb36
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tbyn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tm3n3p02.png b/vendor/stb/tests/pngsuite/primary_check/tm3n3p02.png
new file mode 100644
index 0000000..babdebe
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tm3n3p02.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tp0n0g08.png b/vendor/stb/tests/pngsuite/primary_check/tp0n0g08.png
new file mode 100644
index 0000000..96dd89c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tp0n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tp0n2c08.png b/vendor/stb/tests/pngsuite/primary_check/tp0n2c08.png
new file mode 100644
index 0000000..364e97e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tp0n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tp0n3p08.png b/vendor/stb/tests/pngsuite/primary_check/tp0n3p08.png
new file mode 100644
index 0000000..e5a29d6
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tp0n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/tp1n3p08.png b/vendor/stb/tests/pngsuite/primary_check/tp1n3p08.png
new file mode 100644
index 0000000..9ecd404
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/tp1n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/z00n2c08.png b/vendor/stb/tests/pngsuite/primary_check/z00n2c08.png
new file mode 100644
index 0000000..ecaa0d8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/z00n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/z03n2c08.png b/vendor/stb/tests/pngsuite/primary_check/z03n2c08.png
new file mode 100644
index 0000000..ecaa0d8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/z03n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/z06n2c08.png b/vendor/stb/tests/pngsuite/primary_check/z06n2c08.png
new file mode 100644
index 0000000..ecaa0d8
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/z06n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/primary_check/z09n2c08.png b/vendor/stb/tests/pngsuite/primary_check/z09n2c08.png
new file mode 100644
index 0000000..d869f99
Binary files /dev/null and b/vendor/stb/tests/pngsuite/primary_check/z09n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/ref_results.csv b/vendor/stb/tests/pngsuite/ref_results.csv
new file mode 100644
index 0000000..74dc2e6
--- /dev/null
+++ b/vendor/stb/tests/pngsuite/ref_results.csv
@@ -0,0 +1,259 @@
+filename,width,height,ncomp,error,hash
+pngsuite/16bit/basi0g16.png,32,32,1,,0xfc8f2f99
+pngsuite/16bit/basi2c16.png,32,32,3,,0x65567ed5
+pngsuite/16bit/basi4a16.png,32,32,2,,0x198cf245
+pngsuite/16bit/basi6a16.png,32,32,4,,0x3016e9b5
+pngsuite/16bit/basn0g16.png,32,32,1,,0xfc8f2f99
+pngsuite/16bit/basn2c16.png,32,32,3,,0x65567ed5
+pngsuite/16bit/basn4a16.png,32,32,2,,0x198cf245
+pngsuite/16bit/basn6a16.png,32,32,4,,0x3016e9b5
+pngsuite/16bit/bgai4a16.png,32,32,2,,0x198cf245
+pngsuite/16bit/bgan6a16.png,32,32,4,,0x3016e9b5
+pngsuite/16bit/bggn4a16.png,32,32,2,,0x198cf245
+pngsuite/16bit/bgyn6a16.png,32,32,4,,0x3016e9b5
+pngsuite/16bit/oi1n0g16.png,32,32,1,,0xfc8f2f99
+pngsuite/16bit/oi1n2c16.png,32,32,3,,0x65567ed5
+pngsuite/16bit/oi2n0g16.png,32,32,1,,0xfc8f2f99
+pngsuite/16bit/oi2n2c16.png,32,32,3,,0x65567ed5
+pngsuite/16bit/oi4n0g16.png,32,32,1,,0xfc8f2f99
+pngsuite/16bit/oi4n2c16.png,32,32,3,,0x65567ed5
+pngsuite/16bit/oi9n0g16.png,32,32,1,,0xfc8f2f99
+pngsuite/16bit/oi9n2c16.png,32,32,3,,0x65567ed5
+pngsuite/16bit/tbbn2c16.png,32,32,4,,0xaa9bfe44
+pngsuite/16bit/tbgn2c16.png,32,32,4,,0xaa9bfe44
+pngsuite/16bit/tbwn0g16.png,32,32,2,,0x075e519a
+pngsuite/corrupt/xc1n0g08.png,32,32,2,bad ctype,0x00000000
+pngsuite/corrupt/xc9n2c08.png,32,32,2,bad ctype,0x00000000
+pngsuite/corrupt/xcrn0g04.png,32,32,2,unknown image type,0x00000000
+pngsuite/corrupt/xcsn0g01.png,32,32,1,,0x43b9891f
+pngsuite/corrupt/xd0n2c08.png,32,32,1,1/2/4/8/16-bit only,0x00000000
+pngsuite/corrupt/xd3n2c08.png,32,32,1,1/2/4/8/16-bit only,0x00000000
+pngsuite/corrupt/xd9n2c08.png,32,32,1,1/2/4/8/16-bit only,0x00000000
+pngsuite/corrupt/xdtn0g01.png,32,32,1,no IDAT,0x00000000
+pngsuite/corrupt/xhdn0g08.png,32,32,1,,0x414f1ca9
+pngsuite/corrupt/xlfn0g04.png,32,32,1,unknown image type,0x00000000
+pngsuite/corrupt/xs1n0g01.png,32,32,1,unknown image type,0x00000000
+pngsuite/corrupt/xs2n0g01.png,32,32,1,unknown image type,0x00000000
+pngsuite/corrupt/xs4n0g01.png,32,32,1,unknown image type,0x00000000
+pngsuite/corrupt/xs7n0g01.png,32,32,1,unknown image type,0x00000000
+pngsuite/iphone/iphone_basi0g01.png,32,32,4,,0x5fb33cfd
+pngsuite/iphone/iphone_basi0g02.png,32,32,4,,0x5bbe95c5
+pngsuite/iphone/iphone_basi3p02.png,32,32,4,,0x50ba29c5
+pngsuite/iphone/iphone_bgwn6a08.png,32,32,4,,0x45d8548a
+pngsuite/iphone/iphone_bgyn6a16.png,32,32,4,,0x4b2b7545
+pngsuite/iphone/iphone_tbyn3p08.png,32,32,4,,0x8ea9aaaf
+pngsuite/iphone/iphone_z06n2c08.png,32,32,4,,0xb5dd034b
+pngsuite/primary/basi0g01.png,32,32,1,,0x43b9891f
+pngsuite/primary/basi0g02.png,32,32,1,,0xaf0bb3c5
+pngsuite/primary/basi0g04.png,32,32,1,,0x6fbaeb45
+pngsuite/primary/basi0g08.png,32,32,1,,0x414f1ca9
+pngsuite/primary/basi2c08.png,32,32,3,,0x522345c5
+pngsuite/primary/basi3p01.png,32,32,3,,0x9c5b75c5
+pngsuite/primary/basi3p02.png,32,32,3,,0x46f26ec5
+pngsuite/primary/basi3p04.png,32,32,3,,0x35b2e4a5
+pngsuite/primary/basi3p08.png,32,32,3,,0xfe066865
+pngsuite/primary/basi4a08.png,32,32,2,,0x77cbbfa5
+pngsuite/primary/basi6a08.png,32,32,4,,0xb472197d
+pngsuite/primary/basn0g01.png,32,32,1,,0x43b9891f
+pngsuite/primary/basn0g02.png,32,32,1,,0xaf0bb3c5
+pngsuite/primary/basn0g04.png,32,32,1,,0x6fbaeb45
+pngsuite/primary/basn0g08.png,32,32,1,,0x414f1ca9
+pngsuite/primary/basn2c08.png,32,32,3,,0x522345c5
+pngsuite/primary/basn3p01.png,32,32,3,,0x9c5b75c5
+pngsuite/primary/basn3p02.png,32,32,3,,0x46f26ec5
+pngsuite/primary/basn3p04.png,32,32,3,,0x35b2e4a5
+pngsuite/primary/basn3p08.png,32,32,3,,0xfe066865
+pngsuite/primary/basn4a08.png,32,32,2,,0x77cbbfa5
+pngsuite/primary/basn6a08.png,32,32,4,,0xb472197d
+pngsuite/primary/bgai4a08.png,32,32,2,,0x77cbbfa5
+pngsuite/primary/bgan6a08.png,32,32,4,,0xb472197d
+pngsuite/primary/bgbn4a08.png,32,32,2,,0x77cbbfa5
+pngsuite/primary/bgwn6a08.png,32,32,4,,0xb472197d
+pngsuite/primary/s01i3p01.png,1,1,3,,0xafb003b6
+pngsuite/primary/s01n3p01.png,1,1,3,,0xafb003b6
+pngsuite/primary/s02i3p01.png,2,2,3,,0x96f3dd85
+pngsuite/primary/s02n3p01.png,2,2,3,,0x96f3dd85
+pngsuite/primary/s03i3p01.png,3,3,3,,0xb0cf1241
+pngsuite/primary/s03n3p01.png,3,3,3,,0xb0cf1241
+pngsuite/primary/s04i3p01.png,4,4,3,,0xbfcedd75
+pngsuite/primary/s04n3p01.png,4,4,3,,0xbfcedd75
+pngsuite/primary/s05i3p02.png,5,5,3,,0xc322cedd
+pngsuite/primary/s05n3p02.png,5,5,3,,0xc322cedd
+pngsuite/primary/s06i3p02.png,6,6,3,,0x46916799
+pngsuite/primary/s06n3p02.png,6,6,3,,0x46916799
+pngsuite/primary/s07i3p02.png,7,7,3,,0xfdabc297
+pngsuite/primary/s07n3p02.png,7,7,3,,0xfdabc297
+pngsuite/primary/s08i3p02.png,8,8,3,,0x8f036d09
+pngsuite/primary/s08n3p02.png,8,8,3,,0x8f036d09
+pngsuite/primary/s09i3p02.png,9,9,3,,0x16a46830
+pngsuite/primary/s09n3p02.png,9,9,3,,0x16a46830
+pngsuite/primary/s32i3p04.png,32,32,3,,0x4bd4fbd3
+pngsuite/primary/s32n3p04.png,32,32,3,,0x4bd4fbd3
+pngsuite/primary/s33i3p04.png,33,33,3,,0x51aa005e
+pngsuite/primary/s33n3p04.png,33,33,3,,0x51aa005e
+pngsuite/primary/s34i3p04.png,34,34,3,,0x84818775
+pngsuite/primary/s34n3p04.png,34,34,3,,0x84818775
+pngsuite/primary/s35i3p04.png,35,35,3,,0x6359ec75
+pngsuite/primary/s35n3p04.png,35,35,3,,0x6359ec75
+pngsuite/primary/s36i3p04.png,36,36,3,,0xe4878065
+pngsuite/primary/s36n3p04.png,36,36,3,,0xe4878065
+pngsuite/primary/s37i3p04.png,37,37,3,,0x3cefc423
+pngsuite/primary/s37n3p04.png,37,37,3,,0x3cefc423
+pngsuite/primary/s38i3p04.png,38,38,3,,0xffc55a2b
+pngsuite/primary/s38n3p04.png,38,38,3,,0xffc55a2b
+pngsuite/primary/s39i3p04.png,39,39,3,,0x0c790240
+pngsuite/primary/s39n3p04.png,39,39,3,,0x0c790240
+pngsuite/primary/s40i3p04.png,40,40,3,,0x951a316d
+pngsuite/primary/s40n3p04.png,40,40,3,,0x951a316d
+pngsuite/primary/tbbn0g04.png,32,32,2,,0x9c8410ea
+pngsuite/primary/tbbn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary/tbgn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary/tbrn2c08.png,32,32,4,,0xaa9bfe44
+pngsuite/primary/tbwn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary/tbyn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary/tm3n3p02.png,32,32,4,,0xf59745c5
+pngsuite/primary/tp0n0g08.png,32,32,1,,0xbac0864c
+pngsuite/primary/tp0n2c08.png,32,32,3,,0x82687c37
+pngsuite/primary/tp0n3p08.png,32,32,3,,0x61f54e37
+pngsuite/primary/tp1n3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary/z00n2c08.png,32,32,3,,0x65b4a72f
+pngsuite/primary/z03n2c08.png,32,32,3,,0x65b4a72f
+pngsuite/primary/z06n2c08.png,32,32,3,,0x65b4a72f
+pngsuite/primary/z09n2c08.png,32,32,3,,0x65b4a72f
+pngsuite/primary_check/basi0g01.png,32,32,4,,0x5fb33cfd
+pngsuite/primary_check/basi0g02.png,32,32,4,,0x5bbe95c5
+pngsuite/primary_check/basi0g04.png,32,32,4,,0x3468b9c5
+pngsuite/primary_check/basi0g08.png,32,32,4,,0x262ef46d
+pngsuite/primary_check/basi2c08.png,32,32,4,,0x1fc92bc5
+pngsuite/primary_check/basi3p01.png,32,32,4,,0x28a3e1c5
+pngsuite/primary_check/basi3p02.png,32,32,4,,0x803be5c5
+pngsuite/primary_check/basi3p04.png,32,32,4,,0xf3fc60e5
+pngsuite/primary_check/basi3p08.png,32,32,4,,0x30ef4f45
+pngsuite/primary_check/basi4a08.png,32,32,4,,0x23c8536d
+pngsuite/primary_check/basi6a08.png,32,32,4,,0xb472197d
+pngsuite/primary_check/basn0g01.png,32,32,4,,0x5fb33cfd
+pngsuite/primary_check/basn0g02.png,32,32,4,,0x5bbe95c5
+pngsuite/primary_check/basn0g04.png,32,32,4,,0x3468b9c5
+pngsuite/primary_check/basn0g08.png,32,32,4,,0x262ef46d
+pngsuite/primary_check/basn2c08.png,32,32,4,,0x1fc92bc5
+pngsuite/primary_check/basn3p01.png,32,32,4,,0x28a3e1c5
+pngsuite/primary_check/basn3p02.png,32,32,4,,0x803be5c5
+pngsuite/primary_check/basn3p04.png,32,32,4,,0xf3fc60e5
+pngsuite/primary_check/basn3p08.png,32,32,4,,0x30ef4f45
+pngsuite/primary_check/basn4a08.png,32,32,4,,0x23c8536d
+pngsuite/primary_check/basn6a08.png,32,32,4,,0xb472197d
+pngsuite/primary_check/bgai4a08.png,32,32,4,,0x23c8536d
+pngsuite/primary_check/bgan6a08.png,32,32,4,,0xb472197d
+pngsuite/primary_check/bgbn4a08.png,32,32,4,,0x23c8536d
+pngsuite/primary_check/bgwn6a08.png,32,32,4,,0xb472197d
+pngsuite/primary_check/s01i3p01.png,1,1,4,,0xdb152beb
+pngsuite/primary_check/s01n3p01.png,1,1,4,,0xdb152beb
+pngsuite/primary_check/s02i3p01.png,2,2,4,,0xa344a3a5
+pngsuite/primary_check/s02n3p01.png,2,2,4,,0xa344a3a5
+pngsuite/primary_check/s03i3p01.png,3,3,4,,0x594d3bfa
+pngsuite/primary_check/s03n3p01.png,3,3,4,,0x594d3bfa
+pngsuite/primary_check/s04i3p01.png,4,4,4,,0xd59d4605
+pngsuite/primary_check/s04n3p01.png,4,4,4,,0xd59d4605
+pngsuite/primary_check/s05i3p02.png,5,5,4,,0x41e58366
+pngsuite/primary_check/s05n3p02.png,5,5,4,,0x41e58366
+pngsuite/primary_check/s06i3p02.png,6,6,4,,0xcad1a885
+pngsuite/primary_check/s06n3p02.png,6,6,4,,0xcad1a885
+pngsuite/primary_check/s07i3p02.png,7,7,4,,0x09184108
+pngsuite/primary_check/s07n3p02.png,7,7,4,,0x09184108
+pngsuite/primary_check/s08i3p02.png,8,8,4,,0x4fd11cad
+pngsuite/primary_check/s08n3p02.png,8,8,4,,0x4fd11cad
+pngsuite/primary_check/s09i3p02.png,9,9,4,,0xc50dbecd
+pngsuite/primary_check/s09n3p02.png,9,9,4,,0xc50dbecd
+pngsuite/primary_check/s32i3p04.png,32,32,4,,0x95cbb1d3
+pngsuite/primary_check/s32n3p04.png,32,32,4,,0x95cbb1d3
+pngsuite/primary_check/s33i3p04.png,33,33,4,,0x6649fc5b
+pngsuite/primary_check/s33n3p04.png,33,33,4,,0x6649fc5b
+pngsuite/primary_check/s34i3p04.png,34,34,4,,0x35b98e15
+pngsuite/primary_check/s34n3p04.png,34,34,4,,0x35b98e15
+pngsuite/primary_check/s35i3p04.png,35,35,4,,0xc9ddf938
+pngsuite/primary_check/s35n3p04.png,35,35,4,,0xc9ddf938
+pngsuite/primary_check/s36i3p04.png,36,36,4,,0x7bb4e1cd
+pngsuite/primary_check/s36n3p04.png,36,36,4,,0x7bb4e1cd
+pngsuite/primary_check/s37i3p04.png,37,37,4,,0xee50001c
+pngsuite/primary_check/s37n3p04.png,37,37,4,,0xee50001c
+pngsuite/primary_check/s38i3p04.png,38,38,4,,0x51b76813
+pngsuite/primary_check/s38n3p04.png,38,38,4,,0x51b76813
+pngsuite/primary_check/s39i3p04.png,39,39,4,,0x42f23327
+pngsuite/primary_check/s39n3p04.png,39,39,4,,0x42f23327
+pngsuite/primary_check/s40i3p04.png,40,40,4,,0xf91b6a7d
+pngsuite/primary_check/s40n3p04.png,40,40,4,,0xf91b6a7d
+pngsuite/primary_check/tbbn0g04.png,32,32,4,,0x8a0117a4
+pngsuite/primary_check/tbbn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary_check/tbgn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary_check/tbrn2c08.png,32,32,4,,0xaa9bfe44
+pngsuite/primary_check/tbwn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary_check/tbyn3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary_check/tm3n3p02.png,32,32,4,,0xf59745c5
+pngsuite/primary_check/tp0n0g08.png,32,32,4,,0xd405ad2e
+pngsuite/primary_check/tp0n2c08.png,32,32,4,,0x5a66ca09
+pngsuite/primary_check/tp0n3p08.png,32,32,4,,0x06e81adf
+pngsuite/primary_check/tp1n3p08.png,32,32,4,,0x82bf9a57
+pngsuite/primary_check/z00n2c08.png,32,32,4,,0xaa698493
+pngsuite/primary_check/z03n2c08.png,32,32,4,,0xaa698493
+pngsuite/primary_check/z06n2c08.png,32,32,4,,0xaa698493
+pngsuite/primary_check/z09n2c08.png,32,32,4,,0xaa698493
+pngsuite/unused/ccwn2c08.png,32,32,3,,0xbb576418
+pngsuite/unused/ccwn3p08.png,32,32,3,,0x5c4df060
+pngsuite/unused/cdfn2c08.png,8,32,3,,0xe30ed48f
+pngsuite/unused/cdhn2c08.png,32,8,3,,0x999321f5
+pngsuite/unused/cdsn2c08.png,8,8,3,,0x7f63fa01
+pngsuite/unused/cdun2c08.png,32,32,3,,0xbd325d71
+pngsuite/unused/ch1n3p04.png,32,32,3,,0x35b2e4a5
+pngsuite/unused/ch2n3p08.png,32,32,3,,0xfe066865
+pngsuite/unused/cm0n0g04.png,32,32,1,,0xe9f53e6c
+pngsuite/unused/cm7n0g04.png,32,32,1,,0xe9f53e6c
+pngsuite/unused/cm9n0g04.png,32,32,1,,0xe9f53e6c
+pngsuite/unused/cs3n2c16.png,32,32,3,,0x7f0fa2c5
+pngsuite/unused/cs3n3p08.png,32,32,3,,0x5533bac5
+pngsuite/unused/cs5n2c08.png,32,32,3,,0x8a80f8c5
+pngsuite/unused/cs5n3p08.png,32,32,3,,0x8a80f8c5
+pngsuite/unused/cs8n2c08.png,32,32,3,,0x7f0fa2c5
+pngsuite/unused/cs8n3p08.png,32,32,3,,0x7f0fa2c5
+pngsuite/unused/ct0n0g04.png,32,32,1,,0xe9f53e6c
+pngsuite/unused/ct1n0g04.png,32,32,1,,0xe9f53e6c
+pngsuite/unused/cten0g04.png,32,32,1,,0x1c073b45
+pngsuite/unused/ctfn0g04.png,32,32,1,,0xfa9fd205
+pngsuite/unused/ctgn0g04.png,32,32,1,,0xf28c8085
+pngsuite/unused/cthn0g04.png,32,32,1,,0x7c039595
+pngsuite/unused/ctjn0g04.png,32,32,1,,0xc520f455
+pngsuite/unused/ctzn0g04.png,32,32,1,,0xe9f53e6c
+pngsuite/unused/f00n0g08.png,32,32,1,,0x21db411b
+pngsuite/unused/f00n2c08.png,32,32,3,,0x1f25ded0
+pngsuite/unused/f01n0g08.png,32,32,1,,0x7437b32a
+pngsuite/unused/f01n2c08.png,32,32,3,,0x0d4507ae
+pngsuite/unused/f02n0g08.png,32,32,1,,0x6b633c7c
+pngsuite/unused/f02n2c08.png,32,32,3,,0x4b278986
+pngsuite/unused/f03n0g08.png,32,32,1,,0x2f31c08e
+pngsuite/unused/f03n2c08.png,32,32,3,,0x843ecc7e
+pngsuite/unused/f04n0g08.png,32,32,1,,0xfd3a0b73
+pngsuite/unused/f04n2c08.png,32,32,3,,0x557174bc
+pngsuite/unused/f99n0g04.png,32,32,1,,0xb79aa6e1
+pngsuite/unused/g03n0g16.png,32,32,1,,0xecd13817
+pngsuite/unused/g03n2c08.png,32,32,3,,0x242407a8
+pngsuite/unused/g03n3p04.png,32,32,3,,0xe801ecc8
+pngsuite/unused/g04n0g16.png,32,32,1,,0xc11bc972
+pngsuite/unused/g04n2c08.png,32,32,3,,0xdf843cc4
+pngsuite/unused/g04n3p04.png,32,32,3,,0x60e41f3b
+pngsuite/unused/g05n0g16.png,32,32,1,,0xbe6615a5
+pngsuite/unused/g05n2c08.png,32,32,3,,0x5c312116
+pngsuite/unused/g05n3p04.png,32,32,3,,0x2e0fbf86
+pngsuite/unused/g07n0g16.png,32,32,1,,0x2b54a398
+pngsuite/unused/g07n2c08.png,32,32,3,,0xf765fb10
+pngsuite/unused/g07n3p04.png,32,32,3,,0x9a8c3338
+pngsuite/unused/g10n0g16.png,32,32,1,,0xb08a92e1
+pngsuite/unused/g10n2c08.png,32,32,3,,0xa43f2291
+pngsuite/unused/g10n3p04.png,32,32,3,,0xb733194c
+pngsuite/unused/g25n0g16.png,32,32,1,,0xa6b1f5dd
+pngsuite/unused/g25n2c08.png,32,32,3,,0x767aee0c
+pngsuite/unused/g25n3p04.png,32,32,3,,0x4cf349a8
+pngsuite/unused/pp0n2c16.png,32,32,3,,0x65567ed5
+pngsuite/unused/pp0n6a08.png,32,32,4,,0x3188c645
+pngsuite/unused/ps1n0g08.png,32,32,1,,0x414f1ca9
+pngsuite/unused/ps1n2c16.png,32,32,3,,0x65567ed5
+pngsuite/unused/ps2n0g08.png,32,32,1,,0x414f1ca9
+pngsuite/unused/ps2n2c16.png,32,32,3,,0x65567ed5
diff --git a/vendor/stb/tests/pngsuite/unused/ccwn2c08.png b/vendor/stb/tests/pngsuite/unused/ccwn2c08.png
new file mode 100644
index 0000000..47c2481
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ccwn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ccwn3p08.png b/vendor/stb/tests/pngsuite/unused/ccwn3p08.png
new file mode 100644
index 0000000..8bb2c10
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ccwn3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cdfn2c08.png b/vendor/stb/tests/pngsuite/unused/cdfn2c08.png
new file mode 100644
index 0000000..559e526
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cdfn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cdhn2c08.png b/vendor/stb/tests/pngsuite/unused/cdhn2c08.png
new file mode 100644
index 0000000..3e07e8e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cdhn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cdsn2c08.png b/vendor/stb/tests/pngsuite/unused/cdsn2c08.png
new file mode 100644
index 0000000..076c32c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cdsn2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cdun2c08.png b/vendor/stb/tests/pngsuite/unused/cdun2c08.png
new file mode 100644
index 0000000..846033b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cdun2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ch1n3p04.png b/vendor/stb/tests/pngsuite/unused/ch1n3p04.png
new file mode 100644
index 0000000..17cd12d
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ch1n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ch2n3p08.png b/vendor/stb/tests/pngsuite/unused/ch2n3p08.png
new file mode 100644
index 0000000..25c1798
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ch2n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cm0n0g04.png b/vendor/stb/tests/pngsuite/unused/cm0n0g04.png
new file mode 100644
index 0000000..9fba5db
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cm0n0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cm7n0g04.png b/vendor/stb/tests/pngsuite/unused/cm7n0g04.png
new file mode 100644
index 0000000..f7dc46e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cm7n0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cm9n0g04.png b/vendor/stb/tests/pngsuite/unused/cm9n0g04.png
new file mode 100644
index 0000000..dd70911
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cm9n0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cs3n2c16.png b/vendor/stb/tests/pngsuite/unused/cs3n2c16.png
new file mode 100644
index 0000000..bf5fd20
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cs3n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cs3n3p08.png b/vendor/stb/tests/pngsuite/unused/cs3n3p08.png
new file mode 100644
index 0000000..f4a6623
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cs3n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cs5n2c08.png b/vendor/stb/tests/pngsuite/unused/cs5n2c08.png
new file mode 100644
index 0000000..40f947c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cs5n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cs5n3p08.png b/vendor/stb/tests/pngsuite/unused/cs5n3p08.png
new file mode 100644
index 0000000..dfd6e6e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cs5n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cs8n2c08.png b/vendor/stb/tests/pngsuite/unused/cs8n2c08.png
new file mode 100644
index 0000000..8e01d32
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cs8n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cs8n3p08.png b/vendor/stb/tests/pngsuite/unused/cs8n3p08.png
new file mode 100644
index 0000000..a44066e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cs8n3p08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ct0n0g04.png b/vendor/stb/tests/pngsuite/unused/ct0n0g04.png
new file mode 100644
index 0000000..40d1e06
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ct0n0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ct1n0g04.png b/vendor/stb/tests/pngsuite/unused/ct1n0g04.png
new file mode 100644
index 0000000..3ba110a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ct1n0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cten0g04.png b/vendor/stb/tests/pngsuite/unused/cten0g04.png
new file mode 100644
index 0000000..a6a56fa
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cten0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ctfn0g04.png b/vendor/stb/tests/pngsuite/unused/ctfn0g04.png
new file mode 100644
index 0000000..353873e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ctfn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ctgn0g04.png b/vendor/stb/tests/pngsuite/unused/ctgn0g04.png
new file mode 100644
index 0000000..453f2b0
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ctgn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/cthn0g04.png b/vendor/stb/tests/pngsuite/unused/cthn0g04.png
new file mode 100644
index 0000000..8fce253
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/cthn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ctjn0g04.png b/vendor/stb/tests/pngsuite/unused/ctjn0g04.png
new file mode 100644
index 0000000..a77b8d2
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ctjn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ctzn0g04.png b/vendor/stb/tests/pngsuite/unused/ctzn0g04.png
new file mode 100644
index 0000000..b4401c9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ctzn0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f00n0g08.png b/vendor/stb/tests/pngsuite/unused/f00n0g08.png
new file mode 100644
index 0000000..45a0075
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f00n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f00n2c08.png b/vendor/stb/tests/pngsuite/unused/f00n2c08.png
new file mode 100644
index 0000000..d6a1fff
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f00n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f01n0g08.png b/vendor/stb/tests/pngsuite/unused/f01n0g08.png
new file mode 100644
index 0000000..4a1107b
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f01n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f01n2c08.png b/vendor/stb/tests/pngsuite/unused/f01n2c08.png
new file mode 100644
index 0000000..26fee95
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f01n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f02n0g08.png b/vendor/stb/tests/pngsuite/unused/f02n0g08.png
new file mode 100644
index 0000000..bfe410c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f02n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f02n2c08.png b/vendor/stb/tests/pngsuite/unused/f02n2c08.png
new file mode 100644
index 0000000..e590f12
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f02n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f03n0g08.png b/vendor/stb/tests/pngsuite/unused/f03n0g08.png
new file mode 100644
index 0000000..ed01e29
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f03n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f03n2c08.png b/vendor/stb/tests/pngsuite/unused/f03n2c08.png
new file mode 100644
index 0000000..7581150
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f03n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f04n0g08.png b/vendor/stb/tests/pngsuite/unused/f04n0g08.png
new file mode 100644
index 0000000..663fdae
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f04n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f04n2c08.png b/vendor/stb/tests/pngsuite/unused/f04n2c08.png
new file mode 100644
index 0000000..3c8b511
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f04n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/f99n0g04.png b/vendor/stb/tests/pngsuite/unused/f99n0g04.png
new file mode 100644
index 0000000..0b521c1
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/f99n0g04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g03n0g16.png b/vendor/stb/tests/pngsuite/unused/g03n0g16.png
new file mode 100644
index 0000000..41083ca
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g03n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g03n2c08.png b/vendor/stb/tests/pngsuite/unused/g03n2c08.png
new file mode 100644
index 0000000..a9354db
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g03n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g03n3p04.png b/vendor/stb/tests/pngsuite/unused/g03n3p04.png
new file mode 100644
index 0000000..60396c9
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g03n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g04n0g16.png b/vendor/stb/tests/pngsuite/unused/g04n0g16.png
new file mode 100644
index 0000000..32395b7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g04n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g04n2c08.png b/vendor/stb/tests/pngsuite/unused/g04n2c08.png
new file mode 100644
index 0000000..a652b0c
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g04n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g04n3p04.png b/vendor/stb/tests/pngsuite/unused/g04n3p04.png
new file mode 100644
index 0000000..5661cc3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g04n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g05n0g16.png b/vendor/stb/tests/pngsuite/unused/g05n0g16.png
new file mode 100644
index 0000000..70b37f0
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g05n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g05n2c08.png b/vendor/stb/tests/pngsuite/unused/g05n2c08.png
new file mode 100644
index 0000000..932c136
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g05n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g05n3p04.png b/vendor/stb/tests/pngsuite/unused/g05n3p04.png
new file mode 100644
index 0000000..9619930
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g05n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g07n0g16.png b/vendor/stb/tests/pngsuite/unused/g07n0g16.png
new file mode 100644
index 0000000..d6a47c2
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g07n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g07n2c08.png b/vendor/stb/tests/pngsuite/unused/g07n2c08.png
new file mode 100644
index 0000000..5973464
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g07n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g07n3p04.png b/vendor/stb/tests/pngsuite/unused/g07n3p04.png
new file mode 100644
index 0000000..c73fb61
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g07n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g10n0g16.png b/vendor/stb/tests/pngsuite/unused/g10n0g16.png
new file mode 100644
index 0000000..85f2c95
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g10n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g10n2c08.png b/vendor/stb/tests/pngsuite/unused/g10n2c08.png
new file mode 100644
index 0000000..b303997
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g10n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g10n3p04.png b/vendor/stb/tests/pngsuite/unused/g10n3p04.png
new file mode 100644
index 0000000..1b6a6be
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g10n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g25n0g16.png b/vendor/stb/tests/pngsuite/unused/g25n0g16.png
new file mode 100644
index 0000000..a9f6787
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g25n0g16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g25n2c08.png b/vendor/stb/tests/pngsuite/unused/g25n2c08.png
new file mode 100644
index 0000000..03f505a
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g25n2c08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/g25n3p04.png b/vendor/stb/tests/pngsuite/unused/g25n3p04.png
new file mode 100644
index 0000000..4f943c6
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/g25n3p04.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/pp0n2c16.png b/vendor/stb/tests/pngsuite/unused/pp0n2c16.png
new file mode 100644
index 0000000..8f2aad7
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/pp0n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/pp0n6a08.png b/vendor/stb/tests/pngsuite/unused/pp0n6a08.png
new file mode 100644
index 0000000..4ed7a30
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/pp0n6a08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ps1n0g08.png b/vendor/stb/tests/pngsuite/unused/ps1n0g08.png
new file mode 100644
index 0000000..99625fa
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ps1n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ps1n2c16.png b/vendor/stb/tests/pngsuite/unused/ps1n2c16.png
new file mode 100644
index 0000000..0c7a6b3
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ps1n2c16.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ps2n0g08.png b/vendor/stb/tests/pngsuite/unused/ps2n0g08.png
new file mode 100644
index 0000000..90b2979
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ps2n0g08.png differ
diff --git a/vendor/stb/tests/pngsuite/unused/ps2n2c16.png b/vendor/stb/tests/pngsuite/unused/ps2n2c16.png
new file mode 100644
index 0000000..a4a181e
Binary files /dev/null and b/vendor/stb/tests/pngsuite/unused/ps2n2c16.png differ
diff --git a/vendor/stb/tests/prerelease/stb_lib.h b/vendor/stb/tests/prerelease/stb_lib.h
new file mode 100644
index 0000000..0a6eead
--- /dev/null
+++ b/vendor/stb/tests/prerelease/stb_lib.h
@@ -0,0 +1,3305 @@
+/* stb_lib.h - v1.00 - http://nothings.org/stb
+   no warranty is offered or implied; use this code at your own risk
+
+ ============================================================================
+   You MUST                                                                  
+                                                                             
+      #define STB_LIB_IMPLEMENTATION
+                                                                             
+   in EXACTLY _one_ C or C++ file that includes this header, BEFORE the
+   include, like this:                                                                
+                                                                             
+      #define STB_LIB_IMPLEMENTATION
+      #include "stblib_files.h"
+      
+   All other files should just #include "stblib_files.h" without the #define.
+ ============================================================================
+
+LICENSE
+
+ See end of file for license information.
+
+CREDITS
+
+ Written by Sean Barrett.
+
+ Fixes:
+  Philipp Wiesemann    Robert Nix
+  r-lyeh               blackpawn
+  github:Mojofreem     Ryan Whitworth
+  Vincent Isambart     Mike Sartain
+  Eugene Opalev        Tim Sjostrand
+  github:infatum       Dave Butler
+*/
+
+#ifndef STB_INCLUDE_STB_LIB_H
+
+#include <stdarg.h>
+
+#if defined(_WIN32) && !defined(__MINGW32__)
+   #ifndef _CRT_SECURE_NO_WARNINGS
+   #define _CRT_SECURE_NO_WARNINGS
+   #endif
+   #ifndef _CRT_NONSTDC_NO_DEPRECATE
+   #define _CRT_NONSTDC_NO_DEPRECATE
+   #endif
+   #ifndef _CRT_NON_CONFORMING_SWPRINTFS
+   #define _CRT_NON_CONFORMING_SWPRINTFS
+   #endif
+   #if !defined(_MSC_VER) || _MSC_VER > 1700
+   #include <intrin.h> // _BitScanReverse
+   #endif
+#endif
+
+#include <stdlib.h>     // stdlib could have min/max
+#include <stdio.h>      // need FILE
+#include <string.h>     // stb_define_hash needs memcpy/memset
+#include <time.h>       // stb_dirtree
+
+typedef unsigned char stb_uchar;
+typedef unsigned char stb_uint8;
+typedef unsigned int  stb_uint;
+typedef unsigned short stb_uint16;
+typedef          short stb_int16;
+typedef   signed char  stb_int8;
+#if defined(STB_USE_LONG_FOR_32_BIT_INT) || defined(STB_LONG32)
+  typedef unsigned long  stb_uint32;
+  typedef          long  stb_int32;
+#else
+  typedef unsigned int   stb_uint32;
+  typedef          int   stb_int32;
+#endif
+typedef char stb__testsize2_16[sizeof(stb_uint16)==2 ? 1 : -1];
+typedef char stb__testsize2_32[sizeof(stb_uint32)==4 ? 1 : -1];
+
+#ifdef _MSC_VER
+  typedef unsigned __int64 stb_uint64;
+  typedef          __int64 stb_int64;
+  #define STB_IMM_UINT64(literalui64) (literalui64##ui64)
+#else
+  // ??
+  typedef unsigned long long stb_uint64;
+  typedef          long long stb_int64;
+  #define STB_IMM_UINT64(literalui64) (literalui64##ULL)
+#endif
+typedef char stb__testsize2_64[sizeof(stb_uint64)==8 ? 1 : -1];
+
+#ifdef __cplusplus
+   #define STB_EXTERN   extern "C"
+#else
+   #define STB_EXTERN   extern
+#endif
+
+// check for well-known debug defines
+#if defined(DEBUG) || defined(_DEBUG) || defined(DBG)
+   #ifndef NDEBUG
+      #define STB_DEBUG
+   #endif
+#endif
+
+#ifdef STB_DEBUG
+   #include <assert.h>
+#endif
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+   #include <assert.h>
+   #include <stdarg.h>
+   #include <stddef.h>
+   #include <ctype.h>
+   #include <math.h>
+   #ifndef _WIN32
+   #include <unistd.h>
+   #else
+   #include <io.h>      // _mktemp
+   #include <direct.h>  // _rmdir
+   #endif
+   #include <sys/types.h> // stat()/_stat()
+   #include <sys/stat.h>  // stat()/_stat()
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Miscellany
+//
+
+#ifdef _WIN32
+   #define stb_stricmp(a,b)    stricmp(a,b)
+   #define stb_strnicmp(a,b,n) strnicmp(a,b,n)
+#else
+   #define stb_stricmp(a,b)    strcasecmp(a,b)
+   #define stb_strnicmp(a,b,n) strncasecmp(a,b,n)
+#endif
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN void stb_fatal(char *fmt, ...);
+STB_EXTERN void stb_swap(void *p, void *q, size_t sz);
+STB_EXTERN double stb_linear_remap(double x, double x_min, double x_max,
+                                  double out_min, double out_max);
+
+#define stb_arrcount(x)   (sizeof(x)/sizeof((x)[0]))
+#define stb_lerp(t,a,b)               ( (a) + (t) * (float) ((b)-(a)) )
+#define stb_unlerp(t,a,b)             ( ((t) - (a)) / (float) ((b) - (a)) )
+
+#endif
+
+
+#ifdef STB_LIB_IMPLEMENTATION
+void stb_fatal(char *s, ...)
+{
+   va_list a;
+   va_start(a,s);
+   fputs("Fatal error: ", stderr);
+   vfprintf(stderr, s, a);
+   va_end(a);
+   fputs("\n", stderr);
+   #ifdef STB_DEBUG
+   #ifdef _MSC_VER
+   #ifndef _WIN64
+   __asm int 3;   // trap to debugger!
+   #else
+   __debugbreak();
+   #endif
+   #else
+   __builtin_trap();
+   #endif
+   #endif
+   exit(1);
+}
+
+typedef struct { char d[4]; } stb__4;
+typedef struct { char d[8]; } stb__8;
+
+// optimize the small cases, though you shouldn't be calling this for those!
+void stb_swap(void *p, void *q, size_t sz)
+{
+   char buffer[256];
+   if (p == q) return;
+   if (sz == 4) {
+      stb__4 temp    = * ( stb__4 *) p;
+      * (stb__4 *) p = * ( stb__4 *) q;
+      * (stb__4 *) q = temp;
+      return;
+   } else if (sz == 8) {
+      stb__8 temp    = * ( stb__8 *) p;
+      * (stb__8 *) p = * ( stb__8 *) q;
+      * (stb__8 *) q = temp;
+      return;
+   }
+
+   while (sz > sizeof(buffer)) {
+      stb_swap(p, q, sizeof(buffer));
+      p = (char *) p + sizeof(buffer);
+      q = (char *) q + sizeof(buffer);
+      sz -= sizeof(buffer);
+   }
+
+   memcpy(buffer, p     , sz);
+   memcpy(p     , q     , sz);
+   memcpy(q     , buffer, sz);
+}
+
+#ifdef stb_linear_remap
+#undef stb_linear_remap
+#endif
+
+double stb_linear_remap(double x, double x_min, double x_max,
+                                  double out_min, double out_max)
+{
+   return stb_lerp(stb_unlerp(x,x_min,x_max),out_min,out_max);
+}
+
+#define stb_linear_remap(t,a,b,c,d)   stb_lerp(stb_unlerp(t,a,b),c,d)
+#endif // STB_LIB_IMPLEMENTATION
+
+#ifndef STB_INCLUDE_STB_LIB_H
+// avoid unnecessary function call, but define function so its address can be taken
+#ifndef stb_linear_remap
+#define stb_linear_remap(t,a,b,c,d)   stb_lerp(stb_unlerp(t,a,b),c,d)
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//     cross-platform snprintf because they keep changing that,
+//     and with old compilers without vararg macros we can't write
+//     a macro wrapper to fix it up
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN int  stb_snprintf(char *s, size_t n, const char *fmt, ...);
+STB_EXTERN int  stb_vsnprintf(char *s, size_t n, const char *fmt, va_list v);
+STB_EXTERN char *stb_sprintf(const char *fmt, ...);
+#endif
+
+#ifdef STB_LIB_IMPLEMENTATION
+
+int stb_vsnprintf(char *s, size_t n, const char *fmt, va_list v)
+{
+   int res;
+   #ifdef _WIN32
+   // Could use "_vsnprintf_s(s, n, _TRUNCATE, fmt, v)" ?
+   res = _vsnprintf(s,n,fmt,v);
+   #else
+   res = vsnprintf(s,n,fmt,v);
+   #endif
+   if (n) s[n-1] = 0;
+   // Unix returns length output would require, Windows returns negative when truncated.
+   return (res >= (int) n || res < 0) ? -1 : res;
+}
+
+int stb_snprintf(char *s, size_t n, const char *fmt, ...)
+{
+   int res;
+   va_list v;
+   va_start(v,fmt);
+   res = stb_vsnprintf(s, n, fmt, v);
+   va_end(v);
+   return res;
+}
+
+char *stb_sprintf(const char *fmt, ...)
+{
+   static char buffer[1024];
+   va_list v;
+   va_start(v,fmt);
+   stb_vsnprintf(buffer,1024,fmt,v);
+   va_end(v);
+   return buffer;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                         Windows UTF8 filename handling
+//
+// Windows stupidly treats 8-bit filenames as some dopey code page,
+// rather than utf-8. If we want to use utf8 filenames, we have to
+// convert them to WCHAR explicitly and call WCHAR versions of the
+// file functions. So, ok, we do.
+
+
+#ifndef STB_INCLUDE_STB_LIB_H
+#ifdef _WIN32
+   #define stb__fopen(x,y)    _wfopen((const wchar_t *)stb__from_utf8(x), (const wchar_t *)stb__from_utf8_alt(y))
+   #define stb__windows(x,y)  x
+#else
+   #define stb__fopen(x,y)    fopen(x,y)
+   #define stb__windows(x,y)  y
+#endif
+
+
+typedef unsigned short stb__wchar;
+
+STB_EXTERN stb__wchar * stb_from_utf8(stb__wchar *buffer, char *str, int n);
+STB_EXTERN char       * stb_to_utf8  (char *buffer, stb__wchar *str, int n);
+
+STB_EXTERN stb__wchar *stb__from_utf8(char *str);
+STB_EXTERN stb__wchar *stb__from_utf8_alt(char *str);
+STB_EXTERN char *stb__to_utf8(stb__wchar *str);
+#endif
+
+#ifdef STB_LIB_IMPLEMENTATION
+stb__wchar * stb_from_utf8(stb__wchar *buffer, char *ostr, int n)
+{
+   unsigned char *str = (unsigned char *) ostr;
+   stb_uint32 c;
+   int i=0;
+   --n;
+   while (*str) {
+      if (i >= n)
+         return NULL;
+      if (!(*str & 0x80))
+         buffer[i++] = *str++;
+      else if ((*str & 0xe0) == 0xc0) {
+         if (*str < 0xc2) return NULL;
+         c = (*str++ & 0x1f) << 6;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         buffer[i++] = c + (*str++ & 0x3f);
+      } else if ((*str & 0xf0) == 0xe0) {
+         if (*str == 0xe0 && (str[1] < 0xa0 || str[1] > 0xbf)) return NULL;
+         if (*str == 0xed && str[1] > 0x9f) return NULL; // str[1] < 0x80 is checked below
+         c = (*str++ & 0x0f) << 12;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f) << 6;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         buffer[i++] = c + (*str++ & 0x3f);
+      } else if ((*str & 0xf8) == 0xf0) {
+         if (*str > 0xf4) return NULL;
+         if (*str == 0xf0 && (str[1] < 0x90 || str[1] > 0xbf)) return NULL;
+         if (*str == 0xf4 && str[1] > 0x8f) return NULL; // str[1] < 0x80 is checked below
+         c = (*str++ & 0x07) << 18;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f) << 12;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f) << 6;
+         if ((*str & 0xc0) != 0x80) return NULL;
+         c += (*str++ & 0x3f);
+         // utf-8 encodings of values used in surrogate pairs are invalid
+         if ((c & 0xFFFFF800) == 0xD800) return NULL;
+         if (c >= 0x10000) {
+            c -= 0x10000;
+            if (i + 2 > n) return NULL;
+            buffer[i++] = 0xD800 | (0x3ff & (c >> 10));
+            buffer[i++] = 0xDC00 | (0x3ff & (c      ));
+         }
+      } else
+         return NULL;
+   }
+   buffer[i] = 0;
+   return buffer;
+}
+
+char * stb_to_utf8(char *buffer, stb__wchar *str, int n)
+{
+   int i=0;
+   --n;
+   while (*str) {
+      if (*str < 0x80) {
+         if (i+1 > n) return NULL;
+         buffer[i++] = (char) *str++;
+      } else if (*str < 0x800) {
+         if (i+2 > n) return NULL;
+         buffer[i++] = 0xc0 + (*str >> 6);
+         buffer[i++] = 0x80 + (*str & 0x3f);
+         str += 1;
+      } else if (*str >= 0xd800 && *str < 0xdc00) {
+         stb_uint32 c;
+         if (i+4 > n) return NULL;
+         c = ((str[0] - 0xd800) << 10) + ((str[1]) - 0xdc00) + 0x10000;
+         buffer[i++] = 0xf0 + (c >> 18);
+         buffer[i++] = 0x80 + ((c >> 12) & 0x3f);
+         buffer[i++] = 0x80 + ((c >>  6) & 0x3f);
+         buffer[i++] = 0x80 + ((c      ) & 0x3f);
+         str += 2;
+      } else if (*str >= 0xdc00 && *str < 0xe000) {
+         return NULL;
+      } else {
+         if (i+3 > n) return NULL;
+         buffer[i++] = 0xe0 + (*str >> 12);
+         buffer[i++] = 0x80 + ((*str >> 6) & 0x3f);
+         buffer[i++] = 0x80 + ((*str     ) & 0x3f);
+         str += 1;
+      }
+   }
+   buffer[i] = 0;
+   return buffer;
+}
+
+stb__wchar *stb__from_utf8(char *str)
+{
+   static stb__wchar buffer[4096];
+   return stb_from_utf8(buffer, str, 4096);
+}
+
+stb__wchar *stb__from_utf8_alt(char *str)
+{
+   static stb__wchar buffer[4096];
+   return stb_from_utf8(buffer, str, 4096);
+}
+
+char *stb__to_utf8(stb__wchar *str)
+{
+   static char buffer[4096];
+   return stb_to_utf8(buffer, str, 4096);
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                            qsort Compare Routines
+//                              NOT THREAD SAFE
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN int (*stb_intcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_qsort_strcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_qsort_stricmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_floatcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_doublecmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_ucharcmp(int offset))(const void *a, const void *b);
+STB_EXTERN int (*stb_charcmp(int offset))(const void *a, const void *b);
+#endif
+
+#ifdef STB_LIB_IMPLEMENTATION
+static int stb__intcmpoffset, stb__ucharcmpoffset, stb__strcmpoffset;
+static int stb__floatcmpoffset, stb__doublecmpoffset, stb__charcmpoffset;
+
+int stb__intcmp(const void *a, const void *b)
+{
+   const int p = *(const int *) ((const char *) a + stb__intcmpoffset);
+   const int q = *(const int *) ((const char *) b + stb__intcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__ucharcmp(const void *a, const void *b)
+{
+   const int p = *(const unsigned char *) ((const char *) a + stb__ucharcmpoffset);
+   const int q = *(const unsigned char *) ((const char *) b + stb__ucharcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__charcmp(const void *a, const void *b)
+{
+   const int p = *(const char *) ((const char *) a + stb__ucharcmpoffset);
+   const int q = *(const char *) ((const char *) b + stb__ucharcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__floatcmp(const void *a, const void *b)
+{
+   const float p = *(const float *) ((const char *) a + stb__floatcmpoffset);
+   const float q = *(const float *) ((const char *) b + stb__floatcmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__doublecmp(const void *a, const void *b)
+{
+   const double p = *(const double *) ((const char *) a + stb__doublecmpoffset);
+   const double q = *(const double *) ((const char *) b + stb__doublecmpoffset);
+   return p < q ? -1 : p > q;
+}
+
+int stb__qsort_strcmp(const void *a, const void *b)
+{
+   const char *p = *(const char **) ((const char *) a + stb__strcmpoffset);
+   const char *q = *(const char **) ((const char *) b + stb__strcmpoffset);
+   return strcmp(p,q);
+}
+
+int stb__qsort_stricmp(const void *a, const void *b)
+{
+   const char *p = *(const char **) ((const char *) a + stb__strcmpoffset);
+   const char *q = *(const char **) ((const char *) b + stb__strcmpoffset);
+   return stb_stricmp(p,q);
+}
+
+int (*stb_intcmp(int offset))(const void *, const void *)
+{
+   stb__intcmpoffset = offset;
+   return &stb__intcmp;
+}
+
+int (*stb_ucharcmp(int offset))(const void *, const void *)
+{
+   stb__ucharcmpoffset = offset;
+   return &stb__ucharcmp;
+}
+
+int (*stb_charcmp(int offset))(const void *, const void *)
+{
+   stb__charcmpoffset = offset;
+   return &stb__ucharcmp;
+}
+
+int (*stb_qsort_strcmp(int offset))(const void *, const void *)
+{
+   stb__strcmpoffset = offset;
+   return &stb__qsort_strcmp;
+}
+
+int (*stb_qsort_stricmp(int offset))(const void *, const void *)
+{
+   stb__strcmpoffset = offset;
+   return &stb__qsort_stricmp;
+}
+
+int (*stb_floatcmp(int offset))(const void *, const void *)
+{
+   stb__floatcmpoffset = offset;
+   return &stb__floatcmp;
+}
+
+int (*stb_doublecmp(int offset))(const void *, const void *)
+{
+   stb__doublecmpoffset = offset;
+   return &stb__doublecmp;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                           String Processing
+//
+
+#ifndef STB_INCLUDE_STB_LIB_H
+#define stb_prefixi(s,t)  (0==stb_strnicmp((s),(t),strlen(t)))
+
+enum stb_splitpath_flag
+{
+   STB_PATH = 1,
+   STB_FILE = 2,
+   STB_EXT  = 4,
+   STB_PATH_FILE = STB_PATH + STB_FILE,
+   STB_FILE_EXT  = STB_FILE + STB_EXT,
+   STB_EXT_NO_PERIOD = 8,
+};
+
+STB_EXTERN char * stb_skipwhite(char *s);
+STB_EXTERN char * stb_trimwhite(char *s);
+STB_EXTERN char * stb_skipnewline(char *s);
+STB_EXTERN char * stb_strncpy(char *s, char *t, int n);
+STB_EXTERN char * stb_substr(char *t, int n);
+STB_EXTERN char * stb_duplower(char *s);
+STB_EXTERN void   stb_tolower (char *s);
+STB_EXTERN char * stb_strchr2 (char *s, char p1, char p2);
+STB_EXTERN char * stb_strrchr2(char *s, char p1, char p2);
+STB_EXTERN char * stb_strtok(char *output, char *src, char *delimit);
+STB_EXTERN char * stb_strtok_keep(char *output, char *src, char *delimit);
+STB_EXTERN char * stb_strtok_invert(char *output, char *src, char *allowed);
+STB_EXTERN char * stb_dupreplace(char *s, char *find, char *replace);
+STB_EXTERN void   stb_replaceinplace(char *s, char *find, char *replace);
+STB_EXTERN char * stb_splitpath(char *output, char *src, int flag);
+STB_EXTERN char * stb_splitpathdup(char *src, int flag);
+STB_EXTERN char * stb_replacedir(char *output, char *src, char *dir);
+STB_EXTERN char * stb_replaceext(char *output, char *src, char *ext);
+STB_EXTERN void   stb_fixpath(char *path);
+STB_EXTERN char * stb_shorten_path_readable(char *path, int max_len);
+STB_EXTERN int    stb_suffix (char *s, char *t);
+STB_EXTERN int    stb_suffixi(char *s, char *t);
+STB_EXTERN int    stb_prefix (char *s, char *t);
+STB_EXTERN char * stb_strichr(char *s, char t);
+STB_EXTERN char * stb_stristr(char *s, char *t);
+STB_EXTERN int    stb_prefix_count(char *s, char *t);
+STB_EXTERN const char * stb_plural(int n);  // "s" or ""
+STB_EXTERN size_t stb_strscpy(char *d, const char *s, size_t n);
+
+STB_EXTERN char **stb_tokens(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_nested(char *src, char *delimit, int *count, char *nest_in, char *nest_out);
+STB_EXTERN char **stb_tokens_nested_empty(char *src, char *delimit, int *count, char *nest_in, char *nest_out);
+STB_EXTERN char **stb_tokens_allowempty(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_stripwhite(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_withdelim(char *src, char *delimit, int *count);
+STB_EXTERN char **stb_tokens_quoted(char *src, char *delimit, int *count);
+// with 'quoted', allow delimiters to appear inside quotation marks, and don't
+// strip whitespace inside them (and we delete the quotation marks unless they
+// appear back to back, in which case they're considered escaped)
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+#include <ctype.h>
+
+size_t stb_strscpy(char *d, const char *s, size_t n)
+{
+   size_t len = strlen(s);
+   if (len >= n) {
+      if (n) d[0] = 0;
+      return 0;
+   }
+   strcpy(d,s);
+   return len + 1;
+}
+
+const char *stb_plural(int n)
+{
+   return n == 1 ? "" : "s";
+}
+
+int stb_prefix(char *s, char *t)
+{
+   while (*t)
+      if (*s++ != *t++)
+         return 0;
+   return 1;
+}
+
+int stb_prefix_count(char *s, char *t)
+{
+   int c=0;
+   while (*t) {
+      if (*s++ != *t++)
+         break;
+      ++c;
+   }
+   return c;
+}
+
+int stb_suffix(char *s, char *t)
+{
+   size_t n = strlen(s);
+   size_t m = strlen(t);
+   if (m <= n)
+      return 0 == strcmp(s+n-m, t);
+   else
+      return 0;
+}
+
+int stb_suffixi(char *s, char *t)
+{
+   size_t n = strlen(s);
+   size_t m = strlen(t);
+   if (m <= n)
+      return 0 == stb_stricmp(s+n-m, t);
+   else
+      return 0;
+}
+
+// originally I was using this table so that I could create known sentinel
+// values--e.g. change whitetable[0] to be true if I was scanning for whitespace,
+// and false if I was scanning for nonwhite. I don't appear to be using that
+// functionality anymore (I do for tokentable, though), so just replace it
+// with isspace()
+char *stb_skipwhite(char *s)
+{
+   while (isspace((unsigned char) *s)) ++s;
+   return s;
+}
+
+char *stb_skipnewline(char *s)
+{
+   if (s[0] == '\r' || s[0] == '\n') {
+      if (s[0]+s[1] == '\r' + '\n') ++s;
+      ++s;
+   }
+   return s;
+}
+
+char *stb_trimwhite(char *s)
+{
+   int i,n;
+   s = stb_skipwhite(s);
+   n = (int) strlen(s);
+   for (i=n-1; i >= 0; --i)
+      if (!isspace(s[i]))
+         break;
+   s[i+1] = 0;
+   return s;
+}
+
+char *stb_strncpy(char *s, char *t, int n)
+{
+   strncpy(s,t,n);
+   s[n-1] = 0;
+   return s;
+}
+
+char *stb_substr(char *t, int n)
+{
+   char *a;
+   int z = (int) strlen(t);
+   if (z < n) n = z;
+   a = (char *) malloc(n+1);
+   strncpy(a,t,n);
+   a[n] = 0;
+   return a;
+}
+
+char *stb_duplower(char *s)
+{
+   char *p = strdup(s), *q = p;
+   while (*q) {
+      *q = tolower(*q);
+      ++q;
+   }
+   return p;
+}
+
+void stb_tolower(char *s)
+{
+   while (*s) {
+      *s = tolower(*s);
+      ++s;
+   }
+}
+
+char *stb_strchr2(char *s, char x, char y)
+{
+   for(; *s; ++s)
+      if (*s == x || *s == y)
+         return s;
+   return NULL;
+}
+
+char *stb_strrchr2(char *s, char x, char y)
+{
+   char *r = NULL;
+   for(; *s; ++s)
+      if (*s == x || *s == y)
+         r = s;
+   return r;
+}
+
+char *stb_strichr(char *s, char t)
+{
+   if (tolower(t) == toupper(t))
+      return strchr(s,t);
+   return stb_strchr2(s, (char) tolower(t), (char) toupper(t));
+}
+
+char *stb_stristr(char *s, char *t)
+{
+   size_t n = strlen(t);
+   char *z;
+   if (n==0) return s;
+   while ((z = stb_strichr(s, *t)) != NULL) {
+      if (0==stb_strnicmp(z, t, n))
+         return z;
+      s = z+1;
+   }
+   return NULL;
+}
+
+static char *stb_strtok_raw(char *output, char *src, char *delimit, int keep, int invert)
+{
+   if (invert) {
+      while (*src && strchr(delimit, *src) != NULL) {
+         *output++ = *src++;
+      }
+   } else {
+      while (*src && strchr(delimit, *src) == NULL) {
+         *output++ = *src++;
+      }
+   }
+   *output = 0;
+   if (keep)
+      return src;
+   else
+      return *src ? src+1 : src;
+}
+
+char *stb_strtok(char *output, char *src, char *delimit)
+{
+   return stb_strtok_raw(output, src, delimit, 0, 0);
+}
+
+char *stb_strtok_keep(char *output, char *src, char *delimit)
+{
+   return stb_strtok_raw(output, src, delimit, 1, 0);
+}
+
+char *stb_strtok_invert(char *output, char *src, char *delimit)
+{
+   return stb_strtok_raw(output, src, delimit, 1,1);
+}
+
+static char **stb_tokens_raw(char *src_, char *delimit, int *count,
+                             int stripwhite, int allow_empty, char *start, char *end)
+{
+   int nested = 0;
+   unsigned char *src = (unsigned char *) src_;
+   static char stb_tokentable[256]; // rely on static initializion to 0
+   static char stable[256],etable[256];
+   char *out;
+   char **result;
+   int num=0;
+   unsigned char *s;
+
+   s = (unsigned char *) delimit; while (*s) stb_tokentable[*s++] = 1;
+   if (start) {
+      s = (unsigned char *) start;         while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) etable[*s++] = 1;
+   }
+   stable[0] = 1;
+
+   // two passes through: the first time, counting how many
+   s = (unsigned char *) src;
+   while (*s) {
+      // state: just found delimiter
+      // skip further delimiters
+      if (!allow_empty) {
+         stb_tokentable[0] = 0;
+         while (stb_tokentable[*s])
+            ++s;
+         if (!*s) break;
+      }
+      ++num;
+      // skip further non-delimiters
+      stb_tokentable[0] = 1;
+      if (stripwhite == 2) { // quoted strings
+         while (!stb_tokentable[*s]) {
+            if (*s != '"')
+               ++s;
+            else {
+               ++s;
+               if (*s == '"')
+                  ++s;   // "" -> ", not start a string
+               else {
+                  // begin a string
+                  while (*s) {
+                     if (s[0] == '"') {
+                        if (s[1] == '"') s += 2; // "" -> "
+                        else { ++s; break; } // terminating "
+                     } else
+                        ++s;
+                  }
+               }
+            }
+         }
+      } else 
+         while (nested || !stb_tokentable[*s]) {
+            if (stable[*s]) {
+               if (!*s) break;
+               if (end ? etable[*s] : nested)
+                  --nested;
+               else
+                  ++nested;
+            }
+            ++s;
+         }
+      if (allow_empty) {
+         if (*s) ++s;
+      }
+   }
+   // now num has the actual count... malloc our output structure
+   // need space for all the strings: strings won't be any longer than
+   // original input, since for every '\0' there's at least one delimiter
+   result = (char **) malloc(sizeof(*result) * (num+1) + (s-src+1));
+   if (result == NULL) return result;
+   out = (char *) (result + (num+1));
+   // second pass: copy out the data
+   s = (unsigned char *) src;
+   num = 0;
+   nested = 0;
+   while (*s) {
+      char *last_nonwhite;
+      // state: just found delimiter
+      // skip further delimiters
+      if (!allow_empty) {
+         stb_tokentable[0] = 0;
+         if (stripwhite)
+            while (stb_tokentable[*s] || isspace(*s))
+               ++s;
+         else
+            while (stb_tokentable[*s])
+               ++s;
+      } else if (stripwhite) {
+         while (isspace(*s)) ++s;
+      }
+      if (!*s) break;
+      // we're past any leading delimiters and whitespace
+      result[num] = out;
+      ++num;
+      // copy non-delimiters
+      stb_tokentable[0] = 1;
+      last_nonwhite = out-1;
+      if (stripwhite == 2) {
+         while (!stb_tokentable[*s]) {
+            if (*s != '"') {
+               if (!isspace(*s)) last_nonwhite = out;
+               *out++ = *s++;
+            } else {
+               ++s;
+               if (*s == '"') {
+                  if (!isspace(*s)) last_nonwhite = out;
+                  *out++ = *s++; // "" -> ", not start string
+               } else {
+                  // begin a quoted string
+                  while (*s) {
+                     if (s[0] == '"') {
+                        if (s[1] == '"') { *out++ = *s; s += 2; }
+                        else { ++s; break; } // terminating "
+                     } else
+                        *out++ = *s++;
+                  }
+                  last_nonwhite = out-1; // all in quotes counts as non-white
+               }
+            }
+         }
+      } else {
+         while (nested || !stb_tokentable[*s]) {
+            if (!isspace(*s)) last_nonwhite = out;
+            if (stable[*s]) {
+               if (!*s) break;
+               if (end ? etable[*s] : nested)
+                  --nested;
+               else
+                  ++nested;
+            }
+            *out++ = *s++;
+         }
+      }
+
+      if (stripwhite) // rewind to last non-whitespace char
+         out = last_nonwhite+1;
+      *out++ = '\0';
+
+      if (*s) ++s; // skip delimiter
+   }
+   s = (unsigned char *) delimit; while (*s) stb_tokentable[*s++] = 0;
+   if (start) {
+      s = (unsigned char *) start;         while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) stable[*s++] = 1;
+      s = (unsigned char *) end;   if (s)  while (*s) etable[*s++] = 1;
+   }
+   if (count != NULL) *count = num;
+   result[num] = 0;
+   return result;
+}
+
+char **stb_tokens(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,0,0,0,0);
+}
+
+char **stb_tokens_nested(char *src, char *delimit, int *count, char *nest_in, char *nest_out)
+{
+   return stb_tokens_raw(src,delimit,count,0,0,nest_in,nest_out);
+}
+
+char **stb_tokens_nested_empty(char *src, char *delimit, int *count, char *nest_in, char *nest_out)
+{
+   return stb_tokens_raw(src,delimit,count,0,1,nest_in,nest_out);
+}
+
+char **stb_tokens_allowempty(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,0,1,0,0);
+}
+
+char **stb_tokens_stripwhite(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,1,1,0,0);
+}
+
+char **stb_tokens_quoted(char *src, char *delimit, int *count)
+{
+   return stb_tokens_raw(src,delimit,count,2,1,0,0);
+}
+
+char *stb_dupreplace(char *src, char *find, char *replace)
+{
+   size_t len_find = strlen(find);
+   size_t len_replace = strlen(replace);
+   int count = 0;
+
+   char *s,*p,*q;
+
+   s = strstr(src, find);
+   if (s == NULL) return strdup(src);
+   do {
+      ++count;
+      s = strstr(s + len_find, find);
+   } while (s != NULL);
+
+   p = (char *)  malloc(strlen(src) + count * (len_replace - len_find) + 1);
+   if (p == NULL) return p;
+   q = p;
+   s = src;
+   for (;;) {
+      char *t = strstr(s, find);
+      if (t == NULL) {
+         strcpy(q,s);
+         assert(strlen(p) == strlen(src) + count*(len_replace-len_find));
+         return p;
+      }
+      memcpy(q, s, t-s);
+      q += t-s;
+      memcpy(q, replace, len_replace);
+      q += len_replace;
+      s = t + len_find;
+   }
+}
+
+void stb_replaceinplace(char *src, char *find, char *replace)
+{
+   size_t len_find = strlen(find);
+   size_t len_replace = strlen(replace);
+   int delta;
+
+   char *s,*p,*q;
+
+   delta = len_replace - len_find;
+   assert(delta <= 0);
+   if (delta > 0) return;
+
+   p = strstr(src, find);
+   if (p == NULL) return;
+
+   s = q = p;
+   while (*s) {
+      memcpy(q, replace, len_replace);
+      p += len_find;
+      q += len_replace;
+      s = strstr(p, find);
+      if (s == NULL) s = p + strlen(p);
+      memmove(q, p, s-p);
+      q += s-p;
+      p = s;
+   }
+   *q = 0;
+}
+
+void stb_fixpath(char *path)
+{
+   for(; *path; ++path)
+      if (*path == '\\')
+         *path = '/';
+}
+
+void stb__add_section(char *buffer, char *data, int curlen, int newlen)
+{
+   if (newlen < curlen) {
+      int z1 = newlen >> 1, z2 = newlen-z1;
+      memcpy(buffer, data, z1-1);
+      buffer[z1-1] = '.';
+      buffer[z1-0] = '.';
+      memcpy(buffer+z1+1, data+curlen-z2+1, z2-1);
+   } else
+      memcpy(buffer, data, curlen);
+}
+
+char * stb_shorten_path_readable(char *path, int len)
+{
+   static char buffer[1024];
+   int n = strlen(path),n1,n2,r1,r2;
+   char *s;
+   if (n <= len) return path;
+   if (len > 1024) return path;
+   s = stb_strrchr2(path, '/', '\\');
+   if (s) {
+      n1 = s - path + 1;
+      n2 = n - n1;
+      ++s;
+   } else {
+      n1 = 0;
+      n2 = n;
+      s = path;
+   }
+   // now we need to reduce r1 and r2 so that they fit in len
+   if (n1 < len>>1) {
+      r1 = n1;
+      r2 = len - r1;
+   } else if (n2 < len >> 1) {
+      r2 = n2;
+      r1 = len - r2;
+   } else {
+      r1 = n1 * len / n;
+      r2 = n2 * len / n;
+      if (r1 < len>>2) r1 = len>>2, r2 = len-r1;
+      if (r2 < len>>2) r2 = len>>2, r1 = len-r2;
+   }
+   assert(r1 <= n1 && r2 <= n2);
+   if (n1)
+      stb__add_section(buffer, path, n1, r1);
+   stb__add_section(buffer+r1, s, n2, r2);
+   buffer[len] = 0;
+   return buffer;
+}
+
+static char *stb__splitpath_raw(char *buffer, char *path, int flag)
+{
+   int len=0,x,y, n = (int) strlen(path), f1,f2;
+   char *s = stb_strrchr2(path, '/', '\\');
+   char *t = strrchr(path, '.');
+   if (s && t && t < s) t = NULL;
+   if (s) ++s;
+
+   if (flag == STB_EXT_NO_PERIOD)
+      flag |= STB_EXT;
+
+   if (!(flag & (STB_PATH | STB_FILE | STB_EXT))) return NULL;
+
+   f1 = s == NULL ? 0 : s-path; // start of filename
+   f2 = t == NULL ? n : t-path; // just past end of filename
+
+   if (flag & STB_PATH) {
+      x = 0; if (f1 == 0 && flag == STB_PATH) len=2;
+   } else if (flag & STB_FILE) {
+      x = f1;
+   } else {
+      x = f2;
+      if (flag & STB_EXT_NO_PERIOD)
+         if (buffer[x] == '.')
+            ++x;
+   }
+
+   if (flag & STB_EXT)
+      y = n;
+   else if (flag & STB_FILE)
+      y = f2;
+   else
+      y = f1;
+
+   if (buffer == NULL) {
+      buffer = (char *) malloc(y-x + len + 1);
+      if (!buffer) return NULL;
+   }
+
+   if (len) { strcpy(buffer, "./"); return buffer; }
+   strncpy(buffer, path+x, y-x);
+   buffer[y-x] = 0;
+   return buffer;
+}
+
+char *stb_splitpath(char *output, char *src, int flag)
+{
+   return stb__splitpath_raw(output, src, flag);
+}
+
+char *stb_splitpathdup(char *src, int flag)
+{
+   return stb__splitpath_raw(NULL, src, flag);
+}
+
+char *stb_replacedir(char *output, char *src, char *dir)
+{
+   char buffer[4096];
+   stb_splitpath(buffer, src, STB_FILE | STB_EXT);
+   if (dir)
+      sprintf(output, "%s/%s", dir, buffer);
+   else
+      strcpy(output, buffer);
+   return output;
+}
+
+char *stb_replaceext(char *output, char *src, char *ext)
+{
+   char buffer[4096];
+   stb_splitpath(buffer, src, STB_PATH | STB_FILE);
+   if (ext)
+      sprintf(output, "%s.%s", buffer, ext[0] == '.' ? ext+1 : ext);
+   else
+      strcpy(output, buffer);
+   return output;
+}
+#endif
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                                stb_arr
+//
+//  An stb_arr is directly useable as a pointer (use the actual type in your
+//  definition), but when it resizes, it returns a new pointer and you can't
+//  use the old one, so you have to be careful to copy-in-out as necessary.
+//
+//  Use a NULL pointer as a 0-length array.
+//
+//     float *my_array = NULL, *temp;
+//
+//     // add elements on the end one at a time
+//     stb_arr_push(my_array, 0.0f);
+//     stb_arr_push(my_array, 1.0f);
+//     stb_arr_push(my_array, 2.0f);
+//
+//     assert(my_array[1] == 2.0f);
+//
+//     // add an uninitialized element at the end, then assign it
+//     *stb_arr_add(my_array) = 3.0f;
+//
+//     // add three uninitialized elements at the end
+//     temp = stb_arr_addn(my_array,3);
+//     temp[0] = 4.0f;
+//     temp[1] = 5.0f;
+//     temp[2] = 6.0f;
+//
+//     assert(my_array[5] == 5.0f);
+//
+//     // remove the last one
+//     stb_arr_pop(my_array);
+//
+//     assert(stb_arr_len(my_array) == 6);
+
+
+#ifndef STB_INCLUDE_STB_LIB_H
+
+// simple functions written on top of other functions
+#define stb_arr_empty(a)       (  stb_arr_len(a) == 0 )
+#define stb_arr_add(a)         (  stb_arr_addn((a),1) )
+#define stb_arr_push(a,v)      ( *stb_arr_add(a)=(v)  )
+
+typedef struct
+{
+   int len, limit;
+   unsigned int signature;
+   unsigned int padding; // make it a multiple of 16 so preserve alignment mod 16
+} stb__arr;
+
+#define stb_arr_signature      0x51bada7b  // ends with 0123 in decimal
+
+// access the header block stored before the data
+#define stb_arrhead(a)         /*lint --e(826)*/ (((stb__arr *) (a)) - 1)
+#define stb_arrhead2(a)        /*lint --e(826)*/ (((stb__arr *) (a)) - 1)
+
+#ifdef STB_DEBUG
+#define stb_arr_check(a)       assert(!a || stb_arrhead(a)->signature == stb_arr_signature)
+#define stb_arr_check2(a)      assert(!a || stb_arrhead2(a)->signature == stb_arr_signature)
+#else
+#define stb_arr_check(a)       ((void) 0)
+#define stb_arr_check2(a)      ((void) 0)
+#endif
+
+// ARRAY LENGTH
+
+// get the array length; special case if pointer is NULL
+#define stb_arr_len(a)         (a ? stb_arrhead(a)->len : 0)
+#define stb_arr_len2(a)        ((stb__arr *) (a) ? stb_arrhead2(a)->len : 0)
+#define stb_arr_lastn(a)       (stb_arr_len(a)-1)
+
+// check whether a given index is valid -- tests 0 <= i < stb_arr_len(a) 
+#define stb_arr_valid(a,i)     (a ? (int) (i) < stb_arrhead(a)->len : 0)
+
+// change the array length so is is exactly N entries long, creating
+// uninitialized entries as needed
+#define stb_arr_setlen(a,n)  \
+            (stb__arr_setlen((void **) &(a), sizeof(a[0]), (n)))
+
+// change the array length so that N is a valid index (that is, so
+// it is at least N entries long), creating uninitialized entries as needed
+#define stb_arr_makevalid(a,n)  \
+            (stb_arr_len(a) < (n)+1 ? stb_arr_setlen(a,(n)+1),(a) : (a))
+
+// remove the last element of the array, returning it
+#define stb_arr_pop(a)         ((stb_arr_check(a), (a))[--stb_arrhead(a)->len])
+
+// access the last element in the array
+#define stb_arr_last(a)        ((stb_arr_check(a), (a))[stb_arr_len(a)-1])
+
+// is iterator at end of list?
+#define stb_arr_end(a,i)       ((i) >= &(a)[stb_arr_len(a)])
+
+// (internal) change the allocated length of the array
+#define stb_arr__grow(a,n)     (stb_arr_check(a), stb_arrhead(a)->len += (n))
+
+// add N new uninitialized elements to the end of the array
+#define stb_arr__addn(a,n)     /*lint --e(826)*/ \
+                               ((stb_arr_len(a)+(n) > stb_arrcurmax(a))      \
+                                 ? (stb__arr_addlen((void **) &(a),sizeof(*a),(n)),0) \
+                                 : ((stb_arr__grow(a,n), 0)))
+
+// add N new uninitialized elements to the end of the array, and return
+// a pointer to the first new one
+#define stb_arr_addn(a,n)      (stb_arr__addn((a),n),(a)+stb_arr_len(a)-(n))
+
+// add N new uninitialized elements starting at index 'i'
+#define stb_arr_insertn(a,i,n) (stb__arr_insertn((void **) &(a), sizeof(*a), i, n))
+
+// insert an element at i
+#define stb_arr_insert(a,i,v)  (stb__arr_insertn((void **) &(a), sizeof(*a), i, 1), ((a)[i] = v))
+
+// delete N elements from the middle starting at index 'i'
+#define stb_arr_deleten(a,i,n) (stb__arr_deleten((void **) &(a), sizeof(*a), i, n))
+
+// delete the i'th element
+#define stb_arr_delete(a,i)   stb_arr_deleten(a,i,1)
+
+// delete the i'th element, swapping down from the end
+#define stb_arr_fastdelete(a,i)  \
+   (stb_swap(&a[i], &a[stb_arrhead(a)->len-1], sizeof(*a)), stb_arr_pop(a))
+
+
+// ARRAY STORAGE
+
+// get the array maximum storage; special case if NULL
+#define stb_arrcurmax(a)       (a ? stb_arrhead(a)->limit : 0)
+#define stb_arrcurmax2(a)      (a ? stb_arrhead2(a)->limit : 0)
+
+// set the maxlength of the array to n in anticipation of further growth
+#define stb_arr_setsize(a,n)   (stb_arr_check(a), stb__arr_setsize((void **) &(a),sizeof((a)[0]),n))
+
+// make sure maxlength is large enough for at least N new allocations
+#define stb_arr_atleast(a,n)   (stb_arr_len(a)+(n) > stb_arrcurmax(a)      \
+                                 ? stb_arr_setsize((a), (n)) : 0)
+
+// make a copy of a given array (copies contents via 'memcpy'!)
+#define stb_arr_copy(a)        stb__arr_copy(a, sizeof((a)[0]))
+
+// compute the storage needed to store all the elements of the array
+#define stb_arr_storage(a)     (stb_arr_len(a) * sizeof((a)[0]))
+
+#define stb_arr_for(v,arr)     for((v)=(arr); (v) < (arr)+stb_arr_len(arr); ++(v))
+
+// IMPLEMENTATION
+
+STB_EXTERN void stb_arr_free_(void **p);
+STB_EXTERN void *stb__arr_copy_(void *p, int elem_size);
+STB_EXTERN void stb__arr_setsize_(void **p, int size, int limit);
+STB_EXTERN void stb__arr_setlen_(void **p, int size, int newlen);
+STB_EXTERN void stb__arr_addlen_(void **p, int size, int addlen);
+STB_EXTERN void stb__arr_deleten_(void **p, int size, int loc, int n);
+STB_EXTERN void stb__arr_insertn_(void **p, int size, int loc, int n);
+
+#define stb_arr_free(p)            stb_arr_free_((void **) &(p))
+
+#ifndef STBLIB_MALLOC_WRAPPER // @Todo
+  #define stb__arr_setsize         stb__arr_setsize_
+  #define stb__arr_setlen          stb__arr_setlen_
+  #define stb__arr_addlen          stb__arr_addlen_
+  #define stb__arr_deleten         stb__arr_deleten_
+  #define stb__arr_insertn         stb__arr_insertn_
+  #define stb__arr_copy            stb__arr_copy_
+#else
+  #define stb__arr_addlen(p,s,n)    stb__arr_addlen_(p,s,n,__FILE__,__LINE__)
+  #define stb__arr_setlen(p,s,n)    stb__arr_setlen_(p,s,n,__FILE__,__LINE__)
+  #define stb__arr_setsize(p,s,n)   stb__arr_setsize_(p,s,n,__FILE__,__LINE__)
+  #define stb__arr_deleten(p,s,i,n) stb__arr_deleten_(p,s,i,n,__FILE__,__LINE__)
+  #define stb__arr_insertn(p,s,i,n) stb__arr_insertn_(p,s,i,n,__FILE__,__LINE__)
+  #define stb__arr_copy(p,s)        stb__arr_copy_(p,s,__FILE__,__LINE__)
+#endif
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+void stb_arr_malloc(void **target, void *context)
+{
+   stb__arr *q = (stb__arr *) malloc(sizeof(*q));
+   q->len = q->limit = 0;
+   q->signature = stb_arr_signature;
+   *target = (void *) (q+1);
+}
+
+static void * stb__arr_malloc(int size)
+{
+   return malloc(size);
+}
+
+void * stb__arr_copy_(void *p, int elem_size)
+{
+   stb__arr *q;
+   if (p == NULL) return p;
+   q = (stb__arr *) malloc(sizeof(*q) + elem_size * stb_arrhead2(p)->limit);
+   stb_arr_check2(p);
+   memcpy(q, stb_arrhead2(p), sizeof(*q) + elem_size * stb_arrhead2(p)->len);
+   return q+1;
+}
+
+void stb_arr_free_(void **pp)
+{
+   void *p = *pp;
+   stb_arr_check2(p);
+   if (p) {
+      stb__arr *q = stb_arrhead2(p);
+      free(q);
+   }
+   *pp = NULL;
+}
+
+static void stb__arrsize_(void **pp, int size, int limit, int len)
+{
+   void *p = *pp;
+   stb__arr *a;
+   stb_arr_check2(p);
+   if (p == NULL) {
+      if (len == 0 && size == 0) return;
+      a = (stb__arr *) stb__arr_malloc(sizeof(*a) + size*limit);
+      a->limit = limit;
+      a->len   = len;
+      a->signature = stb_arr_signature;
+   } else {
+      a = stb_arrhead2(p);
+      a->len = len;
+      if (a->limit < limit) {
+         void *p;
+         if (a->limit >= 4 && limit < a->limit * 2)
+            limit = a->limit * 2;
+         p = realloc(a, sizeof(*a) + limit*size);
+         if (p) {
+            a = (stb__arr *) p;
+            a->limit = limit;
+         } else {
+            // throw an error!
+         }
+      }
+   }
+   a->len = a->len < a->limit ? a->len : a->limit;
+   *pp = a+1;
+}
+
+void stb__arr_setsize_(void **pp, int size, int limit)
+{
+   void *p = *pp;
+   stb_arr_check2(p);
+   stb__arrsize_(pp, size, limit, stb_arr_len2(p));
+}
+
+void stb__arr_setlen_(void **pp, int size, int newlen)
+{
+   void *p = *pp;
+   stb_arr_check2(p);
+   if (stb_arrcurmax2(p) < newlen || p == NULL) {
+      stb__arrsize_(pp, size, newlen, newlen);
+   } else {
+      stb_arrhead2(p)->len = newlen;
+   }
+}
+
+void stb__arr_addlen_(void **p, int size, int addlen)
+{
+   stb__arr_setlen_(p, size, stb_arr_len2(*p) + addlen);
+}
+
+void stb__arr_insertn_(void **pp, int size, int i, int n)
+{
+   void *p = *pp;
+   if (n) {
+      int z;
+
+      if (p == NULL) {
+         stb__arr_addlen_(pp, size, n);
+         return;
+      }
+
+      z = stb_arr_len2(p);
+      stb__arr_addlen_(&p, size, n);
+      memmove((char *) p + (i+n)*size, (char *) p + i*size, size * (z-i));
+   }
+   *pp = p;
+}
+
+void stb__arr_deleten_(void **pp, int size, int i, int n)
+{
+   void *p = *pp;
+   if (n) {
+      memmove((char *) p + i*size, (char *) p + (i+n)*size, size * (stb_arr_len2(p)-(i+n)));
+      stb_arrhead2(p)->len -= n;
+   }
+   *pp = p;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                               Hashing
+//
+//      typical use for this is to make a power-of-two hash table.
+//
+//      let N = size of table (2^n)
+//      let H = stb_hash(str)
+//      let S = stb_rehash(H) | 1
+//
+//      then hash probe sequence P(i) for i=0..N-1
+//         P(i) = (H + S*i) & (N-1)
+//
+//      the idea is that H has 32 bits of hash information, but the
+//      table has only, say, 2^20 entries so only uses 20 of the bits.
+//      then by rehashing the original H we get 2^12 different probe
+//      sequences for a given initial probe location. (So it's optimal
+//      for 64K tables and its optimality decreases past that.)
+//
+//      ok, so I've added something that generates _two separate_
+//      32-bit hashes simultaneously which should scale better to
+//      very large tables.
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN unsigned int stb_hash(char *str);
+STB_EXTERN unsigned int stb_hashptr(void *p);
+STB_EXTERN unsigned int stb_hashlen(char *str, int len);
+STB_EXTERN unsigned int stb_rehash_improved(unsigned int v);
+STB_EXTERN unsigned int stb_hash_fast(void *p, int len);
+STB_EXTERN unsigned int stb_hash2(char *str, unsigned int *hash2_ptr);
+STB_EXTERN unsigned int stb_hash_number(unsigned int hash);
+
+#define stb_rehash(x)  ((x) + ((x) >> 6) + ((x) >> 19))
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+unsigned int stb_hash(char *str)
+{
+   unsigned int hash = 0;
+   while (*str)
+      hash = (hash << 7) + (hash >> 25) + *str++;
+   return hash + (hash >> 16);
+}
+
+unsigned int stb_hashlen(char *str, int len)
+{
+   unsigned int hash = 0;
+   while (len-- > 0 && *str)
+      hash = (hash << 7) + (hash >> 25) + *str++;
+   return hash + (hash >> 16);
+}
+
+unsigned int stb_hashptr(void *p)
+{
+    unsigned int x = (unsigned int)(size_t) p;
+
+   // typically lacking in low bits and high bits
+   x = stb_rehash(x);
+   x += x << 16;
+
+   // pearson's shuffle
+   x ^= x << 3;
+   x += x >> 5;
+   x ^= x << 2;
+   x += x >> 15;
+   x ^= x << 10;
+   return stb_rehash(x);
+}
+
+unsigned int stb_rehash_improved(unsigned int v)
+{
+   return stb_hashptr((void *)(size_t) v);
+}
+
+unsigned int stb_hash2(char *str, unsigned int *hash2_ptr)
+{
+   unsigned int hash1 = 0x3141592c;
+   unsigned int hash2 = 0x77f044ed;
+   while (*str) {
+      hash1 = (hash1 << 7) + (hash1 >> 25) + *str;
+      hash2 = (hash2 << 11) + (hash2 >> 21) + *str;
+      ++str;
+   }
+   *hash2_ptr = hash2 + (hash1 >> 16);
+   return       hash1 + (hash2 >> 16);
+}
+
+// Paul Hsieh hash
+#define stb__get16_slow(p) ((p)[0] + ((p)[1] << 8))
+#if defined(_MSC_VER)
+   #define stb__get16(p) (*((unsigned short *) (p)))
+#else
+   #define stb__get16(p) stb__get16_slow(p)
+#endif
+
+unsigned int stb_hash_fast(void *p, int len)
+{
+   unsigned char *q = (unsigned char *) p;
+   unsigned int hash = len;
+
+   if (len <= 0 || q == NULL) return 0;
+
+   /* Main loop */
+    if (((int)(size_t) q & 1) == 0) {
+      for (;len > 3; len -= 4) {
+         unsigned int val;
+         hash +=  stb__get16(q);
+         val   = (stb__get16(q+2) << 11);
+         hash  = (hash << 16) ^ hash ^ val;
+         q    += 4;
+         hash += hash >> 11;
+      }
+   } else {
+      for (;len > 3; len -= 4) {
+         unsigned int val;
+         hash +=  stb__get16_slow(q);
+         val   = (stb__get16_slow(q+2) << 11);
+         hash  = (hash << 16) ^ hash ^ val;
+         q    += 4;
+         hash += hash >> 11;
+      }
+   }
+
+   /* Handle end cases */
+   switch (len) {
+      case 3: hash += stb__get16_slow(q);
+              hash ^= hash << 16;
+              hash ^= q[2] << 18;
+              hash += hash >> 11;
+              break;
+      case 2: hash += stb__get16_slow(q);
+              hash ^= hash << 11;
+              hash += hash >> 17;
+              break;
+      case 1: hash += q[0];
+              hash ^= hash << 10;
+              hash += hash >> 1;
+              break;
+      case 0: break;
+   }
+
+   /* Force "avalanching" of final 127 bits */
+   hash ^= hash << 3;
+   hash += hash >> 5;
+   hash ^= hash << 4;
+   hash += hash >> 17;
+   hash ^= hash << 25;
+   hash += hash >> 6;
+
+   return hash;
+}
+
+unsigned int stb_hash_number(unsigned int hash)
+{
+   hash ^= hash << 3;
+   hash += hash >> 5;
+   hash ^= hash << 4;
+   hash += hash >> 17;
+   hash ^= hash << 25;
+   hash += hash >> 6;
+   return hash;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                     Instantiated data structures
+//
+// This is an attempt to implement a templated data structure.
+//
+// Hash table: call stb_define_hash(TYPE,N,KEY,K1,K2,HASH,VALUE)
+//     TYPE     -- will define a structure type containing the hash table
+//     N        -- the name, will prefix functions named:
+//                        N create
+//                        N destroy
+//                        N get
+//                        N set, N add, N update,
+//                        N remove
+//     KEY      -- the type of the key. 'x == y' must be valid
+//       K1,K2  -- keys never used by the app, used as flags in the hashtable
+//       HASH   -- a piece of code ending with 'return' that hashes key 'k'
+//     VALUE    -- the type of the value. 'x = y' must be valid
+//
+//  Note that stb_define_hash_base can be used to define more sophisticated
+//  hash tables, e.g. those that make copies of the key or use special
+//  comparisons (e.g. strcmp).
+
+#define STB_(prefix,name)     stb__##prefix##name
+#define STB__(prefix,name)    prefix##name
+#define STB__use(x)           x
+#define STB__skip(x)
+
+#define stb_declare_hash(PREFIX,TYPE,N,KEY,VALUE) \
+   typedef struct stb__st_##TYPE TYPE;\
+   PREFIX int STB__(N, init)(TYPE *h, int count);\
+   PREFIX int STB__(N, memory_usage)(TYPE *h);\
+   PREFIX TYPE * STB__(N, create)(void);\
+   PREFIX TYPE * STB__(N, copy)(TYPE *h);\
+   PREFIX void STB__(N, destroy)(TYPE *h);\
+   PREFIX int STB__(N,get_flag)(TYPE *a, KEY k, VALUE *v);\
+   PREFIX VALUE STB__(N,get)(TYPE *a, KEY k);\
+   PREFIX int STB__(N, set)(TYPE *a, KEY k, VALUE v);\
+   PREFIX int STB__(N, add)(TYPE *a, KEY k, VALUE v);\
+   PREFIX int STB__(N, update)(TYPE*a,KEY k,VALUE v);\
+   PREFIX int STB__(N, remove)(TYPE *a, KEY k, VALUE *v);
+
+#define STB_nocopy(x)        (x)
+#define STB_nodelete(x)      0
+#define STB_nofields         
+#define STB_nonullvalue(x)
+#define STB_nullvalue(x)     x
+#define STB_safecompare(x)   x
+#define STB_nosafe(x)
+#define STB_noprefix
+
+#ifdef __GNUC__
+#define STB__nogcc(x)
+#else
+#define STB__nogcc(x)  x
+#endif
+
+#define stb_define_hash_base(PREFIX,TYPE,FIELDS,N,NC,LOAD_FACTOR,             \
+                             KEY,EMPTY,DEL,COPY,DISPOSE,SAFE,                 \
+                             VCOMPARE,CCOMPARE,HASH,                          \
+                             VALUE,HASVNULL,VNULL)                            \
+                                                                              \
+typedef struct                                                                \
+{                                                                             \
+   KEY   k;                                                                   \
+   VALUE v;                                                                   \
+} STB_(N,_hashpair);                                                          \
+                                                                              \
+STB__nogcc( typedef struct stb__st_##TYPE TYPE;  )                            \
+struct stb__st_##TYPE {                                                       \
+   FIELDS                                                                     \
+   STB_(N,_hashpair) *table;                                                  \
+   unsigned int mask;                                                         \
+   int count, limit;                                                          \
+   int deleted;                                                               \
+                                                                              \
+   int delete_threshhold;                                                     \
+   int grow_threshhold;                                                       \
+   int shrink_threshhold;                                                     \
+   unsigned char alloced, has_empty, has_del;                                 \
+   VALUE ev; VALUE dv;                                                        \
+};                                                                            \
+                                                                              \
+static unsigned int STB_(N, hash)(KEY k)                                      \
+{                                                                             \
+   HASH                                                                       \
+}                                                                             \
+                                                                              \
+PREFIX int STB__(N, init)(TYPE *h, int count)                                        \
+{                                                                             \
+   int i;                                                                     \
+   if (count < 4) count = 4;                                                  \
+   h->limit = count;                                                          \
+   h->count = 0;                                                              \
+   h->mask  = count-1;                                                        \
+   h->deleted = 0;                                                            \
+   h->grow_threshhold = (int) (count * LOAD_FACTOR);                          \
+   h->has_empty = h->has_del = 0;                                             \
+   h->alloced = 0;                                                            \
+   if (count <= 64)                                                           \
+      h->shrink_threshhold = 0;                                               \
+   else                                                                       \
+      h->shrink_threshhold = (int) (count * (LOAD_FACTOR/2.25));              \
+   h->delete_threshhold = (int) (count * (1-LOAD_FACTOR)/2);                  \
+   h->table = (STB_(N,_hashpair)*) malloc(sizeof(h->table[0]) * count);       \
+   if (h->table == NULL) return 0;                                            \
+   /* ideally this gets turned into a memset32 automatically */               \
+   for (i=0; i < count; ++i)                                                  \
+      h->table[i].k = EMPTY;                                                  \
+   return 1;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX int STB__(N, memory_usage)(TYPE *h)                                           \
+{                                                                             \
+   return sizeof(*h) + h->limit * sizeof(h->table[0]);                        \
+}                                                                             \
+                                                                              \
+PREFIX TYPE * STB__(N, create)(void)                                                 \
+{                                                                             \
+   TYPE *h = (TYPE *) malloc(sizeof(*h));                                     \
+   if (h) {                                                                   \
+      if (STB__(N, init)(h, 16))                                              \
+         h->alloced = 1;                                                      \
+      else { free(h); h=NULL; }                                               \
+   }                                                                          \
+   return h;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX void STB__(N, destroy)(TYPE *a)                                               \
+{                                                                             \
+   int i;                                                                     \
+   for (i=0; i < a->limit; ++i)                                               \
+      if (!CCOMPARE(a->table[i].k,EMPTY) && !CCOMPARE(a->table[i].k, DEL))    \
+         DISPOSE(a->table[i].k);                                              \
+   free(a->table);                                                            \
+   if (a->alloced)                                                            \
+      free(a);                                                                \
+}                                                                             \
+                                                                              \
+static void STB_(N, rehash)(TYPE *a, int count);                              \
+                                                                              \
+PREFIX int STB__(N,get_flag)(TYPE *a, KEY k, VALUE *v)                               \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask, s;                                           \
+   if (CCOMPARE(k,EMPTY)){ if (a->has_empty) *v = a->ev; return a->has_empty;}\
+   if (CCOMPARE(k,DEL)) { if (a->has_del  ) *v = a->dv; return a->has_del;   }\
+   if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                               \
+   SAFE(if (!CCOMPARE(a->table[n].k,DEL)))                                    \
+   if (VCOMPARE(a->table[n].k,k)) { *v = a->table[n].v; return 1; }            \
+   s = stb_rehash(h) | 1;                                                     \
+   for(;;) {                                                                  \
+      n = (n + s) & a->mask;                                                  \
+      if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                            \
+      SAFE(if (CCOMPARE(a->table[n].k,DEL)) continue;)                        \
+      if (VCOMPARE(a->table[n].k,k))                                           \
+         { *v = a->table[n].v; return 1; }                                    \
+   }                                                                          \
+}                                                                             \
+                                                                              \
+HASVNULL(                                                                     \
+   PREFIX VALUE STB__(N,get)(TYPE *a, KEY k)                                         \
+   {                                                                          \
+      VALUE v;                                                                \
+      if (STB__(N,get_flag)(a,k,&v)) return v;                                \
+      else                           return VNULL;                            \
+   }                                                                          \
+)                                                                             \
+                                                                              \
+PREFIX int STB__(N,getkey)(TYPE *a, KEY k, KEY *kout)                                \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask, s;                                           \
+   if (CCOMPARE(k,EMPTY)||CCOMPARE(k,DEL)) return 0;                          \
+   if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                               \
+   SAFE(if (!CCOMPARE(a->table[n].k,DEL)))                                    \
+   if (VCOMPARE(a->table[n].k,k)) { *kout = a->table[n].k; return 1; }         \
+   s = stb_rehash(h) | 1;                                                     \
+   for(;;) {                                                                  \
+      n = (n + s) & a->mask;                                                  \
+      if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                            \
+      SAFE(if (CCOMPARE(a->table[n].k,DEL)) continue;)                        \
+      if (VCOMPARE(a->table[n].k,k))                                          \
+         { *kout = a->table[n].k; return 1; }                                 \
+   }                                                                          \
+}                                                                             \
+                                                                              \
+static int STB_(N,addset)(TYPE *a, KEY k, VALUE v,                            \
+                             int allow_new, int allow_old, int copy)          \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask;                                              \
+   int b = -1;                                                                \
+   if (CCOMPARE(k,EMPTY)) {                                                   \
+      if (a->has_empty ? allow_old : allow_new) {                             \
+          n=a->has_empty; a->ev = v; a->has_empty = 1; return !n;             \
+      } else return 0;                                                        \
+   }                                                                          \
+   if (CCOMPARE(k,DEL)) {                                                     \
+      if (a->has_del ? allow_old : allow_new) {                               \
+          n=a->has_del; a->dv = v; a->has_del = 1; return !n;                 \
+      } else return 0;                                                        \
+   }                                                                          \
+   if (!CCOMPARE(a->table[n].k, EMPTY)) {                                     \
+      unsigned int s;                                                         \
+      if (CCOMPARE(a->table[n].k, DEL))                                       \
+         b = n;                                                               \
+      else if (VCOMPARE(a->table[n].k,k)) {                                   \
+         if (allow_old)                                                       \
+            a->table[n].v = v;                                                \
+         return !allow_new;                                                   \
+      }                                                                       \
+      s = stb_rehash(h) | 1;                                                  \
+      for(;;) {                                                               \
+         n = (n + s) & a->mask;                                               \
+         if (CCOMPARE(a->table[n].k, EMPTY)) break;                           \
+         if (CCOMPARE(a->table[n].k, DEL)) {                                  \
+            if (b < 0) b = n;                                                 \
+         } else if (VCOMPARE(a->table[n].k,k)) {                              \
+            if (allow_old)                                                    \
+               a->table[n].v = v;                                             \
+            return !allow_new;                                                \
+         }                                                                    \
+      }                                                                       \
+   }                                                                          \
+   if (!allow_new) return 0;                                                  \
+   if (b < 0) b = n; else --a->deleted;                                       \
+   a->table[b].k = copy ? COPY(k) : k;                                        \
+   a->table[b].v = v;                                                         \
+   ++a->count;                                                                \
+   if (a->count > a->grow_threshhold)                                         \
+      STB_(N,rehash)(a, a->limit*2);                                          \
+   return 1;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX int STB__(N, set)(TYPE *a, KEY k, VALUE v){return STB_(N,addset)(a,k,v,1,1,1);}\
+PREFIX int STB__(N, add)(TYPE *a, KEY k, VALUE v){return STB_(N,addset)(a,k,v,1,0,1);}\
+PREFIX int STB__(N, update)(TYPE*a,KEY k,VALUE v){return STB_(N,addset)(a,k,v,0,1,1);}\
+                                                                              \
+PREFIX int STB__(N, remove)(TYPE *a, KEY k, VALUE *v)                                \
+{                                                                             \
+   unsigned int h = STB_(N, hash)(k);                                         \
+   unsigned int n = h & a->mask, s;                                           \
+   if (CCOMPARE(k,EMPTY)) { if (a->has_empty) { if(v)*v = a->ev; a->has_empty=0; return 1; } return 0; } \
+   if (CCOMPARE(k,DEL))   { if (a->has_del  ) { if(v)*v = a->dv; a->has_del  =0; return 1; } return 0; } \
+   if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                               \
+   if (SAFE(CCOMPARE(a->table[n].k,DEL) || ) !VCOMPARE(a->table[n].k,k)) {     \
+      s = stb_rehash(h) | 1;                                                  \
+      for(;;) {                                                               \
+         n = (n + s) & a->mask;                                               \
+         if (CCOMPARE(a->table[n].k,EMPTY)) return 0;                         \
+         SAFE(if (CCOMPARE(a->table[n].k, DEL)) continue;)                    \
+         if (VCOMPARE(a->table[n].k,k)) break;                                 \
+      }                                                                       \
+   }                                                                          \
+   DISPOSE(a->table[n].k);                                                    \
+   a->table[n].k = DEL;                                                       \
+   --a->count;                                                                \
+   ++a->deleted;                                                              \
+   if (v != NULL)                                                             \
+      *v = a->table[n].v;                                                     \
+   if (a->count < a->shrink_threshhold)                                       \
+      STB_(N, rehash)(a, a->limit >> 1);                                      \
+   else if (a->deleted > a->delete_threshhold)                                \
+      STB_(N, rehash)(a, a->limit);                                           \
+   return 1;                                                                  \
+}                                                                             \
+                                                                              \
+PREFIX TYPE * STB__(NC, copy)(TYPE *a)                                        \
+{                                                                             \
+   int i;                                                                     \
+   TYPE *h = (TYPE *) malloc(sizeof(*h));                                     \
+   if (!h) return NULL;                                                       \
+   if (!STB__(N, init)(h, a->limit)) { free(h); return NULL; }                \
+   h->count = a->count;                                                       \
+   h->deleted = a->deleted;                                                   \
+   h->alloced = 1;                                                            \
+   h->ev = a->ev; h->dv = a->dv;                                              \
+   h->has_empty = a->has_empty; h->has_del = a->has_del;                      \
+   memcpy(h->table, a->table, h->limit * sizeof(h->table[0]));                \
+   for (i=0; i < a->limit; ++i)                                               \
+      if (!CCOMPARE(h->table[i].k,EMPTY) && !CCOMPARE(h->table[i].k,DEL))     \
+         h->table[i].k = COPY(h->table[i].k);                                 \
+   return h;                                                                  \
+}                                                                             \
+                                                                              \
+static void STB_(N, rehash)(TYPE *a, int count)                               \
+{                                                                             \
+   int i;                                                                     \
+   TYPE b;                                                                    \
+   STB__(N, init)(&b, count);                                                 \
+   for (i=0; i < a->limit; ++i)                                               \
+      if (!CCOMPARE(a->table[i].k,EMPTY) && !CCOMPARE(a->table[i].k,DEL))     \
+         STB_(N,addset)(&b, a->table[i].k, a->table[i].v,1,1,0);              \
+   free(a->table);                                                            \
+   a->table = b.table;                                                        \
+   a->mask = b.mask;                                                          \
+   a->count = b.count;                                                        \
+   a->limit = b.limit;                                                        \
+   a->deleted = b.deleted;                                                    \
+   a->delete_threshhold = b.delete_threshhold;                                \
+   a->grow_threshhold = b.grow_threshhold;                                    \
+   a->shrink_threshhold = b.shrink_threshhold;                                \
+}
+
+#define STB_equal(a,b)  ((a) == (b))
+
+#define stb_define_hash(TYPE,N,KEY,EMPTY,DEL,HASH,VALUE)                      \
+   stb_define_hash_base(STB_noprefix, TYPE,STB_nofields,N,NC,0.85f,              \
+              KEY,EMPTY,DEL,STB_nocopy,STB_nodelete,STB_nosafe,               \
+              STB_equal,STB_equal,HASH,                                       \
+              VALUE,STB_nonullvalue,0)
+
+#define stb_define_hash_vnull(TYPE,N,KEY,EMPTY,DEL,HASH,VALUE,VNULL)          \
+   stb_define_hash_base(STB_noprefix, TYPE,STB_nofields,N,NC,0.85f,              \
+              KEY,EMPTY,DEL,STB_nocopy,STB_nodelete,STB_nosafe,               \
+              STB_equal,STB_equal,HASH,                                       \
+              VALUE,STB_nullvalue,VNULL)
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                        stb_ptrmap
+//
+// An stb_ptrmap data structure is an O(1) hash table between pointers. One
+// application is to let you store "extra" data associated with pointers,
+// which is why it was originally called stb_extra.
+
+#ifndef STB_INCLUDE_STB_LIB_H
+stb_declare_hash(STB_EXTERN, stb_ptrmap, stb_ptrmap_, void *, void *)
+stb_declare_hash(STB_EXTERN, stb_idict, stb_idict_, stb_int32, stb_int32)
+
+STB_EXTERN void        stb_ptrmap_delete(stb_ptrmap *e, void (*free_func)(void *));
+STB_EXTERN stb_ptrmap *stb_ptrmap_new(void);
+
+STB_EXTERN stb_idict * stb_idict_new_size(unsigned int size);
+STB_EXTERN void        stb_idict_remove_all(stb_idict *e);
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+
+#define STB_EMPTY ((void *) 2)
+#define STB_EDEL  ((void *) 6)
+
+stb_define_hash_base(STB_noprefix,stb_ptrmap, STB_nofields, stb_ptrmap_,stb_ptrmap_,0.85f,
+              void *,STB_EMPTY,STB_EDEL,STB_nocopy,STB_nodelete,STB_nosafe,
+              STB_equal,STB_equal,return stb_hashptr(k);,
+              void *,STB_nullvalue,NULL)
+
+stb_ptrmap *stb_ptrmap_new(void)
+{
+   return stb_ptrmap_create();
+}
+
+void stb_ptrmap_delete(stb_ptrmap *e, void (*free_func)(void *))
+{
+   int i;
+   if (free_func)
+      for (i=0; i < e->limit; ++i)
+         if (e->table[i].k != STB_EMPTY && e->table[i].k != STB_EDEL) {
+            if (free_func == free)
+               free(e->table[i].v); // allow STB_MALLOC_WRAPPER to operate
+            else
+               free_func(e->table[i].v);
+         }
+   stb_ptrmap_destroy(e);
+}
+
+// extra fields needed for stua_dict
+#define STB_IEMPTY  ((int) 1)
+#define STB_IDEL    ((int) 3)
+stb_define_hash_base(STB_noprefix, stb_idict, STB_nofields, stb_idict_,stb_idict_,0.85f,
+              stb_int32,STB_IEMPTY,STB_IDEL,STB_nocopy,STB_nodelete,STB_nosafe,
+              STB_equal,STB_equal,
+              return stb_rehash_improved(k);,stb_int32,STB_nonullvalue,0)
+
+stb_idict * stb_idict_new_size(unsigned int size)
+{
+   stb_idict *e = (stb_idict *) malloc(sizeof(*e));
+   if (e) {
+      // round up to power of 2
+      while ((size & (size-1)) != 0) // while more than 1 bit is set
+         size += (size & ~(size-1)); // add the lowest set bit
+      stb_idict_init(e, size);
+      e->alloced = 1;
+   }
+   return e;
+}
+
+void stb_idict_remove_all(stb_idict *e)
+{
+   int n;
+   for (n=0; n < e->limit; ++n)
+      e->table[n].k = STB_IEMPTY;
+   e->has_empty = e->has_del = 0;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                  SDICT: Hash Table for Strings (symbol table)
+//
+//           if "use_arena=1", then strings will be copied
+//           into blocks and never freed until the sdict is freed;
+//           otherwise they're malloc()ed and free()d on the fly. 
+//           (specify use_arena=1 if you never stb_sdict_remove)
+
+#ifndef STB_INCLUDE_STB_LIB_H
+stb_declare_hash(STB_EXTERN, stb_sdict, stb_sdict_, char *, void *)
+
+STB_EXTERN stb_sdict * stb_sdict_new(void);
+STB_EXTERN stb_sdict * stb_sdict_copy(stb_sdict*); 
+STB_EXTERN void        stb_sdict_delete(stb_sdict *);
+STB_EXTERN void *      stb_sdict_change(stb_sdict *, char *str, void *p);
+STB_EXTERN int         stb_sdict_count(stb_sdict *d);
+
+STB_EXTERN int         stb_sdict_internal_limit(stb_sdict *d);
+STB_EXTERN char *      stb_sdict_internal_key(stb_sdict *d, int n);
+STB_EXTERN void *      stb_sdict_internal_value(stb_sdict *d, int n);
+
+#define stb_sdict_for(d,i,q,z)                                          \
+   for(i=0; i < stb_sdict_internal_limit(d) ? (q=stb_sdict_internal_key(d,i),z=stb_sdict_internal_value(d,i),1) : 0; ++i)    \
+      if (q==NULL||q==(void *) 1);else   // reversed makes macro friendly
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+
+// if in same translation unit, for speed, don't call accessors
+#undef stb_sdict_for
+#define stb_sdict_for(d,i,q,z)                                          \
+   for(i=0; i < (d)->limit ? (q=(d)->table[i].k,z=(d)->table[i].v,1) : 0; ++i)    \
+      if (q==NULL||q==(void *) 1);else   // reversed makes macro friendly
+
+//#define STB_DEL ((void *) 1)
+#define STB_SDEL  ((char *) 1)
+
+stb_define_hash_base(STB_noprefix, stb_sdict, STB_nofields, stb_sdict_,stb_sdictinternal_, 0.85f,
+        char *, NULL, STB_SDEL, strdup, free,
+                        STB_safecompare, !strcmp, STB_equal, return stb_hash(k);,
+        void *, STB_nullvalue, NULL)
+
+int stb_sdict_count(stb_sdict *a)
+{
+   return a->count;
+}
+
+int stb_sdict_internal_limit(stb_sdict *a)
+{
+   return a->limit;
+}
+char* stb_sdict_internal_key(stb_sdict *a, int n)
+{
+   return a->table[n].k;
+}
+void* stb_sdict_internal_value(stb_sdict *a, int n)
+{
+   return a->table[n].v;
+}
+
+stb_sdict * stb_sdict_new(void)
+{
+   stb_sdict *d = stb_sdict_create();
+   if (d == NULL) return NULL;
+   return d;
+}
+
+stb_sdict* stb_sdict_copy(stb_sdict *old)
+{
+   return stb_sdictinternal_copy(old);
+} 
+
+void stb_sdict_delete(stb_sdict *d)
+{
+   stb_sdict_destroy(d);
+}
+
+void * stb_sdict_change(stb_sdict *d, char *str, void *p)
+{
+   void *q = stb_sdict_get(d, str);
+   stb_sdict_set(d, str, p);
+   return q;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                             File Processing
+//
+
+#ifndef STB_INCLUDE_STB_LIB_H
+#ifdef _MSC_VER
+  #define stb_rename(x,y)   _wrename((const wchar_t *)stb__from_utf8(x), (const wchar_t *)stb__from_utf8_alt(y))
+  #define stb_mktemp   _mktemp
+#else
+  #define stb_mktemp   mktemp
+  #define stb_rename   rename
+#endif
+
+#define stb_filec    (char *) stb_file
+#define stb_fileu    (unsigned char *) stb_file
+STB_EXTERN void *  stb_file(char *filename, size_t *length);
+STB_EXTERN size_t  stb_filelen(FILE *f);
+STB_EXTERN int     stb_filewrite(char *filename, void *data, size_t length);
+STB_EXTERN int     stb_filewritestr(char *filename, char *data);
+STB_EXTERN char ** stb_stringfile(char *filename, int *len);
+STB_EXTERN char *  stb_fgets(char *buffer, int buflen, FILE *f);
+STB_EXTERN char *  stb_fgets_malloc(FILE *f);
+STB_EXTERN int     stb_fexists(char *filename);
+STB_EXTERN int     stb_fcmp(char *s1, char *s2);
+STB_EXTERN int     stb_feq(char *s1, char *s2);
+STB_EXTERN time_t  stb_ftimestamp(char *filename);
+STB_EXTERN int     stb_fullpath(char *abs, int abs_size, char *rel);
+
+STB_EXTERN int     stb_copyfile(char *src, char *dest);
+STB_EXTERN int     stb_fread(void *data, size_t len, size_t count, void *f);
+STB_EXTERN int     stb_fwrite(void *data, size_t len, size_t count, void *f);
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+#if defined(_MSC_VER) || defined(__MINGW32__)
+   #define stb__stat   _stat
+#else
+   #define stb__stat   stat
+#endif
+
+int stb_fexists(char *filename)
+{
+   struct stb__stat buf;
+   return stb__windows(
+             _wstat((const wchar_t *)stb__from_utf8(filename), &buf),
+               stat(filename,&buf)
+          ) == 0;
+}
+
+time_t stb_ftimestamp(char *filename)
+{
+   struct stb__stat buf;
+   if (stb__windows(
+             _wstat((const wchar_t *)stb__from_utf8(filename), &buf),
+               stat(filename,&buf)
+          ) == 0)
+   {
+      return buf.st_mtime;
+   } else {
+      return 0;
+   }
+}
+
+size_t  stb_filelen(FILE *f)
+{
+   size_t len, pos;
+   pos = ftell(f);
+   fseek(f, 0, SEEK_END);
+   len = ftell(f);
+   fseek(f, pos, SEEK_SET);
+   return len;
+}
+
+void *stb_file(char *filename, size_t *length)
+{
+   FILE *f = stb__fopen(filename, "rb");
+   char *buffer;
+   size_t len, len2;
+   if (!f) return NULL;
+   len = stb_filelen(f);
+   buffer = (char *) malloc(len+2); // nul + extra
+   len2 = fread(buffer, 1, len, f);
+   if (len2 == len) {
+      if (length) *length = len;
+      buffer[len] = 0;
+   } else {
+      free(buffer);
+      buffer = NULL;
+   }
+   fclose(f);
+   return buffer;
+}
+
+int stb_filewrite(char *filename, void *data, size_t length)
+{
+   FILE *f = stb__fopen(filename, "wb");
+   if (f) {
+      unsigned char *data_ptr = (unsigned char *) data;
+      size_t remaining = length;
+      while (remaining > 0) {
+         size_t len2 = remaining > 65536 ? 65536 : remaining;
+         size_t len3 = fwrite(data_ptr, 1, len2, f);
+         if (len2 != len3) {
+            fprintf(stderr, "Failed while writing %s\n", filename);
+            break;
+         }
+         remaining -= len2;
+         data_ptr += len2;
+      }
+      fclose(f);
+   }
+   return f != NULL;
+}
+
+int stb_filewritestr(char *filename, char *data)
+{
+   return stb_filewrite(filename, data, strlen(data));
+}
+
+char ** stb_stringfile(char *filename, int *plen)
+{
+   FILE *f = stb__fopen(filename, "rb");
+   char *buffer, **list=NULL, *s;
+   size_t len, count, i;
+
+   if (!f) return NULL;
+   len = stb_filelen(f);
+   buffer = (char *) malloc(len+1);
+   len = fread(buffer, 1, len, f);
+   buffer[len] = 0;
+   fclose(f);
+
+   // two passes through: first time count lines, second time set them
+   for (i=0; i < 2; ++i) {
+      s = buffer;
+      if (i == 1)
+         list[0] = s;
+      count = 1;
+      while (*s) {
+         if (*s == '\n' || *s == '\r') {
+            // detect if both cr & lf are together
+            int crlf = (s[0] + s[1]) == ('\n' + '\r');
+            if (i == 1) *s = 0;
+            if (crlf) ++s;
+            if (s[1]) {  // it's not over yet
+               if (i == 1) list[count] = s+1;
+               ++count;
+            }
+         }
+         ++s;
+      }
+      if (i == 0) {
+         list = (char **) malloc(sizeof(*list) * (count+1) + len+1);
+         if (!list) return NULL;
+         list[count] = 0;
+         // recopy the file so there's just a single allocation to free
+         memcpy(&list[count+1], buffer, len+1);
+         free(buffer);
+         buffer = (char *) &list[count+1];
+         if (plen) *plen = count;
+      }
+   }
+   return list;
+}
+
+char * stb_fgets(char *buffer, int buflen, FILE *f)
+{
+   char *p;
+   buffer[0] = 0;
+   p = fgets(buffer, buflen, f);
+   if (p) {
+      int n = strlen(p)-1;
+      if (n >= 0)
+         if (p[n] == '\n')
+            p[n] = 0;
+   }
+   return p;
+}
+
+char * stb_fgets_malloc(FILE *f)
+{
+   // avoid reallocing for small strings
+   char quick_buffer[800];
+   quick_buffer[sizeof(quick_buffer)-2] = 0;
+   if (!fgets(quick_buffer, sizeof(quick_buffer), f))
+      return NULL;
+
+   if (quick_buffer[sizeof(quick_buffer)-2] == 0) {
+      int n = strlen(quick_buffer);
+      if (n > 0 && quick_buffer[n-1] == '\n')
+         quick_buffer[n-1] = 0;
+      return strdup(quick_buffer);
+   } else {
+      char *p;
+      char *a = strdup(quick_buffer);
+      int len = sizeof(quick_buffer)-1;
+
+      while (!feof(f)) {
+         if (a[len-1] == '\n') break;
+         a = (char *) realloc(a, len*2);
+         p = &a[len];
+         p[len-2] = 0;
+         if (!fgets(p, len, f))
+            break;
+         if (p[len-2] == 0) {
+            len += strlen(p);
+            break;
+         }
+         len = len + (len-1);
+      }
+      if (a[len-1] == '\n')
+         a[len-1] = 0;
+      return a;
+   }
+}
+
+int stb_fullpath(char *abs, int abs_size, char *rel)
+{
+   #ifdef _MSC_VER
+   return _fullpath(abs, rel, abs_size) != NULL;
+   #else
+   if (rel[0] == '/' || rel[0] == '~') {
+      if ((int) strlen(rel) >= abs_size)
+         return 0;
+      strcpy(abs,rel);
+      return 1;
+   } else {
+      int n;
+      getcwd(abs, abs_size);
+      n = strlen(abs);
+      if (n+(int) strlen(rel)+2 <= abs_size) {
+         abs[n] = '/';
+         strcpy(abs+n+1, rel);
+         return 1;
+      } else {
+         return 0;
+      }
+   }
+   #endif
+}
+
+static int stb_fcmp_core(FILE *f, FILE *g)
+{
+   char buf1[1024],buf2[1024];
+   int n1,n2, res=0;
+
+   while (1) {
+      n1 = fread(buf1, 1, sizeof(buf1), f);
+      n2 = fread(buf2, 1, sizeof(buf2), g);
+      res = memcmp(buf1,buf2,n1 < n2 ? n1 : n2);
+      if (res)
+         break;
+      if (n1 != n2) {
+         res = n1 < n2 ? -1 : 1;
+         break;
+      }
+      if (n1 == 0)
+         break;
+   }
+
+   fclose(f);
+   fclose(g);
+   return res;
+}
+
+int stb_fcmp(char *s1, char *s2)
+{
+   FILE *f = stb__fopen(s1, "rb");
+   FILE *g = stb__fopen(s2, "rb");
+
+   if (f == NULL || g == NULL) {
+      if (f) fclose(f);
+      if (g) {
+         fclose(g);
+         return 1;
+      }
+      return f != NULL;
+   }
+
+   return stb_fcmp_core(f,g);
+}
+
+int stb_feq(char *s1, char *s2)
+{
+   FILE *f = stb__fopen(s1, "rb");
+   FILE *g = stb__fopen(s2, "rb");
+
+   if (f == NULL || g == NULL) {
+      if (f) fclose(f);
+      if (g) fclose(g);
+      return f == g;
+   }
+
+   // feq is faster because it shortcuts if they're different length
+   if (stb_filelen(f) != stb_filelen(g)) {
+      fclose(f);
+      fclose(g);
+      return 0;
+   }
+
+   return !stb_fcmp_core(f,g);
+}
+
+int stb_copyfile(char *src, char *dest)
+{
+   char raw_buffer[1024];
+   char *buffer;
+   int buf_size = 65536;
+
+   FILE *f, *g;
+
+   // if file already exists at destination, do nothing
+   if (stb_feq(src, dest)) return 1;
+
+   // open file
+   f = stb__fopen(src, "rb");
+   if (f == NULL) return 0;
+
+   // open file for writing
+   g = stb__fopen(dest, "wb");
+   if (g == NULL) {
+      fclose(f);
+      return 0;
+   }
+
+   buffer = (char *) malloc(buf_size);
+   if (buffer == NULL) {
+      buffer = raw_buffer;
+      buf_size = sizeof(raw_buffer);
+   }
+
+   while (!feof(f)) {
+      int n = fread(buffer, 1, buf_size, f);
+      if (n != 0)
+         fwrite(buffer, 1, n, g);
+   }
+
+   fclose(f);
+   if (buffer != raw_buffer)
+      free(buffer);
+
+   fclose(g);
+   return 1;
+}
+
+#define stb_fgetc(f)    ((unsigned char) fgetc(f))
+
+#if 0
+// strip the trailing '/' or '\\' from a directory so we can refer to it
+// as a file for _stat()
+char *stb_strip_final_slash(char *t)
+{
+   if (t[0]) {
+      char *z = t + strlen(t) - 1;
+      // *z is the last character
+      if (*z == '\\' || *z == '/')
+         if (z != t+2 || t[1] != ':') // but don't strip it if it's e.g. "c:/"
+            *z = 0;
+      if (*z == '\\')
+         *z = '/'; // canonicalize to make sure it matches db
+   }
+   return t;
+}
+
+char *stb_strip_final_slash_regardless(char *t)
+{
+   if (t[0]) {
+      char *z = t + strlen(t) - 1;
+      // *z is the last character
+      if (*z == '\\' || *z == '/')
+         *z = 0;
+      if (*z == '\\')
+         *z = '/'; // canonicalize to make sure it matches db
+   }
+   return t;
+}
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                 Portable directory reading
+//
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN char **stb_readdir_files  (char *dir);
+STB_EXTERN char **stb_readdir_files_mask(char *dir, char *wild);
+STB_EXTERN char **stb_readdir_subdirs(char *dir);
+STB_EXTERN char **stb_readdir_subdirs_mask(char *dir, char *wild);
+STB_EXTERN void   stb_readdir_free   (char **files);
+STB_EXTERN char **stb_readdir_recursive(char *dir, char *filespec);
+STB_EXTERN void stb_delete_directory_recursive(char *dir);
+
+// forward declare for implementation
+STB_EXTERN int stb_wildmatchi(char *expr, char *candidate);
+#endif // STB_INCLUDE_STB_LIB_H
+
+
+#ifdef STB_LIB_IMPLEMENTATION
+
+#ifdef _MSC_VER
+#include <io.h>
+#else
+#include <unistd.h>
+#include <dirent.h>
+#endif
+
+void stb_readdir_free(char **files)
+{
+   char **f2 = files;
+   int i;
+   for (i=0; i < stb_arr_len(f2); ++i)
+      free(f2[i]);
+   stb_arr_free(f2);
+}
+
+static int isdotdirname(char *name)
+{
+   if (name[0] == '.')
+      return (name[1] == '.') ? !name[2] : !name[1];
+   return 0;
+}
+
+static char **readdir_raw(char *dir, int return_subdirs, char *mask)
+{
+   char **results = NULL;
+   char buffer[4096], with_slash[4096];
+   size_t n;
+
+   #ifdef _MSC_VER
+      stb__wchar *ws;
+      struct _wfinddata_t data;
+   #ifdef _WIN64
+      const intptr_t none = -1;
+      intptr_t z;
+   #else
+      const long none = -1;
+      long z;
+   #endif
+   #else // !_MSC_VER
+      const DIR *none = NULL;
+      DIR *z;
+   #endif
+
+   n = stb_strscpy(buffer,dir,sizeof(buffer));
+   if (!n || n >= sizeof(buffer))
+      return NULL;
+   stb_fixpath(buffer);
+   n--;
+
+   if (n > 0 && (buffer[n-1] != '/')) {
+      buffer[n++] = '/';
+   }
+   buffer[n] = 0;
+   if (!stb_strscpy(with_slash,buffer,sizeof(with_slash)))
+      return NULL;
+
+   #ifdef _MSC_VER
+      if (!stb_strscpy(buffer+n,"*.*",sizeof(buffer)-n))
+         return NULL;
+      ws = stb__from_utf8(buffer);
+      z = _wfindfirst((const wchar_t *)ws, &data);
+   #else
+      z = opendir(dir);
+   #endif
+
+   if (z != none) {
+      int nonempty = 1;
+      #ifndef _MSC_VER
+      struct dirent *data = readdir(z);
+      nonempty = (data != NULL);
+      #endif
+
+      if (nonempty) {
+
+         do {
+            int is_subdir;
+            #ifdef _MSC_VER
+            char *name = stb__to_utf8((stb__wchar *)data.name);
+            if (name == NULL) {
+               fprintf(stderr, "%s to convert '%S' to %s!\n", "Unable", data.name, "utf8");
+               continue;
+            }
+            is_subdir = !!(data.attrib & _A_SUBDIR);
+            #else
+            char *name = data->d_name;
+            if (!stb_strscpy(buffer+n,name,sizeof(buffer)-n))
+               break;
+            // Could follow DT_LNK, but would need to check for recursive links.
+            is_subdir = !!(data->d_type & DT_DIR);
+            #endif
+
+            if (is_subdir == return_subdirs) {
+               if (!is_subdir || !isdotdirname(name)) {
+                  if (!mask || stb_wildmatchi(mask, name)) {
+                     char buffer[4096],*p=buffer;
+                     if ( stb_snprintf(buffer, sizeof(buffer), "%s%s", with_slash, name) < 0 )
+                        break;
+                     if (buffer[0] == '.' && buffer[1] == '/')
+                        p = buffer+2;
+                     stb_arr_push(results, strdup(p));
+                  }
+               }
+            }
+         }
+         #ifdef _MSC_VER
+         while (0 == _wfindnext(z, &data));
+         #else
+         while ((data = readdir(z)) != NULL);
+         #endif
+      }
+      #ifdef _MSC_VER
+         _findclose(z);
+      #else
+         closedir(z);
+      #endif
+   }
+   return results;
+}
+
+char **stb_readdir_files  (char *dir) { return readdir_raw(dir, 0, NULL); }
+char **stb_readdir_subdirs(char *dir) { return readdir_raw(dir, 1, NULL); }
+char **stb_readdir_files_mask(char *dir, char *wild) { return readdir_raw(dir, 0, wild); }
+char **stb_readdir_subdirs_mask(char *dir, char *wild) { return readdir_raw(dir, 1, wild); }
+
+int stb__rec_max=0x7fffffff;
+static char **stb_readdir_rec(char **sofar, char *dir, char *filespec)
+{
+   char **files;
+   char ** dirs;
+   char **p;
+
+   if (stb_arr_len(sofar) >= stb__rec_max) return sofar;
+
+   files = stb_readdir_files_mask(dir, filespec);
+   stb_arr_for(p, files) {
+      stb_arr_push(sofar, strdup(*p));
+      if (stb_arr_len(sofar) >= stb__rec_max) break;
+   }
+   stb_readdir_free(files);
+   if (stb_arr_len(sofar) >= stb__rec_max) return sofar;
+
+   dirs = stb_readdir_subdirs(dir);
+   stb_arr_for(p, dirs)
+      sofar = stb_readdir_rec(sofar, *p, filespec);
+   stb_readdir_free(dirs);
+   return sofar;
+}
+
+char **stb_readdir_recursive(char *dir, char *filespec)
+{
+   return stb_readdir_rec(NULL, dir, filespec);
+}
+
+void stb_delete_directory_recursive(char *dir)
+{
+   char **list = stb_readdir_subdirs(dir);
+   int i;
+   for (i=0; i < stb_arr_len(list); ++i)
+      stb_delete_directory_recursive(list[i]);
+   stb_arr_free(list);
+   list = stb_readdir_files(dir);
+   for (i=0; i < stb_arr_len(list); ++i)
+      if (!remove(list[i])) {
+         // on windows, try again after making it writeable; don't ALWAYS
+         // do this first since that would be slow in the normal case
+         #ifdef _MSC_VER
+         _chmod(list[i], _S_IWRITE);
+         remove(list[i]);
+         #endif
+      }
+   stb_arr_free(list);
+   stb__windows(_rmdir,rmdir)(dir);
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//                 Checksums: CRC-32, ADLER32, SHA-1
+//
+//    CRC-32 and ADLER32 allow streaming blocks
+//    SHA-1 requires either a complete buffer, max size 2^32 - 73
+//          or it can checksum directly from a file, max 2^61
+
+#ifndef STB_INCLUDE_STB_LIB_H
+#define STB_ADLER32_SEED   1
+#define STB_CRC32_SEED     0    // note that we logical NOT this in the code
+
+STB_EXTERN stb_uint stb_adler32    (stb_uint adler32,  stb_uchar *buffer, stb_uint buflen);
+STB_EXTERN stb_uint stb_crc32_block(stb_uint crc32  ,  stb_uchar *buffer, stb_uint buflen);
+STB_EXTERN stb_uint stb_crc32      (                   stb_uchar *buffer, stb_uint buflen);
+
+STB_EXTERN void stb_sha1(    unsigned char output[20], stb_uchar *buffer, unsigned int len);
+STB_EXTERN int stb_sha1_file(unsigned char output[20], char *file);
+#endif //  STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+stb_uint stb_crc32_block(stb_uint crc, unsigned char *buffer, stb_uint len)
+{
+   static stb_uint crc_table[256];
+   stb_uint i,j,s;
+   crc = ~crc;
+
+   if (crc_table[1] == 0)
+      for(i=0; i < 256; i++) {
+         for (s=i, j=0; j < 8; ++j)
+            s = (s >> 1) ^ (s & 1 ? 0xedb88320 : 0);
+         crc_table[i] = s;
+      }
+   for (i=0; i < len; ++i)
+      crc = (crc >> 8) ^ crc_table[buffer[i] ^ (crc & 0xff)];
+   return ~crc;
+}
+
+stb_uint stb_crc32(unsigned char *buffer, stb_uint len)
+{
+   return stb_crc32_block(0, buffer, len);
+}
+
+stb_uint stb_adler32(stb_uint adler32, stb_uchar *buffer, stb_uint buflen)
+{
+   const unsigned long ADLER_MOD = 65521;
+   unsigned long s1 = adler32 & 0xffff, s2 = adler32 >> 16;
+   unsigned long blocklen, i;
+
+   blocklen = buflen % 5552;
+   while (buflen) {
+      for (i=0; i + 7 < blocklen; i += 8) {
+         s1 += buffer[0], s2 += s1;
+         s1 += buffer[1], s2 += s1;
+         s1 += buffer[2], s2 += s1;
+         s1 += buffer[3], s2 += s1;
+         s1 += buffer[4], s2 += s1;
+         s1 += buffer[5], s2 += s1;
+         s1 += buffer[6], s2 += s1;
+         s1 += buffer[7], s2 += s1;
+
+         buffer += 8;
+      }
+
+      for (; i < blocklen; ++i)
+         s1 += *buffer++, s2 += s1;
+
+      s1 %= ADLER_MOD, s2 %= ADLER_MOD;
+      buflen -= blocklen;
+      blocklen = 5552;
+   }
+   return (s2 << 16) + s1;
+}
+
+#define stb__big32(c)    (((c)[0]<<24) + (c)[1]*65536 + (c)[2]*256 + (c)[3])
+static void stb__sha1(stb_uchar *chunk, stb_uint h[5])
+{
+   int i;
+   stb_uint a,b,c,d,e;
+   stb_uint w[80];
+
+   for (i=0; i < 16; ++i)
+      w[i] = stb__big32(&chunk[i*4]);
+   for (i=16; i < 80; ++i) {
+      stb_uint t;
+      t = w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16];
+      w[i] = (t + t) | (t >> 31);
+   }
+
+   a = h[0];
+   b = h[1];
+   c = h[2];
+   d = h[3];
+   e = h[4];
+
+   #define STB__SHA1(k,f)                                            \
+   {                                                                 \
+      stb_uint temp = (a << 5) + (a >> 27) + (f) + e + (k) + w[i];  \
+      e = d;                                                       \
+      d = c;                                                     \
+      c = (b << 30) + (b >> 2);                               \
+      b = a;                                              \
+      a = temp;                                    \
+   }
+
+   i=0;
+   for (; i < 20; ++i) STB__SHA1(0x5a827999, d ^ (b & (c ^ d))       );
+   for (; i < 40; ++i) STB__SHA1(0x6ed9eba1, b ^ c ^ d               );
+   for (; i < 60; ++i) STB__SHA1(0x8f1bbcdc, (b & c) + (d & (b ^ c)) );
+   for (; i < 80; ++i) STB__SHA1(0xca62c1d6, b ^ c ^ d               );
+
+   #undef STB__SHA1
+
+   h[0] += a;
+   h[1] += b;
+   h[2] += c;
+   h[3] += d;
+   h[4] += e;
+}
+
+void stb_sha1(stb_uchar output[20], stb_uchar *buffer, stb_uint len)
+{
+   unsigned char final_block[128];
+   stb_uint end_start, final_len, j;
+   int i;
+
+   stb_uint h[5];
+
+   h[0] = 0x67452301;
+   h[1] = 0xefcdab89;
+   h[2] = 0x98badcfe;
+   h[3] = 0x10325476;
+   h[4] = 0xc3d2e1f0;
+
+   // we need to write padding to the last one or two
+   // blocks, so build those first into 'final_block'
+
+   // we have to write one special byte, plus the 8-byte length
+
+   // compute the block where the data runs out
+   end_start = len & ~63;
+
+   // compute the earliest we can encode the length
+   if (((len+9) & ~63) == end_start) {
+      // it all fits in one block, so fill a second-to-last block
+      end_start -= 64;
+   }
+
+   final_len = end_start + 128;
+
+   // now we need to copy the data in
+   assert(end_start + 128 >= len+9);
+   assert(end_start < len || len < 64-9);
+
+   j = 0;
+   if (end_start > len)
+      j = (stb_uint) - (int) end_start;
+
+   for (; end_start + j < len; ++j)
+      final_block[j] = buffer[end_start + j];
+   final_block[j++] = 0x80;
+   while (j < 128-5) // 5 byte length, so write 4 extra padding bytes
+      final_block[j++] = 0;
+   // big-endian size
+   final_block[j++] = len >> 29;
+   final_block[j++] = len >> 21;
+   final_block[j++] = len >> 13;
+   final_block[j++] = len >>  5;
+   final_block[j++] = len <<  3;
+   assert(j == 128 && end_start + j == final_len);
+
+   for (j=0; j < final_len; j += 64) { // 512-bit chunks
+      if (j+64 >= end_start+64)
+         stb__sha1(&final_block[j - end_start], h);
+      else
+         stb__sha1(&buffer[j], h);
+   }
+
+   for (i=0; i < 5; ++i) {
+      output[i*4 + 0] = h[i] >> 24;
+      output[i*4 + 1] = h[i] >> 16;
+      output[i*4 + 2] = h[i] >>  8;
+      output[i*4 + 3] = h[i] >>  0;
+   }
+}
+
+int stb_sha1_file(stb_uchar output[20], char *file)
+{
+   int i;
+   stb_uint64 length=0;
+   unsigned char buffer[128];
+
+   FILE *f = stb__fopen(file, "rb");
+   stb_uint h[5];
+
+   if (f == NULL) return 0; // file not found
+
+   h[0] = 0x67452301;
+   h[1] = 0xefcdab89;
+   h[2] = 0x98badcfe;
+   h[3] = 0x10325476;
+   h[4] = 0xc3d2e1f0;
+
+   for(;;) {
+      int n = fread(buffer, 1, 64, f);
+      if (n == 64) {
+         stb__sha1(buffer, h);
+         length += n;
+      } else {
+         int block = 64;
+
+         length += n;
+
+         buffer[n++] = 0x80;
+
+         // if there isn't enough room for the length, double the block
+         if (n + 8 > 64) 
+            block = 128;
+
+         // pad to end
+         memset(buffer+n, 0, block-8-n);
+
+         i = block - 8;
+         buffer[i++] = (stb_uchar) (length >> 53);
+         buffer[i++] = (stb_uchar) (length >> 45);
+         buffer[i++] = (stb_uchar) (length >> 37);
+         buffer[i++] = (stb_uchar) (length >> 29);
+         buffer[i++] = (stb_uchar) (length >> 21);
+         buffer[i++] = (stb_uchar) (length >> 13);
+         buffer[i++] = (stb_uchar) (length >>  5);
+         buffer[i++] = (stb_uchar) (length <<  3);
+         assert(i == block);
+         stb__sha1(buffer, h);
+         if (block == 128)
+            stb__sha1(buffer+64, h);
+         else
+            assert(block == 64);
+         break;
+      }
+   }
+   fclose(f);
+
+   for (i=0; i < 5; ++i) {
+      output[i*4 + 0] = h[i] >> 24;
+      output[i*4 + 1] = h[i] >> 16;
+      output[i*4 + 2] = h[i] >>  8;
+      output[i*4 + 3] = h[i] >>  0;
+   }
+
+   return 1;
+}
+#endif // STB_LIB_IMPLEMENTATION
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//               Random Numbers via Meresenne Twister or LCG
+//
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN unsigned long stb_srandLCG(unsigned long seed);
+STB_EXTERN unsigned long stb_randLCG(void);
+STB_EXTERN double        stb_frandLCG(void);
+
+STB_EXTERN void          stb_srand(unsigned long seed);
+STB_EXTERN unsigned long stb_rand(void);
+STB_EXTERN double        stb_frand(void);
+STB_EXTERN void          stb_shuffle(void *p, size_t n, size_t sz,
+                                        unsigned long seed);
+STB_EXTERN void stb_reverse(void *p, size_t n, size_t sz);
+
+STB_EXTERN unsigned long stb_randLCG_explicit(unsigned long seed);
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+unsigned long stb_randLCG_explicit(unsigned long seed)
+{
+   return seed * 2147001325 + 715136305;
+}
+
+static unsigned long stb__rand_seed=0;
+
+unsigned long stb_srandLCG(unsigned long seed)
+{
+   unsigned long previous = stb__rand_seed;
+   stb__rand_seed = seed;
+   return previous;
+}
+
+unsigned long stb_randLCG(void)
+{
+   stb__rand_seed = stb__rand_seed * 2147001325 + 715136305; // BCPL generator
+   // shuffle non-random bits to the middle, and xor to decorrelate with seed
+   return 0x31415926 ^ ((stb__rand_seed >> 16) + (stb__rand_seed << 16));
+}
+
+double stb_frandLCG(void)
+{
+   return stb_randLCG() / ((double) (1 << 16) * (1 << 16));
+}
+
+void stb_shuffle(void *p, size_t n, size_t sz, unsigned long seed)
+{
+   char *a;
+   unsigned long old_seed;
+   int i;
+   if (seed)
+      old_seed = stb_srandLCG(seed);
+   a = (char *) p + (n-1) * sz;
+
+   for (i=n; i > 1; --i) {
+      int j = stb_randLCG() % i;
+      stb_swap(a, (char *) p + j * sz, sz);
+      a -= sz;
+   }
+   if (seed)
+      stb_srandLCG(old_seed);
+}
+
+void stb_reverse(void *p, size_t n, size_t sz)
+{
+   int i,j = n-1;
+   for (i=0; i < j; ++i,--j) {
+      stb_swap((char *) p + i * sz, (char *) p + j * sz, sz);
+   }
+}
+
+// public domain Mersenne Twister by Michael Brundage
+#define STB__MT_LEN       624
+
+int stb__mt_index = STB__MT_LEN*sizeof(unsigned long)+1;
+unsigned long stb__mt_buffer[STB__MT_LEN];
+
+void stb_srand(unsigned long seed)
+{
+   int i;
+   unsigned long old = stb_srandLCG(seed);
+   for (i = 0; i < STB__MT_LEN; i++)
+      stb__mt_buffer[i] = stb_randLCG();
+   stb_srandLCG(old);
+   stb__mt_index = STB__MT_LEN*sizeof(unsigned long);
+}
+
+#define STB__MT_IA           397
+#define STB__MT_IB           (STB__MT_LEN - STB__MT_IA)
+#define STB__UPPER_MASK      0x80000000
+#define STB__LOWER_MASK      0x7FFFFFFF
+#define STB__MATRIX_A        0x9908B0DF
+#define STB__TWIST(b,i,j)    ((b)[i] & STB__UPPER_MASK) | ((b)[j] & STB__LOWER_MASK)
+#define STB__MAGIC(s)        (((s)&1)*STB__MATRIX_A)
+
+unsigned long stb_rand()
+{
+   unsigned long * b = stb__mt_buffer;
+   int idx = stb__mt_index;
+   unsigned long s,r;
+   int i;
+	
+   if (idx >= STB__MT_LEN*sizeof(unsigned long)) {
+      if (idx > STB__MT_LEN*sizeof(unsigned long))
+         stb_srand(0);
+      idx = 0;
+      i = 0;
+      for (; i < STB__MT_IB; i++) {
+         s = STB__TWIST(b, i, i+1);
+         b[i] = b[i + STB__MT_IA] ^ (s >> 1) ^ STB__MAGIC(s);
+      }
+      for (; i < STB__MT_LEN-1; i++) {
+         s = STB__TWIST(b, i, i+1);
+         b[i] = b[i - STB__MT_IB] ^ (s >> 1) ^ STB__MAGIC(s);
+      }
+      
+      s = STB__TWIST(b, STB__MT_LEN-1, 0);
+      b[STB__MT_LEN-1] = b[STB__MT_IA-1] ^ (s >> 1) ^ STB__MAGIC(s);
+   }
+   stb__mt_index = idx + sizeof(unsigned long);
+   
+   r = *(unsigned long *)((unsigned char *)b + idx);
+   
+   r ^= (r >> 11);
+   r ^= (r << 7) & 0x9D2C5680;
+   r ^= (r << 15) & 0xEFC60000;
+   r ^= (r >> 18);
+   
+   return r;
+}
+
+double stb_frand(void)
+{
+   return stb_rand() / ((double) (1 << 16) * (1 << 16));
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//      wildcards and regexping
+//
+
+#ifndef STB_INCLUDE_STB_LIB_H
+STB_EXTERN int stb_wildmatch (char *expr, char *candidate);
+STB_EXTERN int stb_wildmatchi(char *expr, char *candidate);
+STB_EXTERN int stb_wildfind  (char *expr, char *candidate);
+STB_EXTERN int stb_wildfindi (char *expr, char *candidate);
+#endif // STB_INCLUDE_STB_LIB_H
+
+#ifdef STB_LIB_IMPLEMENTATION
+static int stb__match_qstring(char *candidate, char *qstring, int qlen, int insensitive)
+{
+   int i;
+   if (insensitive) {
+      for (i=0; i < qlen; ++i)
+         if (qstring[i] == '?') {
+            if (!candidate[i]) return 0;
+         } else
+            if (tolower(qstring[i]) != tolower(candidate[i]))
+               return 0;
+   } else {
+      for (i=0; i < qlen; ++i)
+         if (qstring[i] == '?') {
+            if (!candidate[i]) return 0;
+         } else
+            if (qstring[i] != candidate[i])
+               return 0;
+   }
+   return 1;
+}
+
+static int stb__find_qstring(char *candidate, char *qstring, int qlen, int insensitive)
+{
+   char c;
+
+   int offset=0;
+   while (*qstring == '?') {
+      ++qstring;
+      --qlen;
+      ++candidate;
+      if (qlen == 0) return 0;
+      if (*candidate == 0) return -1;
+   }
+
+   c = *qstring++;
+   --qlen;
+   if (insensitive) c = tolower(c);
+
+   while (candidate[offset]) {
+      if (c == (insensitive ? tolower(candidate[offset]) : candidate[offset]))
+         if (stb__match_qstring(candidate+offset+1, qstring, qlen, insensitive))
+            return offset;
+      ++offset;
+   }
+
+   return -1;
+}
+
+int stb__wildmatch_raw2(char *expr, char *candidate, int search, int insensitive)
+{
+   int where=0;
+   int start = -1;
+   
+   if (!search) {
+      // parse to first '*'
+      if (*expr != '*')
+         start = 0;
+      while (*expr != '*') {
+         if (!*expr)
+            return *candidate == 0 ? 0 : -1;
+         if (*expr == '?') {
+            if (!*candidate) return -1;
+         } else {
+            if (insensitive) {
+               if (tolower(*candidate) != tolower(*expr))
+                  return -1;
+            } else 
+               if (*candidate != *expr)
+                  return -1;
+         }
+         ++candidate, ++expr, ++where;
+      }
+   } else {
+      // 0-length search string
+      if (!*expr)
+         return 0;
+   }
+
+   assert(search || *expr == '*');
+   if (!search)
+      ++expr;
+
+   // implicit '*' at this point
+      
+   while (*expr) {
+      int o=0;
+      // combine redundant * characters
+      while (expr[0] == '*') ++expr;
+
+      // ok, at this point, expr[-1] == '*',
+      // and expr[0] != '*'
+
+      if (!expr[0]) return start >= 0 ? start : 0;
+
+      // now find next '*'
+      o = 0;
+      while (expr[o] != '*') {
+         if (expr[o] == 0)
+            break;
+         ++o;
+      }
+      // if no '*', scan to end, then match at end
+      if (expr[o] == 0 && !search) {
+         int z;
+         for (z=0; z < o; ++z)
+            if (candidate[z] == 0)
+               return -1;
+         while (candidate[z])
+            ++z;
+         // ok, now check if they match
+         if (stb__match_qstring(candidate+z-o, expr, o, insensitive))
+            return start >= 0 ? start : 0;
+         return -1; 
+      } else {
+         // if yes '*', then do stb__find_qmatch on the intervening chars
+         int n = stb__find_qstring(candidate, expr, o, insensitive);
+         if (n < 0)
+            return -1;
+         if (start < 0)
+            start = where + n;
+         expr += o;
+         candidate += n+o;
+      }
+
+      if (*expr == 0) {
+         assert(search);
+         return start;
+      }
+
+      assert(*expr == '*');
+      ++expr;
+   }
+
+   return start >= 0 ? start : 0;
+}
+
+int stb__wildmatch_raw(char *expr, char *candidate, int search, int insensitive)
+{
+   char buffer[256];
+   // handle multiple search strings
+   char *s = strchr(expr, ';');
+   char *last = expr;
+   while (s) {
+      int z;
+      // need to allow for non-writeable strings... assume they're small
+      if (s - last < 256) {
+         stb_strncpy(buffer, last, s-last+1);
+         z = stb__wildmatch_raw2(buffer, candidate, search, insensitive);
+      } else {
+         *s = 0;
+         z = stb__wildmatch_raw2(last, candidate, search, insensitive);
+         *s = ';';
+      }
+      if (z >= 0) return z;
+      last = s+1;
+      s = strchr(last, ';');
+   }
+   return stb__wildmatch_raw2(last, candidate, search, insensitive);
+}
+
+int stb_wildmatch(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 0,0) >= 0;
+}
+
+int stb_wildmatchi(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 0,1) >= 0;
+}
+
+int stb_wildfind(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 1,0);
+}
+
+int stb_wildfindi(char *expr, char *candidate)
+{
+   return stb__wildmatch_raw(expr, candidate, 1,1);
+}
+
+#undef STB_LIB_IMPLEMENTATION
+#endif // STB_LIB_IMPLEMENTATION
+
+#ifndef STB_INCLUDE_STB_LIB_H
+#define STB_INCLUDE_STB_LIB_H
+#undef STB_EXTERN
+#endif
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of 
+this software and associated documentation files (the "Software"), to deal in 
+the Software without restriction, including without limitation the rights to 
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 
+of the Software, and to permit persons to whom the Software is furnished to do 
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all 
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this 
+software, either in source code form or as a compiled binary, for any purpose, 
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this 
+software dedicate any and all copyright interest in the software to the public 
+domain. We make this dedication for the benefit of the public at large and to 
+the detriment of our heirs and successors. We intend this dedication to be an 
+overt act of relinquishment in perpetuity of all present and future rights to 
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN 
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/vendor/stb/tests/resample_test.cpp b/vendor/stb/tests/resample_test.cpp
new file mode 100644
index 0000000..6595e37
--- /dev/null
+++ b/vendor/stb/tests/resample_test.cpp
@@ -0,0 +1,1127 @@
+#define _CRT_SECURE_NO_WARNINGS
+#include <stdlib.h>
+#include <stdio.h>
+
+#if defined(_WIN32) && _MSC_VER > 1200
+#define STBIR_ASSERT(x) \
+	if (!(x)) {         \
+		__debugbreak();  \
+	} else
+#else
+#include <assert.h>
+#define STBIR_ASSERT(x) assert(x)
+#endif
+
+#define STBIR_MALLOC stbir_malloc
+#define STBIR_FREE stbir_free
+
+class stbir_context {
+public:
+	stbir_context()
+	{
+		size = 1000000;
+		memory = malloc(size);
+	}
+
+	~stbir_context()
+	{
+		free(memory);
+	}
+
+	size_t size;
+	void* memory;
+} g_context;
+
+void* stbir_malloc(size_t size, void* context)
+{
+	if (!context)
+		return malloc(size);
+
+	stbir_context* real_context = (stbir_context*)context;
+	if (size > real_context->size)
+		return 0;
+
+	return real_context->memory;
+}
+
+void stbir_free(void* memory, void* context)
+{
+	if (!context)
+		free(memory);
+}
+
+//#include <stdio.h>
+void stbir_progress(float p)
+{
+	//printf("%f\n", p);
+	STBIR_ASSERT(p >= 0 && p <= 1);
+}
+
+#ifdef __clang__
+#define STBIRDEF static inline
+#endif
+
+#define STBIR_PROGRESS_REPORT stbir_progress
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#define STB_IMAGE_RESIZE_STATIC
+#include "stb_image_resize2.h"
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+#ifdef _WIN32
+#include <sys/timeb.h>
+#include <direct.h>
+#define mkdir(a, b) _mkdir(a)
+#else
+#include <sys/stat.h>
+#endif
+
+#define MT_SIZE 624
+static size_t g_aiMT[MT_SIZE];
+static size_t g_iMTI = 0;
+
+// Mersenne Twister implementation from Wikipedia.
+// Avoiding use of the system rand() to be sure that our tests generate the same test data on any system.
+void mtsrand(size_t iSeed)
+{
+	g_aiMT[0] = iSeed;
+	for (size_t i = 1; i < MT_SIZE; i++)
+	{
+		size_t inner1 = g_aiMT[i - 1];
+		size_t inner2 = (g_aiMT[i - 1] >> 30);
+		size_t inner = inner1 ^ inner2;
+		g_aiMT[i] = (0x6c078965 * inner) + i;
+	}
+
+	g_iMTI = 0;
+}
+
+size_t mtrand()
+{
+	if (g_iMTI == 0)
+	{
+		for (size_t i = 0; i < MT_SIZE; i++)
+		{
+			size_t y = (0x80000000 & (g_aiMT[i])) + (0x7fffffff & (g_aiMT[(i + 1) % MT_SIZE]));
+			g_aiMT[i] = g_aiMT[(i + 397) % MT_SIZE] ^ (y >> 1);
+			if ((y % 2) == 1)
+				g_aiMT[i] = g_aiMT[i] ^ 0x9908b0df;
+		}
+	}
+
+	size_t y = g_aiMT[g_iMTI];
+	y = y ^ (y >> 11);
+	y = y ^ ((y << 7) & (0x9d2c5680));
+	y = y ^ ((y << 15) & (0xefc60000));
+	y = y ^ (y >> 18);
+
+	g_iMTI = (g_iMTI + 1) % MT_SIZE;
+
+	return y;
+}
+
+
+inline float mtfrand()
+{
+	const int ninenine = 999999;
+	return (float)(mtrand() % ninenine)/ninenine;
+}
+
+void resizer(int argc, char **argv)
+{
+	unsigned char* input_pixels;
+	unsigned char* output_pixels;
+	int w, h;
+	int n;
+	int out_w, out_h;
+	input_pixels = stbi_load(argv[1], &w, &h, &n, 0);
+	out_w = w*3;
+	out_h = h*3;
+	output_pixels = (unsigned char*) malloc(out_w*out_h*n);
+	//stbir_resize_uint8_srgb(input_pixels, w, h, 0, output_pixels, out_w, out_h, 0, n, -1,0);
+	stbir_resize_uint8_linear(input_pixels, w, h, 0, output_pixels, out_w, out_h, 0, (stbir_pixel_layout) n);
+	stbi_write_png("output.png", out_w, out_h, n, output_pixels, 0);
+	exit(0);
+}
+
+void performance(int argc, char **argv)
+{
+	unsigned char* input_pixels;
+	unsigned char* output_pixels;
+	int w, h, count;
+	int n, i;
+	int out_w, out_h, srgb=1;
+	input_pixels = stbi_load(argv[1], &w, &h, &n, 0);
+    #if 0
+    out_w = w/4; out_h = h/4; count=100; // 1
+    #elif 0
+	out_w = w*2; out_h = h/4; count=20; // 2   // note this is structured pessimily, would be much faster to downsample vertically first
+    #elif 0
+    out_w = w/4; out_h = h*2; count=50; // 3
+    #elif 0
+    out_w = w*3; out_h = h*3; count=2; srgb=0; // 4
+    #else
+    out_w = w*3; out_h = h*3; count=2; // 5   // this is dominated by linear->sRGB conversion
+    #endif
+
+	output_pixels = (unsigned char*) malloc(out_w*out_h*n);
+    for (i=0; i < count; ++i)
+        if (srgb)
+	        stbir_resize_uint8_srgb(input_pixels, w, h, 0, output_pixels, out_w, out_h, 0, (stbir_pixel_layout) n);
+        else
+	        stbir_resize_uint8_linear(input_pixels, w, h, 0, output_pixels, out_w, out_h, 0, (stbir_pixel_layout) n);
+	exit(0);
+}
+
+void test_suite(int argc, char **argv);
+
+int main(int argc, char** argv)
+{
+	//resizer(argc, argv);
+    //performance(argc, argv);
+
+	test_suite(argc, argv);
+	return 0;
+}
+
+#if 0
+void resize_image(const char* filename, float width_percent, float height_percent, stbir_filter filter, stbir_edge edge, stbir_colorspace colorspace, const char* output_filename)
+{
+	int w, h, n;
+
+	unsigned char* input_data = stbi_load(filename, &w, &h, &n, 0);
+	if (!input_data)
+	{
+		printf("Input image could not be loaded\n");
+		return;
+	}
+
+	int out_w = (int)(w * width_percent);
+	int out_h = (int)(h * height_percent);
+
+	unsigned char* output_data = (unsigned char*)malloc(out_w * out_h * n);
+
+	stbir_resize(input_data, w, h, 0, output_data, out_w, out_h, 0, STBIR_TYPE_UINT8, n, STBIR_ALPHA_CHANNEL_NONE, 0, edge, edge, filter, filter, colorspace, &g_context);
+
+	stbi_image_free(input_data);
+
+	stbi_write_png(output_filename, out_w, out_h, n, output_data, 0);
+
+	free(output_data);
+}
+
+template <typename F, typename T>
+void convert_image(const F* input, T* output, int length)
+{
+	double f = (pow(2.0, 8.0 * sizeof(T)) - 1) / (pow(2.0, 8.0 * sizeof(F)) - 1);
+	for (int i = 0; i < length; i++)
+		output[i] = (T)(((double)input[i]) * f);
+}
+
+template <typename T>
+void test_format(const char* file, float width_percent, float height_percent, stbir_datatype type, stbir_colorspace colorspace)
+{
+	int w, h, n;
+	unsigned char* input_data = stbi_load(file, &w, &h, &n, 0);
+
+	if (input_data == NULL)
+		return;
+
+
+	int new_w = (int)(w * width_percent);
+	int new_h = (int)(h * height_percent);
+
+	T* T_data = (T*)malloc(w * h * n * sizeof(T));
+    memset(T_data, 0, w*h*n*sizeof(T));
+	convert_image<unsigned char, T>(input_data, T_data, w * h * n);
+
+	T* output_data = (T*)malloc(new_w * new_h * n * sizeof(T));
+
+	stbir_resize(T_data, w, h, 0, output_data, new_w, new_h, 0, type, n, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, colorspace, &g_context);
+
+	free(T_data);
+	stbi_image_free(input_data);
+
+	unsigned char* char_data = (unsigned char*)malloc(new_w * new_h * n * sizeof(char));
+	convert_image<T, unsigned char>(output_data, char_data, new_w * new_h * n);
+
+	char output[200];
+	sprintf(output, "test-output/type-%d-%d-%d-%d-%s", type, colorspace, new_w, new_h, file);
+	stbi_write_png(output, new_w, new_h, n, char_data, 0);
+
+	free(char_data);
+	free(output_data);
+}
+
+void convert_image_float(const unsigned char* input, float* output, int length)
+{
+	for (int i = 0; i < length; i++)
+		output[i] = ((float)input[i])/255;
+}
+
+void convert_image_float(const float* input, unsigned char* output, int length)
+{
+	for (int i = 0; i < length; i++)
+		output[i] = (unsigned char)(stbir__saturate(input[i]) * 255);
+}
+
+void test_float(const char* file, float width_percent, float height_percent, stbir_datatype type, stbir_colorspace colorspace)
+{
+	int w, h, n;
+	unsigned char* input_data = stbi_load(file, &w, &h, &n, 0);
+
+	if (input_data == NULL)
+		return;
+
+	int new_w = (int)(w * width_percent);
+	int new_h = (int)(h * height_percent);
+
+	float* T_data = (float*)malloc(w * h * n * sizeof(float));
+	convert_image_float(input_data, T_data, w * h * n);
+
+	float* output_data = (float*)malloc(new_w * new_h * n * sizeof(float));
+
+	stbir_resize_float_generic(T_data, w, h, 0, output_data, new_w, new_h, 0, n, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, colorspace, &g_context);
+
+	free(T_data);
+	stbi_image_free(input_data);
+
+	unsigned char* char_data = (unsigned char*)malloc(new_w * new_h * n * sizeof(char));
+	convert_image_float(output_data, char_data, new_w * new_h * n);
+
+	char output[200];
+	sprintf(output, "test-output/type-%d-%d-%d-%d-%s", type, colorspace, new_w, new_h, file);
+	stbi_write_png(output, new_w, new_h, n, char_data, 0);
+
+	free(char_data);
+	free(output_data);
+}
+
+void test_channels(const char* file, float width_percent, float height_percent, int channels)
+{
+	int w, h, n;
+	unsigned char* input_data = stbi_load(file, &w, &h, &n, 0);
+
+	if (input_data == NULL)
+		return;
+
+	int new_w = (int)(w * width_percent);
+	int new_h = (int)(h * height_percent);
+
+	unsigned char* channels_data = (unsigned char*)malloc(w * h * channels * sizeof(unsigned char));
+
+	for (int i = 0; i < w * h; i++)
+	{
+		int input_position = i * n;
+		int output_position = i * channels;
+
+		for (int c = 0; c < channels; c++)
+			channels_data[output_position + c] = input_data[input_position + stbir__min(c, n)];
+	}
+
+	unsigned char* output_data = (unsigned char*)malloc(new_w * new_h * channels * sizeof(unsigned char));
+
+	stbir_resize_uint8_srgb(channels_data, w, h, 0, output_data, new_w, new_h, 0, channels, STBIR_ALPHA_CHANNEL_NONE, 0);
+
+	free(channels_data);
+	stbi_image_free(input_data);
+
+	char output[200];
+	sprintf(output, "test-output/channels-%d-%d-%d-%s", channels, new_w, new_h, file);
+	stbi_write_png(output, new_w, new_h, channels, output_data, 0);
+
+	free(output_data);
+}
+
+void test_subpixel(const char* file, float width_percent, float height_percent, float s1, float t1)
+{
+	int w, h, n;
+	unsigned char* input_data = stbi_load(file, &w, &h, &n, 0);
+
+	if (input_data == NULL)
+		return;
+
+	s1 = ((float)w - 1 + s1)/w;
+	t1 = ((float)h - 1 + t1)/h;
+
+	int new_w = (int)(w * width_percent);
+	int new_h = (int)(h * height_percent);
+
+	unsigned char* output_data = (unsigned char*)malloc(new_w * new_h * n * sizeof(unsigned char));
+
+	stbir_resize_region(input_data, w, h, 0, output_data, new_w, new_h, 0, STBIR_TYPE_UINT8, n, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 0, 0, s1, t1);
+
+	stbi_image_free(input_data);
+
+	char output[200];
+	sprintf(output, "test-output/subpixel-%d-%d-%f-%f-%s", new_w, new_h, s1, t1, file);
+	stbi_write_png(output, new_w, new_h, n, output_data, 0);
+
+	free(output_data);
+}
+
+void test_subpixel_region(const char* file, float width_percent, float height_percent, float s0, float t0, float s1, float t1)
+{
+	int w, h, n;
+	unsigned char* input_data = stbi_load(file, &w, &h, &n, 0);
+
+	if (input_data == NULL)
+		return;
+
+	int new_w = (int)(w * width_percent);
+	int new_h = (int)(h * height_percent);
+
+	unsigned char* output_data = (unsigned char*)malloc(new_w * new_h * n * sizeof(unsigned char));
+
+	stbir_resize_region(input_data, w, h, 0, output_data, new_w, new_h, 0, STBIR_TYPE_UINT8, n, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, s0, t0, s1, t1);
+
+	stbi_image_free(input_data);
+
+	char output[200];
+	sprintf(output, "test-output/subpixel-region-%d-%d-%f-%f-%f-%f-%s", new_w, new_h, s0, t0, s1, t1, file);
+	stbi_write_png(output, new_w, new_h, n, output_data, 0);
+
+	free(output_data);
+}
+
+void test_subpixel_command(const char* file, float width_percent, float height_percent, float x_scale, float y_scale, float x_offset, float y_offset)
+{
+	int w, h, n;
+	unsigned char* input_data = stbi_load(file, &w, &h, &n, 0);
+
+	if (input_data == NULL)
+		return;
+
+	int new_w = (int)(w * width_percent);
+	int new_h = (int)(h * height_percent);
+
+	unsigned char* output_data = (unsigned char*)malloc(new_w * new_h * n * sizeof(unsigned char));
+
+	stbir_resize_subpixel(input_data, w, h, 0, output_data, new_w, new_h, 0, STBIR_TYPE_UINT8, n, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, x_scale, y_scale, x_offset, y_offset);
+
+	stbi_image_free(input_data);
+
+	char output[200];
+	sprintf(output, "test-output/subpixel-command-%d-%d-%f-%f-%f-%f-%s", new_w, new_h, x_scale, y_scale, x_offset, y_offset, file);
+	stbi_write_png(output, new_w, new_h, n, output_data, 0);
+
+	free(output_data);
+}
+
+unsigned int* pixel(unsigned int* buffer, int x, int y, int c, int w, int n)
+{
+	return &buffer[y*w*n + x*n + c];
+}
+
+void test_premul()
+{
+	unsigned int input[2 * 2 * 4];
+	unsigned int output[1 * 1 * 4];
+	unsigned int output2[2 * 2 * 4];
+
+	memset(input, 0, sizeof(input));
+
+	// First a test to make sure premul is working properly.
+
+	// Top left - solid red
+	*pixel(input, 0, 0, 0, 2, 4) = 255;
+	*pixel(input, 0, 0, 3, 2, 4) = 255;
+
+	// Bottom left - solid red
+	*pixel(input, 0, 1, 0, 2, 4) = 255;
+	*pixel(input, 0, 1, 3, 2, 4) = 255;
+
+	// Top right - transparent green
+	*pixel(input, 1, 0, 1, 2, 4) = 255;
+	*pixel(input, 1, 0, 3, 2, 4) = 25;
+
+	// Bottom right - transparent green
+	*pixel(input, 1, 1, 1, 2, 4) = 255;
+	*pixel(input, 1, 1, 3, 2, 4) = 25;
+
+	stbir_resize(input, 2, 2, 0, output, 1, 1, 0, STBIR_TYPE_UINT32, 4, 3, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, &g_context);
+
+	float r = (float)255 / 4294967296;
+	float g = (float)255 / 4294967296;
+	float ra = (float)255 / 4294967296;
+	float ga = (float)25 / 4294967296;
+	float a = (ra + ga) / 2;
+
+	STBIR_ASSERT(output[0] == (unsigned int)(r * ra / 2 / a * 4294967296 + 0.5f)); // 232
+	STBIR_ASSERT(output[1] == (unsigned int)(g * ga / 2 / a * 4294967296 + 0.5f)); // 23
+	STBIR_ASSERT(output[2] == 0);
+	STBIR_ASSERT(output[3] == (unsigned int)(a * 4294967296 + 0.5f)); // 140
+
+	// Now a test to make sure it doesn't clobber existing values.
+
+	// Top right - completely transparent green
+	*pixel(input, 1, 0, 1, 2, 4) = 255;
+	*pixel(input, 1, 0, 3, 2, 4) = 0;
+
+	// Bottom right - completely transparent green
+	*pixel(input, 1, 1, 1, 2, 4) = 255;
+	*pixel(input, 1, 1, 3, 2, 4) = 0;
+
+	stbir_resize(input, 2, 2, 0, output2, 2, 2, 0, STBIR_TYPE_UINT32, 4, 3, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, &g_context);
+
+	STBIR_ASSERT(*pixel(output2, 0, 0, 0, 2, 4) == 255);
+	STBIR_ASSERT(*pixel(output2, 0, 0, 1, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 0, 0, 2, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 0, 0, 3, 2, 4) == 255);
+
+	STBIR_ASSERT(*pixel(output2, 0, 1, 0, 2, 4) == 255);
+	STBIR_ASSERT(*pixel(output2, 0, 1, 1, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 0, 1, 2, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 0, 1, 3, 2, 4) == 255);
+
+	STBIR_ASSERT(*pixel(output2, 1, 0, 0, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 1, 0, 1, 2, 4) == 255);
+	STBIR_ASSERT(*pixel(output2, 1, 0, 2, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 1, 0, 3, 2, 4) == 0);
+
+	STBIR_ASSERT(*pixel(output2, 1, 1, 0, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 1, 1, 1, 2, 4) == 255);
+	STBIR_ASSERT(*pixel(output2, 1, 1, 2, 2, 4) == 0);
+	STBIR_ASSERT(*pixel(output2, 1, 1, 3, 2, 4) == 0);
+}
+
+// test that splitting a pow-2 image into tiles produces identical results
+void test_subpixel_1()
+{
+	unsigned char image[8 * 8];
+
+	mtsrand(0);
+
+	for (int i = 0; i < sizeof(image); i++)
+		image[i] = mtrand() & 255;
+
+	unsigned char output_data[16 * 16];
+
+	stbir_resize_region(image, 8, 8, 0, output_data, 16, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 0, 0, 1, 1);
+
+	unsigned char output_left[8 * 16];
+	unsigned char output_right[8 * 16];
+
+	stbir_resize_region(image, 8, 8, 0, output_left, 8, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 0, 0, 0.5f, 1);
+	stbir_resize_region(image, 8, 8, 0, output_right, 8, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 0.5f, 0, 1, 1);
+
+	for (int x = 0; x < 8; x++)
+	{
+		for (int y = 0; y < 16; y++)
+		{
+			STBIR_ASSERT(output_data[y * 16 + x] == output_left[y * 8 + x]);
+			STBIR_ASSERT(output_data[y * 16 + x + 8] == output_right[y * 8 + x]);
+		}
+	}
+
+	stbir_resize_subpixel(image, 8, 8, 0, output_left, 8, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 2, 2, 0, 0);
+	stbir_resize_subpixel(image, 8, 8, 0, output_right, 8, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 2, 2, 8, 0);
+
+	{for (int x = 0; x < 8; x++)
+	{
+		for (int y = 0; y < 16; y++)
+		{
+			STBIR_ASSERT(output_data[y * 16 + x] == output_left[y * 8 + x]);
+			STBIR_ASSERT(output_data[y * 16 + x + 8] == output_right[y * 8 + x]);
+		}
+	}}
+}
+
+// test that replicating an image and using a subtile of it produces same results as wraparound
+void test_subpixel_2()
+{
+	unsigned char image[8 * 8];
+
+	mtsrand(0);
+
+	for (int i = 0; i < sizeof(image); i++)
+		image[i] = mtrand() & 255;
+
+	unsigned char large_image[32 * 32];
+
+	for (int x = 0; x < 8; x++)
+	{
+		for (int y = 0; y < 8; y++)
+		{
+			for (int i = 0; i < 4; i++)
+			{
+				for (int j = 0; j < 4; j++)
+					large_image[j*4*8*8 + i*8 + y*4*8 + x] = image[y*8 + x];
+			}
+		}
+	}
+
+	unsigned char output_data_1[16 * 16];
+	unsigned char output_data_2[16 * 16];
+
+	stbir_resize(image, 8, 8, 0, output_data_1, 16, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_WRAP, STBIR_EDGE_WRAP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context);
+	stbir_resize_region(large_image, 32, 32, 0, output_data_2, 16, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_WRAP, STBIR_EDGE_WRAP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 0.25f, 0.25f, 0.5f, 0.5f);
+
+	{for (int x = 0; x < 16; x++)
+	{
+		for (int y = 0; y < 16; y++)
+			STBIR_ASSERT(output_data_1[y * 16 + x] == output_data_2[y * 16 + x]);
+	}}
+
+	stbir_resize_subpixel(large_image, 32, 32, 0, output_data_2, 16, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_WRAP, STBIR_EDGE_WRAP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context, 2, 2, 16, 16);
+
+	{for (int x = 0; x < 16; x++)
+	{
+		for (int y = 0; y < 16; y++)
+			STBIR_ASSERT(output_data_1[y * 16 + x] == output_data_2[y * 16 + x]);
+	}}
+}
+
+// test that 0,0,1,1 subpixel produces same result as no-rect
+void test_subpixel_3()
+{
+	unsigned char image[8 * 8];
+
+	mtsrand(0);
+
+	for (int i = 0; i < sizeof(image); i++)
+		image[i] = mtrand() & 255;
+
+	unsigned char output_data_1[32 * 32];
+	unsigned char output_data_2[32 * 32];
+
+	stbir_resize_region(image, 8, 8, 0, output_data_1, 32, 32, 0, STBIR_TYPE_UINT8, 1, 0, STBIR_ALPHA_CHANNEL_NONE, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_LINEAR, NULL, 0, 0, 1, 1);
+	stbir_resize_uint8(image, 8, 8, 0, output_data_2, 32, 32, 0, 1);
+
+	for (int x = 0; x < 32; x++)
+	{
+		for (int y = 0; y < 32; y++)
+			STBIR_ASSERT(output_data_1[y * 32 + x] == output_data_2[y * 32 + x]);
+	}
+
+	stbir_resize_subpixel(image, 8, 8, 0, output_data_1, 32, 32, 0, STBIR_TYPE_UINT8, 1, 0, STBIR_ALPHA_CHANNEL_NONE, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_LINEAR, NULL, 4, 4, 0, 0);
+
+	{for (int x = 0; x < 32; x++)
+	{
+		for (int y = 0; y < 32; y++)
+			STBIR_ASSERT(output_data_1[y * 32 + x] == output_data_2[y * 32 + x]);
+	}}
+}
+
+// test that 1:1 resample using s,t=0,0,1,1 with bilinear produces original image
+void test_subpixel_4()
+{
+	unsigned char image[8 * 8];
+
+	mtsrand(0);
+
+	for (int i = 0; i < sizeof(image); i++)
+		image[i] = mtrand() & 255;
+
+	unsigned char output[8 * 8];
+
+	stbir_resize_region(image, 8, 8, 0, output, 8, 8, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_TRIANGLE, STBIR_FILTER_TRIANGLE, STBIR_COLORSPACE_LINEAR, &g_context, 0, 0, 1, 1);
+	STBIR_ASSERT(memcmp(image, output, 8 * 8) == 0);
+
+	stbir_resize_subpixel(image, 8, 8, 0, output, 8, 8, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_TRIANGLE, STBIR_FILTER_TRIANGLE, STBIR_COLORSPACE_LINEAR, &g_context, 1, 1, 0, 0);
+	STBIR_ASSERT(memcmp(image, output, 8 * 8) == 0);
+}
+
+static unsigned int  image88_int[8][8];
+static unsigned char image88 [8][8];
+static unsigned char output88[8][8];
+static unsigned char output44[4][4];
+static unsigned char output22[2][2];
+static unsigned char output11[1][1];
+
+void resample_88(stbir_filter filter)
+{
+	stbir_resize_uint8_generic(image88[0],8,8,0, output88[0],8,8,0, 1,-1,0, STBIR_EDGE_CLAMP, filter, STBIR_COLORSPACE_LINEAR, NULL);
+	stbir_resize_uint8_generic(image88[0],8,8,0, output44[0],4,4,0, 1,-1,0, STBIR_EDGE_CLAMP, filter, STBIR_COLORSPACE_LINEAR, NULL);
+	stbir_resize_uint8_generic(image88[0],8,8,0, output22[0],2,2,0, 1,-1,0, STBIR_EDGE_CLAMP, filter, STBIR_COLORSPACE_LINEAR, NULL);
+	stbir_resize_uint8_generic(image88[0],8,8,0, output11[0],1,1,0, 1,-1,0, STBIR_EDGE_CLAMP, filter, STBIR_COLORSPACE_LINEAR, NULL);
+}
+
+void verify_box(void)
+{
+	int i,j,t;
+
+	resample_88(STBIR_FILTER_BOX);
+
+	for (i=0; i < sizeof(image88); ++i)
+		STBIR_ASSERT(image88[0][i] == output88[0][i]);
+
+	t = 0;
+	for (j=0; j < 4; ++j)
+		for (i=0; i < 4; ++i) {
+			int n = image88[j*2+0][i*2+0]
+			      + image88[j*2+0][i*2+1]
+				  + image88[j*2+1][i*2+0]
+				  + image88[j*2+1][i*2+1];
+			STBIR_ASSERT(output44[j][i] == ((n+2)>>2) || output44[j][i] == ((n+1)>>2)); // can't guarantee exact rounding due to numerical precision
+			t += n;
+		}
+	STBIR_ASSERT(output11[0][0] == ((t+32)>>6) || output11[0][0] == ((t+31)>>6)); // can't guarantee exact rounding due to numerical precision
+}
+
+void verify_filter_normalized(stbir_filter filter, int output_size, unsigned int value)
+{
+	int i, j;
+	unsigned int output[64];
+
+	stbir_resize(image88_int[0], 8, 8, 0, output, output_size, output_size, 0, STBIR_TYPE_UINT32, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, filter, filter, STBIR_COLORSPACE_LINEAR, NULL);
+
+	for (j = 0; j < output_size; ++j)
+		for (i = 0; i < output_size; ++i)
+			STBIR_ASSERT(value == output[j*output_size + i]);
+}
+
+float round2(float f)
+{
+	return (float) floor(f+0.5f); // round() isn't C standard pre-C99
+}
+
+void test_filters(void)
+{
+	int i,j;
+
+	mtsrand(0);
+
+	for (i=0; i < sizeof(image88); ++i)
+		image88[0][i] = mtrand() & 255;
+	verify_box();
+
+	for (i=0; i < sizeof(image88); ++i)
+		image88[0][i] = 0;
+	image88[4][4] = 255;
+	verify_box();
+
+	for (j=0; j < 8; ++j)
+		for (i=0; i < 8; ++i)
+			image88[j][i] = (j^i)&1 ? 255 : 0;
+	verify_box();
+
+	for (j=0; j < 8; ++j)
+		for (i=0; i < 8; ++i)
+			image88[j][i] = i&2 ? 255 : 0;
+	verify_box();
+
+	int value = 64;
+
+	for (j = 0; j < 8; ++j)
+		for (i = 0; i < 8; ++i)
+			image88_int[j][i] = value;
+
+	verify_filter_normalized(STBIR_FILTER_BOX, 8, value);
+	verify_filter_normalized(STBIR_FILTER_TRIANGLE, 8, value);
+	verify_filter_normalized(STBIR_FILTER_CUBICBSPLINE, 8, value);
+	verify_filter_normalized(STBIR_FILTER_CATMULLROM, 8, value);
+	verify_filter_normalized(STBIR_FILTER_MITCHELL, 8, value);
+
+	verify_filter_normalized(STBIR_FILTER_BOX, 4, value);
+	verify_filter_normalized(STBIR_FILTER_TRIANGLE, 4, value);
+	verify_filter_normalized(STBIR_FILTER_CUBICBSPLINE, 4, value);
+	verify_filter_normalized(STBIR_FILTER_CATMULLROM, 4, value);
+	verify_filter_normalized(STBIR_FILTER_MITCHELL, 4, value);
+
+	verify_filter_normalized(STBIR_FILTER_BOX, 2, value);
+	verify_filter_normalized(STBIR_FILTER_TRIANGLE, 2, value);
+	verify_filter_normalized(STBIR_FILTER_CUBICBSPLINE, 2, value);
+	verify_filter_normalized(STBIR_FILTER_CATMULLROM, 2, value);
+	verify_filter_normalized(STBIR_FILTER_MITCHELL, 2, value);
+
+	verify_filter_normalized(STBIR_FILTER_BOX, 1, value);
+	verify_filter_normalized(STBIR_FILTER_TRIANGLE, 1, value);
+	verify_filter_normalized(STBIR_FILTER_CUBICBSPLINE, 1, value);
+	verify_filter_normalized(STBIR_FILTER_CATMULLROM, 1, value);
+	verify_filter_normalized(STBIR_FILTER_MITCHELL, 1, value);
+
+	{
+		// This test is designed to produce coefficients that are very badly denormalized.
+		unsigned int v = 556;
+
+		unsigned int input[100 * 100];
+		unsigned int output[11 * 11];
+
+		for (j = 0; j < 100 * 100; ++j)
+			input[j] = v;
+
+		stbir_resize(input, 100, 100, 0, output, 11, 11, 0, STBIR_TYPE_UINT32, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_TRIANGLE, STBIR_FILTER_TRIANGLE, STBIR_COLORSPACE_LINEAR, NULL);
+
+		for (j = 0; j < 11 * 11; ++j)
+			STBIR_ASSERT(v == output[j]);
+	}
+
+	{
+		// Now test the trapezoid filter for downsampling.
+		unsigned int input[3 * 1];
+		unsigned int output[2 * 1];
+
+		input[0] = 0;
+		input[1] = 255;
+		input[2] = 127;
+
+		stbir_resize(input, 3, 1, 0, output, 2, 1, 0, STBIR_TYPE_UINT32, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, NULL);
+
+		STBIR_ASSERT(output[0] == (unsigned int)round2((float)(input[0] * 2 + input[1]) / 3));
+		STBIR_ASSERT(output[1] == (unsigned int)round2((float)(input[2] * 2 + input[1]) / 3));
+
+		stbir_resize(input, 1, 3, 0, output, 1, 2, 0, STBIR_TYPE_UINT32, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, NULL);
+
+		STBIR_ASSERT(output[0] == (unsigned int)round2((float)(input[0] * 2 + input[1]) / 3));
+		STBIR_ASSERT(output[1] == (unsigned int)round2((float)(input[2] * 2 + input[1]) / 3));
+	}
+
+	{
+		// Now test the trapezoid filter for upsampling.
+		unsigned int input[2 * 1];
+		unsigned int output[3 * 1];
+
+		input[0] = 0;
+		input[1] = 255;
+
+		stbir_resize(input, 2, 1, 0, output, 3, 1, 0, STBIR_TYPE_UINT32, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, NULL);
+
+		STBIR_ASSERT(output[0] == input[0]);
+		STBIR_ASSERT(output[1] == (input[0] + input[1]) / 2);
+		STBIR_ASSERT(output[2] == input[1]);
+
+		stbir_resize(input, 1, 2, 0, output, 1, 3, 0, STBIR_TYPE_UINT32, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, NULL);
+
+		STBIR_ASSERT(output[0] == input[0]);
+		STBIR_ASSERT(output[1] == (input[0] + input[1]) / 2);
+		STBIR_ASSERT(output[2] == input[1]);
+	}
+
+	// checkerboard
+	{
+		unsigned char input[64][64];
+		unsigned char output[16][16];
+		int i,j;
+		for (j=0; j < 64; ++j)
+			for (i=0; i < 64; ++i)
+				input[j][i] = (i^j)&1 ? 255 : 0;
+		stbir_resize_uint8_generic(input[0], 64, 64, 0, output[0],16,16,0, 1,-1,0,STBIR_EDGE_WRAP,STBIR_FILTER_DEFAULT,STBIR_COLORSPACE_LINEAR,0);
+		for (j=0; j < 16; ++j)
+			for (i=0; i < 16; ++i)
+				STBIR_ASSERT(output[j][i] == 128);
+		stbir_resize_uint8_srgb_edgemode(input[0], 64, 64, 0, output[0],16,16,0, 1,-1,0,STBIR_EDGE_WRAP);
+		for (j=0; j < 16; ++j)
+			for (i=0; i < 16; ++i)
+				STBIR_ASSERT(output[j][i] == 188);
+
+
+	}
+
+	{
+		// Test trapezoid box filter
+		unsigned char input[2 * 1];
+		unsigned char output[127 * 1];
+
+		input[0] = 0;
+		input[1] = 255;
+
+		stbir_resize(input, 2, 1, 0, output, 127, 1, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, NULL);
+		STBIR_ASSERT(output[0] == 0);
+		STBIR_ASSERT(output[127 / 2 - 1] == 0);
+		STBIR_ASSERT(output[127 / 2] == 128);
+		STBIR_ASSERT(output[127 / 2 + 1] == 255);
+		STBIR_ASSERT(output[126] == 255);
+		stbi_write_png("test-output/trapezoid-upsample-horizontal.png", 127, 1, 1, output, 0);
+
+		stbir_resize(input, 1, 2, 0, output, 1, 127, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_BOX, STBIR_COLORSPACE_LINEAR, NULL);
+		STBIR_ASSERT(output[0] == 0);
+		STBIR_ASSERT(output[127 / 2 - 1] == 0);
+		STBIR_ASSERT(output[127 / 2] == 128);
+		STBIR_ASSERT(output[127 / 2 + 1] == 255);
+		STBIR_ASSERT(output[126] == 255);
+		stbi_write_png("test-output/trapezoid-upsample-vertical.png", 1, 127, 1, output, 0);
+	}
+}
+
+#define UMAX32   4294967295U
+
+static void write32(const char *filename, stbir_uint32 *output, int w, int h)
+{
+    stbir_uint8 *data = (stbir_uint8*) malloc(w*h*3);
+    for (int i=0; i < w*h*3; ++i)
+        data[i] = output[i]>>24;
+    stbi_write_png(filename, w, h, 3, data, 0);
+    free(data);
+}
+
+static void test_32(void)
+{
+    int w=100,h=120,x,y, out_w,out_h;
+    stbir_uint32 *input  = (stbir_uint32*) malloc(4 * 3 * w * h);
+    stbir_uint32 *output = (stbir_uint32*) malloc(4 * 3 * 3*w * 3*h);
+    for (y=0; y < h; ++y) {
+        for (x=0; x < w; ++x) {
+            input[y*3*w + x*3 + 0] = x * ( UMAX32/w );
+            input[y*3*w + x*3 + 1] = y * ( UMAX32/h );
+            input[y*3*w + x*3 + 2] = UMAX32/2;
+        }
+    }
+    out_w = w*33/16;
+    out_h = h*33/16;
+    stbir_resize(input,w,h,0,output,out_w,out_h,0,STBIR_TYPE_UINT32,3,-1,0,STBIR_EDGE_CLAMP,STBIR_EDGE_CLAMP,STBIR_FILTER_DEFAULT,STBIR_FILTER_DEFAULT,STBIR_COLORSPACE_LINEAR,NULL);
+    write32("test-output/seantest_1.png", output,out_w,out_h);
+
+    out_w = w*16/33;
+    out_h = h*16/33;
+    stbir_resize(input,w,h,0,output,out_w,out_h,0,STBIR_TYPE_UINT32,3,-1,0,STBIR_EDGE_CLAMP,STBIR_EDGE_CLAMP,STBIR_FILTER_DEFAULT,STBIR_FILTER_DEFAULT,STBIR_COLORSPACE_LINEAR,NULL);
+    write32("test-output/seantest_2.png", output,out_w,out_h);
+}
+
+
+void test_suite(int argc, char **argv)
+{
+	int i;
+	const char *barbara;
+
+	mkdir("test-output", 777);
+
+	if (argc > 1)
+		barbara = argv[1];
+	else
+		barbara = "barbara.png";
+
+	// check what cases we need normalization for
+#if 1
+	{
+		float x, y;
+		for (x = -1; x < 1; x += 0.05f) {
+			float sums[5] = { 0 };
+			float o;
+			for (o = -5; o <= 5; ++o) {
+				sums[0] += stbir__filter_mitchell(x + o, 1);
+				sums[1] += stbir__filter_catmullrom(x + o, 1);
+				sums[2] += stbir__filter_cubic(x + o, 1);
+				sums[3] += stbir__filter_triangle(x + o, 1);
+				sums[4] += stbir__filter_trapezoid(x + o, 0.5f);
+			}
+			for (i = 0; i < 5; ++i)
+				STBIR_ASSERT(sums[i] >= 1.0 - 0.001 && sums[i] <= 1.0 + 0.001);
+		}
+
+#if 1	
+		for (y = 0.11f; y < 1; y += 0.01f) {  // Step
+			for (x = -1; x < 1; x += 0.05f) { // Phase
+				float sums[5] = { 0 };
+				float o;
+				for (o = -5; o <= 5; o += y) {
+					sums[0] += y * stbir__filter_mitchell(x + o, 1);
+					sums[1] += y * stbir__filter_catmullrom(x + o, 1);
+					sums[2] += y * stbir__filter_cubic(x + o, 1);
+					sums[4] += y * stbir__filter_trapezoid(x + o, 0.5f);
+					sums[3] += y * stbir__filter_triangle(x + o, 1);
+				}
+				for (i = 0; i < 3; ++i)
+					STBIR_ASSERT(sums[i] >= 1.0 - 0.0170 && sums[i] <= 1.0 + 0.0170);
+			}
+		}
+#endif
+	}
+#endif
+
+#if 0 // linear_to_srgb_uchar table
+	for (i=0; i < 256; ++i) {
+		float f = stbir__srgb_to_linear((i-0.5f)/255.0f);
+		printf("%9d, ", (int) ((f) * (1<<28)));
+		if ((i & 7) == 7)
+			printf("\n");
+	}
+#endif
+
+	// old tests that hacky fix worked on - test that
+	// every uint8 maps to itself
+	for (i = 0; i < 256; i++) {
+		float f = stbir__srgb_to_linear(float(i) / 255);
+		int n = stbir__linear_to_srgb_uchar(f);
+		STBIR_ASSERT(n == i);
+	}
+
+	// new tests that hacky fix failed for - test that
+	// values adjacent to uint8 round to nearest uint8
+	for (i = 0; i < 256; i++) {
+		for (float y = -0.42f; y <= 0.42f; y += 0.01f) {
+			float f = stbir__srgb_to_linear((i+y) / 255.0f);
+			int n = stbir__linear_to_srgb_uchar(f);
+			STBIR_ASSERT(n == i);
+		}
+	}
+
+	test_filters();
+
+	test_subpixel_1();
+	test_subpixel_2();
+	test_subpixel_3();
+	test_subpixel_4();
+
+	test_premul();
+
+	test_32();
+
+	// Some tests to make sure errors don't pop up with strange filter/dimension combinations.
+	stbir_resize(image88, 8, 8, 0, output88, 4, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context);
+	stbir_resize(image88, 8, 8, 0, output88, 4, 16, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_BOX, STBIR_COLORSPACE_SRGB, &g_context);
+	stbir_resize(image88, 8, 8, 0, output88, 16, 4, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_BOX, STBIR_FILTER_CATMULLROM, STBIR_COLORSPACE_SRGB, &g_context);
+	stbir_resize(image88, 8, 8, 0, output88, 16, 4, 0, STBIR_TYPE_UINT8, 1, STBIR_ALPHA_CHANNEL_NONE, 0, STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_FILTER_CATMULLROM, STBIR_FILTER_BOX, STBIR_COLORSPACE_SRGB, &g_context);
+
+	int barbara_width, barbara_height, barbara_channels;
+	stbi_image_free(stbi_load(barbara, &barbara_width, &barbara_height, &barbara_channels, 0));
+
+	int res = 10;
+	// Downscaling
+	{for (int i = 0; i <= res; i++)
+	{
+		float t = (float)i/res;
+		float scale = 0.5;
+		float out_scale = 2.0f/3;
+		float x_shift = (barbara_width*out_scale - barbara_width*scale) * t;
+		float y_shift = (barbara_height*out_scale - barbara_height*scale) * t;
+
+		test_subpixel_command(barbara, scale, scale, out_scale, out_scale, x_shift, y_shift);
+	}}
+
+	// Upscaling
+	{for (int i = 0; i <= res; i++)
+	{
+		float t = (float)i/res;
+		float scale = 2;
+		float out_scale = 3;
+		float x_shift = (barbara_width*out_scale - barbara_width*scale) * t;
+		float y_shift = (barbara_height*out_scale - barbara_height*scale) * t;
+
+		test_subpixel_command(barbara, scale, scale, out_scale, out_scale, x_shift, y_shift);
+	}}
+
+	// Downscaling
+	{for (int i = 0; i <= res; i++)
+	{
+		float t = (float)i/res / 2;
+		test_subpixel_region(barbara, 0.25f, 0.25f, t, t, t+0.5f, t+0.5f);
+	}}
+
+	// No scaling
+	{for (int i = 0; i <= res; i++)
+	{
+		float t = (float)i/res / 2;
+		test_subpixel_region(barbara, 0.5f, 0.5f, t, t, t+0.5f, t+0.5f);
+	}}
+
+	// Upscaling
+	{for (int i = 0; i <= res; i++)
+	{
+		float t = (float)i/res / 2;
+		test_subpixel_region(barbara, 1, 1, t, t, t+0.5f, t+0.5f);
+	}}
+
+	{for (i = 0; i < 10; i++)
+		test_subpixel(barbara, 0.5f, 0.5f, (float)i / 10, 1);
+   }
+
+	{for (i = 0; i < 10; i++)
+		test_subpixel(barbara, 0.5f, 0.5f, 1, (float)i / 10);
+   }
+
+	{for (i = 0; i < 10; i++)
+		test_subpixel(barbara, 2, 2, (float)i / 10, 1);
+   }
+
+	{for (i = 0; i < 10; i++)
+		test_subpixel(barbara, 2, 2, 1, (float)i / 10);
+   }
+
+	// Channels test
+	test_channels(barbara, 0.5f, 0.5f, 1);
+	test_channels(barbara, 0.5f, 0.5f, 2);
+	test_channels(barbara, 0.5f, 0.5f, 3);
+	test_channels(barbara, 0.5f, 0.5f, 4);
+
+	test_channels(barbara, 2, 2, 1);
+	test_channels(barbara, 2, 2, 2);
+	test_channels(barbara, 2, 2, 3);
+	test_channels(barbara, 2, 2, 4);
+
+	// filter tests
+	resize_image(barbara, 2, 2, STBIR_FILTER_BOX         , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-upsample-nearest.png");
+	resize_image(barbara, 2, 2, STBIR_FILTER_TRIANGLE    , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-upsample-bilinear.png");
+	resize_image(barbara, 2, 2, STBIR_FILTER_CUBICBSPLINE, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-upsample-bicubic.png");
+	resize_image(barbara, 2, 2, STBIR_FILTER_CATMULLROM  , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-upsample-catmullrom.png");
+	resize_image(barbara, 2, 2, STBIR_FILTER_MITCHELL    , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-upsample-mitchell.png");
+
+	resize_image(barbara, 0.5f, 0.5f, STBIR_FILTER_BOX         , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-downsample-nearest.png");
+	resize_image(barbara, 0.5f, 0.5f, STBIR_FILTER_TRIANGLE    , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-downsample-bilinear.png");
+	resize_image(barbara, 0.5f, 0.5f, STBIR_FILTER_CUBICBSPLINE, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-downsample-bicubic.png");
+	resize_image(barbara, 0.5f, 0.5f, STBIR_FILTER_CATMULLROM  , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-downsample-catmullrom.png");
+	resize_image(barbara, 0.5f, 0.5f, STBIR_FILTER_MITCHELL    , STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, "test-output/barbara-downsample-mitchell.png");
+
+	{for (i = 10; i < 100; i++)
+	{
+		char outname[200];
+		sprintf(outname, "test-output/barbara-width-%d.jpg", i);
+		resize_image(barbara, (float)i / 100, 1, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, outname);
+	}}
+
+	{for (i = 110; i < 500; i += 10)
+	{
+		char outname[200];
+		sprintf(outname, "test-output/barbara-width-%d.jpg", i);
+		resize_image(barbara, (float)i / 100, 1, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, outname);
+	}}
+
+	{for (i = 10; i < 100; i++)
+	{
+		char outname[200];
+		sprintf(outname, "test-output/barbara-height-%d.jpg", i);
+		resize_image(barbara, 1, (float)i / 100, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, outname);
+	}}
+
+	{for (i = 110; i < 500; i += 10)
+	{
+		char outname[200];
+		sprintf(outname, "test-output/barbara-height-%d.jpg", i);
+		resize_image(barbara, 1, (float)i / 100, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, outname);
+	}}
+
+	{for (i = 50; i < 200; i += 10)
+	{
+		char outname[200];
+		sprintf(outname, "test-output/barbara-width-height-%d.jpg", i);
+		resize_image(barbara, 100 / (float)i, (float)i / 100, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB, outname);
+	}}
+
+	test_format<unsigned short>(barbara, 0.5, 2.0, STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB);
+	test_format<unsigned short>(barbara, 0.5, 2.0, STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR);
+	test_format<unsigned short>(barbara, 2.0, 0.5, STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB);
+	test_format<unsigned short>(barbara, 2.0, 0.5, STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR);
+
+	test_format<unsigned int>(barbara, 0.5, 2.0, STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB);
+	test_format<unsigned int>(barbara, 0.5, 2.0, STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR);
+	test_format<unsigned int>(barbara, 2.0, 0.5, STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB);
+	test_format<unsigned int>(barbara, 2.0, 0.5, STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR);
+
+	test_float(barbara, 0.5, 2.0, STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB);
+	test_float(barbara, 0.5, 2.0, STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR);
+	test_float(barbara, 2.0, 0.5, STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB);
+	test_float(barbara, 2.0, 0.5, STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR);
+
+	// Edge behavior tests
+	resize_image("hgradient.png", 2, 2, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR, "test-output/hgradient-clamp.png");
+	resize_image("hgradient.png", 2, 2, STBIR_FILTER_CATMULLROM, STBIR_EDGE_WRAP, STBIR_COLORSPACE_LINEAR, "test-output/hgradient-wrap.png");
+
+	resize_image("vgradient.png", 2, 2, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR, "test-output/vgradient-clamp.png");
+	resize_image("vgradient.png", 2, 2, STBIR_FILTER_CATMULLROM, STBIR_EDGE_WRAP, STBIR_COLORSPACE_LINEAR, "test-output/vgradient-wrap.png");
+
+	resize_image("1px-border.png", 2, 2, STBIR_FILTER_CATMULLROM, STBIR_EDGE_REFLECT, STBIR_COLORSPACE_LINEAR, "test-output/1px-border-reflect.png");
+	resize_image("1px-border.png", 2, 2, STBIR_FILTER_CATMULLROM, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR, "test-output/1px-border-clamp.png");
+
+	// sRGB tests
+	resize_image("gamma_colors.jpg", .5f, .5f, STBIR_FILTER_CATMULLROM, STBIR_EDGE_REFLECT, STBIR_COLORSPACE_SRGB, "test-output/gamma_colors.jpg");
+	resize_image("gamma_2.2.jpg", .5f, .5f, STBIR_FILTER_CATMULLROM, STBIR_EDGE_REFLECT, STBIR_COLORSPACE_SRGB, "test-output/gamma_2.2.jpg");
+	resize_image("gamma_dalai_lama_gray.jpg", .5f, .5f, STBIR_FILTER_CATMULLROM, STBIR_EDGE_REFLECT, STBIR_COLORSPACE_SRGB, "test-output/gamma_dalai_lama_gray.jpg");
+}
+#endif
+void test_suite(int argc, char **argv)
+{
+}
diff --git a/vendor/stb/tests/resample_test_c.c b/vendor/stb/tests/resample_test_c.c
new file mode 100644
index 0000000..e7e3531
--- /dev/null
+++ b/vendor/stb/tests/resample_test_c.c
@@ -0,0 +1,8 @@
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#define STB_IMAGE_RESIZE_STATIC
+#include "stb_image_resize.h"
+
+// Just to make sure it will build properly with a c compiler
+
+int main() {
+}
diff --git a/vendor/stb/tests/resize.dsp b/vendor/stb/tests/resize.dsp
new file mode 100644
index 0000000..cfb9608
--- /dev/null
+++ b/vendor/stb/tests/resize.dsp
@@ -0,0 +1,94 @@
+# Microsoft Developer Studio Project File - Name="resize" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=resize - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "resize.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "resize.mak" CFG="resize - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "resize - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "resize - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "resize - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /G6 /W3 /GX /Z7 /O2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386
+
+!ELSEIF  "$(CFG)" == "resize - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /WX /Gm /GX /ZI /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "resize - Win32 Release"
+# Name "resize - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\resample_test.cpp
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_image_resize2.h
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/sdf/sdf_test.c b/vendor/stb/tests/sdf/sdf_test.c
new file mode 100644
index 0000000..d5b0ca0
--- /dev/null
+++ b/vendor/stb/tests/sdf/sdf_test.c
@@ -0,0 +1,152 @@
+#define STB_DEFINE
+#include "stb.h"
+
+#define STB_TRUETYPE_IMPLEMENTATION
+#include "stb_truetype.h"
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#include "stb_image_write.h"
+
+// used both to compute SDF and in 'shader'
+float sdf_size = 32.0;          // the larger this is, the better large font sizes look
+float pixel_dist_scale = 64.0;  // trades off precision w/ ability to handle *smaller* sizes
+int onedge_value = 128;
+int padding = 3; // not used in shader
+
+typedef struct
+{
+   float advance;
+   signed char xoff;
+   signed char yoff;
+   unsigned char w,h;
+   unsigned char *data;
+} fontchar;
+
+fontchar fdata[128];
+
+#define BITMAP_W  1200
+#define BITMAP_H  800
+unsigned char bitmap[BITMAP_H][BITMAP_W][3];
+
+char *sample = "This is goofy text, size %d!";
+char *small_sample = "This is goofy text, size %d! Really needs in-shader supersampling to look good.";
+
+void blend_pixel(int x, int y, int color, float alpha)
+{
+   int i;
+   for (i=0; i < 3; ++i)
+      bitmap[y][x][i] = (unsigned char) (stb_lerp(alpha, bitmap[y][x][i], color)+0.5); // round
+}
+
+void draw_char(float px, float py, char c, float relative_scale)
+{
+   int x,y;
+   fontchar *fc = &fdata[c];
+   float fx0 = px + fc->xoff*relative_scale;
+   float fy0 = py + fc->yoff*relative_scale;
+   float fx1 = fx0 + fc->w*relative_scale;
+   float fy1 = fy0 + fc->h*relative_scale;
+   int ix0 = (int) floor(fx0);
+   int iy0 = (int) floor(fy0);
+   int ix1 = (int) ceil(fx1);
+   int iy1 = (int) ceil(fy1);
+   // clamp to viewport
+   if (ix0 < 0) ix0 = 0;
+   if (iy0 < 0) iy0 = 0;
+   if (ix1 > BITMAP_W) ix1 = BITMAP_W;
+   if (iy1 > BITMAP_H) iy1 = BITMAP_H;
+
+   for (y=iy0; y < iy1; ++y) {
+      for (x=ix0; x < ix1; ++x) {
+         float sdf_dist, pix_dist;
+         float bmx = stb_linear_remap(x, fx0, fx1, 0, fc->w);
+         float bmy = stb_linear_remap(y, fy0, fy1, 0, fc->h);
+         int v00,v01,v10,v11;
+         float v0,v1,v;
+         int sx0 = (int) bmx;
+         int sx1 = sx0+1;
+         int sy0 = (int) bmy;
+         int sy1 = sy0+1;
+         // compute lerp weights
+         bmx = bmx - sx0;
+         bmy = bmy - sy0;
+         // clamp to edge
+         sx0 = stb_clamp(sx0, 0, fc->w-1);
+         sx1 = stb_clamp(sx1, 0, fc->w-1);
+         sy0 = stb_clamp(sy0, 0, fc->h-1);
+         sy1 = stb_clamp(sy1, 0, fc->h-1);
+         // bilinear texture sample
+         v00 = fc->data[sy0*fc->w+sx0];
+         v01 = fc->data[sy0*fc->w+sx1];
+         v10 = fc->data[sy1*fc->w+sx0];
+         v11 = fc->data[sy1*fc->w+sx1];
+         v0 = stb_lerp(bmx,v00,v01);
+         v1 = stb_lerp(bmx,v10,v11);
+         v  = stb_lerp(bmy,v0 ,v1 );
+         #if 0
+         // non-anti-aliased
+         if (v > onedge_value)
+            blend_pixel(x,y,0,1.0);
+         #else
+         // Following math can be greatly simplified
+
+         // convert distance in SDF value to distance in SDF bitmap
+         sdf_dist = stb_linear_remap(v, onedge_value, onedge_value+pixel_dist_scale, 0, 1);
+         // convert distance in SDF bitmap to distance in output bitmap
+         pix_dist = sdf_dist * relative_scale;
+         // anti-alias by mapping 1/2 pixel around contour from 0..1 alpha
+         v = stb_linear_remap(pix_dist, -0.5f, 0.5f, 0, 1);
+         if (v > 1) v = 1;
+         if (v > 0)
+            blend_pixel(x,y,0,v);
+         #endif
+      }
+   }
+}
+
+
+void print_text(float x, float y, char *text, float scale)
+{
+   int i;
+   for (i=0; text[i]; ++i) {
+      if (fdata[text[i]].data)
+         draw_char(x,y,text[i],scale);
+      x += fdata[text[i]].advance * scale;
+   }
+}
+
+int main(int argc, char **argv)
+{
+   int ch;
+   float scale, ypos;
+   stbtt_fontinfo font;
+   void *data = stb_file("c:/windows/fonts/times.ttf", NULL);
+   stbtt_InitFont(&font, data, 0);
+
+   scale = stbtt_ScaleForPixelHeight(&font, sdf_size);
+
+   for (ch=32; ch < 127; ++ch) {
+      fontchar fc;
+      int xoff,yoff,w,h, advance;
+      fc.data = stbtt_GetCodepointSDF(&font, scale, ch, padding, onedge_value, pixel_dist_scale, &w, &h, &xoff, &yoff);
+      fc.xoff = xoff;
+      fc.yoff = yoff;
+      fc.w = w;
+      fc.h = h;
+      stbtt_GetCodepointHMetrics(&font, ch, &advance, NULL);
+      fc.advance = advance * scale;
+      fdata[ch] = fc;
+   }
+
+   ypos = 60;
+   memset(bitmap, 255, sizeof(bitmap));
+   print_text(400, ypos+30, stb_sprintf("sdf bitmap height %d", (int) sdf_size), 30/sdf_size);
+   ypos += 80;
+   for (scale = 8.0; scale < 120.0; scale *= 1.33f) {
+      print_text(80, ypos+scale, stb_sprintf(scale == 8.0 ? small_sample : sample, (int) scale), scale / sdf_size);
+      ypos += scale*1.05f + 20;
+   }
+
+   stbi_write_png("sdf_test.png", BITMAP_W, BITMAP_H, 3, bitmap, 0);
+   return 0;
+}
diff --git a/vendor/stb/tests/sdf/sdf_test_arial_16.png b/vendor/stb/tests/sdf/sdf_test_arial_16.png
new file mode 100644
index 0000000..3d2bc1e
Binary files /dev/null and b/vendor/stb/tests/sdf/sdf_test_arial_16.png differ
diff --git a/vendor/stb/tests/sdf/sdf_test_times_16.png b/vendor/stb/tests/sdf/sdf_test_times_16.png
new file mode 100644
index 0000000..c76e7b9
Binary files /dev/null and b/vendor/stb/tests/sdf/sdf_test_times_16.png differ
diff --git a/vendor/stb/tests/sdf/sdf_test_times_50.png b/vendor/stb/tests/sdf/sdf_test_times_50.png
new file mode 100644
index 0000000..bf4974f
Binary files /dev/null and b/vendor/stb/tests/sdf/sdf_test_times_50.png differ
diff --git a/vendor/stb/tests/stb.c b/vendor/stb/tests/stb.c
new file mode 100644
index 0000000..ee6941f
--- /dev/null
+++ b/vendor/stb/tests/stb.c
@@ -0,0 +1,3330 @@
+/*
+ * Unit tests for "stb.h"
+ */
+
+#define _CRT_SECURE_NO_WARNINGS
+//#include <windows.h>
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <assert.h>
+#include <stdlib.h>
+
+#ifdef _WIN32
+#include <crtdbg.h>
+#endif
+
+//#define STB_FASTMALLOC
+#ifdef _DEBUG
+#define STB_MALLOC_WRAPPER_DEBUG
+#endif
+#ifndef _M_AMD64
+#define STB_NPTR
+#endif
+#define STB_DEFINE
+#include "stb.h"
+
+//#include "stb_file.h"
+//#include "stb_pixel32.h"
+
+//#define DEBUG_BLOCK
+#ifdef DEBUG_BLOCK
+#include <conio.h>
+#endif
+
+#ifdef STB_FASTMALLOC
+#error "can't use FASTMALLOC with threads"
+#endif
+
+int count;
+void c(int truth, char *error)
+{
+   if (!truth) {
+      fprintf(stderr, "Test failed: %s\n", error);
+      ++count;
+   }
+}
+
+
+#if 0
+void show(void)
+{
+   #ifdef _WIN32
+   SYSTEM_INFO x;
+   GetSystemInfo(&x);
+   printf("%d\n", x.dwPageSize);
+   #endif
+}
+#endif
+
+void test_classes(void)
+{
+   unsigned char size_base[32], size_shift[32];
+   int class_to_pages[256];
+   int class_to_size[256], cl;
+   int lg, size, wasted_pages;
+   int kAlignShift = 3;
+   int kAlignment = 1 << kAlignShift;
+   int kMaxSize = 8 * 4096;
+   int kPageShift = 12;
+   int kPageSize = (1 << kPageShift);
+  int next_class = 1;
+  int alignshift = kAlignShift;
+  int last_lg = -1;
+
+  for (lg = 0; lg < kAlignShift; lg++) {
+    size_base[lg] = 1;
+    size_shift[lg] = kAlignShift;
+  }
+
+  for (size = kAlignment; size <= kMaxSize; size += (1 << alignshift)) {
+    int lg = stb_log2_floor(size);
+    if (lg > last_lg) {
+      // Increase alignment every so often.
+      //
+      // Since we double the alignment every time size doubles and
+      // size >= 128, this means that space wasted due to alignment is
+      // at most 16/128 i.e., 12.5%.  Plus we cap the alignment at 256
+      // bytes, so the space wasted as a percentage starts falling for
+      // sizes > 2K.
+      if ((lg >= 7) && (alignshift < 8)) {
+        alignshift++;
+      }
+      size_base[lg] = next_class - ((size-1) >> alignshift);
+      size_shift[lg] = alignshift;
+    }
+
+    class_to_size[next_class] = size;
+    last_lg = lg;
+
+    next_class++;
+  }
+
+  // Initialize the number of pages we should allocate to split into
+  // small objects for a given class.
+  wasted_pages = 0;
+  for (cl = 1; cl < next_class; cl++) {
+    // Allocate enough pages so leftover is less than 1/8 of total.
+    // This bounds wasted space to at most 12.5%.
+    size_t psize = kPageSize;
+    const size_t s = class_to_size[cl];
+    while ((psize % s) > (psize >> 3)) {
+      psize += kPageSize;
+    }
+    class_to_pages[cl] = (int) (psize >> kPageShift);
+    wasted_pages += (int) psize;
+  }
+
+  printf("TCMalloc can waste as much as %d memory on one-shot allocations\n", wasted_pages);
+
+
+  return;
+}
+
+#ifdef STB_STUA
+void test_script(void)
+{
+   stua_run_script(
+      "var g = (2+3)*5 + 3*(2+1) + ((7)); \n"
+      "func sprint(x) _print(x) _print(' ') x end;\n"
+      "func foo(y) var q = func(x) sprint(x) end; q end;\n "
+      "var z=foo(5); z(77);\n"
+      "func counter(z) func(x) z=z+1 end end\n"
+      "var q=counter(0), p=counter(5);\n"
+      "sprint(q()) sprint(p()) sprint(q()) sprint(p()) sprint(q()) sprint(p())\n"
+      "var x=2222;\n"
+      "if 1 == 2 then 3333 else 4444 end; => x; sprint(x);\n"
+      "var x1 = sprint(1.5e3);  \n"
+      "var x2 = sprint(.5);  \n"
+      "var x3 = sprint(1.);  \n"
+      "var x4 = sprint(1.e3);  \n"
+      "var x5 = sprint(1e3);  \n"
+      "var x6 = sprint(0.5e3);  \n"
+      "var x7 = sprint(.5e3);  \n"
+      " func sum(x,y) x+y end                                       \n"
+      " func sumfunc(a) sum+{x=a} end                               \n"
+      " var q = sumfunc(3) \n"
+      " var p = sumfunc(20) \n"
+      " var d = sprint(q(5)) - sprint(q(8)) \n"
+      " var e = sprint(p(5)) - sprint(p(8)) \n"
+      " func test3(x)       \n"
+      "    sprint(x)         \n"
+      "    x = x+3          \n"
+      "    sprint(x)         \n"
+      "    x+5              \n"
+      " end                 \n"
+      " var y = test3(4);   \n"
+      " func fib(x)         \n"
+      "    if x < 3 then    \n"
+      "       1             \n"
+      "    else             \n"
+      "      fib(x-1) + fib(x-2); \n"
+      "    end              \n"
+      " end                 \n"
+      "                     \n"
+      " func fib2(x)        \n"
+      "    var a=1          \n"
+      "    var b=1          \n"
+      "    sprint(a)        \n"
+      "    sprint(b)        \n"
+      "    while x > 2 do   \n"
+      "       var c=a+b     \n"
+      "       a=b           \n"
+      "       b=c           \n"
+      "       sprint(b)     \n"
+      "       x=x-1         \n"
+      "    end              \n"
+      "    b                \n"
+      " end                 \n"
+      "                                                             \n"
+      " func assign(z)                                              \n"
+      "    var y = { 'this', 'is', 'a', 'lame', 'day', 'to', 'die'} \n"
+      "    y[3] = z                                                 \n"
+      "    var i = 0                                                \n"
+      "    while y[i] != nil do                                     \n"
+      "       sprint(y[i])                                          \n"
+      "       i = i+1                                               \n"
+      "    end                                                      \n"
+      " end                                                         \n"
+      "                                                             \n"
+      " sprint(fib(12)); \n"
+      " assign(\"good\"); \n"
+      " fib2(20); \n"
+      " sprint('ok'); \n"
+      " sprint(-5); \n"
+      " // final comment with no newline"
+   );
+}
+#endif
+
+#ifdef STB_THREADS
+extern void __stdcall Sleep(unsigned long);
+
+void * thread_1(void *x)
+{
+   Sleep(80);
+   printf("thread 1\n"); fflush(stdout);
+   return (void *) 2;
+}
+
+void * thread_2(void *y)
+{
+   stb_work(thread_1, NULL, y);
+   Sleep(50);
+   printf("thread 2\n"); fflush(stdout);
+   return (void *) 3;
+}
+
+stb_semaphore stest;
+stb_mutex mutex;
+volatile int tc1, tc2;
+
+void *thread_3(void *p)
+{
+   stb_mutex_begin(mutex);
+   ++tc1;
+   stb_mutex_end(mutex);
+   stb_sem_waitfor(stest);
+   stb_mutex_begin(mutex);
+   ++tc2;
+   stb_mutex_end(mutex);
+   return NULL;
+}
+
+void test_threads(void)
+{
+   volatile int a=0,b=0;
+   //stb_work_numthreads(2);
+   stb_work(thread_2, (void *) &a, (void *) &b);
+   while (a==0 || b==0) {
+      Sleep(10);
+      //printf("a=%d b=%d\n", a, b);
+   }
+   c(a==2 && b == 3, "stb_thread");
+   stb_work_numthreads(4);
+   stest = stb_sem_new(8);
+   mutex = stb_mutex_new();
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   while (tc1 < 4)
+      Sleep(10);
+   c(tc1 == 4, "stb_work 1");
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   Sleep(40);
+   while (tc1 != 8 || tc2 != 8)
+      Sleep(10);
+   c(tc1 == 8 && tc2 == 8, "stb_work 2");
+   stb_work_numthreads(2);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   stb_work(thread_3, NULL, NULL);
+   while (tc1 < 10)
+      Sleep(10);
+   c(tc1 == 10, "stb_work 1");
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+   stb_sem_release(stest);
+
+   Sleep(100);
+   stb_sem_delete(stest);
+   stb_mutex_delete(mutex);
+}
+#else
+void test_threads(void)
+{
+}
+#endif
+
+void *thread4(void *p)
+{
+   return NULL;
+}
+
+#ifdef STB_THREADS
+stb_threadqueue *tq;
+stb_sync synch;
+stb_mutex msum;
+
+volatile int thread_sum;
+
+void *consume1(void *p)
+{
+   volatile int *q = (volatile int *) p;
+   for(;;) {
+      int z;
+      stb_threadq_get_block(tq, &z);
+      stb_mutex_begin(msum);
+      thread_sum += z;
+      *q += z;
+      stb_mutex_end(msum);
+      stb_sync_reach(synch);
+   }      
+}
+
+void test_threads2(void)
+{
+   int array[256],i,n=0;
+   volatile int which[4];
+   synch = stb_sync_new();
+   stb_sync_set_target(synch,2);
+   stb_work_reach(thread4, NULL, NULL, synch);
+   stb_sync_reach_and_wait(synch);
+   printf("ok\n");
+
+   tq = stb_threadq_new(4, 1, TRUE,TRUE);
+   msum = stb_mutex_new();
+   thread_sum = 0;
+   stb_sync_set_target(synch, 65);
+   for (i=0; i < 4; ++i) {
+      which[i] = 0;
+      stb_create_thread(consume1, (int *) &which[i]);
+   }
+   for (i=1; i <= 64; ++i) {
+      array[i] = i;
+      n += i;
+      stb_threadq_add_block(tq, &array[i]);
+   }
+   stb_sync_reach_and_wait(synch);
+   stb_barrier();
+   c(thread_sum == n, "stb_threadq 1");
+   c(which[0] + which[1] + which[2] + which[3] == n, "stb_threadq 2");
+   printf("(Distribution: %d %d %d %d)\n", which[0], which[1], which[2], which[3]);
+
+   stb_sync_delete(synch);
+   stb_threadq_delete(tq);
+   stb_mutex_delete(msum);
+}
+#else
+void test_threads2(void)
+{
+}
+#endif
+
+char tc[] = "testing compression test quick test voila woohoo what the hell";
+
+unsigned char storage1[1 << 23];
+int test_compression(unsigned char *buffer, int length)
+{
+   unsigned char *storage2;
+   int c_len = stb_compress(storage1, buffer, length);
+   int dc_len;
+   printf("Compressed %d to %d\n", length, c_len);
+   dc_len = stb_decompress_length(storage1);
+   storage2 = malloc(dc_len);
+   dc_len = stb_decompress(storage2, storage1, c_len);
+   if (dc_len != length) { free(storage2); return -1; }
+   if (memcmp(buffer, storage2, length) != 0) { free(storage2); return -1; }
+   free(storage2);
+   return c_len;
+}
+
+#if 0
+int test_en_compression(char *buffer, int length)
+{
+   int c_len = stb_en_compress(storage1, buffer, length);
+   int dc_len;
+   printf("Encompressed %d to %d\n", length, c_len);
+   dc_len = stb_en_decompress(storage2, storage1, c_len);
+   if (dc_len != length) return -1;
+   if (memcmp(buffer, storage2, length) != 0) return -1;
+   return c_len;
+}
+#endif
+
+#define STR_x "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+#define STR_y "yyyyyyyyyyyyyyyyyy"
+
+#define STR_xy STR_x STR_y
+#define STR_xyyxy STR_xy STR_y STR_xy
+
+#define STR_1 "testing"
+#define STR_2 STR_xyyxy STR_xy STR_xyyxy STR_xyyxy STR_xy STR_xyyxy
+#define STR_3 "buh"
+
+char buffer[] = STR_1 "\r\n" STR_2 STR_2 STR_2 "\n" STR_3;
+char str1[] = STR_1;
+char str2[] = STR_2 STR_2 STR_2;
+char str3[] = STR_3;
+
+int sum(short *s)
+{
+   int i,total=0;
+   for (i=0; i < stb_arr_len(s); ++i)
+      total += s[i];
+   return total;
+}
+
+stb_uint stb_adler32_old(stb_uint adler32, stb_uchar *buffer, stb_uint buflen)
+{
+   const stb_uint ADLER_MOD = 65521;
+   stb_uint s1 = adler32 & 0xffff;
+   stb_uint s2 = adler32 >> 16;
+
+   while (buflen-- > 0) { // NOTE: much faster implementations are possible!
+      s1 += *buffer++; if (s1 > ADLER_MOD) s1 -= ADLER_MOD;
+      s2 += s1       ; if (s2 > ADLER_MOD) s2 -= ADLER_MOD;
+   }
+   return (s2 << 16) + s1;
+}
+
+int sample_test[3][5] =
+{
+   { 1,2,3,4,5 },
+   { 6,7,8,9,10, },
+   { 11,12,13,14,15 },
+};
+
+typedef struct { unsigned short x,y,z; } struct1;
+typedef struct { double a; int x,y,z; } struct2;
+
+char *args_raw[] = { "foo", "-dxrf", "bar", "-ts" };
+char *args[8];
+
+void do_compressor(int,char**);
+void test_sha1(void);
+
+int alloc_num, alloc_size;
+void dumpfunc(void *ptr, int sz, char *file, int line)
+{
+   printf("%p (%6d)  -- %3d:%s\n", ptr, sz, line, file);
+   alloc_size += sz;
+   alloc_num  += 1;
+}
+
+char *expects(stb_matcher *m, char *s, int result, int len, char *str)
+{
+   int res2,len2=0;
+   res2 = stb_lex(m, s, &len2);
+   c(result == res2 && len == len2, str);
+   return s + len;
+}
+
+void test_lex(void)
+{
+   stb_matcher *m = stb_lex_matcher();
+   //         tok_en5 .3 20.1 20. .20 .1
+   char *s = "tok_en5.3 20.1 20. .20.1";
+
+   stb_lex_item(m, "[a-zA-Z_][a-zA-Z0-9_]*", 1   );
+   stb_lex_item(m, "[0-9]*\\.?[0-9]*"      , 2   );
+   stb_lex_item(m, "[\r\n\t ]+"            , 3   );
+   stb_lex_item(m, "."                     , -99 );
+   s=expects(m,s,1,7, "stb_lex 1");
+   s=expects(m,s,2,2, "stb_lex 2");
+   s=expects(m,s,3,1, "stb_lex 3");
+   s=expects(m,s,2,4, "stb_lex 4");
+   s=expects(m,s,3,1, "stb_lex 5");
+   s=expects(m,s,2,3, "stb_lex 6");
+   s=expects(m,s,3,1, "stb_lex 7");
+   s=expects(m,s,2,3, "stb_lex 8");
+   s=expects(m,s,2,2, "stb_lex 9");
+   s=expects(m,s,0,0, "stb_lex 10");
+   stb_matcher_free(m);
+}
+
+typedef struct Btest
+{
+   struct Btest stb_bst_fields(btest_);
+   int v;
+} Btest;
+
+stb_bst(Btest, btest_, BT2,bt2,v, int, a - b)
+
+void bst_test(void)
+{
+   Btest *root = NULL, *t;
+   int items[500], sorted[500];
+   int i,j,z;
+   for (z=0; z < 10; ++z) {
+      for (i=0; i < 500; ++i)
+         items[i] = stb_rand() & 0xfffffff;
+
+      // check for collisions, and retrry if so
+      memcpy(sorted, items, sizeof(sorted));
+      qsort(sorted, 500, sizeof(sorted[0]), stb_intcmp(0));
+      for (i=1; i < 500; ++i)
+         if (sorted[i-1] == sorted[i])
+            break;
+      if (i != 500) { --z; break; }
+
+      for (i=0; i < 500; ++i)  {
+         t = malloc(sizeof(*t));
+         t->v = items[i];
+         root = btest_insert(root, t);
+         #ifdef STB_DEBUG
+         btest__validate(root,1);
+         #endif
+         for (j=0; j <= i; ++j)
+            c(btest_find(root, items[j]) != NULL, "stb_bst 1");
+         for (   ; j < 500; ++j)
+            c(btest_find(root, items[j]) == NULL, "stb_bst 2");
+      }
+
+      t = btest_first(root);
+      for (i=0; i < 500; ++i)
+         t = btest_next(root,t);
+      c(t == NULL, "stb_bst 5");
+      t = btest_last(root);
+      for (i=0; i < 500; ++i)
+         t = btest_prev(root,t);
+      c(t == NULL, "stb_bst 6");
+
+      memcpy(sorted, items, sizeof(sorted));
+      qsort(sorted, 500, sizeof(sorted[0]), stb_intcmp(0));
+      t = btest_first(root);
+      for (i=0; i < 500; ++i) {
+         assert(t->v == sorted[i]);
+         t = btest_next(root, t);
+      }
+      assert(t == NULL);
+
+      if (z==1)
+         stb_reverse(items, 500, sizeof(items[0]));
+      else if (z)
+         stb_shuffle(items, 500, sizeof(items[0]), stb_rand());
+
+      for (i=0; i < 500; ++i)  {
+         t = btest_find(root, items[i]);
+         assert(t != NULL);
+         root = btest_remove(root, t);
+         c(btest_find(root, items[i]) == NULL, "stb_bst 5");
+         #ifdef STB_DEBUG
+         btest__validate(root, 1);
+         #endif
+         for (j=0; j <= i; ++j)
+            c(btest_find(root, items[j]) == NULL, "stb_bst 3");
+         for (   ; j < 500; ++j)
+            c(btest_find(root, items[j]) != NULL, "stb_bst 4");
+         free(t);
+      }
+   }
+}
+
+extern void stu_uninit(void);
+
+stb_define_sort(sort_int, int, *a < *b)
+
+stb_rand_define(prime_rand, 1)
+void test_packed_floats(void);
+void test_parser_generator(void);
+
+void rec_print(stb_dirtree2 *d, int depth)
+{
+   int i;
+   for (i=0; i < depth; ++i) printf("  ");
+   printf("%s (%d)\n", d->relpath, stb_arr_len(d->files));
+   for (i=0; i < stb_arr_len(d->subdirs); ++i)
+      rec_print(d->subdirs[i], depth+1);
+   d->weight = (float) stb_arr_len(d->files);
+}
+
+#ifdef MAIN_TEST
+int main(int argc, char **argv)
+{
+   char *z;
+   stb__wchar buffer7[1024],buffer9[1024];
+   char buffer8[4096];
+   FILE *f;
+   char *p1 = "foo/bar\\baz/test.xyz";
+   char *p2 = "foo/.bar";
+   char *p3 = "foo.bar";
+   char *p4 = "foo/bar";
+   char *wildcards[] = { "*foo*", "*bar", "baz", "*1*2*3*", "*/CVS/repository", "*oof*" };
+   char **s;
+   char buf[256], *p;
+   int n,len2,*q,i;
+   stb_matcher *mt=NULL;
+
+   if (argc > 1) {
+      do_compressor(argc,argv);
+      return 0;
+   }
+   test_classes();
+   //show();
+
+   //stb_malloc_check_counter(2,2);
+   //_CrtSetBreakAlloc(10398);
+
+   stbprint("Checking {!if} the {$fancy} print function {#works}?  - should\n");
+   stbprint("                                                      - align\n");
+   stbprint("But {#3this}} {one}}                                  - shouldn't\n");
+
+   #if 0
+   {
+      int i;
+      char **s = stb_readdir_recursive("/sean", NULL);
+      stb_dirtree *d = stb_dirtree_from_files_relative("", s, stb_arr_len(s));
+      stb_dirtree **e;
+      rec_print(d, 0);
+      e = stb_summarize_tree(d,12,4);
+      for (i=0; i < stb_arr_len(e); ++i) {
+         printf("%s\n", e[i]->fullpath);
+      }
+      stb_arr_free(e);
+
+      stb_fatal("foo");
+   }
+   #endif
+
+   stb_("Started stb.c");
+   test_threads2();
+   test_threads();
+
+   for (i=0; i < 1023 && 5+77*i < 0xd800; ++i)
+      buffer7[i] = 5+77*i;
+   buffer7[i++] = 0xd801;
+   buffer7[i++] = 0xdc02;
+   buffer7[i++] = 0xdbff;
+   buffer7[i++] = 0xdfff;
+   buffer7[i] = 0;
+   p = stb_to_utf8(buffer8, buffer7, sizeof(buffer8));
+   c(p != NULL, "stb_to_utf8");
+   if (p != NULL) {
+      stb_from_utf8(buffer9, buffer8, sizeof(buffer9)/2);
+      c(!memcmp(buffer7, buffer9, i*2), "stb_from_utf8");
+   }
+
+   z = "foo.*[bd]ak?r";
+   c( stb_regex(z, "muggle man food is barfy") == 1, "stb_regex 1");
+   c( stb_regex("foo.*bar", "muggle man food is farfy") == 0, "stb_regex 2");
+   c( stb_regex("[^a-zA-Z]foo[^a-zA-Z]", "dfoobar xfood") == 0, "stb_regex 3");
+   c( stb_regex(z, "muman foob is bakrfy") == 1, "stb_regex 4");
+   z = "foo.*[bd]bk?r";
+   c( stb_regex(z, "muman foob is bakrfy") == 0, "stb_regex 5");
+   c( stb_regex(z, "muman foob is bbkrfy") == 1, "stb_regex 6");
+
+   stb_regex(NULL,NULL);
+
+   #if 0
+   test_parser_generator();
+   stb_wrapper_listall(dumpfunc);
+   if (alloc_num) 
+      printf("Memory still in use: %d allocations of %d bytes.\n", alloc_num, alloc_size);
+   #endif
+
+   test_script();
+   p = stb_file("sieve.stua", NULL);
+   if (p) {
+      stua_run_script(p);      
+      free(p);
+   }
+   stua_uninit();
+
+   //stb_wrapper_listall(dumpfunc);
+   printf("Memory still in use: %d allocations of %d bytes.\n", alloc_num, alloc_size);
+
+   c(stb_alloc_count_alloc == stb_alloc_count_free, "stb_alloc 0");
+
+   bst_test();
+
+   c(stb_alloc_count_alloc == stb_alloc_count_free, "stb_alloc 0");
+
+#if 0
+   // stb_block
+   {
+      int inuse=0, freespace=0;
+      int *x = malloc(10000*sizeof(*x));
+      stb_block *b = stb_block_new(1, 10000);
+      #define BLOCK_COUNT  1000
+      int *p = malloc(sizeof(*p) * BLOCK_COUNT);
+      int *l = malloc(sizeof(*l) * BLOCK_COUNT);
+      int i, n, k = 0;
+
+      memset(x, 0, 10000 * sizeof(*x));
+
+      n = 0;
+      while (n < BLOCK_COUNT && k < 1000) {
+         l[n] = 16 + (rand() & 31);
+         p[n] = stb_block_alloc(b, l[n], 0);
+         if (p[n] == 0)
+            break;
+         inuse += l[n];
+
+         freespace = 0;
+         for (i=0; i < b->len; ++i)
+            freespace += b->freelist[i].len;
+         assert(freespace + inuse == 9999);
+
+         for (i=0; i < l[n]; ++i)
+            x[ p[n]+i ] = p[n];
+         ++n;
+
+         if (k > 20) {
+            int sz;
+            i = (stb_rand() % n);
+            sz = l[i];
+            stb_block_free(b, p[i], sz);
+            inuse -= sz;
+            p[i] = p[n-1];
+            l[i] = l[n-1];
+            --n;
+
+            freespace = 0;
+            for (i=0; i < b->len; ++i)
+               freespace += b->freelist[i].len;
+            assert(freespace + inuse == 9999);
+         }
+
+
+         ++k;
+
+         // validate
+         if ((k % 50) == 0) {
+            int j;
+            for (j=0; j < n; ++j) {
+               for (i=0; i < l[j]; ++i)
+                  assert(x[ p[j]+i ] == p[j]);
+            }
+         }
+
+         if ((k % 200) == 0) {
+            stb_block_compact_freelist(b);
+         }
+      }
+
+      for (i=0; i < n; ++i)
+         stb_block_free(b, p[i], l[i]);
+
+      stb_block_destroy(b);
+      free(p);
+      free(l);
+      free(x);
+   }
+
+   blockfile_test();
+#endif
+
+   mt = stb_lex_matcher();
+   for (i=0; i < 5; ++i)
+      stb_lex_item_wild(mt, wildcards[i], i+1);
+
+   c(1==stb_lex(mt, "this is a foo in the middle",NULL), "stb_matcher_match 1");
+   c(0==stb_lex(mt, "this is a bar in the middle",NULL), "stb_matcher_match 2");
+   c(0==stb_lex(mt, "this is a baz in the middle",NULL), "stb_matcher_match 3");
+   c(2==stb_lex(mt, "this is a bar",NULL), "stb_matcher_match 4");
+   c(0==stb_lex(mt, "this is a baz",NULL), "stb_matcher_match 5");
+   c(3==stb_lex(mt, "baz",NULL), "stb_matcher_match 6");
+   c(4==stb_lex(mt, "1_2_3_4",NULL), "stb_matcher_match 7");
+   c(0==stb_lex(mt, "1  3  3 3 3  2 ",NULL), "stb_matcher_match 8");
+   c(4==stb_lex(mt, "1  3  3 3 2  3 ",NULL), "stb_matcher_match 9");
+   c(5==stb_lex(mt, "C:/sean/prj/old/gdmag/mipmap/hqp/adol-c/CVS/Repository",NULL), "stb_matcher_match 10");
+   stb_matcher_free(mt);
+
+   {
+      #define SSIZE  500000
+      static int arr[SSIZE],arr2[SSIZE];
+      int i,good;
+      for (i=0; i < SSIZE; ++i)
+         arr2[i] = stb_rand();
+      memcpy(arr,arr2,sizeof(arr));
+      printf("stb_define_sort:\n");
+      sort_int(arr, SSIZE);
+      good = 1;
+      for (i=0; i+1 < SSIZE; ++i)
+         if (arr[i] > arr[i+1])
+            good = 0;
+      c(good, "stb_define_sort");
+      printf("qsort:\n");
+      qsort(arr2, SSIZE, sizeof(arr2[0]), stb_intcmp(0));
+      printf("done\n");
+      // check for bugs
+      memset(arr, 0, sizeof(arr[0]) * 1000);
+      sort_int(arr, 1000);
+   }
+
+
+   c(stb_alloc_count_alloc == stb_alloc_count_free, "stb_alloc -2");
+
+   c( stb_is_prime( 2), "stb_is_prime 1");
+   c( stb_is_prime( 3), "stb_is_prime 2");
+   c( stb_is_prime( 5), "stb_is_prime 3");
+   c( stb_is_prime( 7), "stb_is_prime 4");
+   c(!stb_is_prime( 9), "stb_is_prime 5");
+   c( stb_is_prime(11), "stb_is_prime 6");
+   c(!stb_is_prime(25), "stb_is_prime 7");
+   c(!stb_is_prime(27), "stb_is_prime 8");
+   c( stb_is_prime(29), "stb_is_prime 9");
+   c( stb_is_prime(31), "stb_is_prime a");
+   c(!stb_is_prime(33), "stb_is_prime b");
+   c(!stb_is_prime(35), "stb_is_prime c");
+   c(!stb_is_prime(36), "stb_is_prime d");
+
+   for (n=7; n < 64; n += 3) {
+      int i;
+      stb_perfect s;
+      unsigned int *p = malloc(n * sizeof(*p));
+      for (i=0; i < n; ++i)
+         p[i] = i*i;
+      c(stb_perfect_create(&s, p, n), "stb_perfect_hash 1");
+      stb_perfect_destroy(&s);
+      for (i=0; i < n; ++i)
+         p[i] = stb_rand();
+      c(stb_perfect_create(&s, p, n), "stb_perfect_hash 2");
+      stb_perfect_destroy(&s);
+      for (i=0; i < n; ++i)
+         p[i] = (0x80000000 >> stb_log2_ceil(n>>1)) * i;
+      c(stb_perfect_create(&s, p, n), "stb_perfect_hash 2");
+      stb_perfect_destroy(&s);
+      for (i=0; i < n; ++i)
+         p[i] = (int) malloc(1024);
+      c(stb_perfect_create(&s, p, n), "stb_perfect_hash 3");
+      stb_perfect_destroy(&s);
+      for (i=0; i < n; ++i)
+         free((void *) p[i]);
+      free(p);
+   }
+   printf("Maximum attempts required to find perfect hash: %d\n",
+         stb_perfect_hash_max_failures);
+
+   p = "abcdefghijklmnopqrstuvwxyz";
+   c(stb_ischar('c', p), "stb_ischar 1");
+   c(stb_ischar('x', p), "stb_ischar 2");
+   c(!stb_ischar('#', p), "stb_ischar 3");
+   c(!stb_ischar('X', p), "stb_ischar 4");
+   p = "0123456789";
+   c(!stb_ischar('c', p), "stb_ischar 5");
+   c(!stb_ischar('x', p), "stb_ischar 6");
+   c(!stb_ischar('#', p), "stb_ischar 7");
+   c(!stb_ischar('X', p), "stb_ischar 8");
+   p = "#####";
+   c(!stb_ischar('c', p), "stb_ischar a");
+   c(!stb_ischar('x', p), "stb_ischar b");
+   c(stb_ischar('#', p), "stb_ischar c");
+   c(!stb_ischar('X', p), "stb_ischar d");
+   p = "xXyY";
+   c(!stb_ischar('c', p), "stb_ischar e");
+   c(stb_ischar('x', p), "stb_ischar f");
+   c(!stb_ischar('#', p), "stb_ischar g");
+   c(stb_ischar('X', p), "stb_ischar h");
+
+   c(stb_alloc_count_alloc == stb_alloc_count_free, "stb_alloc 1");
+
+   q = stb_wordwrapalloc(15, "How now brown cow. Testinglishously. Okey dokey");
+   // How now brown
+   // cow. Testinglis
+   // hously. Okey
+   // dokey
+   c(stb_arr_len(q) ==  8, "stb_wordwrap 8");
+   c(q[2] == 14 && q[3] == 15, "stb_wordwrap 9");
+   c(q[4] == 29 && q[5] == 12, "stb_wordwrap 10");
+   stb_arr_free(q);
+
+   q = stb_wordwrapalloc(20, "How now brown cow. Testinglishously. Okey dokey");
+   // How now brown cow.
+   // Testinglishously.
+   // Okey dokey
+   c(stb_arr_len(q) ==  6, "stb_wordwrap 1");
+   c(q[0] ==  0 && q[1] == 18, "stb_wordwrap 2");
+   c(q[2] == 19 && q[3] == 17, "stb_wordwrap 3");
+   c(q[4] == 37 && q[5] == 10, "stb_wordwrap 4");
+   stb_arr_free(q);
+
+   q = stb_wordwrapalloc(12, "How now brown cow. Testinglishously. Okey dokey");
+   // How now
+   // brown cow.
+   // Testinglisho
+   // usly. Okey
+   // dokey
+   c(stb_arr_len(q) ==  10, "stb_wordwrap 5");
+   c(q[4] == 19 && q[5] == 12, "stb_wordwrap 6");
+   c(q[6] == 31 && q[3] == 10, "stb_wordwrap 7");
+   stb_arr_free(q);
+
+   //test_script();
+
+   //test_packed_floats();
+
+   c(stb_alloc_count_alloc == stb_alloc_count_free, "stb_alloc 0");
+   if (stb_alloc_count_alloc != stb_alloc_count_free) {
+      printf("%d allocs, %d frees\n", stb_alloc_count_alloc, stb_alloc_count_free);
+   }
+   test_lex();
+
+   mt = stb_regex_matcher(".*foo.*bar.*");
+   c(stb_matcher_match(mt, "foobarx")                == 1, "stb_matcher_match 1");
+   c(stb_matcher_match(mt, "foobar")                 == 1, "stb_matcher_match 2");
+   c(stb_matcher_match(mt, "foo bar")                == 1, "stb_matcher_match 3");
+   c(stb_matcher_match(mt, "fo foo ba ba bar ba")    == 1, "stb_matcher_match 4");
+   c(stb_matcher_match(mt, "fo oo oo ba ba bar foo") == 0, "stb_matcher_match 5");
+   stb_free(mt);
+
+   mt = stb_regex_matcher(".*foo.?bar.*");
+   c(stb_matcher_match(mt, "abfoobarx")                == 1, "stb_matcher_match 6");
+   c(stb_matcher_match(mt, "abfoobar")                 == 1, "stb_matcher_match 7");
+   c(stb_matcher_match(mt, "abfoo bar")                == 1, "stb_matcher_match 8");
+   c(stb_matcher_match(mt, "abfoo  bar")               == 0, "stb_matcher_match 9");
+   c(stb_matcher_match(mt, "abfo foo ba ba bar ba")    == 0, "stb_matcher_match 10");
+   c(stb_matcher_match(mt, "abfo oo oo ba ba bar foo") == 0, "stb_matcher_match 11");
+   stb_free(mt);
+
+   mt = stb_regex_matcher(".*m((foo|bar)*baz)m.*");
+   c(stb_matcher_match(mt, "abfoobarx")                == 0, "stb_matcher_match 12");
+   c(stb_matcher_match(mt, "a mfoofoofoobazm d")       == 1, "stb_matcher_match 13");
+   c(stb_matcher_match(mt, "a mfoobarbazfoom d")       == 0, "stb_matcher_match 14");
+   c(stb_matcher_match(mt, "a mbarbarfoobarbazm d")    == 1, "stb_matcher_match 15");
+   c(stb_matcher_match(mt, "a mfoobarfoo bazm d")      == 0, "stb_matcher_match 16");
+   c(stb_matcher_match(mt, "a mm foobarfoobarfoobar ") == 0, "stb_matcher_match 17");
+   stb_free(mt);
+
+   mt = stb_regex_matcher("f*|z");
+   c(stb_matcher_match(mt, "fz")       == 0, "stb_matcher_match 0a");
+   c(stb_matcher_match(mt, "ff")       == 1, "stb_matcher_match 0b");
+   c(stb_matcher_match(mt, "z")        == 1, "stb_matcher_match 0c");
+   stb_free(mt);
+
+   mt = stb_regex_matcher("m(f|z*)n");
+   c(stb_matcher_match(mt, "mfzn")     == 0, "stb_matcher_match 0d");
+   c(stb_matcher_match(mt, "mffn")     == 0, "stb_matcher_match 0e");
+   c(stb_matcher_match(mt, "mzn")      == 1, "stb_matcher_match 0f");
+   c(stb_matcher_match(mt, "mn")       == 1, "stb_matcher_match 0g");
+   c(stb_matcher_match(mt, "mzfn")     == 0, "stb_matcher_match 0f");
+
+   c(stb_matcher_find(mt, "manmanmannnnnnnmmmmmmmmm       ") == 0, "stb_matcher_find 1");
+   c(stb_matcher_find(mt, "manmanmannnnnnnmmmmmmmmm       ") == 0, "stb_matcher_find 2");
+   c(stb_matcher_find(mt, "manmanmannnnnnnmmmmmmmmmffzzz  ") == 0, "stb_matcher_find 3");
+   c(stb_matcher_find(mt, "manmanmannnnnnnmmmmmmmmmnfzzz  ") == 1, "stb_matcher_find 4");
+   c(stb_matcher_find(mt, "mmmfn aanmannnnnnnmmmmmm fzzz  ") == 1, "stb_matcher_find 5");
+   c(stb_matcher_find(mt, "mmmzzn anmannnnnnnmmmmmm fzzz  ") == 1, "stb_matcher_find 6");
+   c(stb_matcher_find(mt, "mm anmannnnnnnmmmmmm fzmzznzz  ") == 1, "stb_matcher_find 7");
+   c(stb_matcher_find(mt, "mm anmannnnnnnmmmmmm fzmzzfnzz ") == 0, "stb_matcher_find 8");
+   c(stb_matcher_find(mt, "manmfnmannnnnnnmmmmmmmmmffzzz  ") == 1, "stb_matcher_find 9");
+   stb_free(mt);
+
+   mt = stb_regex_matcher(".*m((foo|bar)*|baz)m.*");
+   c(stb_matcher_match(mt, "abfoobarx")                == 0, "stb_matcher_match 18");
+   c(stb_matcher_match(mt, "a mfoofoofoobazm d")       == 0, "stb_matcher_match 19");
+   c(stb_matcher_match(mt, "a mfoobarbazfoom d")       == 0, "stb_matcher_match 20");
+   c(stb_matcher_match(mt, "a mbazm d")                == 1, "stb_matcher_match 21");
+   c(stb_matcher_match(mt, "a mfoobarfoom d")          == 1, "stb_matcher_match 22");
+   c(stb_matcher_match(mt, "a mm foobarfoobarfoobar ") == 1, "stb_matcher_match 23");
+   stb_free(mt);
+
+   mt = stb_regex_matcher("[a-fA-F]..[^]a-zA-Z]");
+   c(stb_matcher_match(mt, "Axx1")                     == 1, "stb_matcher_match 24");
+   c(stb_matcher_match(mt, "Fxx1")                     == 1, "stb_matcher_match 25");
+   c(stb_matcher_match(mt, "Bxx]")                     == 0, "stb_matcher_match 26");
+   c(stb_matcher_match(mt, "Cxxz")                     == 0, "stb_matcher_match 27");
+   c(stb_matcher_match(mt, "gxx[")                     == 0, "stb_matcher_match 28");
+   c(stb_matcher_match(mt, "-xx0")                     == 0, "stb_matcher_match 29");
+   stb_free(mt);
+
+   c(stb_wildmatch("foo*bar", "foobarx")    == 0, "stb_wildmatch 0a");
+   c(stb_wildmatch("foo*bar", "foobar")     == 1, "stb_wildmatch 1a");
+   c(stb_wildmatch("foo*bar", "foo bar")    == 1, "stb_wildmatch 2a");
+   c(stb_wildmatch("foo*bar", "fo foo ba ba bar ba") == 0, "stb_wildmatch 3a");
+   c(stb_wildmatch("foo*bar", "fo oo oo ba ba ar foo") == 0, "stb_wildmatch 4a");
+
+   c(stb_wildmatch("*foo*bar*", "foobar")     == 1, "stb_wildmatch 1b");
+   c(stb_wildmatch("*foo*bar*", "foo bar")    == 1, "stb_wildmatch 2b");
+   c(stb_wildmatch("*foo*bar*", "fo foo ba ba bar ba") == 1, "stb_wildmatch 3b");
+   c(stb_wildmatch("*foo*bar*", "fo oo oo ba ba ar foo") == 0, "stb_wildmatch 4b");
+
+   c(stb_wildmatch("foo*bar*", "foobarx")     == 1, "stb_wildmatch 1c");
+   c(stb_wildmatch("foo*bar*", "foobabar")    == 1, "stb_wildmatch 2c");
+   c(stb_wildmatch("foo*bar*", "fo foo ba ba bar ba") == 0, "stb_wildmatch 3c");
+   c(stb_wildmatch("foo*bar*", "fo oo oo ba ba ar foo") == 0, "stb_wildmatch 4c");
+
+   c(stb_wildmatch("*foo*bar", "foobar")     == 1, "stb_wildmatch 1d");
+   c(stb_wildmatch("*foo*bar", "foo bar")    == 1, "stb_wildmatch 2d");
+   c(stb_wildmatch("*foo*bar", "fo foo ba ba bar ba") == 0, "stb_wildmatch 3d");
+   c(stb_wildmatch("*foo*bar", "fo oo oo ba ba ar foo") == 0, "stb_wildmatch 4d");
+
+   c(stb_wildfind("foo*bar", "xyfoobarx")    == 2, "stb_wildfind 0a");
+   c(stb_wildfind("foo*bar", "aaafoobar")    == 3, "stb_wildfind 1a");
+   c(stb_wildfind("foo*bar", "foo bar")      == 0, "stb_wildfind 2a");
+   c(stb_wildfind("foo*bar", "fo foo ba ba bar ba") == 3, "stb_wildfind 3a");
+   c(stb_wildfind("foo*bar", "fo oo oo ba ba ar foo") == -1, "stb_wildfind 4a");
+
+   c(stb_wildmatch("*foo*;*bar*", "foobar")  == 1, "stb_wildmatch 1e");
+   c(stb_wildmatch("*foo*;*bar*", "afooa")  == 1, "stb_wildmatch 2e");
+   c(stb_wildmatch("*foo*;*bar*", "abara")  == 1, "stb_wildmatch 3e");
+   c(stb_wildmatch("*foo*;*bar*", "abaza")  == 0, "stb_wildmatch 4e");
+   c(stb_wildmatch("*foo*;*bar*", "foboar")  == 0, "stb_wildmatch 5e");
+
+   test_sha1();
+
+   n = sizeof(args_raw)/sizeof(args_raw[0]);
+   memcpy(args, args_raw, sizeof(args_raw));
+   s = stb_getopt(&n, args);
+   c(n >= 1 && !strcmp(args[1], "bar" ), "stb_getopt 1");
+   c(stb_arr_len(s) >= 2 && !strcmp(s[2]   , "r"   ), "stb_getopt 2");
+   stb_getopt_free(s);
+
+   n = sizeof(args_raw)/sizeof(args_raw[0]);
+   memcpy(args, args_raw, sizeof(args_raw));
+   s = stb_getopt_param(&n, args, "f");
+   c(stb_arr_len(s) >= 3 && !strcmp(s[3]   , "fbar"), "stb_getopt 3");
+   stb_getopt_free(s);
+
+   n = sizeof(args_raw)/sizeof(args_raw[0]);
+   memcpy(args, args_raw, sizeof(args_raw));
+   s = stb_getopt_param(&n, args, "x");
+   c(stb_arr_len(s) >= 2 && !strcmp(s[1]   , "xrf" ), "stb_getopt 4");
+   stb_getopt_free(s);
+
+   n = sizeof(args_raw)/sizeof(args_raw[0]);
+   memcpy(args, args_raw, sizeof(args_raw));
+   s = stb_getopt_param(&n, args, "s");
+   c(s == NULL && n == 0     , "stb_getopt 5");
+   stb_getopt_free(s);
+
+#if 0
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3, -1, -1) ==  1, "stb_csample_int 1");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3,  1, -3) ==  2, "stb_csample_int 2");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3, 12, -2) ==  5, "stb_csample_int 3");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3, 15,  1) == 10, "stb_csample_int 4");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3,  5,  4) == 15, "stb_csample_int 5");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3,  3,  3) == 14, "stb_csample_int 6");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3, -2,  5) == 11, "stb_csample_int 7");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3, -7,  0) ==  1, "stb_csample_int 8");
+   c(*stb_csample_int(sample_test[0], 1, 5, 5, 3,  2,  1) ==  8, "stb_csample_int 9");
+#endif
+
+   c(!strcmp(stb_splitpath(buf, p1, STB_PATH      ), "foo/bar\\baz/"), "stb_splitpath 1");
+   c(!strcmp(stb_splitpath(buf, p1, STB_FILE      ), "test"), "stb_splitpath 2");
+   c(!strcmp(stb_splitpath(buf, p1, STB_EXT       ), ".xyz"), "stb_splitpath 3");
+   c(!strcmp(stb_splitpath(buf, p1, STB_PATH_FILE ), "foo/bar\\baz/test"), "stb_splitpath 4");
+   c(!strcmp(stb_splitpath(buf, p1, STB_FILE_EXT  ), "test.xyz"), "stb_splitpath 5");
+
+   c(!strcmp(stb_splitpath(buf, p2, STB_PATH      ), "foo/"), "stb_splitpath 6");
+   c(!strcmp(stb_splitpath(buf, p2, STB_FILE      ), ""), "stb_splitpath 7");
+   c(!strcmp(stb_splitpath(buf, p2, STB_EXT       ), ".bar"), "stb_splitpath 8");
+   c(!strcmp(stb_splitpath(buf, p2, STB_PATH_FILE ), "foo/"), "stb_splitpath 9");
+   c(!strcmp(stb_splitpath(buf, p2, STB_FILE_EXT  ), ".bar"), "stb_splitpath 10");
+
+   c(!strcmp(stb_splitpath(buf, p3, STB_PATH      ), "./"), "stb_splitpath 11");
+   c(!strcmp(stb_splitpath(buf, p3, STB_FILE      ), "foo"), "stb_splitpath 12");
+   c(!strcmp(stb_splitpath(buf, p3, STB_EXT       ), ".bar"), "stb_splitpath 13");
+   c(!strcmp(stb_splitpath(buf, p3, STB_PATH_FILE ), "foo"), "stb_splitpath 14");
+
+   c(!strcmp(stb_splitpath(buf, p4, STB_PATH      ), "foo/"), "stb_splitpath 16");
+   c(!strcmp(stb_splitpath(buf, p4, STB_FILE      ), "bar"), "stb_splitpath 17");
+   c(!strcmp(stb_splitpath(buf, p4, STB_EXT       ), ""), "stb_splitpath 18");
+   c(!strcmp(stb_splitpath(buf, p4, STB_PATH_FILE ), "foo/bar"), "stb_splitpath 19");
+   c(!strcmp(stb_splitpath(buf, p4, STB_FILE_EXT  ), "bar"), "stb_splitpath 20");
+
+   c(!strcmp(p=stb_dupreplace("testfootffooo foo fox", "foo", "brap"), "testbraptfbrapo brap fox"), "stb_dupreplace 1"); free(p);
+   c(!strcmp(p=stb_dupreplace("testfootffooo foo fox", "foo", ""    ), "testtfo  fox"            ), "stb_dupreplace 2"); free(p);
+   c(!strcmp(p=stb_dupreplace("abacab", "a", "aba"),                   "abababacabab"            ), "stb_dupreplace 3"); free(p);
+
+
+#if 0
+   m = stb_mml_parse("<a><b><c>x</c><d>y</d></b><e>&lt;&amp;f&gt;</e></a>");
+   c(m != NULL, "stb_mml_parse 1");
+   if (m) {
+      c(!strcmp(m->child[0]->child[0]->child[1]->tag, "d"), "stb_mml_parse 2");
+      c(!strcmp(m->child[0]->child[1]->leaf_data, "<&f>"), "stb_mml_parse 3");
+   }
+   if (m)
+      stb_mml_free(m);
+   c(stb_alloc_count_alloc == stb_alloc_count_free, "stb_alloc 1");
+   if (stb_alloc_count_alloc != stb_alloc_count_free) {
+      printf("%d allocs, %d frees\n", stb_alloc_count_alloc, stb_alloc_count_free);
+   }
+#endif
+
+   c(stb_linear_remap(3.0f,0,8,1,2) == 1.375, "stb_linear_remap()");
+
+   c(stb_bitreverse(0x1248fec8) == 0x137f1248, "stb_bitreverse() 1");
+   c(stb_bitreverse8(0x4e) == 0x72, "stb_bitreverse8() 1");
+   c(stb_bitreverse8(0x31) == 0x8c, "stb_bitreverse8() 2");
+   for (n=1; n < 255; ++n) {
+      unsigned int m = stb_bitreverse8((uint8) n);
+      c(stb_bitreverse8((uint8) m) == (unsigned int) n, "stb_bitreverse8() 3");
+   }
+
+   for (n=2; n <= 31; ++n) {
+      c(stb_is_pow2   ((1 << n)  ) == 1  , "stb_is_pow2() 1");
+      c(stb_is_pow2   ((1 << n)+1) == 0  , "stb_is_pow2() 2");
+      c(stb_is_pow2   ((1 << n)-1) == 0  , "stb_is_pow2() 3");
+
+      c(stb_log2_floor((1 << n)  ) == n  , "stb_log2_floor() 1");
+      c(stb_log2_floor((1 << n)+1) == n  , "stb_log2_floor() 2");
+      c(stb_log2_floor((1 << n)-1) == n-1, "stb_log2_floor() 3");
+
+      c(stb_log2_ceil ((1 << n)  ) == n  , "stb_log2_ceil() 1");
+      c(stb_log2_ceil ((1 << n)+1) == n+1, "stb_log2_ceil() 2");
+      c(stb_log2_ceil ((1 << n)-1) == n  , "stb_log2_ceil() 3");
+
+      c(stb_bitreverse(1 << n) == 1U << (31-n), "stb_bitreverse() 2");
+   }
+
+   c(stb_log2_floor(0) == -1, "stb_log2_floor() 4");
+   c(stb_log2_ceil (0) == -1, "stb_log2_ceil () 4");
+
+   c(stb_log2_floor(-1) == 31, "stb_log2_floor() 5");
+   c(stb_log2_ceil (-1) == 32, "stb_log2_ceil () 5");
+
+   c(stb_bitcount(0xffffffff) == 32, "stb_bitcount() 1");
+   c(stb_bitcount(0xaaaaaaaa) == 16, "stb_bitcount() 2");
+   c(stb_bitcount(0x55555555) == 16, "stb_bitcount() 3");
+   c(stb_bitcount(0x00000000) ==  0, "stb_bitcount() 4");
+
+   c(stb_lowbit8(0xf0) == 4, "stb_lowbit8 1");
+   c(stb_lowbit8(0x10) == 4, "stb_lowbit8 2");
+   c(stb_lowbit8(0xf3) == 0, "stb_lowbit8 3");
+   c(stb_lowbit8(0xf8) == 3, "stb_lowbit8 4");
+   c(stb_lowbit8(0x60) == 5, "stb_lowbit8 5");
+
+   for (n=0; n < sizeof(buf); ++n)
+      buf[n] = 0;
+
+   for (n = 0; n < 200000; ++n) {
+      unsigned int k = stb_rand();
+      int i,z=0;
+      for (i=0; i < 32; ++i)
+         if (k & (1 << i)) ++z;
+      c(stb_bitcount(k) == z, "stb_bitcount() 5");
+
+      buf[k >> 24] = 1;
+
+      if (k != 0) {
+         if (stb_is_pow2(k)) {
+            c(stb_log2_floor(k) == stb_log2_ceil(k), "stb_is_pow2() 1");
+            c(k == 1U << stb_log2_floor(k), "stb_is_pow2() 2");
+         } else {
+            c(stb_log2_floor(k) == stb_log2_ceil(k)-1, "stb_is_pow2() 3");
+         }
+      }
+
+      c(stb_bitreverse(stb_bitreverse(n)) == (uint32) n, "stb_bitreverse() 3");
+   }
+
+   // make sure reasonable coverage from stb_rand()
+   for (n=0; n < sizeof(buf); ++n)
+      c(buf[n] != 0, "stb_rand()");
+
+   for (n=0; n < sizeof(buf); ++n)
+      buf[n] = 0;
+
+   for (n=0; n < 60000; ++n) {
+      float z = (float) stb_frand();
+      int n = (int) (z * sizeof(buf));
+      c(z >= 0 && z < 1, "stb_frand() 1");
+      c(n >= 0 && n < sizeof(buf), "stb_frand() 2");
+      buf[n] = 1;
+   }
+
+   // make sure reasonable coverage from stb_frand(),
+   // e.g. that the range remap isn't incorrect
+   for (n=0; n < sizeof(buf); ++n)
+      c(buf[n] != 0, "stb_frand()");
+         
+
+   // stb_arr
+   {
+      short *s = NULL;
+
+      c(sum(s) == 0, "stb_arr 1");
+
+      stb_arr_add(s); s[0] = 3;
+      stb_arr_push(s,7);
+
+      c( stb_arr_valid(s,1), "stb_arr 2");
+      c(!stb_arr_valid(s,2), "stb_arr 3");
+
+      // force a realloc
+      stb_arr_push(s,0);
+      stb_arr_push(s,0);
+      stb_arr_push(s,0);
+      stb_arr_push(s,0);
+
+      c(sum(s) == 10, "stb_arr 4");
+      stb_arr_push(s,0);
+      s[0] = 1; s[1] = 5; s[2] = 20;
+      c(sum(s) == 26, "stb_arr 5");
+      stb_arr_setlen(s,2);
+      c(sum(s) == 6, "stb_arr 6");
+      stb_arr_setlen(s,1);
+      c(sum(s) == 1, "stb_arr 7");
+      stb_arr_setlen(s,0);
+      c(sum(s) == 0, "stb_arr 8");
+
+      stb_arr_push(s,3);
+      stb_arr_push(s,4);
+      stb_arr_push(s,5);
+      stb_arr_push(s,6);
+      stb_arr_push(s,7);
+      stb_arr_deleten(s,1,3);
+      c(stb_arr_len(s)==2 && sum(s) == 10, "stb_arr_9");
+
+      stb_arr_push(s,2);
+      // 3 7 2
+      stb_arr_insertn(s,2,2);
+      // 3 7 x x 2
+      s[2] = 5;
+      s[3] = 6;
+      c(s[0]==3 && s[1] == 7 && s[2] == 5 && s[3] == 6 && s[4] == 2, "stb_arr 10");
+      stb_arr_free(s);
+   }
+
+   #if 1
+   f= stb_fopen("data/stb.test", "wb");
+   fwrite(buffer, 1, sizeof(buffer)-1, f);
+   stb_fclose(f, stb_keep_yes);
+   #ifndef WIN32
+   sleep(1);  // andLinux has some synchronization problem here
+   #endif
+   #else
+   f= fopen("data/stb.test", "wb");
+   fwrite(buffer, 1, sizeof(buffer)-1, f);
+   fclose(f);
+   #endif
+   if (!stb_fexists("data/stb.test")) {
+      fprintf(stderr, "Error: couldn't open file just written, or stb_fexists() is broken.\n");
+   }
+
+   f = fopen("data/stb.test", "rb");
+   // f = NULL; // test stb_fatal()
+   if (!f) { stb_fatal("Error: couldn't open file just written\n"); }
+   else {
+      char temp[4];
+      int len1 = stb_filelen(f), len2;
+      int n1,n2;
+      if (fread(temp,1,4,f) == 0) {
+         int n = ferror(f);
+         if (n) { stb_fatal("Error reading from stream: %d", n); }
+         if (feof(f)) stb_fatal("Weird, read 0 bytes and hit eof");
+         stb_fatal("Read 0, but neither feof nor ferror is true");
+      }
+      fclose(f);
+      p = stb_file("data/stb.test", &len2);
+      if (p == NULL) stb_fatal("Error: stb_file() failed");
+      c(len1 == sizeof(buffer)-1, "stb_filelen()");
+      c(len2 == sizeof(buffer)-1, "stb_file():n");
+      c(memcmp(p, buffer, sizeof(buffer)-1) == 0, "stb_file()");
+      c(strcmp(p, buffer)==0, "stb_file() terminated");
+      free(p);
+
+      s = stb_stringfile("data/stb.test", &n1);
+      c(n1 == 3, "stb_stringfile():n");
+      n2 = 0;
+      while (s[n2]) ++n2;
+      c(n1 == n2, "stb_stringfile():n length matches the non-NULL strings");
+      if (n2 == 3) {
+         c(strcmp(s[0],str1)==0, "stb_stringfile()[0]");
+         c(strcmp(s[1],str2)==0, "stb_stringfile()[1]");
+         c(strcmp(s[2],str3)==0, "stb_stringfile()[2] (no terminating newlines)");
+      }
+      free(s);
+
+      f = fopen("data/stb.test", "rb");
+      stb_fgets(buf, sizeof(buf), f);
+      //c(strcmp(buf, str1)==0, "stb_fgets()");
+      p = stb_fgets_malloc(f);
+      n1 = strlen(p);
+      n2 = strlen(str2);
+      c(strcmp(p, str2)==0, "stb_fgets_malloc()");
+      free(p);
+      stb_fgets(buf, sizeof(buf), f);
+      c(strcmp(buf, str3)==0, "stb_fgets()3");
+   }
+
+   c( stb_prefix("foobar", "foo"), "stb_prefix() 1");
+   c(!stb_prefix("foo", "foobar"), "stb_prefix() 2");
+   c( stb_prefix("foob", "foob" ), "stb_prefix() 3");
+
+   stb_strncpy(buf, "foobar", 6);  c(strcmp(buf,"fooba" )==0, "stb_strncpy() 1");
+   stb_strncpy(buf, "foobar", 8);  c(strcmp(buf,"foobar")==0, "stb_strncpy() 2");
+
+   c(!strcmp(p=stb_duplower("FooBar"), "foobar"), "stb_duplower()"); free(p);
+   strcpy(buf, "FooBar");
+   stb_tolower(buf);
+   c(!strcmp(buf, "foobar"), "stb_tolower()");
+
+   p = stb_strtok(buf, "foo=ba*r", "#=*");
+   c(!strcmp(buf, "foo" ), "stb_strtok() 1");
+   c(!strcmp(p  , "ba*r"), "stb_strtok() 2");
+   p = stb_strtok(buf, "foobar", "#=*");
+   c(*p == 0, "stb_strtok() 3");
+
+   c(!strcmp(stb_skipwhite(" \t\n foo"), "foo"), "stb_skipwhite()");
+
+   s = stb_tokens("foo == ba*r", "#=*", NULL);
+   c(!strcmp(s[0], "foo "), "stb_tokens() 1");
+   c(!strcmp(s[1], " ba"),  "stb_tokens() 2");
+   c(!strcmp(s[2], "r"),    "stb_tokens() 3");
+   c(s[3] == 0,             "stb_tokens() 4");
+   free(s);
+
+   s = stb_tokens_allowempty("foo == ba*r", "#=*", NULL);
+   c(!strcmp(s[0], "foo "), "stb_tokens_allowempty() 1");
+   c(!strcmp(s[1], ""    ), "stb_tokens_allowempty() 2");
+   c(!strcmp(s[2], " ba"),  "stb_tokens_allowempty() 3");
+   c(!strcmp(s[3], "r"),    "stb_tokens_allowempty() 4");
+   c(s[4] == 0,             "stb_tokens_allowempty() 5");
+   free(s);
+
+   s = stb_tokens_stripwhite("foo == ba*r", "#=*", NULL);
+   c(!strcmp(s[0], "foo"),  "stb_tokens_stripwhite() 1");
+   c(!strcmp(s[1], ""   ),  "stb_tokens_stripwhite() 2");
+   c(!strcmp(s[2], "ba"),   "stb_tokens_stripwhite() 3");
+   c(!strcmp(s[3], "r"),    "stb_tokens_stripwhite() 4");
+   c(s[4] == 0,             "stb_tokens_stripwhite() 5");
+   free(s);
+
+   s = stb_tokens_quoted("foo =\"=\" ba*\"\"r \" foo\" bah    ", "#=*", NULL);
+   c(!strcmp(s[0], "foo"),  "stb_tokens_quoted() 1");
+   c(!strcmp(s[1], "= ba"), "stb_tokens_quoted() 2");
+   c(!strcmp(s[2], "\"r  foo bah"),  "stb_tokens_quoted() 3");
+   c(s[3] == 0,             "stb_tokens_quoted() 4");
+   free(s);
+
+
+   p = stb_file("stb.h", &len2);
+   if (p) {
+      uint32 z = stb_adler32_old(1, p, len2);
+      uint32 x = stb_adler32    (1, p, len2);
+      c(z == x, "stb_adler32() 1");
+      memset(p,0xff,len2);
+      z = stb_adler32_old((65520<<16) + 65520, p, len2);
+      x = stb_adler32    ((65520<<16) + 65520, p, len2);
+      c(z == x, "stb_adler32() 2");
+      free(p);
+   }
+
+   //   stb_hheap
+   {
+      #define HHEAP_COUNT  100000
+      void **p = malloc(sizeof(*p) * HHEAP_COUNT);
+      int i, j;
+      #if 0
+      stb_hheap *h2, *h = stb_newhheap(sizeof(struct1),0);
+
+      for (i=0; i < HHEAP_COUNT; ++i)
+         p[i] = stb_halloc(h);
+      stb_shuffle(p, HHEAP_COUNT, sizeof(*p), stb_rand());
+      for (i=0; i < HHEAP_COUNT; ++i)
+         stb_hfree(p[i]);
+
+      c(h->num_alloc == 0, "stb_hheap 1");
+      stb_delhheap(h);
+
+      h = stb_newhheap(sizeof(struct1),0);
+      h2 = stb_newhheap(sizeof(struct2),8);
+
+      for (i=0; i < HHEAP_COUNT; ++i) {
+         if (i & 1)
+            p[i] = stb_halloc(h);
+         else {
+            p[i] = stb_halloc(h2);
+            c((((int) p[i]) & 4) == 0, "stb_hheap 2");
+         }
+      }
+
+      stb_shuffle(p, HHEAP_COUNT, sizeof(*p), stb_rand());
+      for (i=0; i < HHEAP_COUNT; ++i)
+         stb_hfree(p[i]);
+
+      c(h->num_alloc == 0, "stb_hheap 3");
+      c(h2->num_alloc == 0, "stb_hheap 4");
+
+      stb_delhheap(h);
+      stb_delhheap(h2);
+      #else
+      for (i=0; i < HHEAP_COUNT; ++i)
+         p[i] = malloc(32);
+      stb_shuffle(p, HHEAP_COUNT, sizeof(*p), stb_rand());
+      for (i=0; i < HHEAP_COUNT; ++i)
+         free(p[i]);
+      #endif
+
+      // now use the same array of pointers to do pointer set operations
+      for (j=100; j < HHEAP_COUNT; j += 25000) {
+         stb_ps *ps = NULL;
+         for (i=0; i < j; ++i)
+            ps = stb_ps_add(ps, p[i]);
+
+         for (i=0; i < HHEAP_COUNT; ++i)
+            c(stb_ps_find(ps, p[i]) == (i < j), "stb_ps 1");
+         c(stb_ps_count(ps) == j, "stb_ps 1b");
+
+         for (i=j; i < HHEAP_COUNT; ++i)
+            ps = stb_ps_add(ps, p[i]);
+
+         for (i=0; i < j; ++i)
+            ps = stb_ps_remove(ps, p[i]);
+
+         for (i=0; i < HHEAP_COUNT; ++i)
+            c(stb_ps_find(ps, p[i]) == !(i < j), "stb_ps 2");
+
+         stb_ps_delete(ps);
+      }
+
+      #define HHEAP_COUNT2   100
+      // now use the same array of pointers to do pointer set operations
+      for (j=1; j < 40; ++j) {
+         stb_ps *ps = NULL;
+         for (i=0; i < j; ++i)
+            ps = stb_ps_add(ps, p[i]);
+
+         for (i=0; i < HHEAP_COUNT2; ++i)
+            c(stb_ps_find(ps, p[i]) == (i < j), "stb_ps 3");
+         c(stb_ps_count(ps) == j, "stb_ps 3b");
+
+         for (i=j; i < HHEAP_COUNT2; ++i)
+            ps = stb_ps_add(ps, p[i]);
+
+         for (i=0; i < j; ++i)
+            ps = stb_ps_remove(ps, p[i]);
+
+         for (i=0; i < HHEAP_COUNT2; ++i)
+            c(stb_ps_find(ps, p[i]) == !(i < j), "stb_ps 4");
+
+         stb_ps_delete(ps);
+      }
+
+      free(p);
+   }
+
+
+   n = test_compression(tc, sizeof(tc));
+   c(n >= 0, "stb_compress()/stb_decompress() 1");
+
+   p = stb_file("stb.h", &len2);
+   if (p) {
+      FILE *f = fopen("data/stb_h.z", "wb");
+      if (stb_compress_stream_start(f)) {
+         int i;
+         void *q;
+         int len3;
+
+         for (i=0; i < len2; ) {
+            int n = stb_rand() % 10;
+            if (n <= 6) n = 1 + stb_rand()%16;
+            else if (n <= 8) n = 20 + stb_rand() % 1000;
+            else n = 15000;
+            if (i + n > len2) n = len2 - i;
+            stb_write(p + i, n);
+            i += n;
+         }
+         stb_compress_stream_end(1);
+
+         q = stb_decompress_fromfile("data/stb_h.z", &len3);
+         c(len3 == len2, "stb_compress_stream 2");
+         if (len2 == len3)
+            c(!memcmp(p,q,len2), "stb_compress_stream 3");
+         if (q) free(q);
+      } else {
+         c(0, "stb_compress_stream 1");
+      }
+
+      free(p);
+      stb_compress_window(65536*4);
+   }
+
+   p = stb_file("stb.h", &len2);
+   if (p) {
+      n = test_compression(p, len2);
+      c(n >= 0, "stb_compress()/stb_decompress() 2");
+      #if 0
+      n = test_en_compression(p, len2);
+      c(n >= 0, "stb_en_compress()/stb_en_decompress() 2");
+      #endif
+      free(p);
+   } else {
+      fprintf(stderr, "No stb.h to compression test.\n");
+   }
+
+   p = stb_file("data/test.bmp", &len2);
+   if (p) {
+      n = test_compression(p, len2);
+      c(n == 106141, "stb_compress()/stb_decompress() 4");
+      #if 0
+      n = test_en_compression(p, len2);
+      c(n >= 0, "stb_en_compress()/stb_en_decompress() 4");
+      #endif
+      free(p);
+   }
+
+   // the hardcoded compressed lengths being verified _could_
+   // change if you changed the compressor parameters; but pure
+   // performance optimizations shouldn't change them
+   p = stb_file("data/cantrbry.zip", &len2);
+   if (p) {
+      n = test_compression(p, len2);
+      c(n == 642787, "stb_compress()/stb_decompress() 3");
+      #if 0
+      n = test_en_compression(p, len2);
+      c(n >= 0, "stb_en_compress()/stb_en_decompress() 3");
+      #endif
+      free(p);
+   }
+
+   p = stb_file("data/bible.txt", &len2);
+   if (p) {
+      n = test_compression(p, len2);
+      c(n == 2022520, "stb_compress()/stb_decompress() 4");
+      #if 0
+      n = test_en_compression(p, len2);
+      c(n >= 0, "stb_en_compress()/stb_en_decompress() 4");
+      #endif
+      free(p);
+   }
+
+   {
+      int len = 1 << 25, o=0; // 32MB
+      char *buffer = malloc(len);
+      int i;
+      for (i=0; i < 8192; ++i)
+         buffer[o++] = (char) stb_rand();
+      for (i=0; i < (1 << 15); ++i)
+         buffer[o++] = 1;
+      for (i=0; i < 64; ++i)
+         buffer[o++] = buffer[i];
+      for (i=0; i < (1 << 21); ++i)
+         buffer[o++] = 2;
+      for (i=0; i < 64; ++i)
+         buffer[o++] = buffer[i];
+      for (i=0; i < (1 << 21); ++i)
+         buffer[o++] = 3;
+      for (i=0; i < 8192; ++i)
+         buffer[o++] = buffer[i];
+      for (i=0; i < (1 << 21); ++i)
+         buffer[o++] = 4;
+      assert(o < len);
+      stb_compress_window(1 << 24);
+      i = test_compression(buffer, len);
+      c(n >= 0, "stb_compress() 6");
+      free(buffer);
+   }
+
+   #ifdef STB_THREADS
+   stb_thread_cleanup();
+   #endif
+   stb_ischar(0,NULL);
+   stb_wrapper_listall(dumpfunc);
+   printf("Memory still in use: %d allocations of %d bytes.\n", alloc_num, alloc_size);
+
+   // force some memory checking
+   for (n=1; n < 20; ++n)
+      malloc(1 << n);
+
+   printf("Finished stb.c with %d errors.\n", count);
+
+   #ifdef _MSC_VER
+   if (count)
+      __asm int 3;
+   #endif
+
+   return 0;
+}
+
+#endif
+
+
+
+
+
+// NIST test vectors
+
+struct
+{
+   int length;
+   char *message;
+   char *digest;
+} sha1_tests[] =
+{
+{   24,
+"616263",
+"a9993e364706816aba3e25717850c26c9cd0d89d",
+},{
+   1304,
+"ec29561244ede706b6eb30a1c371d74450a105c3f9735f7fa9fe38cf67f304a5736a106e"
+"92e17139a6813b1c81a4f3d3fb9546ab4296fa9f722826c066869edacd73b25480351858"
+"13e22634a9da44000d95a281ff9f264ecce0a931222162d021cca28db5f3c2aa24945ab1"
+"e31cb413ae29810fd794cad5dfaf29ec43cb38d198fe4ae1da2359780221405bd6712a53"
+"05da4b1b737fce7cd21c0eb7728d08235a9011",
+"970111c4e77bcc88cc20459c02b69b4aa8f58217",
+},{
+   2096,
+"5fc2c3f6a7e79dc94be526e5166a238899d54927ce470018fbfd668fd9dd97cbf64e2c91"
+"584d01da63be3cc9fdff8adfefc3ac728e1e335b9cdc87f069172e323d094b47fa1e652a"
+"fe4d6aa147a9f46fda33cacb65f3aa12234746b9007a8c85fe982afed7815221e43dba55"
+"3d8fe8a022cdac1b99eeeea359e5a9d2e72e382dffa6d19f359f4f27dc3434cd27daeeda"
+"8e38594873398678065fbb23665aba9309d946135da0e4a4afdadff14db18e85e71dd93c"
+"3bf9faf7f25c8194c4269b1ee3d9934097ab990025d9c3aaf63d5109f52335dd3959d38a"
+"e485050e4bbb6235574fc0102be8f7a306d6e8de6ba6becf80f37415b57f9898a5824e77"
+"414197422be3d36a6080",
+   "0423dc76a8791107d14e13f5265b343f24cc0f19",
+},{
+   2888,
+"0f865f46a8f3aed2da18482aa09a8f390dc9da07d51d1bd10fe0bf5f3928d5927d08733d"
+"32075535a6d1c8ac1b2dc6ba0f2f633dc1af68e3f0fa3d85e6c60cb7b56c239dc1519a00"
+"7ea536a07b518ecca02a6c31b46b76f021620ef3fc6976804018380e5ab9c558ebfc5cb1"
+"c9ed2d974722bf8ab6398f1f2b82fa5083f85c16a5767a3a07271d67743f00850ce8ec42"
+"8c7f22f1cf01f99895c0c844845b06a06cecb0c6cf83eb55a1d4ebc44c2c13f6f7aa5e0e"
+"08abfd84e7864279057abc471ee4a45dbbb5774afa24e51791a0eada11093b88681fe30b"
+"aa3b2e94113dc63342c51ca5d1a6096d0897b626e42cb91761058008f746f35465465540"
+"ad8c6b8b60f7e1461b3ce9e6529625984cb8c7d46f07f735be067588a0117f23e34ff578"
+"00e2bbe9a1605fde6087fb15d22c5d3ac47566b8c448b0cee40373e5ba6eaa21abee7136"
+"6afbb27dbbd300477d70c371e7b8963812f5ed4fb784fb2f3bd1d3afe883cdd47ef32bea"
+"ea",
+   "6692a71d73e00f27df976bc56df4970650d90e45",
+},{
+   3680,
+"4893f1c763625f2c6ce53aacf28026f14b3cd8687e1a1d3b60a81e80fcd1e2b038f9145a"
+"b64a0718f948f7c3c9ac92e3d86fb669a5257da1a18c776291653688338210a3242120f1"
+"01788e8acc9110db9258b1554bf3d26602516ea93606a25a7f566c0c758fb39ecd9d876b"
+"c5d8abc1c3205095382c2474cb1f8bbdb45c2c0e659cb0fc703ec607a5de6bcc7a28687d"
+"b1ee1c8f34797bb2441d5706d210df8c2d7d65dbded36414d063c117b52a51f7a4eb9cac"
+"0782e008b47459ed5acac0bc1f20121087f992ad985511b33c866d18e63f585478ee5a5e"
+"654b19d81231d98683ae3f0533565aba43dce408d7e3c4c6be11d8f05165f29c9dcb2030"
+"c4ee31d3a04e7421aa92c3231a1fc07e50e95fea7389a5e65891afaba51cf55e36a9d089"
+"bf293accb356d5d06547307d6e41456d4ed146a056179971c56521c83109bf922866186e"
+"184a99a96c7bb96df8937e35970e438412a2b8d744cf2ad87cb605d4232e976f9f151697"
+"76e4e5b6b786132c966b25fc56d815c56c819af5e159aa39f8a93d38115f5580cda93bc0"
+"73c30b39920e726fe861b72483a3f886269ab7a8eefe952f35d25c4eb7f443f4f3f26e43"
+"d51fb54591e6a6dad25fcdf5142033084e5624bdd51435e77dea86b8",
+   "dc5859dd5163c4354d5d577b855fa98e37f04384",
+},{
+   4472,
+"cf494c18a4e17bf03910631471bca5ba7edea8b9a63381e3463517961749848eb03abefd"
+"4ce676dece3740860255f57c261a558aa9c7f11432f549a9e4ce31d8e17c79450ce2ccfc"
+"148ad904aedfb138219d7052088520495355dadd90f72e6f69f9c6176d3d45f113f275b7"
+"fbc2a295784d41384cd7d629b23d1459a22e45fd5097ec9bf65fa965d3555ec77367903c"
+"32141065fc24da5c56963d46a2da3c279e4035fb2fb1c0025d9dda5b9e3443d457d92401"
+"a0d3f58b48469ecb1862dc975cdbe75ca099526db8b0329b03928206f084c633c04eef5e"
+"8e377f118d30edf592504be9d2802651ec78aeb02aea167a03fc3e23e5fc907c324f283f"
+"89ab37e84687a9c74ccf055402db95c29ba2c8d79b2bd4fa96459f8e3b78e07e923b8119"
+"8267492196ecb71e01c331f8df245ec5bdf8d0e05c91e63bb299f0f6324895304dda721d"
+"39410458f117c87b7dd6a0ee734b79fcbe482b2c9e9aa0cef03a39d4b0c86de3bc34b4aa"
+"dabfa373fd2258f7c40c187744d237080762382f547a36adb117839ca72f8ebbc5a20a07"
+"e86f4c8bb923f5787698d278f6db0040e76e54645bb0f97083995b34b9aa445fc4244550"
+"58795828dd00c32471ec402a307f5aa1b37b1a86d6dae3bcbfbe9ba41cab0beeabf489af"
+"0073d4b3837d3f14b815120bc3602d072b5aeefcdec655fe756b660eba7dcf34675acbce"
+"317746270599424b9248791a0780449c1eabbb9459cc1e588bfd74df9b1b711c85c09d8a"
+"a171b309281947e8f4b6ac438753158f4f36fa",
+   "4c17926feb6e87f5bca7890d8a5cde744f231dab",
+},{
+   5264,
+"8236153781bd2f1b81ffe0def1beb46f5a70191142926651503f1b3bb1016acdb9e7f7ac"
+"ced8dd168226f118ff664a01a8800116fd023587bfba52a2558393476f5fc69ce9c65001"
+"f23e70476d2cc81c97ea19caeb194e224339bcb23f77a83feac5096f9b3090c51a6ee6d2"
+"04b735aa71d7e996d380b80822e4dfd43683af9c7442498cacbea64842dfda238cb09992"
+"7c6efae07fdf7b23a4e4456e0152b24853fe0d5de4179974b2b9d4a1cdbefcbc01d8d311"
+"b5dda059136176ea698ab82acf20dd490be47130b1235cb48f8a6710473cfc923e222d94"
+"b582f9ae36d4ca2a32d141b8e8cc36638845fbc499bce17698c3fecae2572dbbd4705524"
+"30d7ef30c238c2124478f1f780483839b4fb73d63a9460206824a5b6b65315b21e3c2f24"
+"c97ee7c0e78faad3df549c7ca8ef241876d9aafe9a309f6da352bec2caaa92ee8dca3928"
+"99ba67dfed90aef33d41fc2494b765cb3e2422c8e595dabbfaca217757453fb322a13203"
+"f425f6073a9903e2dc5818ee1da737afc345f0057744e3a56e1681c949eb12273a3bfc20"
+"699e423b96e44bd1ff62e50a848a890809bfe1611c6787d3d741103308f849a790f9c015"
+"098286dbacfc34c1718b2c2b77e32194a75dda37954a320fa68764027852855a7e5b5274"
+"eb1e2cbcd27161d98b59ad245822015f48af82a45c0ed59be94f9af03d9736048570d6e3"
+"ef63b1770bc98dfb77de84b1bb1708d872b625d9ab9b06c18e5dbbf34399391f0f8aa26e"
+"c0dac7ff4cb8ec97b52bcb942fa6db2385dcd1b3b9d567aaeb425d567b0ebe267235651a"
+"1ed9bf78fd93d3c1dd077fe340bb04b00529c58f45124b717c168d07e9826e33376988bc"
+"5cf62845c2009980a4dfa69fbc7e5a0b1bb20a5958ca967aec68eb31dd8fccca9afcd30a"
+"26bab26279f1bf6724ff",
+   "11863b483809ef88413ca9b0084ac4a5390640af",
+},{
+   6056,
+"31ec3c3636618c7141441294fde7e72366a407fa7ec6a64a41a7c8dfda150ca417fac868"
+"1b3c5be253e3bff3ab7a5e2c01b72790d95ee09b5362be835b4d33bd20e307c3c702aa15"
+"60cdc97d190a1f98b1c78e9230446e31d60d25155167f73e33ed20cea27b2010514b57ba"
+"b05ed16f601e6388ea41f714b0f0241d2429022e37623c11156f66dd0fa59131d8401dba"
+"f502cffb6f1d234dcb53e4243b5cf9951688821586a524848123a06afa76ab8058bcfa72"
+"27a09ce30d7e8cb100c8877bb7a81b615ee6010b8e0daced7cc922c971940b757a9107de"
+"60b8454dda3452e902092e7e06faa57c20aadc43c8012b9d28d12a8cd0ba0f47ab4b377f"
+"316902e6dff5e4f2e4a9b9de1e4359f344e66d0565bd814091e15a25d67d89cf6e30407b"
+"36b2654762bbe53a6f204b855a3f9108109e351825cf9080c89764c5f74fb4afef89d804"
+"e7f7d097fd89d98171d63eaf11bd719df44c5a606be0efea358e058af2c265b2da2623fd"
+"afc62b70f0711d0150625b55672060cea6a253c590b7db1427a536d8a51085756d1e6ada"
+"41d9d506b5d51bcae41249d16123b7df7190e056777a70feaf7d9f051fdbbe45cbd60fc6"
+"295dda84d4ebbd7284ad44be3ee3ba57c8883ead603519b8ad434e3bf630734a9243c00a"
+"a07366b8f88621ec6176111f0418c66b20ff9a93009f43432aaea899dad0f4e3ae72e9ab"
+"a3f678f140118eb7117230c357a5caa0fe36c4e6cf1957bbe7499f9a68b0f1536e476e53"
+"457ed826d9dea53a6ded52e69052faaa4d3927b9a3f9e8b435f424b941bf2d9cd6849874"
+"42a44d5acaa0da6d9f390d1a0dd6c19af427f8bb7c082ae405a8dd535dea76aa360b4faa"
+"d786093e113424bb75b8cc66c41af637a7b2acdca048a501417919cf9c5cd3b2fa668860"
+"d08b6717eea6f125fa1b0bae1dbb52aafce8ae2deaf92aeb5be003fb9c09fedbc286ffb5"
+"e16ad8e07e725faa46ebc35500cf205fc03250075ddc050c263814b8d16d141db4ca289f"
+"386719b28a09a8e5934722202beb3429899b016dfeb972fee487cdd8d18f8a681042624f"
+"51",
+   "f43937922444421042f76756fbed0338b354516f",
+},{
+   6848,
+"21b9a9686ec200456c414f2e6963e2d59e8b57e654eced3d4b57fe565b51c9045c697566"
+"44c953178f0a64a6e44d1b46f58763c6a71ce4c373b0821c0b3927a64159c32125ec916b"
+"6edd9bf41c3d80725b9675d6a97c8a7e3b662fac9dbcf6379a319a805b5341a8d360fe00"
+"5a5c9ac1976094fea43566d66d220aee5901bd1f2d98036b2d27eb36843e94b2e5d1f09c"
+"738ec826de6e0034cf8b1dca873104c5c33704cae290177d491d65f307c50a69b5c81936"
+"a050e1fe2b4a6f296e73549323b6a885c3b54ee5eca67aa90660719126b590163203909e"
+"470608f157f033f017bcf48518bf17d63380dabe2bc9ac7d8efe34aedcae957aeb68f10c"
+"8ad02c4465f1f2b029d5fbb8e8538d18be294394b54b0ee6e67a79fce11731604f3ac4f8"
+"d6ffa9ef3d2081f3d1c99ca107a7bf3d624324a7978ec38af0bcd0d7ee568572328b212b"
+"9dc831efb7880e3f4d6ca7e25f8e80d73913fb8edfffd758ae4df61b4140634a92f49314"
+"6138ebdcdaa083ea72d52a601230aa6f77874dcad9479f5bcac3763662cc30cb99823c5f"
+"f469dcbd64c028286b0e579580fd3a17b56b099b97bf62d555798f7a250e08b0e4f238c3"
+"fcf684198bd48a68c208a6268be2bb416eda3011b523388bce8357b7f26122640420461a"
+"bcabcb5004519adfa2d43db718bce7d0c8f1b4645c89315c65df1f0842e5741244bba3b5"
+"10801d2a446818635d0e8ffcd80c8a6f97ca9f878793b91780ee18eb6c2b99ffac3c38ef"
+"b7c6d3af0478317c2b9c421247eba8209ea677f984e2398c7c243696a12df2164417f602"
+"d7a1d33809c865b73397550ff33fe116166ae0ddbccd00e2b6fc538733830ac39c328018"
+"bcb87ac52474ad3cce8780d6002e14c6734f814cb551632bcc31965c1cd23d048b9509a4"
+"e22ab88f76a6dba209d5dd2febd1413a64d32be8574a22341f2a14e4bd879abb35627ef1"
+"35c37be0f80843006a7cc7158d2bb2a71bf536b36de20ca09bb5b674e5c408485106e6fa"
+"966e4f2139779b46f6010051615b5b41cda12d206d48e436b9f75d7e1398a656abb0087a"
+"a0eb453368fc1ecc71a31846080f804d7b67ad6a7aa48579c3a1435eff7577f4e6004d46"
+"aac4130293f6f62ae6d50c0d0c3b9876f0728923a94843785966a27555dd3ce68602e7d9"
+"0f7c7c552f9bda4969ec2dc3e30a70620db6300e822a93e633ab9a7a",
+   "5d4d18b24b877092188a44be4f2e80ab1d41e795",
+},{
+   7640,
+"1c87f48f4409c3682e2cf34c63286dd52701b6c14e08669851a6dc8fa15530ad3bef692c"
+"7d2bf02238644561069df19bdec3bccae5311fce877afc58c7628d08d32d9bd2dc1df0a6"
+"24360e505944219d211f33bff62e9ff2342ac86070240a420ccaf14908e6a93c1b27b6e2"
+"0324e522199e83692805cc4c7f3ea66f45a490a50d4dd558aa8e052c45c1a5dfad452674"
+"edc7149024c09024913f004ceee90577ff3eaec96a1eebbdc98b440ffeb0cad9c6224efc"
+"9267d2c192b53dc012fb53010926e362ef9d4238d00df9399f6cbb9acc389a7418007a6c"
+"a926c59359e3608b548bdeece213f4e581d02d273781dffe26905ec161956f6dfe1c008d"
+"6da8165d08f8062eea88e80c055b499f6ff8204ffdb303ab132d9b0cba1e5675f3525bbe"
+"4cf2c3f2b00506f58336b36aefd865d37827f2fad7d1e59105b52f1596ea19f848037dfe"
+"dc9136e824ead5505e2995d4c0769276548835430667f333fc77375125b29c1b1535602c"
+"10fe161864f49a98fc274ae7335a736be6bf0a98cd019d120b87881103f86c0a6efadd8c"
+"aa405b6855c384141b4f8751cc42dc0cb2913382210baaa84fe242ca66679472d815c08b"
+"f3d1a7c6b5705a3de17ad157522de1eb90c568a8a1fbcbb422cca293967bb14bfdd91bc5"
+"a9c4d2774dee524057e08f937f3e2bd8a04ced0fc7b16fb78a7b16ee9c6447d99e53d846"
+"3726c59066af25c317fc5c01f5dc9125809e63a55f1cd7bdf7f995ca3c2655f4c7ab940f"
+"2aa48bc3808961eb48b3a03c731ce627bb67dd0037206c5f2c442fc72704258548c6a9db"
+"e16da45e40da009dc8b2600347620eff8361346116b550087cb9e2ba6b1d6753622e8b22"
+"85589b90a8e93902aa17530104455699a1829efef153327639b2ae722d5680fec035575c"
+"3b48d9ec8c8e9550e15338cc76b203f3ab597c805a8c6482676deabc997a1e4ba857a889"
+"97ceba32431443c53d4d662dd5532aa177b373c93bf93122b72ed7a3189e0fa171dfabf0"
+"520edf4b9d5caef595c9a3a13830c190cf84bcf9c3596aadb2a674fbc2c951d135cb7525"
+"3ee6c59313444f48440a381e4b95f5086403beb19ff640603394931f15d36b1cc9f3924f"
+"794c965d4449bfbdd8b543194335bdf70616dc986b49582019ac2bf8e68cfd71ec67e0aa"
+"dff63db39e6a0ef207f74ec6108fae6b13f08a1e6ae01b813cb7ee40961f95f5be189c49"
+"c43fbf5c594f5968e4e820a1d38f105f2ff7a57e747e4d059ffb1d0788b7c3c772b9bc1f"
+"e147c723aca999015230d22c917730b935e902092f83e0a8e6db9a75d2626e0346e67e40"
+"8d5b815439dab8ccb8ea23f828fff6916c4047",
+   "32e0f5d40ceec1fbe45ddd151c76c0b3fef1c938",
+},{
+   8432,
+"084f04f8d44b333dca539ad2f45f1d94065fbb1d86d2ccf32f9486fe98f7c64011160ec0"
+"cd66c9c7478ed74fde7945b9c2a95cbe14cedea849978cc2d0c8eb0df48d4834030dfac2"
+"b043e793b6094a88be76b37f836a4f833467693f1aa331b97a5bbc3dbd694d96ce19d385"
+"c439b26bc16fc64919d0a5eab7ad255fbdb01fac6b2872c142a24aac69b9a20c4f2f07c9"
+"923c9f0220256b479c11c90903193d4e8f9e70a9dbdf796a49ca5c12a113d00afa844694"
+"de942601a93a5c2532031308ad63c0ded048633935f50a7e000e9695c1efc1e59c426080"
+"a7d1e69a93982a408f1f6a4769078f82f6e2b238b548e0d4af271adfa15aa02c5d7d7052"
+"6e00095ffb7b74cbee4185ab54385f2707e8362e8bd1596937026f6d95e700340b6338ce"
+"ba1ee854a621ce1e17a016354016200b1f98846aa46254ab15b7a128b1e840f494b2cdc9"
+"daccf14107c1e149a7fc27d33121a5cc31a4d74ea6945816a9b7a83850dc2c11d26d767e"
+"ec44c74b83bfd2ef8a17c37626ed80be10262fe63cf9f804b8460c16d62ae63c8dd0d124"
+"1d8aaac5f220e750cb68d8631b162d80afd6b9bf929875bf2e2bc8e2b30e05babd8336be"
+"31e41842673a66a68f0c5acd4d7572d0a77970f42199a4da26a56df6aad2fe420e0d5e34"
+"448eb2ed33afbfb35dffaba1bf92039df89c038bae3e11c02ea08aba5240c10ea88a45a1"
+"d0a8631b269bec99a28b39a3fc5b6b5d1381f7018f15638cc5274ab8dc56a62b2e9e4fee"
+"f172be20170b17ec72ff67b81c15299f165810222f6a001a281b5df1153a891206aca89e"
+"e7baa761a5af7c0493a3af840b9219e358b1ec1dd301f35d4d241b71ad70337bda42f0ea"
+"dc9434a93ed28f96b6ea073608a314a7272fefd69d030cf22ee6e520b848fa705ed6160f"
+"e54bd3bf5e89608506e882a16aced9c3cf80657cd03749f34977ced9749caa9f52b683e6"
+"4d96af371b293ef4e5053a8ea9422df9dd8be45d5574730f660e79bf4cbaa5f3c93a79b4"
+"0f0e4e86e0fd999ef4f26c509b0940c7a3eaf1f87c560ad89aff43cd1b9d4863aa3ebc41"
+"a3dd7e5b77372b6953dae497fc7f517efe99e553052e645e8be6a3aeb362900c75ce712d"
+"fcba712c4c25583728db9a883302939655ef118d603e13fcf421d0cea0f8fb7c49224681"
+"d013250defa7d4fd64b69b0b52e95142e4cc1fb6332486716a82a3b02818b25025ccd283"
+"198b07c7d9e08519c3c52c655db94f423912b9dc1c95f2315e44be819477e7ff6d2e3ccd"
+"daa6da27722aaadf142c2b09ce9472f7fd586f68b64d71fc653decebb4397bf7af30219f"
+"25c1d496514e3c73b952b8aa57f4a2bbf7dcd4a9e0456aaeb653ca2d9fa7e2e8a532b173"
+"5c4609e9c4f393dd70901393e898ed704db8e9b03b253357f333a66aba24495e7c3d1ad1"
+"b5200b7892554b59532ac63af3bdef590b57bd5df4fbf38d2b3fa540fa5bf89455802963"
+"036bd173fe3967ed1b7d",
+   "ee976e4ad3cad933b283649eff9ffdb41fcccb18",
+},{
+   9224,
+"bd8320703d0cac96a96aeefa3abf0f757456bf42b3e56f62070fc03e412d3b8f4e4e427b"
+"c47c4600bb423b96de6b4910c20bc5c476c45feb5b429d4b35088813836fa5060ceb26db"
+"bb9162e4acd683ef879a7e6a0d6549caf0f0482de8e7083d03ed2f583de1b3ef505f4b2c"
+"cd8a23d86c09d47ba05093c56f21a82c815223d777d0cabb7ee4550423b5deb6690f9394"
+"1862ae41590ea7a580dda79229d141a786215d75f77e74e1db9a03c9a7eb39eb35adf302"
+"5e26eb31ca2d2ca507edca77d9e7cfcfd136784f2117a2afafa87fa468f08d07d720c933"
+"f61820af442d260d172a0a113494ca169d33a3aeaacdcc895b356398ed85a871aba769f6"
+"071abd31e9f2f5834721d0fef6f6ee0fc0e38760b6835dfcc7dbefb592e1f0c3793af7ad"
+"f748786d3364f3cfd5686b1a18711af220e3637d8fad08c553ce9d5dc1183d48e8337b16"
+"1fe69b50e1920316dbffec07425b5d616a805a699576590e0939f5c965bce6c7342d314a"
+"c37b9c4d30166567c4f633f182de4d6b00e20a1c762789f915eaa1c89ac31b85222b1f05"
+"403dedd94db9ce75ff4e49923d1999d032695fa0a1c595617830c3c9a7ab758732fcec26"
+"85ae14350959b6a5f423ef726587e186b055a8daf6fa8fdefa02841b2fdbca1616dcee78"
+"c685fc6dcc09f24a36097572eba3c37a3eabe98bc23836085f63ef71a54b4488615d83b2"
+"6ed28c9fce78852df9b6cf8a75ca3899a7567298e91bc4ffdd04ffab0066b43b8286a4bb"
+"555c78808496b252c6e0e4d153631f11f68baf88630e052acc2af5d2af2e22e4f23bb630"
+"314c561a577455f86b6727bcad3c19d3e271404dec30af3d9dd0ed63cd9fa708aadfa12a"
+"500ef2d99a6b71e137b56ba90036975b88004b45f577ef800f0fb3cf97577dc9da37253b"
+"8675e5c8bb7e0bd26564f19eca232fb25f280f82e014424c9fbdd1411d7556e5d7906bb8"
+"62206316ba03385cd820c54c82ed35b36735bc486b1885d84053eba036c1ebfb5422d93d"
+"a71c53deda7f74db07cd4959cdfa898ba37080d76b564d344b124dd7b80cd70ed3b52a6c"
+"f9c9a32695d134bd39eb11ddeecdac86c808e469bd8a7995b667c452e7d9a54d5c85bcf6"
+"d5ffdc27d491bc06f438f02c7cf018073431587c78ba08d18a8daccb2d3b26136f612ade"
+"c673f3cd5eb83412b29652d55a10d0d6238d0b5365db272c917349450aff062c36191cfc"
+"d45660819083f89cd42ecae9e26934a020cafeb9b2b68d544edf59574c0ca159fd195dbf"
+"3e3e74244d942fffdbd4ed7f626219bab88b5a07e50b09a832d3e8ad82091114e54f2c35"
+"6b48e55e36589ebad3ac6077cb7b1827748b00670df65bbf0a2e65caad3f8a97d654d64e"
+"1c7dad171cafbc37110d2f7ca66524dc08fe60593e914128bd95f41137bfe819b5ca835f"
+"e5741344b5c907ce20a35f4f48726141c6398e753ed9d46d3692050628c78859d5014fe4"
+"dd3708e58d4d9807f8dac540492e32fa579491717ad4145c9efc24cf95605660b2e09b89"
+"9369b74d3ebff41e707917ff314d93e6ac8dfd643ef2c087cd9912005b4b2681da01a369"
+"42a756a3e22123cbf38c429373c6a8663130c24b24b2690b000013960b1c46a32d1d5397"
+"47",
+   "2df09b10933afedfcd3f2532dfd29e7cb6213859",
+},{
+   10016,
+"7a94978bec7f5034b12c96b86498068db28cd2726b676f54d81d8d7350804cc106bead8a"
+"252b465a1f413b1c41e5697f8cece49ec0dea4dfb9fa7b1bfe7a4a00981875b420d094bb"
+"1ce86c1b8c2e1dbebf819c176b926409fdec69042e324e71d7a8d75006f5a11f512811fe"
+"6af88a12f450e327950b18994dfc3f740631beda6c78bca5fe23d54e6509120e05cd1842"
+"d3639f1466cf26585030e5b4aefe0404fe900afc31e1980f0193579085342f1803c1ba27"
+"0568f80eaf92440c4f2186b736f6ab9dc7b7522ccdcfc8cf12b6375a2d721aa89b5ef482"
+"112a42c31123aebabcb485d0e72d6b6b70c44e12d2da98d1f87fa9df4f37847e1ffec823"
+"1b8be3d737d282ddb9cc4b95937acfa0f028ba450def4d134a7d0fc88119bf7296e18cd4"
+"4f56890b661b5b72ddfa34c29228067e13caf08eb3b7fd29de800df9a9ae137aad4a81a4"
+"16a301c9f74b66c0e163e243b3187996b36eb569de3d9c007d78df91f9b554eef0eaa663"
+"88754ce20460b75d95e2d0747229a1502a5652cf39ca58e1daa0e9321d7ab3093981cd70"
+"23a7ee956030dd70177028a66ad619ad0629e631f91228b7c5db8e81b276d3b168c1edb1"
+"bc0888d1cbcbb23245c2d8e40c1ff14bfe13f9c70e93a1939a5c45eef9351e795374b9e1"
+"b5c3a7bd642477ba7233e1f590ab44a8232c53099a3c0a6ffe8be8b7ca7b58e6fedf700f"
+"6f03dd7861ee1ef857e3f1a32a2e0baa591d0c7ca04cb231cc254d29cda873f00d68f465"
+"00d6101cfdc2e8004c1f333d8007325d06ffe6b0ff7b80f24ba51928e65aa3cb78752028"
+"27511207b089328bb60264595a2cebfc0b84d9899f5eca7ea3e1d2f0f053b4e67f975500"
+"7ff3705ca4178ab9c15b29dd99494135f35befbcec05691d91f6361cad9c9a32e0e65577"
+"f14d8dc66515081b51d09e3f6c25eea868cf519a83e80c935968cae6fce949a646ad53c5"
+"6ee1f07dda23daef3443310bc04670afedb1a0132a04cb64fa84b4af4b3dc501044849cd"
+"dd4adb8d733d1eac9c73afa4f7d75864c87787f4033ffe5ba707cbc14dd17bd1014b8b61"
+"509c1f55a25cf6c0cbe49e4ddcc9e4de3fa38f7203134e4c7404ee52ef30d0b3f4e69bcc"
+"7d0b2e4d8e60d9970e02cc69d537cfbc066734eb9f690a174e0194ca87a6fadad3883d91"
+"6bd1700a052b26deee832701590d67e6f78938eac7c4beef3061a3474dd90dd588c1cd6e"
+"6a4cda85b110fd08a30dcd85a3ebde910283366a17a100db920885600db7578be46bcfa6"
+"4765ba9a8d6d5010cb1766d5a645e48365ed785e4b1d8c7c233c76291c92ef89d70bc77f"
+"bf37d7ce9996367e5b13b08242ce73971f1e0c6ff2d7920fb9c821768a888a7fe0734908"
+"33efb854cbf482aed5cb594fb715ec82a110130664164db488666d6198279006c1aa521f"
+"9cf04250476c934eba0914fd586f62d6c5825b8cf82cd7ef915d93106c506ea6760fd8b0"
+"bf39875cd1036b28417de54783173446026330ef701c3a6e5b6873b2025a2c1666bb9e41"
+"a40adb4a81c1052047dabe2ad092df2ae06d6d67b87ac90be7d826ca647940c4da264cad"
+"43c32a2bb8d5e27f87414e6887561444a80ed879ce91af13e0fbd6af1b5fa497ad0cbd2e"
+"7f0f898f52f9e4710de2174e55ad07c45f8ead3b02cac6c811becc51e72324f2439099a0"
+"5740090c1b165ecae7dec0b341d60a88f46d7ad8624aac231a90c93fad61fcfbbea12503"
+"59fcd203862a6b0f1d71ac43db6c58a6b60c2c546edc12dd658998e8",
+   "f32e70862a16e3e8b199e9d81a9949d66f812cad",
+},{
+   10808,
+"88dd7f273acbe799219c23184782ac0b07bade2bc46b4f8adbd25ed3d59c0fd3e2931638"
+"837d31998641bbb7374c7f03d533ca60439ac4290054ff7659cc519bdda3dff2129a7bdb"
+"66b3300068931ade382b7b813c970c8e15469187d25cb04b635403dc50ea6c65ab38a97c"
+"431f28a41ae81c16192bd0c103f03b8fa815d6ea5bf0aa7fa534ad413b194eb12eb74f5d"
+"62b3d3a7411eb8c8b09a261542bf6880acbdfb617a42e577009e482992253712f8d4c8bd"
+"1c386bad068c7aa10a22111640041f0c35dabd0de00ebf6cd82f89cbc49325df12419278"
+"ec0d5ebb670577b2fe0c3e0840c5dd0dc5b3da00669eed8ead380f968b00d42f4967faec"
+"c131425fce1f7edb01cbec7e96d3c26fa6390a659e0ab069ef3edadc07e077bb816f1b22"
+"98830a0fe2b393693bb79f41feca89577c5230e0a6c34b860dc1fdb10d85aa054481082c"
+"494779d59ba798fcd817116c3059b7831857d0364352b354ce3b960fbb61a1b8a04d47ca"
+"a0ead52a9bea4bada2646cdbaec211f391dac22f2c5b8748e36bfc3d4e8ea45131ca7f52"
+"af09df21babe776fcecbb5c5dfa352c790ab27b9a5e74242bbd23970368dbefd7c3c74d1"
+"61ae01c7e13c65b415f38aa660f51b69ea1c9a504fe1ad31987cb9b26a4db2c37d7b326c"
+"50dbc8c91b13925306ff0e6098532dee7282a99c3ddf99f9e1024301f76e31e58271870b"
+"d94b9356e892a6a798d422a48c7fd5b80efe855a4925cc93b8cf27badec5498338e2b538"
+"70758b45d3e7a2fa059ed88df320a65e0a7cf87fa7e63b74cea1b7371e221f8004726642"
+"30d4d57945a85b23d58f248c8cd06ccfabfa969ab8cb78317451fab60e4fdfa796e2e2a8"
+"b46405839a91266d37e8d38bae545fb4060c357923b86d62f5d59d7bef5af20fbb9c7fb4"
+"2c6fd487748ed3b9973dbf4b1f2c9615129fa10d21cc49c622842c37c01670be71715765"
+"a98814634efbdee66bf3420f284dbd3efafc8a9117a8b9a72d9b81aa53ded78c409f3f90"
+"bad6e30d5229e26f4f0cea7ee82c09e3b60ec0e768f35a7fb9007b869f9bfc49c518f648"
+"3c951d3b6e22505453266ec4e7fe6a80dbe6a2458a1d6cd93044f2955607412091009c7d"
+"6cd81648a3b0603c92bfdff9ec3c0104b07ed2105962ca7c56ede91cb932073c337665e2"
+"409387549f9a46da05bc21c5126bd4b084bc2c06ab1019c51df30581aa4464ab92978c13"
+"f6d7c7ac8d30a78f982b9a43181bbe3c3eb9f7a1230b3e53b98a3c2a028317827fbe8cf6"
+"ec5e3e6b2a084d517d472b25f72fab3a34415bba488f14e7f621cfa72396ba40890e8c60"
+"b04815601a0819c9bebc5e18b95e04be3f9c156bd7375d8cc8a97c13ce0a3976123419fa"
+"592631317ca638c1182be06886f9663d0e8e6839573df8f52219eeb5381482a6a1681a64"
+"173660bfbb6d98bf06ee31e601ee99b4b99b5671ed0253260b3077ed5b977c6a79b4ff9a"
+"08efd3cba5c39bec1a1e9807d40bbf0c988e0fd071cf2155ed7b014c88683cd869783a95"
+"4cbfced9c0e80c3a92d45b508985cbbc533ba868c0dc4f112e99400345cf7524e42bf234"
+"5a129e53da4051c429af2ef09aba33ae3c820ec1529132a203bd2b81534f2e865265f55c"
+"9395caf0e0d3e1762c95eaaec935e765dc963b3e0d0a04b28373ab560fa9ba5ca71ced5d"
+"17bb8b56f314f6f0d0bc8104b3f1835eca7eaac15adf912cf9a6945cfd1de392342dd596"
+"d67e7ffcb7e086a6c1ea318aa2e0c2b5c2da079078232c637de0d317a1f26640bc1dac5b"
+"e8699b53edc86e4bfdfaf797a2ae350bf4ea29790face675c4d2e85b8f37a694c91f6a14"
+"1fd561274392ee6ee1a14424d5c134a69bcb4333079400f03615952fc4c99bf03f5733a8"
+"dc71524269fc5c648371f5f3098314d9d10258",
+   "08632c75676571a5db5971f5d99cb8de6bf1792a",
+},{
+   11600,
+"85d43615942fcaa449329fd1fe9efb17545eb252cac752228f1e9d90955a3cf4e72cb116"
+"3c3d8e93ccb7e4826206ff58b3e05009ee82ab70943db3f18a32925d6d5aed1525c91673"
+"bd33846571af815b09bb236466807d935b5816a8be8e9becbe65d05d765bcc0bc3ae66c2"
+"5320ebe9fff712aa5b4931548b76b0fd58f6be6b83554435587b1725873172e130e1a3ca"
+"3d9d0425f4632d79cca0683780f266a0633230e4f3b25f87b0c390092f7b13c66ab5e31b"
+"5a58dbcac8dd26a0600bf85507057bb36e870dfae76da8847875a1a52e4596d5b4b0a211"
+"2435d27e1dc8dd5016d60feaf2838746d436a2983457b72e3357059b2bf1e9148bb0551a"
+"e2b27d5a39abd3d1a62c36331e26668e8baabc2a1ef218b5e7a51a9ca35795bcd54f403a"
+"188eafafb30d82896e45ddaea4f418629a1fb76a0f539c7114317bac1e2a8fba5a868bce"
+"40abd40f6b9ced3fa8c0329b4de5ca03cc84d75b8746ef31e6c8d0a0a79b4f747690928e"
+"be327f8bbe9374a0df4c39c845bf3322a49fda9455b36db5a9d6e4ea7d4326cf0e0f7cd8"
+"0ff74538f95cec01a38c188d1243221e9272ccc1053e30787c4cf697043cca6fc3730d2a"
+"431ecbf60d73ee667a3ab114c68d578c66dc1c659b346cb148c053980190353f6499bfef"
+"acfd1d73838d6dc1188c74dd72b690fb0481eee481a3fd9af1d4233f05d5ae33a7b10d7d"
+"d643406cb1f88d7dd1d77580dcbee6f757eeb2bfbcc940f2cddb820b2718264b1a64115c"
+"b85909352c44b13d4e70bbb374a8594d8af7f41f65b221bf54b8d1a7f8f9c7da563550cb"
+"2b062e7a7f21d5e07dd9da8d82e5a89074627597551c745718094c2eb316ca077526d27f"
+"9a589c461d891dc7cd1bc20ba3f464da53c97924219c87a0f683dfb3b3ac8793c59e78ac"
+"fac109439221ac599a6fd8d2754946d6bcba60784805f7958c9e34ff287ad1dbbc888848"
+"fa80cc4200dbb8c5e4224535906cbffdd0237a77a906c10ced740f9c0ce7821f2dbf8c8d"
+"7d41ecfcc7dfdc0846b98c78b765d01fb1eb15ff39149ab592e5dd1152665304bba85bbf"
+"4705751985aaaf31245361554d561a2337e3daeef58a826492fd886d5f18ef568c1e772e"
+"f6461170407695e3254eb7bf0c683811ddde5960140d959114998f08bdb24a104095987d"
+"3255d590e0dbd41ae32b1ae4f4ea4a4f011de1388034231e034756870c9f2d9f23788723"
+"27055a7de2b5e931dfb53e7780b6d4294bf094e08567025b026db9203b681565a1d52f30"
+"318d0ebe49471b22ba5fd62e1ed6c8966c99b853c9062246a1ace51ef7523c7bf93bef53"
+"d8a9cb96d6a04f0da1eca888df66e0380a72525a7ecc6115d08569a66248f6ba34e2341b"
+"fd01a78f7b3c1cfe0754e0d26cba2fa3f951ef14d5749ff8933b8aba06fa40fb570b467c"
+"54ce0d3f0bed21e998e5a36b3bc2f9e1ae29c4bab59c121af6fad67c0b45959cd6a86194"
+"14b90b4535fb95f86ca7e64502acc135eff4f8a3abe9dde84238fab7a7d402454a3f07ad"
+"ec05ec94b2891e0879037fae6acaa31dcecf3f85236ade946f5ad69ad4077beb65099285"
+"38ee09f2bc38e5704da67b5006b5e39cd765aafcd740c7dadb99d0c547126e1324610fcb"
+"7353dac2c110e803fca2b17485b1c4b78690bc4f867e6f043b2568889f67985a465a48eb"
+"ee915200589e915756d4968d26529c3ffe3dbe70e84c682ad08a0c68db571634fbb0210d"
+"c1b16b8b725886465c8c51f36a5e27d0f78e5643e051d3bddd512ce511f6bdf3dfe42759"
+"00c5fea9d248c2b3f36911ed0ff41a19f6445521f251724657ea8f795b3ead0928a1657f"
+"308dd7c7c1e7e490d9849df43becfa5cc25ed09ef614fd69ddc7e5e3147623901d647876"
+"fb60077ffc48c51ed7d02b35f6802e3715fc708a0c88b82fe9cba0a442d38d09ca5ae483"
+"21487bdef1794e7636bf7457dd2b51a391880c34d229438347e5fec8555fe263f08ba87b"
+"b16dcde529248a477628067d13d0cb3bf51776f4d39fb3fbc5f669e91019323e40360e4b"
+"78b6584f077bf9e03b66",
+   "ab7213f6becb980d40dc89fbda0ca39f225a2d33",
+},{
+   12392,
+"7ae3ca60b3a96be914d24980fb5652eb68451fed5fa47abe8e771db8301fbd5331e64753"
+"93d96a4010d6551701e5c23f7ecb33bec7dd7bade21381e9865d410c383a139cb4863082"
+"8e9372bd197c5b5788b6599853e8487bddfd395e537772fdd706b6a1de59c695d63427da"
+"0dc3261bce2e1ae3cd6de90ec45ecd7e5f14580f5672b6ccd8f9336330dffcd6a3612a74"
+"975afc08fb136450e25dc6b071ddfc28fca89d846c107fd2e4bd7a19a4ff6f482d62896d"
+"a583c3277e23ab5e537a653112cdf2306043b3cc39f5280bd744fe81d66f497b95650e7d"
+"dfd704efcb929b13e00c3e3a7d3cd53878af8f1506d9de05dba9c39a92604b394ea25acb"
+"a2cda7b4ae8b08098ba3f0fdea15359df76517be84377f33631c844313ac335aa0d590fe"
+"c472d805521f0905d44ca40d7391b292184105acd142c083761c1a038c4f5ed869ea3696"
+"99592a37817f64cb4205b66be1f1de6fa47a08e1bf1a94312fe61a29e71bab242af95a7b"
+"38d4fb412c682b30256d91e2a46b634535d02b495240cbdb842cbe17cba6a2b94073f3d5"
+"f9621ac92ddda66f98bed997216466b4bb0579d58945f8d7450808d9e285d4f1709d8a1d"
+"416aa57d4a1a72bfcbfecdda33de2cff3e90e0cc60c897c4663224fc5bbe8316a83c1773"
+"802837a57bc7e9238173ed41ea32fe5fe38e546014a16d5e80700d9bac7a84bb03902f31"
+"79e641f86f6bc383d656daf69801499633fb367ea7593195934c72bc9bf9624c0c845ebf"
+"c36eb7ad4b22fdfb45ca7d4f0d6708c69a21f6eaa6db6bde0f0bd9dc7ec9c6e24626d0a7"
+"8fbeeed4b391f871e80e6a9d207165832d4ff689296f9bca15dc03c7c0381659ea5335eb"
+"aafdc3e50d18e46b00f1844870d09c25afcdb0ff1ae69dd8f94f91aca6095ba6f2b6e594"
+"c4acfe9903485d21b684e31a6acc2162d40e1a7bb8114a860a07e76f5265666555f2418d"
+"f11ef8f7499656d12215f5da8d7d041ac72648d15d7661ad93b24f3f071334b0921d5bb0"
+"6f2c7ab09f5034518b5ae21cec379373e87d51c77d44a70c2337606aadeb9036716fd920"
+"a824e7ae18ce3de9f0ec3456f3454027d8c476b3f1854b240c309f6f9786fa8a073915d9"
+"7a019ce99aec3260c5f6b6346cd9c41cb9267f4475958e45289965548238c6b9f91a8784"
+"b4e0957ba8b73956012c9a2fc3428434b1c1679f6ed2a3e8e2c90238df428622046f668e"
+"e2b053f55e64ffd45600e05a885e3264af573bacee93d23d72a0222b5442ac80bc0a8b79"
+"4c2afcf3bc881d20c111f57e3450b50a703f3db1fc5de2076a006f3b7eed694b93269874"
+"3b03c2ed2684bad445e69a692e744c7ac3a04f1e0e52b7a6708076d1fbffdb3f1c995828"
+"7d5f884e29407030f2db06811092efd80ae08da9daec39744c5ecd3ca771663b8f4968d4"
+"2a88c2c9821c73ae2a5a4d9e2551f82c03583b9c4dea775423b4748d24eb604e8ee3159b"
+"a6de9bea5b22eed6264e011734ed02b2c74ce06dda890b8604ed7ba49e7bf30e28c9871b"
+"e90f5cead67eaf52b5d3181c822b10701219b28ef6f6bebfa278e38acf863e2a1d4b1e40"
+"fd8a0ac6ce31054446301046148bf10dc3ae3385e2026e7762bdc8003ffebc4263191a59"
+"c72f4f90db03e7d52808506b33bfe1dfa53f1a3daa152e83974fbe56cfd4e8f4e7f7806a"
+"084b9d0795b858100eced0b5355a72446f37779d6c67ade60a627b8077ae1f3996b03bc3"
+"a5c290651c8609f0d879fbf578cbab35086e1159dd6ddbe3bf7fb5654edcc8f09e4f80d0"
+"258c9376d7c53fb68f78d333b18b70170d9a11070790c956f5744c78c986b1baf08b7631"
+"7a65c5f07ae6f57eb0e65488659324d29709e3735623d0426e90aa8c4629bb080881150c"
+"02be1c004da84414ac001c2eb6138c26388f5a36d594f3acef0e69e2cb43b870efa84da0"
+"cff9c923a9880202aed64ad76260f53c45bb1584b3e388a909d13586094b924680006a1d"
+"25d4dd36c579a8ec9d3fa63c082d977a5a5021440b5314b51850f2daa6e6af6ae88cb5b1"
+"44242bceb1d4771e641101f8abfc3a9b19f2de64e35e76458ad22072ba57925d73015de5"
+"66c66fcaa28fdc656f90de967ad51afd331e246d74ed469d63dd7d219935c59984bd9629"
+"09d1af296eb3121d782650e7d038063bab5fa854aac77de5ffebeb53d263f521e3fc02ac"
+"70",
+   "b0e15d39025f5263e3efa255c1868d4a37041382",
+},{
+   13184,
+"fa922061131282d91217a9ff07463843ae34ff7f8c28b6d93b23f1ea031d5020aa92f660"
+"8c3d3df0ee24a8958fd41af880ee454e36e26438defb2de8f09c018607c967d2f0e8b80a"
+"00c91c0eabe5b4c253e319b45e6106ff8bf0516f866020e5ba3f59fd669c5aeff310ebb3"
+"85007069d01c64f72d2b02f4ec0b45c5ecf313056afcb52b17e08b666d01fecc42adb5b4"
+"9ea00c60cacac2e0a953f1324bdd44aec00964a22a3cb33916a33da10d74ec6c6577fb37"
+"5dc6ac8a6ad13e00cba419a8636d4daac8383a2e98fe90790cde7b59cfaa17c410a52abc"
+"d68b127593d2fcbafd30578d195d890e981ae09e6772cb4382404a4e09f1a33c958b57db"
+"ccee54ae335b6c91443206a0c100135647b844f226417a1f70317fd350d9f3789d81894a"
+"aff4730072401aaeb8b713ead4394e2e64b6917d6eee2549af7bd0952f12035719065320"
+"ca0d2dfe2847c6a2357c52bee4a676b12bafff66597bd479aa29299c1896f63a7523a85a"
+"b7b916c5930ab66b4d191103cefc74f2f7e0e96e354f65e355ae43959a0af1880d14ea9d"
+"1569e4fd47174aba7f5decb430b3f6baf80a1ef27855227b62487250d3602970e423423c"
+"7ca90920685bcf75adfbe2a61ce5bd9228947b32f567927cb1a5bd8727c03aef91d6367b"
+"ae7d86fd15c0977ac965a88b0d7236037aefb8d24eec8d2a07c633e031a7b9147c4c7714"
+"110bfc7e261448a5d0f73c3619664e1c533c81a0acbf95d502227a33f84f0b8249e3f9fa"
+"5c7905a8192b7313fc56bb20679e81333d32c797ac5162204a0eaa0e64507635921c485b"
+"8f17c4e2484667a733197529e2a833eed83c57229b11bd820b5a5b78f1867787dbc217ea"
+"28bfba785fb545cbc5a840a12eea428213e1aaa4e50a900ba13efcf4a5345574c2481c5d"
+"927ada610bba567a55630c89d905db3d9b67fe36c9cc3d6a947664c83e69f51c74711a33"
+"df66dd3ff6af9b7c1605b614d4798b4192b9a4b1508f2e2ec5aaad7eaea1ee8867353db9"
+"b8d7d9a6f16aa5f339492073238c979082879aee7f94ac4be8133eaacbaedfb044e2ad4e"
+"93ba0fa071dea615a5cd80d1d2678f4f93ae5a4bc9cdf3df345a29ec41d8febb23805ce4"
+"2541036f3f05c63ae736f79a29802045fad9f370cabf843458c1b636ca41f387fd7821c9"
+"1abbd1946afcb9186b936403233f28a5b467595131a6bc07b0873e51a08de66b5d7709a6"
+"02c1bd0e7f6e8f4beb0579c51bda0e0c738ef876fcd9a40ab7873c9c31c1d63a588eebc7"
+"8d9a0ae6fa35cd1a269e0d2bc68252cbd7c08d79e96f0aa6be22a016136a2b8abe9d3c9c"
+"f9d60eeafe3dbc76d489b24d68c36167df4c38cf2b21cf03dc5e659e39018c3490f1237e"
+"ca3f85b742ab0045d86a899c4126ad60a147cbc95b71814c274d6478668df41eb32acfb4"
+"bbf024fb4e3d6be0b60653a0471afc3037ab67dcb00a2b2e24b26911e1880136e56106b7"
+"f3c570fbe6f311d94624cb001914ff96fbbf481f71686aa17be0850568058fc1ee8900b4"
+"7af5cf51c5ed9e00a8b532c131f42513f6b8df14a9bbc2e9ede5a560681184d41a147552"
+"edfbdef98d95e6a7793229d25ba9b0b395a020aa1c0731de89e662246d59ec22e5d8f4b4"
+"6fbc048efcffbc234744c5c66417070f9c751c81788f04691ccb1a09d60c46f6f73375bf"
+"e2e646cf6290069541a8dfe216374c925e94d06ece72e851e81d3e8acd011f82526c2f9f"
+"55955c6752dc10e93153ab58627e30fa2c573e4042954337982eec1f741be058c85bad86"
+"bf3a02ed96d3201dadd48bd4de8105200dfcbcc400c3c3dd717abfc562ebe338b14b1eb5"
+"ecbe9227661e49c58bf8233770d813faafc78b05711135adcc4ce4c65095ca0bdc1debc0"
+"b6e5d195dbc582ce94b3afa14a422edf9b69abd7ae869a78c3a26fb50ef7122ec5af8d0c"
+"78ef082ca114f8817c3d93b31809870caea2eb9533fa767c2954efb9ba07e4f1077e9f9b"
+"be845661eabea2c91079321477a7c167c7234528d63d6aabbe723e0e337b2e61138a310a"
+"3fd04368aa4215b7af9d0334a8a74681bcb86b4af87a0329a1ed9dc7c9aef14521785eda"
+"0eeb97bdff8c9945fd0ee04e84d0dae091a69c0bfcdcd4150878fed839c0db6565fc1fed"
+"0e7d6ae2efde7a59d58a9fb3b07e6f7cea51ba93f771c18b2eafa252d7fe171085776052"
+"a6a17e6858f0a20b7e8be54413523989bf20a028a84d9ce98b78e6ee0b8362df49de5344"
+"b409cc322354672a21ea383e870d047551a3af71aaf2f44f49a859cf001e61b592dd036f"
+"c6625bf7b91ea0fb78c1563cceb8c4345bf4a9fbe6ee2b6bf5e81083",
+   "8b6d59106f04300cb57c7c961945cd77f3536b0a",
+},{
+   13976,
+"162cca41155de90f6e4b76a34261be6666ef65bdb92b5831b47604ce42e0c6c8d2eda265"
+"ab9a3716809bf2e745e7831a41768d0f6349a268d9ac6e6adfb832a5d51b75d7951cf60e"
+"03d9e40de6d351f1f6ade5143531cf32839401ca6dfb9dc7473daa607aeb0c3d1e8eb3db"
+"cc2f1231ad1dd394d7eac9d8dab726b895b1ee774fdcabc8031063ecfa41c71a9f03ad23"
+"904cc056f17c76a1059c43faffe30dfd157fdfd7d792e162bf7a889109550a0fc4c41523"
+"2af0c0d72dcbc2595299e1a1c2aeae549f7970e994c15e0ab02f113d740d38c32a4d8ec0"
+"79cd099d37d954ab7ef2800902cdf7c7a19fb14b3c98aaf4c6ad93fe9a9bc7a61229828e"
+"55ad4d6270d1bdbca9975d450f9be91e5699bd7ee22e8c9c22e355cf1f6793f3551cb510"
+"c1d5cd363bdf8cab063e6e49a6383221f1188d64692c1f84c910a696de2e72fb9886193f"
+"61ab6b41ad0ea894d37ff1261bf1fd1f187e0d0c38ab223d99ec6d6b1e6b079fc305e24e"
+"2d9500c98676e2d587434495d6e107b193c06fb12d5d8eaa7b0c001d08f4d91cae5bdcae"
+"6624ee755e95ca8e4c5ef5b903d7f5ba438abeffd6f16d82d88138f157e7a50d1c91fb50"
+"c770f6d222fcbf6daf791b1f8379e3b157a3b496ddb2e71650c1c4ac4fc5f2aceb5b3228"
+"ffc44e15c02d4baa9434e60928a93f21bc91cfd3c2719f53a8c9bcd2f2dee65a8bbc88f9"
+"5d7ced211fc3b04f6e8b74eb2026d66fd57fa0cccea43f0a0381782e6fee5660afed674d"
+"cb2c28cb54d2bdbbaf78e534b0742ede6b5e659b46cd516d5362a194dd0822f6417935c4"
+"ff05815b118fe5687cd8b050240015cfe449d9dfde1f4fdb105586e429b2c1849aac2791"
+"ef73bc54603190eba39037ec057e784bb92d497e705dfcde2addb3514b4f1926f12d5440"
+"850935779019b23bd0f2977a8c9478c424a7eaaeec04f3743a77bee2bec3937412e707bc"
+"92a070046e2f9c35fe5cc3f755bbb91a182e683591ab7e8cff40633730546e81522f588f"
+"07bdf142b78e115d2a22d2eb5664fcdb7574c1ee5ba9abd307d7d29078cd5223c222fc69"
+"60324c40cc639be84dad96b01059efce7b08538ebef89bafab834609c7e82774a14e5be6"
+"62067edba6111efa8ae270f5066442b17e3f31a793581c8a3f96d92921ec26981594e28a"
+"08987d020b97ad2ba5c662836e35fd3fd954bcec52b579528913959d0d942fbf1c4b9910"
+"ba010c3700359a4eb7616541257f0f7727cc71b580cc903f718ecc408a315b6bbfa7f6e3"
+"beb9d258804bd2731ee2fb75e763281baf1effc4690a23d5f952ab5d4311d4f5885af2eb"
+"f27cad9f6d84692cb903064bbd11ca751f919b4811b7722c6ec80c360521e34d357b5c8b"
+"ba6d42e5c632730f53add99ab8aa9c607b6796216753086ede158bc670d04900aca66ce8"
+"357bd72d19fb147b5fde8ee4df6a0184573a2e65ba3fd3a0cb04dac5eb36d17d2f639a6e"
+"b602645f3ab4da9de4c9999d6506e8e242a5a3216f9e79a4202558ecdc74249ad3caaf90"
+"71b4e653338b48b3ba3e9daf1e51e49384268d63f37ce87c6335de79175cdf542d661bcd"
+"74b8f5107d6ab492f54b7c3c31257ecb0b426b77ed2e2ed22bbfdaf49653e1d54e5988fa"
+"d71397546f9955659f22b3a4117fc823a1e87d6fb6fb8ab7d302a1316975e8baf0c0adbd"
+"35455655f6a596b6ac3be7c9a8ea34166119d5e70dfbc1aa6e14ff98eff95e94ef576656"
+"5d368ec8857fb0b029bcb990d420a5ca6bc7ab08053eb4dbfc4612a345d56faefc5e03a4"
+"43520b224de776a5b618e1aa16edc513d5fcefcd413031b0ddc958a6fca45d108fbde065"
+"3cf2d11cb00a71cd35f57993875598b4e33e2384623a0986859105d511c717c21d6534bf"
+"69fd3d7cf1682e4fc25298d90df951e77a316996beac61bb7078988118c906548af92cfe"
+"72cd4b102ffad584e5e721a0cdb5621ed07dda8955d84bea57a5afa4ba06289ddfac3a9e"
+"765538fd9392fc7904cedb65e38cd90967f01845ff819777a22d199f608e62c13e6ba98b"
+"40824b38c784bdb41d62c4014fc7e8d93be52695e975e54d1ff92b412f451177143d74a6"
+"bde0ee53a986043ab465a1ef315ac4c538e775ef4178fde5f2ea560a364de18b8fe9578a"
+"ad80027c3fd32dcf0967d9d03789b1cdf19040762f626289cf3af8afe5a8e0a152d9258e"
+"981872c1ec95cd7f8d65812e55cb5cbd8db61b3f068a23d9652372dfbf18d43a663c5a0d"
+"026b0898e383ce5c95b0ba7fb5ed6b7304c7c9d3ba64f38d1dc579465148ccfa7271f2e3"
+"e0e97e9ddac7c0874f0f396cf07851638a734df393687b7b0343afd1652ff32a2da17b3a"
+"4c99d79c02256c73f32625527e5666594a8a42a12135eddb022e743371b3ab7b12ad6785"
+"7635eed03558ac673d17280769b2368056276d5d72f5dbc75525f8a7558bd90b544aa6cb"
+"dd964e6c70be79441969bfdf471f17a2dc0c92",
+   "6144c4786145852e2a01b20604c369d1b9721019",
+},{
+   14768,
+"c9bed88d93806b89c2d028866842e6542ab88c895228c96c1f9f05125f8697c7402538b0"
+"6465b7ae33daef847500f73d20c598c86e4804e633e1c4466e61f3ed1e9baadc5723bbed"
+"9455a2ff4f99b852cfe6aa3442852ade0b18e4995ddab4250928165a9441de108d4a293d"
+"1d95935de022aa17f366a31d4f4c4c54557a4235a9d56473444787ddc5c06c87087aef24"
+"fa8280b7ac74d76ba685e4be7dc705e5a8a97c6c8fbd201ee5bf522438d23371c60c155d"
+"93352f8fb8cc9421fe4b66ffabad46909c2c1099944fc55ed424c90aecca4f50d0331153"
+"2e2844c3ff8ecb495de7ab26941cbf177b79ad7b05f918b713c417da8cf6e67db0a2dcee"
+"a9179d8d636191759e13955f4244f0c4f2d88842e3015641ef0417d6e54144e8246e4591"
+"6823e2c6e39bfa3b90b97781c44981710689f2ce20e70a26760d65f9971b291e12338461"
+"8b3b56710dde2afaa2d46b0e2164d5c9482729350a0e256b2aa6b3fb099b618ebd7c11ca"
+"62bdf176b502aedfdf9be57a8e4adbca4a4d6d8407984af2f6635f95a1e4930e375eb53f"
+"245ab2ade5340c281bda87afded1268e537955c9819168bd60fd440533c75c9b1865e03f"
+"de3a301d165f97aa6da236cf39cf3e49512f6350224f8d76ff02d0d3b9a99e5f70b23b9f"
+"a85f72849fc98790df246c3a0f4437940e60d42b4317f72e2eb055d343a614f7f9648005"
+"1e4dff186dff476462d9ced24dbb82eaa60cbbf6a0026e64001da36d30f529f48f3688b1"
+"0ce9378ef3f50f5106e5007cd0eb037136254fda4f20d048769bd51a9d8d09a1e469a482"
+"6aa0e25b6267b5a96abcb6e919a362fdd7b683d2f2dcec40ee5969311c07f6066ee22f36"
+"89ca08381c85bea470040e9541e7a451cd43d62c2aa292a9dc4b95e3a7c4de2ba29663f3"
+"8d5002eb64ceba6934bb1b0e2e55fba7fa706b514ebeeae1be4dd882d6512da066246a05"
+"1d8bd042593bd0513e9cc47806ccdc7097e75bc75b8603834c85cd084e0ade3cc2c2b7e8"
+"586eac62249f9769f5bdcd50e24e515f257548762db9adf3ee0846d67cfcd723d85d9588"
+"09e6dd406f4c2637557c356fc52490a2a0763429ee298a1c72c098bb810e740c15faffc6"
+"1e80cf6e18f86dc0e29bc150ce43ca71f5729356cd966277fd8b32366f6263c3a761b13d"
+"544a631a25e1c4c8dea8d794abed47ccb4069d20f1dcb54e40a673ffb5f7b2eb31fb7d44"
+"36fd8252f92dc35bb9a18fc55099b17e0807e79caf4f9641ee4bbbc2d6922508bcfae236"
+"475bf78bc796548bc8d60659e816af68e5e43352fa64b5086c97c22c60ddcbbbefb9d9ef"
+"7cd57c64454604793910f4f90aedb4fb824a86061a93bb79c9b0272a1ad0d24e8165f099"
+"ef6f14a6a4fea09845f280022e061804090d7ab79f7bddcbef264b6f7d4e9971eddb9ca7"
+"d0e79a8dbe7cff2fa59f514a608d66ae8c44d5e69745aa1b19995e366812064567d3ca20"
+"9e12994c901d1b1f489be7253615f7c339b5581afd4d262e879ab8480ecb18990d3db61f"
+"96895dcde9c065e645f52baafefcbe34d072dba373fd1c786fd56c3f3284be7260eaff9a"
+"6a8348b762ed59e20ea443313b1164db53c3989c32fcae5b366f190b9548e8cff46df961"
+"350369b490354ed8e530a91f5072967eff45c63540862fb2deab02b3ae05deac65414368"
+"ac3549f277da92b692947de47cba9c1579526931e31c3490c1d3605f9bafcf468c2e9b47"
+"981407ea40b0b59754621943095a2d4f4ba266ac545fe7447e54f69555a7ac9ff1e8f001"
+"834fa65f2d4523061726e4d3bf4680519032dc21b7389e9f3229e4c2295d354482f8b803"
+"b06ca3a8cb3ff786e60f6bc59dd3a5bfed63b0aa493bab78e97bbefb6633534d84de826f"
+"4e2ccc3069050d50a2caace6c9de15ffc2656988d94b736e5688df0351a3a6a4c875cd99"
+"ef304f3cc7a0585df2b0b3e6c62f86bba0d43de47b80c4eec1c4f98e60a36188219919cf"
+"36dc10ee11e174a67d226ad9e71f02a7fca26ad67a4862773f3defc6a747545314063e5f"
+"ce7a3f890ec57daa5532acfd027739832437c8a58dcbe11c2842e60e8ca64979d081fbd5"
+"a1a028f59317212fb5869abc689a156171d69e4f4c93b949c3459904c00192d3603cd184"
+"48d64b843c57f34aee7830f313e58e2abc41b44be46a96c845ffebcb7120e21d1d751046"
+"c072adf65dd901a39c8019742054be5e159ea88d0885ee05fcd4c189bafe5abb68603186"
+"5dc570b9342fa7f41fd5c1c87e68371ab19a83c82ae1d890c678102d5da8e6c29845657c"
+"027ba07362cba4d24950ab38e747925e22ce8df9eaec1ae2c6d23374b360c8352feb6cb9"
+"913e4fc49bde6caf5293030d0d234a8ecd616023cc668262591f812de208738e5336a9e6"
+"9f9be2479b86be1e1369761518dfc93797ed3a55308878a944581eba50bc9c7f7a0e75c7"
+"6a28acd95b277857726f3f684eefc215e0a696f47d65d30431d710d957c08ef96682b385"
+"0ee5ba1c8417aafc1af2846a127ec155b4b7fb369e90eb3a5c3793a3389bbc6b532ca32b"
+"f5e1f03c2280e71c6e1ae21312d4ff163eee16ebb1fdee8e887bb0d453829b4e6ed5fa70"
+"8f2053f29b81e277be46",
+   "a757ead499a6ec3d8ab9814f839117354ae563c8"
+}
+};
+
+void test_sha1(void)
+{
+   unsigned char buffer[4000];
+   int i;
+   for (i=0; i < sizeof(sha1_tests) / sizeof(sha1_tests[0]); ++i) {
+      stb_uint len = sha1_tests[i].length / 8;
+      unsigned char digest[20], fdig[20];
+      unsigned int h;
+      assert(len <= sizeof(buffer));
+      assert(strlen(sha1_tests[i].message) == len*2);
+      assert(strlen(sha1_tests[i].digest) == 20 * 2);
+      for (h=0; h < len; ++h) {
+         char v[3];
+         v[0] = sha1_tests[i].message[h*2];
+         v[1] = sha1_tests[i].message[h*2+1];
+         v[2] = 0;
+         buffer[h] = (unsigned char) strtol(v, NULL, 16);
+      }
+      stb_sha1(digest, buffer, len);
+      for (h=0; h < 20; ++h) {
+         char v[3];
+         int res;
+         v[0] = sha1_tests[i].digest[h*2];
+         v[1] = sha1_tests[i].digest[h*2+1];
+         v[2] = 0;
+         res = digest[h] == strtol(v, NULL, 16);
+         c(res, sha1_tests[i].digest);
+         if (!res)
+            break;
+      }
+      {
+         int z;
+         FILE *f = fopen("data/test.bin", "wb");
+         if (!f) stb_fatal("Couldn't write to test.bin");
+         fwrite(buffer, len, 1, f);
+         fclose(f);
+         #ifdef _WIN32
+         z = stb_sha1_file(fdig, "data/test.bin");
+         if (!z) stb_fatal("Couldn't digest test.bin");
+         c(memcmp(digest, fdig, 20)==0, "stb_sh1_file");
+         #endif
+      }
+   }
+}
+
+
+#if 0
+
+stb__obj zero, one;
+
+void test_packed_floats(void)
+{
+   stb__obj *p;
+   float x,y,*q;
+   clock_t a,b,c;
+   int i;
+   stb_float_init();
+   for (i=-10; i < 10; ++i) {
+      float f = (float) pow(10,i);
+      float g = f * 10;
+      float delta = (g - f) / 10000;
+      while (f < g) {
+         stb__obj z = stb_float(f);
+         float k = stb_getfloat(z);
+         float p = stb_getfloat_table(z);
+         assert((z & 1) == 1);
+         assert(f == k);
+         assert(k == p);
+         f += delta;
+      }
+   }
+
+   zero = stb_float(0);
+   one  = stb_float(1);
+
+   p = malloc(8192 * 4);
+   for (i=0; i < 8192; ++i)
+      p[i] = stb_rand();
+   for (i=0; i < 8192; ++i)
+      if ((stb_rand() & 31) < 28)
+         p[i] = zero;
+
+   q = malloc(4 * 1024);
+      
+   a = clock();
+
+   x = y = 0;
+   for (i=0; i < 200000000; ++i)
+      q[i&1023] = stb_getfloat_table(p[i&8191]);
+   b = clock();
+   for (i=0; i < 200000000; ++i)
+      q[i&1023] = stb_getfloat_table2(p[i&8191]);
+   c = clock();
+   free(p);
+
+   free(q);
+
+   printf("Table: %d\nIFs: %d\n", b-a, c-b);
+}
+#endif
+
+
+void do_compressor(int argc,char**argv)
+{
+   char *p;
+   size_t slen;
+   unsigned int len;
+
+   int window;
+   if (argc == 2) {
+      p = stb_file(argv[1], &slen);
+      len = (unsigned int) slen;
+      if (p) {
+         unsigned int dlen, clen = stb_compress_tofile("data/dummy.bin", p, len);
+         char *q = stb_decompress_fromfile("data/dummy.bin", &dlen);
+
+         if (len != dlen) {
+            printf("FAILED %d -> %d\n", len, clen);
+         } else {
+            int z = memcmp(q,p,dlen);
+            if (z != 0) 
+               printf("FAILED %d -> %d\n", len, clen);
+            else
+               printf("%d -> %d\n", len, clen);
+         }
+      }
+      return;
+   }
+
+   window = atoi(argv[1]);
+   if (window && argc == 4) {
+      p = stb_file(argv[3], &slen);
+      len = (int) slen;
+      if (p) {
+         stb_compress_hashsize(window);
+         stb_compress_tofile(argv[2], p, len);
+      }
+   } else if (argc == 3) {
+      p = stb_decompress_fromfile(argv[2], &len);
+      if (p) {
+         FILE *f = fopen(argv[1], "wb");
+         fwrite(p,1,len,f);
+         fclose(f);
+      } else {
+         fprintf(stderr, "FAILED.\n");
+      }
+   } else {
+      fprintf(stderr, "Usage: stb <hashsize> <output> <filetocompress>\n"
+                      "   or  stb            <output> <filetodecompress>\n");
+   }
+}
+
+#if 0
+// naive backtracking implementation
+int wildmatch(char *expr, char *candidate)
+{
+   while(*expr) {
+      if (*expr == '?') {
+         if (!*candidate) return 0;
+         ++candidate;
+         ++expr;
+      } else if (*expr == '*') {
+         ++expr;
+         while (*expr == '*' || *expr =='?') ++expr;
+         // '*' at end of expression matches anything
+         if (!*expr) return 1;
+         // now scan candidate 'til first match
+         while (*candidate) {
+            if (*candidate == *expr) {
+               // check this candidate
+               if (stb_wildmatch(expr+1, candidate+1))
+                  return 1;
+               // if not, then backtrack
+            }
+            ++candidate;
+         }
+      } else {
+         if (*expr != *candidate)
+            return 0;
+         ++expr, ++candidate;
+      }
+   }
+   return *candidate != 0;
+}
+
+int stb_matcher_find_slow(stb_matcher *m, char *str)
+{
+   int result = 1;
+   int i,j,y,z;
+   uint16 *previous = NULL;
+   uint16 *current = NULL;
+   uint16 *temp;
+
+   stb_arr_setsize(previous, 4);
+   stb_arr_setsize(current, 4);
+
+   previous = stb__add_if_inactive(m, previous, m->start_node);
+   previous = stb__eps_closure(m,previous);
+   if (stb__clear_goalcheck(m, previous))
+      goto done;
+
+   while (*str) {
+      y = stb_arr_len(previous);
+      for (i=0; i < y; ++i) {
+         stb_nfa_node *n = &m->nodes[previous[i]];
+         z = stb_arr_len(n->out);
+         for (j=0; j < z; ++j) {
+            if (n->out[j].match == *str)
+               current = stb__add_if_inactive(m, current, n->out[j].node);
+            else if (n->out[j].match == -1) {
+               if (*str != '\n')
+                  current = stb__add_if_inactive(m, current, n->out[j].node);
+            } else if (n->out[j].match < -1) {
+               int z = -n->out[j].match - 2;
+               if (m->charset[(uint8) *str] & (1 << z))
+                  current = stb__add_if_inactive(m, current, n->out[j].node);
+            }
+         }
+      }
+      ++str;
+      stb_arr_setlen(previous, 0);
+
+      temp = previous;
+      previous = current;
+      current = temp;
+
+      if (!m->match_start)
+         previous = stb__add_if_inactive(m, previous, m->start_node);
+      previous = stb__eps_closure(m,previous);
+      if (stb__clear_goalcheck(m, previous))
+         goto done;
+   }
+
+   result=0;
+
+done:
+   stb_arr_free(previous);
+   stb_arr_free(current);
+
+   return result;
+}
+#endif
+
+
+
+
+
+#if 0 // parser generator
+//////////////////////////////////////////////////////////////////////////
+//
+//   stb_parser
+//
+//   Generates an LR(1) parser from a grammar, and can parse with it
+
+
+
+// Symbol representations
+//
+// Client:     Internal:
+//    -           c=0     e aka epsilon
+//    -           c=1     $ aka end of string
+//   > 0        2<=c<M    terminals (note these are remapped from a sparse layout)
+//   < 0        M<=c<N    non-terminals
+
+#define END 1
+#define EPS 0
+
+short encode_term[4096];  // @TODO: malloc these
+short encode_nonterm[4096];
+int first_nonterm, num_symbols, symset;
+#define encode_term(x)     encode_term[x]
+#define encode_nonterm(x)  encode_nonterm[~(x)]
+#define encode_symbol(x)   ((x) >= 0 ? encode_term(x) : encode_nonterm(x))
+
+stb_bitset **compute_first(short ** productions)
+{
+   int i, changed;
+   stb_bitset **first = malloc(sizeof(*first) * num_symbols);
+
+   assert(symset);
+   for (i=0; i < num_symbols; ++i)
+      first[i] = stb_bitset_new(0, symset);
+
+   for (i=END; i < first_nonterm; ++i)
+      stb_bitset_setbit(first[i], i);
+
+   for (i=0; i < stb_arr_len(productions); ++i) {
+      if (productions[i][2] == 0) {
+         int nt = encode_nonterm(productions[i][0]);
+         stb_bitset_setbit(first[nt], EPS);
+      }
+   }
+
+   do {
+      changed = 0;
+      for (i=0; i < stb_arr_len(productions); ++i) {
+         int j, nt = encode_nonterm(productions[i][0]);
+         for (j=2; productions[i][j]; ++j) {
+            int z = encode_symbol(productions[i][j]);
+            changed |= stb_bitset_unioneq_changed(first[nt], first[z], symset);
+            if (!stb_bitset_testbit(first[z], EPS))
+               break;
+         }
+         if (!productions[i][j] && !stb_bitset_testbit(first[nt], EPS)) {
+            stb_bitset_setbit(first[nt], EPS);
+            changed = 1;
+         }
+      }
+   } while (changed);
+   return first;
+}
+
+stb_bitset **compute_follow(short ** productions, stb_bitset **first, int start)
+{
+   int i,j,changed;
+   stb_bitset **follow = malloc(sizeof(*follow) * num_symbols);
+
+   assert(symset);
+   for (i=0; i < num_symbols; ++i)
+      follow[i] = (i >= first_nonterm ? stb_bitset_new(0, symset) : NULL);
+
+   stb_bitset_setbit(follow[start], END);
+   do {
+      changed = 0;
+      for (i=0; i < stb_arr_len(productions); ++i) {
+         int nt = encode_nonterm(productions[i][0]);
+         for (j=2; productions[i][j]; ++j) {
+            if (productions[i][j] < 0) {
+               int k,z = encode_nonterm(productions[i][j]);
+               for (k=j+1; productions[i][k]; ++k) {
+                  int q = encode_symbol(productions[i][k]);
+                  changed |= stb_bitset_unioneq_changed(follow[z], first[q], symset);
+                  if (!stb_bitset_testbit(first[q], EPS))
+                     break;
+               }
+               if (!productions[i][k] == 0)
+                  changed |= stb_bitset_unioneq_changed(follow[z], follow[nt], symset);
+            }
+         }
+      }
+   } while (changed);
+
+   for (i=first_nonterm; i < num_symbols; ++i)
+      stb_bitset_clearbit(follow[i], EPS);
+
+   return follow;
+}
+
+void first_for_prod_plus_sym(stb_bitset **first, stb_bitset *out, short *prod, int symbol)
+{
+   stb_bitset_clearall(out, symset);
+   for(;*prod;++prod) {
+      int z = encode_symbol(*prod);
+      stb_bitset_unioneq_changed(out, first[z], symset);
+      if (!stb_bitset_testbit(first[z], EPS))
+         return;
+   }
+   stb_bitset_unioneq_changed(out, first[symbol], symset);
+}
+
+#define Item(p,c,t)       ((void *) (size_t) (((t) << 18) + ((c) << 12) + ((p) << 2)))
+#define ItemProd(i)       ((((uint32) (size_t) (i)) >> 2) & 1023)
+#define ItemCursor(i)     ((((uint32) (size_t) (i)) >> 12) & 63)
+#define ItemLookahead(i)  (((uint32) (size_t) (i)) >> 18)
+
+static void pc(stb_ps *p)
+{
+}
+
+typedef struct
+{
+   short *prod;
+   int prod_num;
+} ProdRef;
+
+typedef struct
+{
+   stb_bitset **first;
+   stb_bitset **follow;
+   short **   prod;
+   ProdRef ** prod_by_nt;
+} Grammar;
+
+stb_ps *itemset_closure(Grammar g, stb_ps *set)
+{
+   stb_bitset *lookahead;
+   int changed,i,j,k, list_len;
+   if (set == NULL) return set;
+   lookahead = stb_bitset_new(0, symset);
+   do {
+      void **list = stb_ps_getlist(set, &list_len);
+      changed = 0;
+      for (i=0; i < list_len; ++i) {
+         ProdRef *prod;
+         int nt, *looklist;
+         int p = ItemProd(list[i]), c = ItemCursor(list[i]), t = ItemLookahead(list[i]);
+         if (g.prod[p][c] >= 0) continue;
+         nt = encode_nonterm(g.prod[p][c]);
+         first_for_prod_plus_sym(g.first, lookahead, g.prod[p]+c+1, t);
+         looklist = stb_bitset_getlist(lookahead, 1, first_nonterm);
+               
+         prod = g.prod_by_nt[nt];
+         for (j=0; j < stb_arr_len(prod); ++j) {
+            assert(prod[j].prod[0] == g.prod[p][c]);
+            // matched production; now iterate terminals
+            for (k=0; k < stb_arr_len(looklist); ++k) {
+               void *item = Item(prod[j].prod_num,2,looklist[k]);
+               if (!stb_ps_find(set, item)) {
+                  changed = 1;
+                  set = stb_ps_add(set, item);
+                  pc(set);
+               }
+            }
+         }
+         stb_arr_free(looklist);
+      }
+      free(list);
+   } while (changed);
+   free(lookahead);
+   return set;
+}
+
+stb_ps *itemset_goto(Grammar g, stb_ps *set, int sym)
+{
+   int i, listlen;
+   void **list = stb_ps_fastlist(set, &listlen);
+   stb_ps *out = NULL;
+   for (i=0; i < listlen; ++i) {
+      int p,c;
+      if (!stb_ps_fastlist_valid(list[i])) continue;
+      p = ItemProd(list[i]), c = ItemCursor(list[i]);
+      if (encode_symbol(g.prod[p][c]) == sym) {
+         void *z = Item(p,c+1,ItemLookahead(list[i]));
+         if (!stb_ps_find(out, z))
+            out = stb_ps_add(out, z);
+         pc(out);
+      }
+   }
+   return itemset_closure(g, out);
+}
+
+void itemset_all_nextsym(Grammar g, stb_bitset *out, stb_ps *set)
+{
+   int i, listlen;
+   void **list = stb_ps_fastlist(set, &listlen);
+   stb_bitset_clearall(out, symset);
+   pc(set);
+   for (i=0; i < listlen; ++i) {
+      if (stb_ps_fastlist_valid(list[i])) {
+         int p = ItemProd(list[i]);
+         int c = ItemCursor(list[i]);
+         if (g.prod[p][c])
+            stb_bitset_setbit(out, encode_symbol(g.prod[p][c]));
+      }
+   }
+}
+
+stb_ps ** generate_items(Grammar g, int start_prod)
+{
+   stb_ps ** all=NULL;
+   int i,j,k;
+   stb_bitset *try = stb_bitset_new(0,symset);
+   stb_ps *set = NULL;
+   void *item = Item(start_prod, 2, END);
+   set = stb_ps_add(set, item);
+   pc(set);
+   set = itemset_closure(g, set);
+   pc(set);
+   stb_arr_push(all, set);
+   for (i = 0; i < stb_arr_len(all); ++i) {
+      // only try symbols that appear in all[i]... there's a smarter way to do this,
+      // which is to take all[i], and divide it up by symbol
+      pc(all[i]);
+      itemset_all_nextsym(g, try, all[i]);
+      for (j = 1; j < num_symbols; ++j) {
+         if (stb_bitset_testbit(try, j)) {
+            stb_ps *out;
+            if (stb_arr_len(all) > 4) pc(all[4]);
+            if (i == 1 && j == 29) {
+               if (stb_arr_len(all) > 4) pc(all[4]);
+               out = itemset_goto(g, all[i], j);
+               if (stb_arr_len(all) > 4) pc(all[4]);
+            } else
+               out = itemset_goto(g, all[i], j);
+            pc(out);
+            if (stb_arr_len(all) > 4) pc(all[4]);
+            if (out != NULL) {
+               // add it to the array if it's not already there
+               for (k=0; k < stb_arr_len(all); ++k)
+                  if (stb_ps_eq(all[k], out))
+                     break;
+               if (k == stb_arr_len(all)) {
+                  stb_arr_push(all, out);
+                  pc(out);
+                  if (stb_arr_len(all) > 4) pc(all[4]);
+               } else
+                  stb_ps_delete(out);
+            }
+         }
+      }
+   }
+   free(try);
+   return all;
+}
+
+typedef struct
+{
+   int num_stack;
+   int function;
+} Reduction;
+
+typedef struct
+{
+   short *encode_term;
+   Reduction *reductions;
+   short **action_goto; // terminals are action, nonterminals are goto
+   int start;
+   int end_term;
+} Parser;
+
+enum
+{
+   A_error, A_accept, A_shift, A_reduce, A_conflict
+};
+
+typedef struct
+{
+   uint8 type;
+   uint8 cursor;
+   short prod;
+   short value;
+} Action;
+
+Parser *parser_create(short **productions, int num_prod, int start_nt, int end_term)
+{
+   short *mini_rule = malloc(4 * sizeof(mini_rule[0]));
+   Action *actions;
+   Grammar g;
+   stb_ps ** sets;
+   Parser *p = malloc(sizeof(*p));
+   int i,j,n;
+   stb_bitset *mapped;
+   int min_s=0, max_s=0, termset, ntset, num_states, num_reductions, init_prod;
+
+   int synth_start;
+
+   // remap sparse terminals and nonterminals
+
+   for (i=0; i < num_prod; ++i) {
+      for (j=2; productions[i][j]; ++j) {
+         if (productions[i][j] < min_s) min_s = productions[i][j];
+         if (productions[i][j] > max_s) max_s = productions[i][j];
+      }
+   }
+   synth_start = --min_s;
+
+   termset = (max_s + 32) >> 5;
+   ntset = (~min_s + 32) >> 5;
+   memset(encode_term, 0, sizeof(encode_term));
+   memset(encode_nonterm, 0, sizeof(encode_nonterm));
+
+   mapped = stb_bitset_new(0, termset);
+   n = 2;
+   for (i=0; i < num_prod; ++i)
+      for (j=2; productions[i][j]; ++j)
+         if (productions[i][j] > 0)
+            if (!stb_bitset_testbit(mapped, productions[i][j])) {
+               stb_bitset_setbit(mapped, productions[i][j]);
+               encode_term[productions[i][j]] = n++;
+            }
+   free(mapped);
+
+   first_nonterm = n;
+
+   mapped = stb_bitset_new(0, ntset);
+   for (i=0; i < num_prod; ++i)
+      for (j=2; productions[i][j]; ++j)
+         if (productions[i][j] < 0)
+            if (!stb_bitset_testbit(mapped, ~productions[i][j])) {
+               stb_bitset_setbit(mapped, ~productions[i][j]);
+               encode_nonterm[~productions[i][j]] = n++;
+            }
+   free(mapped);
+
+   // add a special start state for internal processing
+   p->start = n++;
+   encode_nonterm[synth_start] = p->start;
+   mini_rule[0] = synth_start;
+   mini_rule[1] = -32768;
+   mini_rule[2] = start_nt;
+   mini_rule[3] = 0;
+
+   p->end_term = end_term;
+
+   num_symbols = n;
+   
+   // create tables
+   g.prod = NULL;
+   g.prod_by_nt = malloc(num_symbols * sizeof(g.prod_by_nt[0]));
+   for (i=0; i < num_symbols; ++i)
+      g.prod_by_nt[i] = NULL;
+
+   for (i=0; i < num_prod; ++i) {
+      stb_arr_push(g.prod, productions[i]);
+   }
+   init_prod = stb_arr_len(g.prod);
+   stb_arr_push(g.prod, mini_rule);
+
+   num_reductions = stb_arr_len(g.prod);
+   p->reductions = malloc(num_reductions * sizeof(*p->reductions));
+
+   symset = (num_symbols + 31) >> 5;
+   g.first = compute_first(g.prod);
+   g.follow = compute_follow(g.prod, g.first, p->start);
+
+   for (i=0; i < stb_arr_len(g.prod); ++i) {
+      ProdRef pr = { g.prod[i], i };
+      stb_arr_push(g.prod_by_nt[encode_nonterm(g.prod[i][0])], pr);
+   }
+
+   sets = generate_items(g, init_prod);
+
+   num_states = stb_arr_len(sets);
+   // now generate tables
+
+   actions = malloc(sizeof(*actions) * first_nonterm);
+   p->action_goto = (short **) stb_array_block_alloc(num_states, sizeof(short) * num_symbols);
+   for (i=0; i < num_states; ++i) {
+      int j,n;
+      void **list = stb_ps_getlist(sets[i], &n);
+      memset(actions, 0, sizeof(*actions) * first_nonterm);
+      for (j=0; j < n; ++j) {
+         int p = ItemProd(list[j]), c = ItemCursor(list[j]), t = ItemLookahead(list[j]);
+         if (g.prod[p][c] == 0) {
+            if (p == init_prod) {
+               // @TODO: check for conflicts
+               assert(actions[t].type == A_error || actions[t].type == A_accept);
+               actions[t].type = A_accept;
+            } else {
+               // reduce production p
+               if (actions[t].type == A_reduce) {
+                  // is it the same reduction we already have?
+                  if (actions[t].prod != p) {
+                     // no, it's a reduce-reduce conflict!
+                     printf("Reduce-reduce conflict for rule %d and %d, lookahead %d\n", p, actions[t].prod, t);
+                     // @TODO: use precedence
+                     actions[t].type = A_conflict;
+                  }
+               } else if (actions[t].type == A_shift) {
+                  printf("Shift-reduce conflict for rule %d and %d, lookahead %d\n", actions[t].prod, p, t);
+                  actions[t].type = A_conflict;
+               } else if (actions[t].type == A_accept) {
+                  assert(0);
+               } else if (actions[t].type == A_error) {
+                  actions[t].type = A_reduce;
+                  actions[t].prod = p;
+               }
+            }
+         } else if (g.prod[p][c] > 0) {
+            int a = encode_symbol(g.prod[p][c]), k;
+            stb_ps *out = itemset_goto(g, sets[i], a);
+            for (k=0; k < stb_arr_len(sets); ++k)
+               if (stb_ps_eq(sets[k], out))
+                  break;
+            assert(k < stb_arr_len(sets));
+            // shift k
+            if (actions[a].type == A_shift) {
+               if (actions[a].value != k) {
+                  printf("Shift-shift conflict! Rule %d and %d with lookahead %d/%d\n", actions[a].prod, p, a,t);
+                  actions[a].type = A_conflict;
+               }
+            } else if (actions[a].type == A_reduce) {
+               printf("Shift-reduce conflict for rule %d and %d, lookahead %d/%d\n", p, actions[a].prod, a,t);
+               actions[a].type = A_conflict;
+            } else if (actions[a].type == A_accept) {
+               assert(0);
+            } else if (actions[a].type == A_error) {
+               actions[a].type = A_shift;
+               actions[a].prod = p;
+               actions[a].cursor = c;
+               actions[a].value  = k;
+            }
+         }
+      }
+      // @TODO: recompile actions into p->action_goto
+   }
+
+   free(mini_rule);
+   stb_pointer_array_free(g.first , num_symbols); free(g.first );
+   stb_pointer_array_free(g.follow, num_symbols); free(g.follow);
+   stb_arr_free(g.prod);
+   for (i=0; i < num_symbols; ++i)
+      stb_arr_free(g.prod_by_nt[i]);
+   free(g.prod_by_nt);
+   for (i=0; i < stb_arr_len(sets); ++i)
+      stb_ps_delete(sets[i]);
+   stb_arr_free(sets);
+
+   return p;
+}
+
+void parser_destroy(Parser *p)
+{
+   free(p);
+}
+
+#if 0
+enum nonterm
+{
+   N_globals = -50,
+   N_global, N_vardef, N_varinitlist, N_varinit, N_funcdef, N_optid, N_optparamlist,
+   N_paramlist, N_param, N_optinit, N_optcomma, N_statements, N_statement,
+   N_optexpr, N_assign, N_if, N_ifcore, N_else, N_dictdef, N_dictdef2,
+   N_dictdefitem, N_expr,
+   N__last
+};
+
+short grammar[][10] =
+{
+   { N_globals    ,  0, N_globals, N_global                                 },
+   { N_globals    ,  0                                                      },
+   { N_global     ,  0, N_vardef                                            },
+   { N_global     ,  0, N_funcdef                                           },
+   { N_vardef     ,  0, ST_var, N_varinitlist,                               },
+   { N_varinitlist,  0, N_varinitlist, ',', N_varinit                       },
+   { N_varinitlist,  0, N_varinit,                                          },
+   { N_varinit    ,  0, ST_id, N_optinit,                                    },
+   { N_funcdef    ,  0, ST_func, N_optid, '(', N_optparamlist, ')', N_statements, ST_end },
+   { N_optid      ,  0, ST_id                                                },
+   { N_optid      ,  0,                                                     },
+   { N_optparamlist, 0,                                                     },
+   { N_optparamlist, 0, N_paramlist, N_optcomma                             },
+   { N_paramlist  ,  0, N_paramlist, ',', N_param                           },
+   { N_paramlist  ,  0, N_param                                             },
+   { N_param      ,  0, ST_id, N_optinit                                     },
+   { N_optinit    ,  0, '=', N_expr                                         },
+   { N_optinit    ,  0,                                                     },
+   { N_optcomma   ,  0, ','                                                 },
+   { N_optcomma   ,  0,                                                     },
+   { N_statements ,  0, N_statements, N_statement                           },
+   { N_statement  ,  0, N_statement, ';'                                    },
+   { N_statement  ,  0, N_varinit                                           },
+   { N_statement  ,  0, ST_return, N_expr                                    },
+   { N_statement  ,  0, ST_break , N_optexpr                                 },
+   { N_optexpr    ,  0, N_expr                                              },
+   { N_optexpr    ,  0,                                                     },
+   { N_statement  ,  0, ST_continue                                          },
+   { N_statement  ,  0, N_assign                                            },
+   { N_assign     ,  0, N_expr, '=', N_assign                               },
+   //{ N_assign     ,  0, N_expr                                              },
+   { N_statement  ,  0, ST_while, N_expr, N_statements, ST_end                },
+   { N_statement  ,  0, ST_if, N_if,                                         },
+   { N_if         ,  0, N_ifcore, ST_end,                                    },
+   { N_ifcore     ,  0, N_expr, ST_then, N_statements, N_else, ST_end         },
+   { N_else       ,  0, ST_elseif, N_ifcore                                  },
+   { N_else       ,  0, ST_else, N_statements                                },
+   { N_else       ,  0,                                                     },
+   { N_dictdef    ,  0, N_dictdef2, N_optcomma                              },
+   { N_dictdef2   ,  0, N_dictdef2, ',', N_dictdefitem                      },
+   { N_dictdef2   ,  0, N_dictdefitem                                       },
+   { N_dictdefitem,  0, ST_id, '=', N_expr                                   },
+   { N_dictdefitem,  0, N_expr                                              },
+   { N_expr       ,  0, ST_number                                            },
+   { N_expr       ,  0, ST_string                                            },
+   { N_expr       ,  0, ST_id                                                },
+   { N_expr       ,  0, N_funcdef                                           },
+   { N_expr       ,  0, '-', N_expr                                         },
+   { N_expr       ,  0, '{', N_dictdef, '}'                                 },
+   { N_expr       ,  0, '(', N_expr, ')'                                    },
+   { N_expr       ,  0, N_expr, '.', ST_id                                   },
+   { N_expr       ,  0, N_expr, '[', N_expr, ']'                            },
+   { N_expr       ,  0, N_expr, '(', N_dictdef, ')'                         },
+#if 0
+#define BINOP(op)  { N_expr, 0, N_expr, op, N_expr }
+   BINOP(ST_and), BINOP(ST_or), BINOP(ST_eq), BINOP(ST_ne),
+   BINOP(ST_le),  BINOP(ST_ge), BINOP('>') , BINOP('<' ),
+   BINOP('&'), BINOP('|'), BINOP('^'), BINOP('+'), BINOP('-'),
+   BINOP('*'), BINOP('/'), BINOP('%'),
+#undef BINOP
+#endif
+};
+
+short *grammar_list[stb_arrcount(grammar)];
+
+void test_parser_generator(void)
+{
+   Parser *p;
+   int i;
+   assert(N__last <= 0);
+   for (i=0; i < stb_arrcount(grammar); ++i)
+      grammar_list[i] = grammar[i];
+   p = parser_create(grammar_list, stb_arrcount(grammar), N_globals, 0);
+   parser_destroy(p);
+}
+#endif
+#endif // parser generator
+
+#if 0
+// stb_threadtest.c
+
+
+#include <windows.h>
+#define STB_DEFINE
+//#define STB_THREAD_TEST
+#include "../stb.h"
+
+#define NUM_WORK 100
+
+void *work_consumer(void *p)
+{
+   stb__thread_sleep(20);
+   return NULL;
+}
+
+int pass;
+stb_threadqueue *tq1, *tq2, *tq3, *tq4;
+volatile float t1,t2;
+
+//    with windows.h
+// Worked correctly with 100,000,000 enqueue/dequeue WAITLESS
+// (770 passes, 170000 per pass)
+// Worked correctly with   2,500,000 enqueue/dequeue !WAITLESS
+// (15 passes, 170000 per pass)
+// Worked correctly with   1,500,000 enqueue/dequeue WAITLESS && STB_THREAD_TEST
+// (9 passes, 170000 per pass)
+//    without windows.h
+// Worked correctly with   1,000,000 enqueue/dequeue WAITLESS && STB_THREAD_TEST
+// (6 passes, 170000 per pass)
+// Worked correctly with     500,000 enqueue/dequeue !WAITLESS && STB_THREAD_TEST
+// (3 passes, 170000 per pass)
+// Worked correctly with   1,000,000 enqueue/dequeue WAITLESS
+// (15 passes, 170000 per pass)
+#define WAITLESS
+
+volatile int table[1000*1000*10];
+
+void wait(int n)
+{
+#ifndef WAITLESS
+   int j;
+   float y;
+   for (j=0; j < n; ++j)
+      y += 1 / (t1+j);
+   t2 = y;
+#endif
+}
+
+void *tq1_consumer(void *p)
+{
+   for(;;) {
+      int z;
+      float y = 0;
+      stb_threadq_get_block(tq1, &z);
+      wait(5000);
+      table[z] = pass;
+   }
+}
+
+void *tq2_consumer(void *p)
+{
+   for(;;) {
+      int z;
+      if (stb_threadq_get(tq2, &z))
+         table[z] = pass;
+      wait(1000);
+   }
+}
+
+void *tq3_consumer(void *p)
+{
+   for(;;) {
+      int z;
+      stb_threadq_get_block(tq3, &z);
+      table[z] = pass;
+      wait(500);
+   }
+}
+
+void *tq4_consumer(void *p)
+{
+   for (;;) {
+      int z;
+      stb_threadq_get_block(tq4, &z);
+      table[z] = pass;
+      wait(500);
+   }
+}
+
+typedef struct
+{
+   int start, end;
+   stb_threadqueue *tq;
+   int delay;
+} write_data;
+
+void *writer(void *q)
+{
+   int i;
+   write_data *p = (write_data *) q;
+   for (i=p->start; i < p->end; ++i) {
+      stb_threadq_add_block(p->tq, &i);
+      #ifndef WAITLESS
+      if (p->delay) stb__thread_sleep(p->delay);
+      else {
+         int j;
+         float z = 0;
+         for (j=0; j <= 20; ++j)
+            z += 1 / (t1+j);
+         t2 = z;
+      }
+      #endif
+   }
+   return NULL;
+}
+
+write_data info[256];
+int pos;
+
+void start_writer(int z, int count, stb_threadqueue *tq, int delay)
+{
+   info[z].start = pos;
+   info[z].end = pos+count;
+   info[z].tq = tq;
+   info[z].delay = delay;
+   stb_create_thread(writer, &info[z]);
+   pos += count;
+}
+
+int main(int argc, char **argv)
+{
+   int i;
+   stb_sync s = stb_sync_new();
+   stb_sync_set_target(s, NUM_WORK+1);
+   stb_work_numthreads(2);
+   for (i=0; i < NUM_WORK; ++i) {
+      stb_work_reach(work_consumer, NULL, NULL, s);
+   }
+   printf("Started stb_work test.\n");
+
+   t1 = 1;
+
+   // create the queues
+   tq1 = stb_threadq_new(4, 4, TRUE , TRUE);
+   tq2 = stb_threadq_new(4, 4, TRUE , FALSE);
+   tq3 = stb_threadq_new(4, 4, FALSE, TRUE);
+   tq4 = stb_threadq_new(4, 4, FALSE, FALSE);
+
+   // start the consumers
+   stb_create_thread(tq1_consumer, NULL);
+   stb_create_thread(tq1_consumer, NULL);
+   stb_create_thread(tq1_consumer, NULL);
+
+   stb_create_thread(tq2_consumer, NULL);
+
+   stb_create_thread(tq3_consumer, NULL);
+   stb_create_thread(tq3_consumer, NULL);
+   stb_create_thread(tq3_consumer, NULL);
+   stb_create_thread(tq3_consumer, NULL);
+   stb_create_thread(tq3_consumer, NULL);
+   stb_create_thread(tq3_consumer, NULL);
+   stb_create_thread(tq3_consumer, NULL);
+
+   stb_create_thread(tq4_consumer, NULL);
+
+   for (pass=1; pass <= 5000; ++pass) {
+      int z = 0;
+      int last_n = -1;
+      int identical = 0;
+      pos = 0;
+      start_writer(z++, 50000, tq1, 0);
+      start_writer(z++, 50000, tq1, 0);
+      start_writer(z++, 50000, tq1, 0);
+
+      start_writer(z++, 5000, tq2, 1);
+      start_writer(z++, 3000, tq2, 3);
+      start_writer(z++, 2000, tq2, 5);
+
+      start_writer(z++, 5000, tq3, 3);
+
+      start_writer(z++, 5000, tq4, 3);
+      #ifndef WAITLESS
+      stb__thread_sleep(8000);
+      #endif
+      for(;;) {
+         int n =0;
+         for (i=0; i < pos; ++i) {
+            if (table[i] == pass)
+               ++n;
+         }
+         if (n == pos) break;
+         if (n == last_n) {
+            ++identical;
+            if (identical == 3) {
+               printf("Problem slots:\n");
+               for (i=0; i < pos; ++i) {
+                  if (table[i] != pass) printf("%d ", i);
+               }
+               printf("\n");
+            } else {
+               if (identical < 3)
+                  printf("Processed %d of %d\n", n, pos);
+               else
+                  printf(".");
+            }
+         } else {
+            identical = 0;
+            printf("Processed %d of %d\n", n, pos);
+         }
+         last_n = n;
+         #ifdef WAITLESS
+         stb__thread_sleep(750);
+         #else
+         stb__thread_sleep(3000);
+         #endif
+      }
+      printf("Finished pass %d\n", pass);
+   }
+
+   stb_sync_reach_and_wait(s);
+   printf("stb_work test completed ok.\n");
+   return 0;
+}
+#endif
+
+
+#if 0
+//////////////////////////////////////////////////////////////////////////////
+//
+//   collapse tree leaves up to parents until we only have N nodes
+//   useful for cmirror summaries
+
+typedef struct stb_summary_tree
+{
+   struct stb_summary_tree **children;
+   int num_children;
+   float weight;
+} stb_summary_tree;
+
+STB_EXTERN void *stb_summarize_tree(void *tree, int limit, float reweight);
+
+#ifdef STB_DEFINE
+
+typedef struct stb_summary_tree2
+{
+   STB__ARR(struct stb_summary_tree2 *) children;
+   int num_children;
+   float weight;
+   float weight_with_all_children;
+   float makes_target_weight;
+   float weight_at_target;
+   stb_summary_tree *original;
+   struct stb_summary_tree2 *target;
+   STB__ARR(struct stb_summary_tree2 *) targeters;
+} stb_summary_tree2;
+
+static stb_summary_tree2 *stb__summarize_clone(stb_summary_tree *t)
+{
+   int i;
+   stb_summary_tree2 *s;
+   s = (stb_summary_tree2 *) malloc(sizeof(*s));
+   s->original = t;
+   s->weight = t->weight;
+   s->weight_with_all_children = 0;
+   s->weight_at_target = 0;
+   s->target = NULL;
+   s->targeters = NULL;
+   s->num_children = t->num_children;
+   s->children = NULL;
+   for (i=0; i < s->num_children; ++i)
+      stb_arr_push(s->children, stb__summarize_clone(t->children[i]));
+   return s;
+}
+
+static float stb__summarize_compute_targets(stb_summary_tree2 *parent, stb_summary_tree2 *node, float reweight, float weight)
+{
+   float total = 0;
+   if (node->weight == 0 && node->num_children == 1 && parent) {
+      node->target = parent;
+      return stb__summarize_compute_targets(parent, node->children[0], reweight, weight*reweight);
+   } else {
+      float total=0;
+      int i;
+      for (i=0; i < node->num_children; ++i)
+         total += stb__summarize_compute_targets(node, node->children[i], reweight, reweight);
+      node->weight_with_all_children = total + node->weight;
+      if (parent && node->weight_with_all_children) {
+         node->target = parent;
+         node->weight_at_target = node->weight_with_all_children * weight;
+         node->makes_target_weight = node->weight_at_target + parent->weight;
+         stb_arr_push(parent->targeters, node);
+      } else {
+         node->target = NULL;
+         node->weight_at_target = node->weight;
+         node->makes_target_weight = 0;
+      }
+      return node->weight_with_all_children * weight;
+   }      
+}
+
+static stb_summary_tree2 ** stb__summarize_make_array(STB__ARR(stb_summary_tree2 *) all, stb_summary_tree2 *tree)
+{
+   int i;
+   stb_arr_push(all, tree);
+   for (i=0; i < tree->num_children; ++i)
+      all = stb__summarize_make_array(all, tree->children[i]);
+   return all;
+}
+
+typedef stb_summary_tree2 * stb__stree2;
+stb_define_sort(stb__summarysort, stb__stree2, (*a)->makes_target_weight < (*b)->makes_target_weight)
+
+void *stb_summarize_tree(void *tree, int limit, float reweight)
+{
+   int i,j,k;
+   STB__ARR(stb_summary_tree *) ret=NULL;
+   STB__ARR(stb_summary_tree2 *) all=NULL;
+
+   // first clone the tree so we can manipulate it
+   stb_summary_tree2 *t = stb__summarize_clone((stb_summary_tree *) tree);
+   if (reweight < 1) reweight = 1;
+
+   // now compute how far up the tree each node would get pushed
+   // there's no value in pushing a node up to an empty node with
+   // only one child, so we keep pushing it up
+   stb__summarize_compute_targets(NULL, t, reweight, 1);
+
+   all = stb__summarize_make_array(all, t);
+
+   // now we want to iteratively find the smallest 'makes_target_weight',
+   // update that, and then fix all the others (which will be all descendents)
+   // to do this efficiently, we need a heap or a sorted binary tree
+   // what we have is an array. maybe we can insertion sort the array?
+   stb__summarysort(all, stb_arr_len(all));
+
+   for (i=0; i < stb_arr_len(all) - limit; ++i) {
+      stb_summary_tree2 *src, *dest;
+      src = all[i];
+      dest = all[i]->target;
+      if (src->makes_target_weight == 0) continue;
+      assert(dest != NULL);
+
+      for (k=0; k < stb_arr_len(all); ++k)
+         if (all[k] == dest)
+            break;
+      assert(k != stb_arr_len(all));
+      assert(i < k);
+
+      // move weight from all[i] to target
+      src->weight = dest->makes_target_weight;
+      src->weight = 0;
+      src->makes_target_weight = 0;
+      // recompute effect of other descendents
+      for (j=0; j < stb_arr_len(dest->targeters); ++j) {
+         if (dest->targeters[j]->weight) {
+            dest->targeters[j]->makes_target_weight = dest->weight + dest->targeters[j]->weight_at_target;
+            assert(dest->targeters[j]->makes_target_weight <= dest->weight_with_all_children);
+         }
+      }
+      STB_(stb__summarysort,_ins_sort)(all+i, stb_arr_len(all)-i);
+   }
+   // now the elements in [ i..stb_arr_len(all) ) are the relevant ones
+   for (; i < stb_arr_len(all); ++i)
+      stb_arr_push(ret, all[i]->original);
+
+   // now free all our temp data
+   for (i=0; i < stb_arr_len(all); ++i) {
+      stb_arr_free(all[i]->children);
+      free(all[i]);
+   }
+   stb_arr_free(all);
+   return ret;
+}
+#endif
+
+#endif
diff --git a/vendor/stb/tests/stb.dsp b/vendor/stb/tests/stb.dsp
new file mode 100644
index 0000000..b7039c7
--- /dev/null
+++ b/vendor/stb/tests/stb.dsp
@@ -0,0 +1,244 @@
+# Microsoft Developer Studio Project File - Name="stb" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=stb - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "stb.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "stb.mak" CFG="stb - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "stb - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "stb - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "stb - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /G6 /MT /W3 /GX /Z7 /O2 /Ob2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /D "VORBIS_TEST" /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386
+
+!ELSEIF  "$(CFG)" == "stb - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug\stb"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /MTd /W3 /GX /Zi /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /D "TT_TEST" /FR /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /incremental:no /debug /machine:I386 /pdbtype:sept
+# SUBTRACT LINK32 /force
+
+!ENDIF 
+
+# Begin Target
+
+# Name "stb - Win32 Release"
+# Name "stb - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\grid_reachability.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\stb.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_c_lexer.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_connected_components.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_divide.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_ds.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_dxt.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_easy_font.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_herringbone_wang_tile.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_image.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_image_resize2.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_image_write.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_include.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_leakcheck.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_malloc.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_perlin.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_rect_pack.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_sprintf.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\stb_static.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_textedit.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_tilemap_editor.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_truetype.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_vorbis.c
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_voxel_render.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_c_compilation.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_c_lexer.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_ds.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_dxt.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_easyfont.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_image.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_image_write.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_packer.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_perlin.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_sprintf.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_truetype.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_vorbis.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_voxel.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\textedit_sample.c
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/stb.dsw b/vendor/stb/tests/stb.dsw
new file mode 100644
index 0000000..eae18c8
--- /dev/null
+++ b/vendor/stb/tests/stb.dsw
@@ -0,0 +1,158 @@
+Microsoft Developer Studio Workspace File, Format Version 6.00
+# WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE!
+
+###############################################################################
+
+Project: "c_lexer_test"=.\c_lexer_test.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "herringbone"=.\herringbone.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "herringbone_map"=.\herringbone_map.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "image_test"=.\image_test.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "make_readme"=..\tools\make_readme.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "resize"=.\resize.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "stb"=.\stb.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+    Begin Project Dependency
+    Project_Dep_Name stb_cpp
+    End Project Dependency
+    Begin Project Dependency
+    Project_Dep_Name image_test
+    End Project Dependency
+    Begin Project Dependency
+    Project_Dep_Name c_lexer_test
+    End Project Dependency
+}}}
+
+###############################################################################
+
+Project: "stb_cpp"=.\stb_cpp.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "stblib"=.\stblib.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "unicode"=..\tools\unicode\unicode.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Project: "vorbseek"=.\vorbseek\vorbseek.dsp - Package Owner=<4>
+
+Package=<5>
+{{{
+}}}
+
+Package=<4>
+{{{
+}}}
+
+###############################################################################
+
+Global:
+
+Package=<5>
+{{{
+}}}
+
+Package=<3>
+{{{
+}}}
+
+###############################################################################
+
diff --git a/vendor/stb/tests/stb_c_lexer_fuzzer.cpp b/vendor/stb/tests/stb_c_lexer_fuzzer.cpp
new file mode 100644
index 0000000..f6dd04a
--- /dev/null
+++ b/vendor/stb/tests/stb_c_lexer_fuzzer.cpp
@@ -0,0 +1,74 @@
+#define STB_C_LEX_C_DECIMAL_INTS    Y  
+#define STB_C_LEX_C_HEX_INTS        Y
+#define STB_C_LEX_C_OCTAL_INTS      Y
+#define STB_C_LEX_C_DECIMAL_FLOATS  Y
+#define STB_C_LEX_C99_HEX_FLOATS    Y
+#define STB_C_LEX_C_IDENTIFIERS     Y
+#define STB_C_LEX_C_DQ_STRINGS      Y
+#define STB_C_LEX_C_SQ_STRINGS      Y
+#define STB_C_LEX_C_CHARS           Y
+#define STB_C_LEX_C_COMMENTS        Y
+#define STB_C_LEX_CPP_COMMENTS      Y
+#define STB_C_LEX_C_COMPARISONS     Y
+#define STB_C_LEX_C_LOGICAL         Y
+#define STB_C_LEX_C_SHIFTS          Y 
+#define STB_C_LEX_C_INCREMENTS      Y
+#define STB_C_LEX_C_ARROW           Y
+#define STB_C_LEX_EQUAL_ARROW       Y
+#define STB_C_LEX_C_BITWISEEQ       Y
+#define STB_C_LEX_C_ARITHEQ         Y
+
+#define STB_C_LEX_PARSE_SUFFIXES    Y
+#define STB_C_LEX_DECIMAL_SUFFIXES  "uUlL"
+#define STB_C_LEX_HEX_SUFFIXES      "lL"
+#define STB_C_LEX_OCTAL_SUFFIXES    "lL"
+#define STB_C_LEX_FLOAT_SUFFIXES    "uulL"
+
+#define STB_C_LEX_0_IS_EOF             N
+#define STB_C_LEX_INTEGERS_AS_DOUBLES  N
+#define STB_C_LEX_MULTILINE_DSTRINGS   Y
+#define STB_C_LEX_MULTILINE_SSTRINGS   Y
+#define STB_C_LEX_USE_STDLIB           N
+#define STB_C_LEX_DOLLAR_IDENTIFIER    Y
+#define STB_C_LEX_FLOAT_NO_DECIMAL     Y  
+
+#define STB_C_LEX_DEFINE_ALL_TOKEN_NAMES  Y  
+#define STB_C_LEX_DISCARD_PREPROCESSOR    Y  
+#define STB_C_LEXER_DEFINITIONS         
+
+#define STB_C_LEXER_IMPLEMENTATION
+#define STB_C_LEXER_SELF_TEST
+#include "../stb_c_lexer.h"
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
+{
+	if(size<3){
+		return 0;
+	}
+	char *input_stream = (char *)malloc(size);
+	if (input_stream == NULL){
+		return 0;
+	}
+	memcpy(input_stream, data, size);
+
+	stb_lexer lex;
+	char *input_end = input_stream+size-1;
+	char *store = (char *)malloc(0x10000);
+	int len = 0x10000;
+	
+	stb_c_lexer_init(&lex, input_stream, input_end, store, len);
+	while (stb_c_lexer_get_token(&lex)) {
+		if (lex.token == CLEX_parse_error) {
+			break;
+		}
+	}
+	
+	free(input_stream);
+	free(store);
+	return 0;
+}
diff --git a/vendor/stb/tests/stb_cpp.cpp b/vendor/stb/tests/stb_cpp.cpp
new file mode 100644
index 0000000..65373a9
--- /dev/null
+++ b/vendor/stb/tests/stb_cpp.cpp
@@ -0,0 +1,85 @@
+#define WIN32_MEAN_AND_LEAN
+#define WIN32_LEAN_AND_MEAN
+//#include <windows.h>
+#include <conio.h>
+#define STB_DEFINE
+#ifndef _M_AMD64
+#define STB_NPTR
+#endif
+#define STB_ONLY
+#include "stb.h"
+//#include "stb_file.h"
+
+int count;
+void c(int truth, const char *error)
+{
+   if (!truth) {
+      fprintf(stderr, "Test failed: %s\n", error);
+      ++count;
+   }
+}
+
+char *expects(stb_matcher *m, char *s, int result, int len, const char *str)
+{
+   int res2,len2=0;
+   res2 = stb_lex(m, s, &len2);
+   c(result == res2 && len == len2, str);
+   return s + len;
+}
+
+void test_lex(void)
+{
+   stb_matcher *m = stb_lex_matcher();
+   //         tok_en5 .3 20.1 20. .20 .1
+   char *s = (char*) "tok_en5.3 20.1 20. .20.1";
+
+   stb_lex_item(m, "[a-zA-Z_][a-zA-Z0-9_]*", 1   );
+   stb_lex_item(m, "[0-9]*\\.?[0-9]*"      , 2   );
+   stb_lex_item(m, "[\r\n\t ]+"            , 3   );
+   stb_lex_item(m, "."                     , -99 );
+   s=expects(m,s,1,7, "stb_lex 1");
+   s=expects(m,s,2,2, "stb_lex 2");
+   s=expects(m,s,3,1, "stb_lex 3");
+   s=expects(m,s,2,4, "stb_lex 4");
+   s=expects(m,s,3,1, "stb_lex 5");
+   s=expects(m,s,2,3, "stb_lex 6");
+   s=expects(m,s,3,1, "stb_lex 7");
+   s=expects(m,s,2,3, "stb_lex 8");
+   s=expects(m,s,2,2, "stb_lex 9");
+   s=expects(m,s,0,0, "stb_lex 10");
+   stb_matcher_free(m);
+}
+
+int main(int argc, char **argv)
+{
+#if 0
+   char *p;
+   p = (char*) "abcdefghijklmnopqrstuvwxyz";
+   c(stb_ischar('c', p), "stb_ischar 1");
+   c(stb_ischar('x', p), "stb_ischar 2");
+   c(!stb_ischar('#', p), "stb_ischar 3");
+   c(!stb_ischar('X', p), "stb_ischar 4");
+   p = (char*) "0123456789";
+   c(!stb_ischar('c', p), "stb_ischar 5");
+   c(!stb_ischar('x', p), "stb_ischar 6");
+   c(!stb_ischar('#', p), "stb_ischar 7");
+   c(!stb_ischar('X', p), "stb_ischar 8");
+   p = (char*) "#####";
+   c(!stb_ischar('c', p), "stb_ischar a");
+   c(!stb_ischar('x', p), "stb_ischar b");
+   c(stb_ischar('#', p), "stb_ischar c");
+   c(!stb_ischar('X', p), "stb_ischar d");
+   p = (char*) "xXyY";
+   c(!stb_ischar('c', p), "stb_ischar e");
+   c(stb_ischar('x', p), "stb_ischar f");
+   c(!stb_ischar('#', p), "stb_ischar g");
+   c(stb_ischar('X', p), "stb_ischar h");
+#endif
+
+   test_lex();
+
+   if (count) {
+      _getch();
+   }
+   return 0;
+}
diff --git a/vendor/stb/tests/stb_cpp.dsp b/vendor/stb/tests/stb_cpp.dsp
new file mode 100644
index 0000000..8bf9975
--- /dev/null
+++ b/vendor/stb/tests/stb_cpp.dsp
@@ -0,0 +1,98 @@
+# Microsoft Developer Studio Project File - Name="stb_cpp" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=stb_cpp - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "stb_cpp.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "stb_cpp.mak" CFG="stb_cpp - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "stb_cpp - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "stb_cpp - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "stb_cpp - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /MT /W3 /GX /O2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "stb_cpp - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug\stb_cpp"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /MTd /W3 /GX /Zd /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "stb_cpp - Win32 Release"
+# Name "stb_cpp - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\stb_cpp.cpp
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_vorbis.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\test_cpp_compilation.cpp
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/stb_png.dict b/vendor/stb/tests/stb_png.dict
new file mode 100644
index 0000000..2a27994
--- /dev/null
+++ b/vendor/stb/tests/stb_png.dict
@@ -0,0 +1,8 @@
+header_png="\x89PNG\x0d\x0a\x1a\x0a"
+
+section_idat="IDAT"
+section_iend="IEND"
+section_ihdr="IHDR"
+section_plte="PLTE"
+section_trns="tRNS"
+section_cgbi="CgBI"
diff --git a/vendor/stb/tests/stb_static.c b/vendor/stb/tests/stb_static.c
new file mode 100644
index 0000000..07ce1f9
--- /dev/null
+++ b/vendor/stb/tests/stb_static.c
@@ -0,0 +1,12 @@
+#define STBI_WINDOWS_UTF8
+#define STB_IMAGE_STATIC
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+#define STB_IMAGE_WRITE_STATIC
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+//#include "stb_image_write.h"
+
+#define STBTT_STATIC
+#define STB_TRUETYPE_IMPLEMENTATION
+#include "stb_truetype.h"
diff --git a/vendor/stb/tests/stbi_read_fuzzer.c b/vendor/stb/tests/stbi_read_fuzzer.c
new file mode 100644
index 0000000..c25398e
--- /dev/null
+++ b/vendor/stb/tests/stbi_read_fuzzer.c
@@ -0,0 +1,28 @@
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define STB_IMAGE_IMPLEMENTATION
+
+#include "../stb_image.h"
+
+
+int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
+{
+    int x, y, channels;
+
+    if(!stbi_info_from_memory(data, size, &x, &y, &channels)) return 0;
+
+    /* exit if the image is larger than ~80MB */
+    if(y && x > (80000000 / 4) / y) return 0;
+
+    unsigned char *img = stbi_load_from_memory(data, size, &x, &y, &channels, 4);
+
+    free(img);
+
+    return 0;
+}
+
+#ifdef __cplusplus
+}
+#endif
\ No newline at end of file
diff --git a/vendor/stb/tests/stblib.dsp b/vendor/stb/tests/stblib.dsp
new file mode 100644
index 0000000..c43513d
--- /dev/null
+++ b/vendor/stb/tests/stblib.dsp
@@ -0,0 +1,102 @@
+# Microsoft Developer Studio Project File - Name="stblib" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=stblib - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "stblib.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "stblib.mak" CFG="stblib - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "stblib - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "stblib - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "stblib - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /FD /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "stblib - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "stblib - Win32 Release"
+# Name "stblib - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\prerelease\stb_lib.h
+# End Source File
+# Begin Source File
+
+SOURCE=..\stb_regex.h
+# End Source File
+# Begin Source File
+
+SOURCE=.\stblib_test.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\stblib_test_companion.c
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/stblib_test.c b/vendor/stb/tests/stblib_test.c
new file mode 100644
index 0000000..9f2e176
--- /dev/null
+++ b/vendor/stb/tests/stblib_test.c
@@ -0,0 +1,11 @@
+#include "prerelease/stb_lib.h"
+#define STB_LIB_IMPLEMENTATION
+#include "prerelease/stb_lib.h"
+
+//#define STB_REGEX_IMPLEMENTATION
+//#include "stb_regex.h"
+
+int main(int argc, char **argv)
+{
+
+}
\ No newline at end of file
diff --git a/vendor/stb/tests/stblib_test_companion.c b/vendor/stb/tests/stblib_test_companion.c
new file mode 100644
index 0000000..88f233c
--- /dev/null
+++ b/vendor/stb/tests/stblib_test_companion.c
@@ -0,0 +1,4 @@
+//#include "stb_regex.h"
+//#include "stb_regex.h"
+#include "prerelease/stb_lib.h"
+#include "prerelease/stb_lib.h"
diff --git a/vendor/stb/tests/stretch_test.dsp b/vendor/stb/tests/stretch_test.dsp
new file mode 100644
index 0000000..dd0442c
--- /dev/null
+++ b/vendor/stb/tests/stretch_test.dsp
@@ -0,0 +1,89 @@
+# Microsoft Developer Studio Project File - Name="stretch_test" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=stretch_test - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "stretch_test.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "stretch_test.mak" CFG="stretch_test - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "stretch_test - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "stretch_test - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "stretch_test - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /I "..\.." /I ".." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /D "TT_TEST" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "stretch_test - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "stretch_test___Win32_Debug"
+# PROP BASE Intermediate_Dir "stretch_test___Win32_Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug\stretch_test"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I ".." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "stretch_test - Win32 Release"
+# Name "stretch_test - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\stretch_test.c
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tests/test.sbm b/vendor/stb/tests/test.sbm
new file mode 100644
index 0000000..9465605
--- /dev/null
+++ b/vendor/stb/tests/test.sbm
@@ -0,0 +1,60 @@
+[link]
+-Xlinker advapi32.lib
+
+[args]
+-I .. -Wall -D_DEBUG
+
+[compilers]
+#clang for x64,vcvars_2015_x64,clang --analyze
+clang for x64,vcvars_2015_x64,clang
+clang for x86,vcvars_2015_x86,clang --target=i386-pc-windows-msvc
+
+#####    STATIC ANALYSIS
+#
+#[link]
+#
+#[args]
+#-I .. -Wall -D_DEBUG
+#
+#[compilers]
+#clang for x64,vcvars_2015_x64,clang --analyze
+#
+#####
+
+[link]
+advapi32.lib
+
+[args]
+/nologo -I .. -W3 -WX -D_DEBUG
+
+[compilers]
+VS2015 for x64, vcvars_2015_x64
+VC6           , vcvars_vc6
+VS2008 for x86, vcvars_2008_x86
+VS2013 for x86, vcvars_2013_x86
+VS2015 for x86, vcvars_2015_x86
+clang-cl for x64, vcvars_2015_x64, clang-cl
+clang-cl for x86, vcvars_2015_x86, clang-cl --target=i386-pc-windows-msvc
+#these batch files don't path a cl executable on my machine?!?
+#VS2008 for x64, vcvars_2008_x64
+#VS2013 for x64, vcvars_2013_x64
+
+[projects]
+c_lexer_test.c
+image_test.c image_write_test.c
+test_cpp_compilation.cpp stb_cpp.cpp ../stb_vorbis.c
+resample_test.cpp 
+-DTT_TEST test_c_compilation.c test_truetype.c 
+main.c stb.c 
+main.c stretchy_buffer_test.c
+main.c test_c_compilation.c
+main.c test_c_lexer.c
+main.c test_dxt.c
+main.c test_easyfont.c
+main.c test_image.c
+main.c test_image_write.c
+main.c test_perlin.c
+main.c test_sprintf.c
+main.c test_vorbis.c ../stb_vorbis.c
+main.c test_voxel.c
+main.c textedit_sample.c
diff --git a/vendor/stb/tests/test_c_compilation.c b/vendor/stb/tests/test_c_compilation.c
new file mode 100644
index 0000000..8141d2b
--- /dev/null
+++ b/vendor/stb/tests/test_c_compilation.c
@@ -0,0 +1,50 @@
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#include "stb_image_resize2.h"
+
+#define STB_SPRINTF_IMPLEMENTATION
+#include "stb_sprintf.h"
+
+#define STB_PERLIN_IMPLEMENTATION
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#define STB_C_LEXER_IMPLEMENTATIOn
+#define STB_DIVIDE_IMPLEMENTATION
+#define STB_IMAGE_IMPLEMENTATION
+#define STB_HERRINGBONE_WANG_TILE_IMEPLEMENTATIOn
+#define STB_RECT_PACK_IMPLEMENTATION
+#define STB_VOXEL_RENDER_IMPLEMENTATION
+#define STB_EASY_FONT_IMPLEMENTATION
+#define STB_DXT_IMPLEMENTATION
+#define STB_INCLUDE_IMPLEMENTATION
+
+#include "stb_herringbone_wang_tile.h"
+#include "stb_image.h"
+#include "stb_image_write.h"
+#include "stb_perlin.h"
+#include "stb_c_lexer.h"
+#include "stb_divide.h"
+#include "stb_rect_pack.h"
+#include "stb_dxt.h"
+#include "stb_include.h"
+
+#include "stb_ds.h"
+
+#define STBVOX_CONFIG_MODE 1
+#include "stb_voxel_render.h"
+
+void STBTE_DRAW_RECT(int x0, int y0, int x1, int y1, unsigned int color)
+{
+}
+
+void STBTE_DRAW_TILE(int x0, int y0, unsigned short id, int highlight, float *data)
+{
+}
+
+#define STB_TILEMAP_EDITOR_IMPLEMENTATION
+//#include "stb_tilemap_editor.h"   // @TODO: it's broken
+
+int quicktest(void)
+{
+   char buffer[999];
+   stbsp_sprintf(buffer, "test%%test");
+   return 0;
+}
\ No newline at end of file
diff --git a/vendor/stb/tests/test_c_lexer.c b/vendor/stb/tests/test_c_lexer.c
new file mode 100644
index 0000000..7921b1c
--- /dev/null
+++ b/vendor/stb/tests/test_c_lexer.c
@@ -0,0 +1,50 @@
+#include "stb_c_lexer.h"
+
+#define STB_C_LEX_C_DECIMAL_INTS    Y   //  "0|[1-9][0-9]*"                        CLEX_intlit
+#define STB_C_LEX_C_HEX_INTS        Y   //  "0x[0-9a-fA-F]+"                       CLEX_intlit
+#define STB_C_LEX_C_OCTAL_INTS      Y   //  "[0-7]+"                               CLEX_intlit
+#define STB_C_LEX_C_DECIMAL_FLOATS  Y   //  "[0-9]*(.[0-9]*([eE][-+]?[0-9]+)?)     CLEX_floatlit
+#define STB_C_LEX_C99_HEX_FLOATS    N   //  "0x{hex}+(.{hex}*)?[pP][-+]?{hex}+     CLEX_floatlit
+#define STB_C_LEX_C_IDENTIFIERS     Y   //  "[_a-zA-Z][_a-zA-Z0-9]*"               CLEX_id
+#define STB_C_LEX_C_DQ_STRINGS      Y   //  double-quote-delimited strings with escapes  CLEX_dqstring
+#define STB_C_LEX_C_SQ_STRINGS      N   //  single-quote-delimited strings with escapes  CLEX_ssstring
+#define STB_C_LEX_C_CHARS           Y   //  single-quote-delimited character with escape CLEX_charlits
+#define STB_C_LEX_C_COMMENTS        Y   //  "/* comment */"
+#define STB_C_LEX_CPP_COMMENTS      Y   //  "// comment to end of line\n"
+#define STB_C_LEX_C_COMPARISONS     Y   //  "==" CLEX_eq  "!=" CLEX_noteq   "<=" CLEX_lesseq  ">=" CLEX_greatereq
+#define STB_C_LEX_C_LOGICAL         Y   //  "&&"  CLEX_andand   "||"  CLEX_oror
+#define STB_C_LEX_C_SHIFTS          Y   //  "<<"  CLEX_shl      ">>"  CLEX_shr
+#define STB_C_LEX_C_INCREMENTS      Y   //  "++"  CLEX_plusplus "--"  CLEX_minusminus
+#define STB_C_LEX_C_ARROW           Y   //  "->"  CLEX_arrow
+#define STB_C_LEX_EQUAL_ARROW       N   //  "=>"  CLEX_eqarrow
+#define STB_C_LEX_C_BITWISEEQ       Y   //  "&="  CLEX_andeq    "|="  CLEX_oreq     "^="  CLEX_xoreq
+#define STB_C_LEX_C_ARITHEQ         Y   //  "+="  CLEX_pluseq   "-="  CLEX_minuseq
+                                        //  "*="  CLEX_muleq    "/="  CLEX_diveq    "%=" CLEX_modeq
+                                        //  if both STB_C_LEX_SHIFTS & STB_C_LEX_ARITHEQ:
+                                        //                      "<<=" CLEX_shleq    ">>=" CLEX_shreq
+
+#define STB_C_LEX_PARSE_SUFFIXES    N   // letters after numbers are parsed as part of those numbers, and must be in suffix list below
+#define STB_C_LEX_DECIMAL_SUFFIXES  ""  // decimal integer suffixes e.g. "uUlL" -- these are returned as-is in string storage
+#define STB_C_LEX_HEX_SUFFIXES      ""  // e.g. "uUlL"
+#define STB_C_LEX_OCTAL_SUFFIXES    ""  // e.g. "uUlL"
+#define STB_C_LEX_FLOAT_SUFFIXES    ""  //
+
+#define STB_C_LEX_0_IS_EOF             Y  // if Y, ends parsing at '\0'; if N, returns '\0' as token
+#define STB_C_LEX_INTEGERS_AS_DOUBLES  N  // parses integers as doubles so they can be larger than 'int', but only if STB_C_LEX_STDLIB==N
+#define STB_C_LEX_MULTILINE_DSTRINGS   N  // allow newlines in double-quoted strings
+#define STB_C_LEX_MULTILINE_SSTRINGS   N  // allow newlines in single-quoted strings
+#define STB_C_LEX_USE_STDLIB           Y  // use strtod,strtol for parsing #s; otherwise inaccurate hack
+#define STB_C_LEX_DOLLAR_IDENTIFIER    Y  // allow $ as an identifier character
+#define STB_C_LEX_FLOAT_NO_DECIMAL     Y  // allow floats that have no decimal point if they have an exponent
+
+#define STB_C_LEX_DEFINE_ALL_TOKEN_NAMES  N   // if Y, all CLEX_ token names are defined, even if never returned
+                                              // leaving it as N should help you catch config bugs
+
+#define STB_C_LEX_DISCARD_PREPROCESSOR    Y   // discard C-preprocessor directives (e.g. after prepocess
+                                              // still have #line, #pragma, etc)
+
+//#define STB_C_LEX_ISWHITE(str)    ... // return length in bytes of whitespace characters if first char is whitespace
+
+#define STB_C_LEXER_DEFINITIONS         // This line prevents the header file from replacing your definitions
+
+#include "stb_c_lexer.h"
diff --git a/vendor/stb/tests/test_cpp_compilation.cpp b/vendor/stb/tests/test_cpp_compilation.cpp
new file mode 100644
index 0000000..d1d10b8
--- /dev/null
+++ b/vendor/stb/tests/test_cpp_compilation.cpp
@@ -0,0 +1,186 @@
+#define STB_IMAGE_WRITE_STATIC
+#define STBIWDEF static inline
+
+#include "stb_image.h"
+#include "stb_rect_pack.h"
+#include "stb_truetype.h"
+#include "stb_image_write.h"
+#include "stb_c_lexer.h"
+#include "stb_perlin.h"
+#include "stb_dxt.h"
+#include "stb_divide.h"
+#include "stb_herringbone_wang_tile.h"
+#include "stb_ds.h"
+#include "stb_hexwave.h"
+
+#include "stb_sprintf.h"
+#define STB_SPRINTF_IMPLEMENTATION
+#include "stb_sprintf.h"
+
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#define STB_TRUETYPE_IMPLEMENTATION
+#define STB_PERLIN_IMPLEMENTATION
+#define STB_DXT_IMPLEMENATION
+#define STB_C_LEXER_IMPLEMENTATIOn
+#define STB_DIVIDE_IMPLEMENTATION
+#define STB_IMAGE_IMPLEMENTATION
+#define STB_HERRINGBONE_WANG_TILE_IMPLEMENTATION
+#define STB_RECT_PACK_IMPLEMENTATION
+#define STB_VOXEL_RENDER_IMPLEMENTATION
+#define STB_CONNECTED_COMPONENTS_IMPLEMENTATION
+#define STB_HEXWAVE_IMPLEMENTATION
+#define STB_DS_IMPLEMENTATION
+#define STBDS_UNIT_TESTS
+
+#define STBI_MALLOC     my_malloc
+#define STBI_FREE       my_free
+#define STBI_REALLOC    my_realloc
+
+void *my_malloc(size_t) { return 0; }
+void *my_realloc(void *, size_t) { return 0; }
+void my_free(void *) { }
+
+#include "stb_image.h"
+#include "stb_rect_pack.h"
+#include "stb_truetype.h"
+#include "stb_image_write.h"
+#include "stb_perlin.h"
+#include "stb_dxt.h"
+#include "stb_divide.h"
+#include "stb_herringbone_wang_tile.h"
+#include "stb_ds.h"
+#include "stb_hexwave.h"
+
+#define STBCC_GRID_COUNT_X_LOG2  10
+#define STBCC_GRID_COUNT_Y_LOG2  10
+#include "stb_connected_components.h"
+
+#define STBVOX_CONFIG_MODE 1
+#include "stb_voxel_render.h"
+
+#define STBTE_DRAW_RECT(x0,y0,x1,y1,color)      do ; while(0)
+#define STBTE_DRAW_TILE(x,y,id,highlight,data)  do ; while(0)
+#define STB_TILEMAP_EDITOR_IMPLEMENTATION
+#include "stb_tilemap_editor.h"
+
+#include "stb_easy_font.h"
+
+#define STB_LEAKCHECK_IMPLEMENTATION
+#include "stb_leakcheck.h"
+
+#define STB_IMAGE_RESIZE_IMPLEMENTATION
+#include "stb_image_resize2.h"
+
+//#include "stretchy_buffer.h"  // deprecating
+
+
+// avoid unused-function complaints
+void dummy2(void)
+{
+   stb_easy_font_spacing(1.0);
+   stb_easy_font_print(0,0,NULL,NULL,NULL,0);
+   stb_easy_font_width(NULL);
+   stb_easy_font_height(NULL);
+}
+
+
+////////////////////////////////////////////////////////////
+//
+// text edit
+
+#include <stdlib.h>
+#include <string.h> // memmove
+#include <ctype.h>  // isspace
+
+#define STB_TEXTEDIT_CHARTYPE   char
+#define STB_TEXTEDIT_STRING     text_control
+
+// get the base type
+#include "stb_textedit.h"
+
+// define our editor structure
+typedef struct
+{
+   char *string;
+   int stringlen;
+   STB_TexteditState state;
+} text_control;
+
+// define the functions we need
+void layout_func(StbTexteditRow *row, STB_TEXTEDIT_STRING *str, int start_i)
+{
+   int remaining_chars = str->stringlen - start_i;
+   row->num_chars = remaining_chars > 20 ? 20 : remaining_chars; // should do real word wrap here
+   row->x0 = 0;
+   row->x1 = 20; // need to account for actual size of characters
+   row->baseline_y_delta = 1.25;
+   row->ymin = -1;
+   row->ymax =  0;
+}
+
+int delete_chars(STB_TEXTEDIT_STRING *str, int pos, int num)
+{
+   memmove(&str->string[pos], &str->string[pos+num], str->stringlen - (pos+num));
+   str->stringlen -= num;
+   return 1; // always succeeds
+}
+
+int insert_chars(STB_TEXTEDIT_STRING *str, int pos, STB_TEXTEDIT_CHARTYPE *newtext, int num)
+{
+   str->string = (char *) realloc(str->string, str->stringlen + num);
+   memmove(&str->string[pos+num], &str->string[pos], str->stringlen - pos);
+   memcpy(&str->string[pos], newtext, num);
+   str->stringlen += num;
+   return 1; // always succeeds
+}
+
+// define all the #defines needed 
+
+#define KEYDOWN_BIT                    0x40000000
+
+#define STB_TEXTEDIT_STRINGLEN(tc)     ((tc)->stringlen)
+#define STB_TEXTEDIT_LAYOUTROW         layout_func
+#define STB_TEXTEDIT_GETWIDTH(tc,n,i)  (1) // quick hack for monospaced
+#define STB_TEXTEDIT_KEYTOTEXT(key)    (((key) & KEYDOWN_BIT) ? 0 : (key))
+#define STB_TEXTEDIT_GETCHAR(tc,i)     ((tc)->string[i])
+#define STB_TEXTEDIT_NEWLINE           '\n'
+#define STB_TEXTEDIT_IS_SPACE(ch)      isspace(ch)
+#define STB_TEXTEDIT_DELETECHARS       delete_chars
+#define STB_TEXTEDIT_INSERTCHARS       insert_chars
+
+#define STB_TEXTEDIT_K_SHIFT           0x20000000
+#define STB_TEXTEDIT_K_CONTROL         0x10000000
+#define STB_TEXTEDIT_K_LEFT            (KEYDOWN_BIT | 1) // actually use VK_LEFT, SDLK_LEFT, etc
+#define STB_TEXTEDIT_K_RIGHT           (KEYDOWN_BIT | 2) // VK_RIGHT
+#define STB_TEXTEDIT_K_UP              (KEYDOWN_BIT | 3) // VK_UP
+#define STB_TEXTEDIT_K_DOWN            (KEYDOWN_BIT | 4) // VK_DOWN
+#define STB_TEXTEDIT_K_LINESTART       (KEYDOWN_BIT | 5) // VK_HOME
+#define STB_TEXTEDIT_K_LINEEND         (KEYDOWN_BIT | 6) // VK_END
+#define STB_TEXTEDIT_K_TEXTSTART       (STB_TEXTEDIT_K_LINESTART | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_TEXTEND         (STB_TEXTEDIT_K_LINEEND   | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_DELETE          (KEYDOWN_BIT | 7) // VK_DELETE
+#define STB_TEXTEDIT_K_BACKSPACE       (KEYDOWN_BIT | 8) // VK_BACKSPACE
+#define STB_TEXTEDIT_K_UNDO            (KEYDOWN_BIT | STB_TEXTEDIT_K_CONTROL | 'z')
+#define STB_TEXTEDIT_K_REDO            (KEYDOWN_BIT | STB_TEXTEDIT_K_CONTROL | 'y')
+#define STB_TEXTEDIT_K_INSERT          (KEYDOWN_BIT | 9) // VK_INSERT
+#define STB_TEXTEDIT_K_WORDLEFT        (STB_TEXTEDIT_K_LEFT  | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_WORDRIGHT       (STB_TEXTEDIT_K_RIGHT | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_PGUP            (KEYDOWN_BIT | 10) // VK_PGUP -- not implemented
+#define STB_TEXTEDIT_K_PGDOWN          (KEYDOWN_BIT | 11) // VK_PGDOWN -- not implemented
+
+#define STB_TEXTEDIT_IMPLEMENTATION
+#include "stb_textedit.h"
+
+
+void dummy3(void)
+{
+  stb_textedit_click(0,0,0,0);
+  stb_textedit_drag(0,0,0,0);
+  stb_textedit_cut(0,0);
+  stb_textedit_key(0,0,0);
+  stb_textedit_initialize_state(0,0);
+  stb_textedit_paste(0,0,0,0);
+}
+
+#include "stb_c_lexer.h"
diff --git a/vendor/stb/tests/test_ds.c b/vendor/stb/tests/test_ds.c
new file mode 100644
index 0000000..fffff8d
--- /dev/null
+++ b/vendor/stb/tests/test_ds.c
@@ -0,0 +1,1034 @@
+#ifdef DS_PERF
+#define _CRT_SECURE_NO_WARNINGS
+#define _CRT_NONSTDC_NO_DEPRECATE
+#define _CRT_NON_CONFORMING_SWPRINTFS
+
+
+//#define STBDS_INTERNAL_SMALL_BUCKET    // make 64-bit bucket fit both keys and hash bits
+//#define STBDS_SIPHASH_2_4     // performance test 1_3 against 2_4
+//#define STBDS_INTERNAL_BUCKET_START    // don't bother offseting differently within bucket for different hash values
+//#define STBDS_FLUSH_CACHE  (1u<<20) // do this much memory traffic to flush the cache between some benchmarking measurements
+
+#include <stdio.h>
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#define STB_DEFINE
+#define STB_NO_REGISTRY
+#include "../stb.h"
+#endif
+
+#ifdef DS_TEST
+#define STBDS_UNIT_TESTS
+#define STBDS_SMALL_BUCKET
+#endif
+
+#ifdef DS_STATS
+#define STBDS_STATISTICS
+#endif
+
+#ifndef DS_PERF
+#define STBDS_ASSERT assert
+#include <assert.h>
+#endif
+
+#define STB_DS_IMPLEMENTATION
+#include "../stb_ds.h"
+
+size_t churn_inserts, churn_deletes;
+
+void churn(int a, int b, int count)
+{
+  struct { int key,value; } *map=NULL;
+  int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i,i+1);
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      assert(k != 0);
+    }
+    assert(hmlen(map) == a);
+  }
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+}
+
+#ifdef DS_TEST
+#include <stdio.h>
+int main(int argc, char **argv)
+{
+  char *temp=NULL;
+  stbds_unit_tests();
+  arrins(temp, 0, 'a');
+  arrins(temp, arrlen(temp), 'b');
+  churn(0,100,1);
+  churn(3,7,50000);
+  churn(3,15,50000);
+  churn(16, 48, 25000);
+  churn(10, 15, 25000);
+  churn(200,500, 5000);
+  churn(2000,5000, 500);
+  churn(20000,50000, 50);
+  printf("Ok!");
+  return 0;
+}
+#endif
+
+#ifdef DS_STATS
+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+size_t max_hit_probes, max_miss_probes, total_put_probes, total_miss_probes, churn_misses;
+void churn_stats(int a, int b, int count)
+{
+  struct { int key,value; } *map=NULL;
+  int i,j,n,k;
+  churn_misses = 0;
+  for (i=0; i < a; ++i) {
+    hmput(map,i,i+1);
+    max_hit_probes = MAX(max_hit_probes, stbds_hash_probes);
+    total_put_probes += stbds_hash_probes;
+    stbds_hash_probes = 0;
+  }
+    
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+      max_hit_probes = MAX(max_hit_probes, stbds_hash_probes);
+      total_put_probes += stbds_hash_probes;
+      stbds_hash_probes = 0;
+    }
+    for (j=0; j < (b-a)*10; ++j) {
+      k=i+j;
+      (void) hmgeti(map,k); // miss
+      max_miss_probes = MAX(max_miss_probes, stbds_hash_probes);
+      total_miss_probes += stbds_hash_probes;
+      stbds_hash_probes = 0;
+      ++churn_misses;
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      stbds_hash_probes = 0;
+      assert(k);
+    }
+    assert(hmlen(map) == a);
+  }
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+}
+
+void reset_stats(void)
+{
+  stbds_array_grow=0, 
+  stbds_hash_grow=0;
+  stbds_hash_shrink=0;
+  stbds_hash_rebuild=0;
+  stbds_hash_probes=0;
+  stbds_hash_alloc=0;
+  stbds_rehash_probes=0;
+  stbds_rehash_items=0;
+  max_hit_probes = 0;
+  max_miss_probes = 0;
+  total_put_probes = 0;
+  total_miss_probes = 0;
+}
+
+void print_churn_probe_stats(char *str)
+{
+  printf("Probes: %3d max hit, %3d max miss, %4.2f avg hit, %4.2f avg miss: %s\n",
+    (int) max_hit_probes, (int) max_miss_probes, (float) total_put_probes / churn_inserts, (float) total_miss_probes / churn_misses, str);
+  reset_stats();
+}
+
+int main(int arg, char **argv)
+{
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  return 0;
+}
+#endif
+
+
+#ifdef DS_PERF
+//char *strdup(const char *foo) { return 0; }
+//int stricmp(const char *a, const char *b) { return 0; }
+//int strnicmp(const char *a, const char *b, size_t n) { return 0; }
+
+unsigned __int64 t0, xsum, mn,mx,count;
+void begin(void)
+{
+  LARGE_INTEGER m;
+  QueryPerformanceCounter(&m);
+  t0 = m.QuadPart;
+  xsum = 0;
+  count = 0;
+  mx = 0;
+  mn = ~(unsigned __int64) 0;
+}
+
+void measure(void)
+{
+  unsigned __int64 t1, t;
+  LARGE_INTEGER m;
+  QueryPerformanceCounter(&m);
+  t1 = m.QuadPart;
+  t = t1-t0;
+  if (t1 < t0)
+    printf("ALERT: QueryPerformanceCounter was unordered!\n");
+  if (t < mn) mn = t;
+  if (t > mx) mx = t;
+  xsum += t;
+  ++count;
+  t0 = t1;
+}
+
+void dont_measure(void)
+{
+  LARGE_INTEGER m;
+  QueryPerformanceCounter(&m);
+  t0 = m.QuadPart;
+}
+
+double timer;
+double end(void)
+{
+  LARGE_INTEGER m;
+  QueryPerformanceFrequency(&m);
+
+  if (count > 3) {
+    // discard the highest and lowest
+    xsum -= mn;
+    xsum -= mx;
+    count -= 2;
+  }
+  timer = (double) (xsum) / count / m.QuadPart * 1000;
+  return timer;
+}
+
+void build(int a, int b, int count, int step)
+{
+  struct { int key,value; } *map=NULL;
+  int i,n;
+  for (i=0; i < a; ++i) {
+    n = i*step;
+    hmput(map,n,i+1);
+  }
+  measure();
+  churn_inserts = i;
+  hmfree(map);
+  dont_measure();
+}
+
+#ifdef STB__INCLUDE_STB_H
+void build_stb(int a, int b, int count, int step)
+{
+  stb_idict *d = stb_idict_new_size(8);
+  int i;
+  for (i=0; i < a; ++i)
+    stb_idict_add(d, i*step, i+1);
+  measure();
+  churn_inserts = i;
+  stb_idict_destroy(d);
+  dont_measure();
+}
+
+void multibuild_stb(int a, int b, int count, int step, int tables)
+{
+  stb_idict *d[50000];
+  int i,q;
+  for (q=0; q < tables; ++q)
+    d[q] = stb_idict_new_size(8);
+  dont_measure();
+  for (i=0; i < a; ++i)
+    for (q=0; q < tables; ++q)
+      stb_idict_add(d[q], i*step+q*771, i+1);
+  measure();
+  churn_inserts = i;
+  for (q=0; q < tables; ++q)
+    stb_idict_destroy(d[q]);
+  dont_measure();
+}
+
+int multisearch_stb(int a, int start, int end, int step, int tables)
+{
+  stb_idict *d[50000];
+  int i,q,total=0,v;
+  for (q=0; q < tables; ++q)
+    d[q] = stb_idict_new_size(8);
+  for (q=0; q < tables; ++q)
+    for (i=0; i < a; ++i)
+      stb_idict_add(d[q], i*step+q*771, i+1);
+  dont_measure();
+  for (i=start; i < end; ++i)
+    for (q=0; q < tables; ++q)
+      if (stb_idict_get_flag(d[q], i*step+q*771, &v))
+        total += v;
+  measure();
+  churn_inserts = i;
+  for (q=0; q < tables; ++q)
+    stb_idict_destroy(d[q]);
+  dont_measure();
+  return total;
+}
+#endif
+
+int multisearch(int a, int start, int end, int step, int tables)
+{
+  struct { int key,value; } *hash[50000];
+  int i,q,total=0;
+  for (q=0; q < tables; ++q)
+    hash[q] = NULL;
+  for (q=0; q < tables; ++q)
+    for (i=0; i < a; ++i)
+      hmput(hash[q], i*step+q*771, i+1);
+  dont_measure();
+  for (i=start; i < end; ++i)
+    for (q=0; q < tables; ++q)
+      total += hmget(hash[q], i*step+q*771);
+  measure();
+  churn_inserts = i;
+  for (q=0; q < tables; ++q)
+    hmfree(hash[q]);
+  dont_measure();
+  return total;
+}
+
+void churn_skip(unsigned int a, unsigned int b, int count)
+{
+  struct { unsigned int key,value; } *map=NULL;
+  unsigned int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i,i+1);
+  dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      assert(k != 0);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  hmfree(map);
+  dont_measure();
+}
+
+typedef struct { int n[8]; } str32;
+void churn32(int a, int b, int count, int include_startup)
+{
+  struct { str32 key; int value; } *map=NULL;
+  int i,j,n;
+  str32 key = { 0 };
+  for (i=0; i < a; ++i) {
+    key.n[0] = i;
+    hmput(map,key,i+1);
+  }
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      key.n[0] = i;
+      hmput(map,key,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      key.n[0] = i-j-1;
+      hmdel(map,key);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  dont_measure();
+}
+
+typedef struct { int n[32]; } str256;
+void churn256(int a, int b, int count, int include_startup)
+{
+  struct { str256 key; int value; } *map=NULL;
+  int i,j,n;
+  str256 key = { 0 };
+  for (i=0; i < a; ++i) {
+    key.n[0] = i;
+    hmput(map,key,i+1);
+  }
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      key.n[0] = i;
+      hmput(map,key,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      key.n[0] = i-j-1;
+      hmdel(map,key);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  dont_measure();
+}
+
+void churn8(int a, int b, int count, int include_startup)
+{
+  struct { size_t key,value; } *map=NULL;
+  int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i,i+1);
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      assert(k != 0);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  dont_measure();
+}
+
+void multichurn4(int a, int b, int count, int include_startup, int tables)
+{
+  struct { int key,value; } *map[50000];
+  int i,j,n,k,q;
+  for (q=0; q < tables; ++q)
+    map[q] = NULL;
+  dont_measure();
+
+  for (i=0; i < a; ++i)
+    for (q=0; q < tables; ++q)
+      hmput(map[q],i,i+1);
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      for (q=0; q < tables; ++q)
+        hmput(map[q],i,i+1);
+    }
+    assert(hmlen(map[0]) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      for (q=0; q < tables; ++q)
+        k = hmdel(map[q],k);
+      assert(k != 0);
+    }
+    assert(hmlen(map[0]) == a);
+  }
+  measure();
+  for (q=0; q < tables; ++q)
+    hmfree(map[q]);
+  churn_inserts = i * tables;
+  churn_deletes = (b-a) * n * tables;
+  dont_measure();
+}
+
+struct {
+  unsigned __int64 start;
+  unsigned __int64 end;
+  int    table_size;
+} mstats[32][4000];
+
+const int first_step = 64;
+const int last_step = 384-48; // 32M
+
+void measure_build4(int step_log2)
+{
+  double length;
+  int i,j,k=0;
+  int step = 1 << step_log2;
+  unsigned __int64 t0,t1;
+  struct { int key,value; } *map=NULL;
+  double rdtsc_scale;
+  begin();
+  t0 = __rdtsc();
+
+  mstats[0][0].start = __rdtsc();
+  for (i=0; i < 256; ++i) {
+    hmput(map,k,k+1);
+    k += step;
+  }
+  mstats[0][first_step-1].end = __rdtsc();
+  mstats[0][first_step-1].table_size = k >> step_log2;
+  for (j=first_step; j < last_step; ++j) {
+    for (i=0; i < (1<<(j>>4)); ++i) {
+      hmput(map, k,k+1);
+      k += step;
+    }
+    mstats[0][j].end = __rdtsc();
+    mstats[0][j].table_size = k >> step_log2;
+  }
+  t1 = __rdtsc();
+  measure();
+  hmfree(map);
+  length = end();
+  rdtsc_scale = length / (t1-t0) * 1000;
+
+  for (j=1; j < last_step; ++j)
+    mstats[0][j].start = mstats[0][0].start;
+  for (j=first_step-1; j < last_step; ++j) {
+    printf("%12.4f,%12d,%12d,0,0,0\n", (mstats[0][j].end - mstats[0][j].start) * rdtsc_scale, mstats[0][j].table_size, mstats[0][j].table_size);
+  }
+}
+
+#ifdef STBDS_FLUSH_CACHE
+static int cache_index;
+char dummy[8][STBDS_FLUSH_CACHE];
+
+int flush_cache(void)
+{
+  memmove(dummy[cache_index],dummy[cache_index]+1, sizeof(dummy[cache_index])-1);
+  cache_index = (cache_index+1)%8;
+  return dummy[cache_index][0];
+}
+#else
+int flush_cache(void) { return 0; }
+#endif
+
+int measure_average_lookup4(int step_log2)
+{
+  int total;
+  double length;
+  int i,j,k=0,q;
+  int step = 1 << step_log2;
+  unsigned __int64 t0,t1;
+  struct { int key,value; } *map=NULL;
+  double rdtsc_scale;
+  begin();
+  t0 = __rdtsc();
+
+  for (i=0; i < 128; ++i) {
+    hmput(map,k,k+1);
+    k += step;
+  }
+  for (j=first_step; j <= last_step; ++j) {
+    total += flush_cache();
+    mstats[0][j].start = __rdtsc();
+    for (q=i=0; i < 50000; ++i) {
+       total += hmget(map, q); // hit
+       if (++q == k) q = 0;
+    }
+    mstats[0][j].end = __rdtsc();
+    mstats[0][j].table_size = k;
+    total += flush_cache();
+    mstats[1][j].start = __rdtsc();
+    for (i=0; i < 50000; ++i) {
+       total += hmget(map, i+k); // miss
+    }
+    mstats[1][j].end = __rdtsc();
+    mstats[1][j].table_size = k;
+
+    // expand table
+    for (i=0; i < (1<<(j>>4)); ++i) {
+      hmput(map, k,k+1);
+      k += step;
+    }
+  }
+
+  t1 = __rdtsc();
+  measure();
+  hmfree(map);
+  length = end();
+  rdtsc_scale = length / (t1-t0) * 1000;
+
+  for (j=first_step; j <= last_step; ++j) {
+    // time,table_size,numins,numhit,nummiss,numperflush
+    printf("%12.4f,%12d,0,50000,0,0\n", (mstats[0][j].end - mstats[0][j].start) * rdtsc_scale, mstats[0][j].table_size);
+  }
+  for (j=first_step; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,0,50000,0\n", (mstats[1][j].end - mstats[1][j].start) * rdtsc_scale, mstats[1][j].table_size);
+  }
+  return total;
+}
+
+int measure_worst_lookup4_a(int step_log2)
+{
+  int total;
+  double length;
+  int i,j,k=0,q,worst_q,n,z,attempts;
+  int step = 1 << step_log2;
+  unsigned __int64 t0,t1;
+  unsigned __int64 m0,m1,worst;
+  struct { int key,value; } *map=NULL;
+  double rdtsc_scale;
+  begin();
+  t0 = __rdtsc();
+
+  memset(mstats, 0, sizeof(mstats));
+  for (j=first_step; j <= last_step; ++j)
+    mstats[0][j].end = mstats[1][j].end = ~(unsigned __int64) 0;
+
+  for(attempts=0; attempts < 2; ++attempts) {
+    k = 0;
+    stbds_rand_seed(0); // force us to get the same table every time
+    for (i=0; i < 128; ++i) {
+      hmput(map,k,k+1);
+      k += step;
+    }
+    for (j=first_step; j <= last_step; ++j) {
+      unsigned __int64 times[32];
+
+      // find the worst hit time
+      for (z=0; z < 2; ++z) { // try the bisectioning measurement 4 times
+        worst = 0;
+        for (n=0; n < 10; ++n) { // test 400 keys total
+          // find the worst time to hit 20 keys
+          q=0;
+          worst_q = 0;
+          total += flush_cache();
+          m0 = __rdtsc();
+          for (i=0; i < 20; ++i) {
+            total += hmget(map, q); // hit
+            if (++q == k) q = 0;
+          }
+          m1 = __rdtsc();
+          // for each n, check if this is the worst lookup we've seen
+          if (m1 - m0 > worst) {
+            worst = m1-m0;
+            worst_q = q - i;
+            if (worst_q < 0) q += k;
+          }
+        }
+        // after 400 keys, take the worst 20 keys, and try each one
+        worst = 0;
+        q = worst_q;
+        for (i=0; i < 20; ++i) {
+          total += flush_cache();
+          m0 = __rdtsc();
+          total += hmget(map, q); // hit
+          m1 = __rdtsc();
+          if (m1 - m0 > worst)
+            worst = m1-m0;
+          if (++q == k) q = 0;
+        }
+        times[z] = worst;
+      }
+      // find the worst time in the bunch
+      worst = 0;
+      for (i=0; i < z; ++i)
+        if (times[i] > worst)
+          worst = times[i];
+      // take the best of 'attempts', to discard outliers
+      if (worst < mstats[0][j].end)
+        mstats[0][j].end = worst;
+      mstats[0][j].start = 0;
+      mstats[0][j].table_size = k >> step_log2;
+
+      // find the worst miss time
+      for (z=0; z < 8; ++z) { // try the bisectioning measurement 8 times
+        worst = 0;
+        for (n=0; n < 20; ++n) { // test 400 keys total
+          // find the worst time to hit 20 keys
+          q=k;
+          worst_q = 0;
+          total += flush_cache();
+          m0 = __rdtsc();
+          for (i=0; i < 20; ++i) {
+            total += hmget(map, q); // hit
+          }
+          m1 = __rdtsc();
+          // for each n, check if this is the worst lookup we've seen
+          if (m1 - m0 > worst) {
+            worst = m1-m0;
+            worst_q = q - i;
+          }
+        }
+        // after 400 keys, take the worst 20 keys, and try each one
+        worst = 0;
+        q = worst_q;
+        for (i=0; i < 20; ++i) {
+          total += flush_cache();
+          m0 = __rdtsc();
+          total += hmget(map, q); // hit
+          m1 = __rdtsc();
+          if (m1 - m0 > worst)
+            worst = m1-m0;
+        }
+        times[z] = worst;
+      }
+      // find the worst time in the bunch
+      worst = 0;
+      for (i=0; i < z; ++i)
+        if (times[i] > worst)
+          worst = times[i];
+      if (worst < mstats[1][j].end)
+        mstats[1][j].end = worst;
+      mstats[1][j].start = 0;
+      mstats[1][j].table_size = k >> step_log2;
+
+      // expand table
+      for (i=0; i < (1<<(j>>4)); ++i) {
+        hmput(map, k,k+1);
+        k += step;
+      }
+    }
+    hmfree(map);
+  }
+
+  t1 = __rdtsc();
+  measure();
+  length = end();
+  rdtsc_scale = length / (t1-t0) * 1000;
+
+  for (j=first_step; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,1,0,1\n", (mstats[0][j].end - mstats[0][j].start) * rdtsc_scale, mstats[0][j].table_size);
+  }
+  for (j=first_step; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,0,1,1\n", (mstats[1][j].end - mstats[1][j].start) * rdtsc_scale, mstats[1][j].table_size);
+  }
+  return total;
+}
+
+int measure_worst_lookup4_b(int step_log2)
+{
+  int total;
+  double length;
+  int i,j,k=0,q,worst_q,n,z,attempts;
+  int step = 1 << step_log2;
+  unsigned __int64 t0,t1;
+  unsigned __int64 m0,m1,worst;
+  struct { int key,value; } *map=NULL;
+  double rdtsc_scale;
+  begin();
+  t0 = __rdtsc();
+
+  memset(mstats, 0, sizeof(mstats));
+  for (j=first_step; j <= last_step; ++j)
+    mstats[0][j].end = mstats[1][j].end = ~(unsigned __int64) 0;
+
+  k = 0;
+  stbds_rand_seed(0); // force us to get the same table every time
+  for (i=0; i < 128; ++i) {
+    hmput(map,k,k+1);
+    k += step;
+  }
+  for (j=first_step; j <= last_step; ++j) {
+    unsigned __int64 times[32];
+
+    // find the worst hit time
+    for (z=0; z < 8; ++z) { // try this 8 times
+      worst = 0;
+      q=0;
+      for (i=0; i < 5000; ++i) {
+        total += hmget(map, q);
+        m0 = __rdtsc();
+        total += hmget(map, q);
+        m1 = __rdtsc();
+        if (m1 - m0 > worst) {
+          worst = m1-m0;
+          worst_q = q - i;
+        }
+        if (++q == k) q = 0;
+      }
+      // now retry with the worst one, but find the shortest time for it
+      worst = ~(unsigned __int64) 0;
+      for (i=0; i < 4; ++i) {
+        total += flush_cache();
+        m0 = __rdtsc();
+        total += hmget(map,worst_q);
+        m1 = __rdtsc();
+        if (m1-m0 < worst)
+          worst = m1-m0;
+      }
+      times[z] = worst;
+    }
+
+    // find the worst of those
+    worst = 0;
+    for (i=0; i < z; ++i)
+      if (times[i] > worst)
+        worst = times[i];
+    mstats[0][j].start = 0;
+    mstats[0][j].end = worst;
+    mstats[0][j].table_size = k;
+
+    // find the worst miss time
+    for (z=0; z < 8; ++z) { // try this 8 times
+      worst = 0;
+      q=k;
+      for (i=0; i < 5000; ++i) {
+        total += hmget(map, q);
+        m0 = __rdtsc();
+        total += hmget(map, q);
+        m1 = __rdtsc();
+        if (m1 - m0 > worst) {
+          worst = m1-m0;
+          worst_q = q - i;
+        }
+        //printf("%6llu ", m1-m0);
+      }
+      // now retry with the worst one, but find the shortest time for it
+      worst = ~(unsigned __int64) 0;
+      for (i=0; i < 4; ++i) {
+        total += flush_cache();
+        m0 = __rdtsc();
+        total += hmget(map,worst_q);
+        m1 = __rdtsc();
+        if (m1-m0 < worst)
+          worst = m1-m0;
+      }
+      times[z] = worst;
+    }
+
+    // find the worst of those
+    worst = 0;
+    for (i=0; i < z; ++i)
+      if (times[i] > worst)
+        worst = times[i];
+    mstats[1][j].start = 0;
+    mstats[1][j].end = worst;
+    mstats[1][j].table_size = k;
+
+    // expand table
+    for (i=0; i < (1<<(j>>4)); ++i) {
+      hmput(map, k,k+1);
+      k += step;
+    }
+  }
+  hmfree(map);
+
+  t1 = __rdtsc();
+  measure();
+  length = end();
+  rdtsc_scale = length / (t1-t0) * 1000;
+
+  for (j=first_step+1; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,1,0,1\n", (mstats[0][j].end - mstats[0][j].start) * rdtsc_scale, mstats[0][j].table_size);
+  }
+  for (j=first_step+1; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,0,1,1\n", (mstats[1][j].end - mstats[1][j].start) * rdtsc_scale, mstats[1][j].table_size);
+  }
+  return total;
+}
+
+int measure_uncached_lookup4(int step_log2)
+{
+  int total;
+  double length;
+  int i,j,k=0,q;
+  int step = 1 << step_log2;
+  unsigned __int64 t0,t1;
+  struct { int key,value; } *map=NULL;
+  double rdtsc_scale;
+  begin();
+  t0 = __rdtsc();
+
+  map = NULL;
+  for (i=0; i < 128; ++i) {
+    hmput(map,k,k+1);
+    k += step;
+  }
+  for (j=first_step; j <= last_step; ++j) {
+    mstats[0][j].start = __rdtsc();
+    mstats[0][j].end = 0;
+    for (q=i=0; i < 512; ++i) {
+      if ((i & 3) == 0) {
+        mstats[0][j].end += __rdtsc();
+        total += flush_cache();
+        mstats[0][j].start += __rdtsc();
+      }
+      total += hmget(map, q); // hit
+      if (++q == k) q = 0;
+    }
+    mstats[0][j].end += __rdtsc();
+    mstats[0][j].table_size = k;
+    total += flush_cache();
+    mstats[1][j].end = 0;
+    mstats[1][j].start = __rdtsc();
+    for (i=0; i < 512; ++i) {
+      if ((i & 3) == 0) {
+        mstats[1][j].end += __rdtsc();
+        total += flush_cache();
+        mstats[1][j].start += __rdtsc();
+      }
+      total += hmget(map, i+k); // miss
+    }
+    mstats[1][j].end += __rdtsc();
+    mstats[1][j].table_size = k;
+
+    // expand table
+    for (i=0; i < (1<<(j>>4)); ++i) {
+      hmput(map, k,k+1);
+      k += step;
+    }
+  }
+  hmfree(map);
+
+  t1 = __rdtsc();
+  measure();
+  length = end();
+  rdtsc_scale = length / (t1-t0) * 1000;
+
+  for (j=first_step; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,512,0,4\n", (mstats[0][j].end - mstats[0][j].start) * rdtsc_scale, mstats[0][j].table_size);
+  }
+  for (j=first_step; j <= last_step; ++j) {
+    printf("%12.4f,%12d,0,0,512,4\n", (mstats[1][j].end - mstats[1][j].start) * rdtsc_scale, mstats[1][j].table_size);
+  }
+  return total;
+}
+
+
+
+
+int main(int arg, char **argv)
+{
+  int n,s,w;
+  double worst = 0;
+
+  printf("# size_t=%d,", (int) sizeof(size_t));
+
+// number of cache-lines
+#ifdef STBDS_SMALL_BUCKET
+  printf("cacheline=%d,", 1);
+#else
+  printf("cacheline=%d,", sizeof(size_t)==8 ? 2 : 1);
+#endif
+#ifdef STBDS_FLUSH_CACHE
+  printf("%d,", (int) stbds_log2(STBDS_FLUSH_CACHE));
+#else
+  printf("0,");
+#endif
+#ifdef STBDS_BUCKET_START    // don't bother offseting differently within bucket for different hash values
+  printf("STBDS_BUCKET_START,");
+#else
+  printf(",");
+#endif
+#ifdef STBDS_SIPHASH_2_4
+  printf("STBDS_SIPHASH_2_4,");
+#else
+  printf(",");
+#endif
+  printf("\n");
+
+  measure_worst_lookup4_b(0);
+  //measure_worst_lookup4_a(0);
+  measure_average_lookup4(0);
+  measure_uncached_lookup4(0);
+  measure_build4(0);
+  return 0;
+
+#if 0
+  begin(); for (n=0; n < 2000; ++n) { build_stb(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table\n", timer);
+  begin(); for (n=0; n <  500; ++n) { build_stb(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table\n", timer);
+  begin(); for (n=0; n <  100; ++n) { build_stb(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table\n", timer);
+  begin(); for (n=0; n <   10; ++n) { build_stb(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table\n", timer);
+  begin(); for (n=0; n <    5; ++n) { build_stb(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table\n", timer);
+#endif
+
+#if 0
+  begin(); for (n=0; n < 2000; ++n) { churn32(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { churn32(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { churn32(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { churn32(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn32(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 32-byte key\n", timer);
+
+  begin(); for (n=0; n < 2000; ++n) { churn256(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { churn256(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { churn256(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { churn256(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn256(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 256-byte key\n", timer);
+#endif
+
+  begin(); for (n=0; n <   20; ++n) { multisearch_stb(2000,0,2000,1,1000);    } end(); printf("  // %7.2fms : 2,000,000 hits on 1,000   2K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { multisearch_stb(20000,0,2000,1,1000);   } end(); printf("  // %7.2fms : 2,000,000 hits on 1,000  20K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    6; ++n) { multisearch_stb(200000,0,2000,1,1000);  } end(); printf("  // %7.2fms : 2,000,000 hits on 1,000 200K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multisearch_stb(2000000,0,20000,1,100); } end(); printf("  // %7.2fms : 2,000,000 hits on   100   2M table w/ 4-byte key\n", timer);
+
+  begin(); for (n=0; n <   20; ++n) { multisearch    (2000,0,2000,1,1000);    } end(); printf("  // %7.2fms : 2,000,000 hits on 1,000   2K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { multisearch    (20000,0,2000,1,1000);   } end(); printf("  // %7.2fms : 2,000,000 hits on 1,000  20K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    6; ++n) { multisearch    (200000,0,2000,1,1000);  } end(); printf("  // %7.2fms : 2,000,000 hits on 1,000 200K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multisearch    (2000000,0,20000,1,100); } end(); printf("  // %7.2fms : 2,000,000 hits on   100   2M table w/ 4-byte key\n", timer);
+
+
+#if 1
+  begin(); for (n=0; n <    2; ++n) { multibuild_stb(2000,0,0,1,10000); } end(); printf("  // %7.2fms : 20,000,000 inserts creating 10,000   2K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multibuild_stb(20000,0,0,1,1000); } end(); printf("  // %7.2fms : 20,000,000 inserts creating  1,000  20K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multibuild_stb(200000,0,0,1,100); } end(); printf("  // %7.2fms : 20,000,000 inserts creating    100 200K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multibuild_stb(2000000,0,0,1,10); } end(); printf("  // %7.2fms : 20,000,000 inserts creating     10   2M table w/ 4-byte key\n", timer);
+
+  begin(); for (n=0; n <    2; ++n) { multichurn4(2000,0,0,1,10000); } end(); printf("  // %7.2fms : 20,000,000 inserts creating 10,000   2K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multichurn4(20000,0,0,1,1000); } end(); printf("  // %7.2fms : 20,000,000 inserts creating  1,000  20K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multichurn4(200000,0,0,1,100); } end(); printf("  // %7.2fms : 20,000,000 inserts creating    100 200K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    2; ++n) { multichurn4(2000000,0,0,1,10); } end(); printf("  // %7.2fms : 20,000,000 inserts creating     10   2M table w/ 4-byte key\n", timer);
+#endif
+
+  begin(); for (n=0; n < 2000; ++n) { build(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { build(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { build(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { build(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { build(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 4-byte key\n", timer);
+
+  begin(); for (n=0; n < 2000; ++n) { churn8(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { churn8(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { churn8(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { churn8(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn8(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 8-byte key\n", timer);
+
+  begin(); for (n=0; n <   60; ++n) { churn_skip(2000,2100,5000);            } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 2K table\n", timer);
+  begin(); for (n=0; n <   30; ++n) { churn_skip(20000,21000,500);           } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 20K table\n", timer);
+  begin(); for (n=0; n <   15; ++n) { churn_skip(200000,201000,500);         } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 200K table\n", timer);
+  begin(); for (n=0; n <    8; ++n) { churn_skip(2000000,2001000,500);       } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 2M table\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn_skip(20000000,20001000,500);     } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 20M table\n", timer);
+  begin(); for (n=0; n <    1; ++n) { churn_skip(200000000u,200001000u,500); } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 200M table\n", timer);
+  // even though the above measures a roughly fixed amount of work, we still have to build the table n times, hence the fewer measurements each time
+
+  begin(); for (n=0; n <   60; ++n) { churn_skip(1000,3000,250);             } end(); printf("  // %7.2fms :    500,000 inserts & deletes in 2K table\n", timer);
+  begin(); for (n=0; n <   15; ++n) { churn_skip(10000,30000,25);            } end(); printf("  // %7.2fms :    500,000 inserts & deletes in 20K table\n", timer);
+  begin(); for (n=0; n <    7; ++n) { churn_skip(100000,300000,10);          } end(); printf("  // %7.2fms :  2,000,000 inserts & deletes in 200K table\n", timer);
+  begin(); for (n=0; n <    2; ++n) { churn_skip(1000000,3000000,10);        } end(); printf("  // %7.2fms : 20,000,000 inserts & deletes in 2M table\n", timer);
+
+  // search for bad intervals.. in practice this just seems to measure execution variance
+  for (s = 2; s < 64; ++s) {
+    begin(); for (n=0; n < 50; ++n) { build(200000,0,0,s); } end();
+    if (timer > worst) {
+      worst = timer;
+      w = s;
+    }
+  }
+  for (; s <= 1024; s *= 2) {
+    begin(); for (n=0; n < 50; ++n) { build(200000,0,0,s); } end();
+    if (timer > worst) {
+      worst = timer;
+      w = s;
+    }
+  }
+  printf("  // %7.2fms(%d)   : Worst time from inserting 200,000 items with spacing %d.\n", worst, w, w);
+
+  return 0;
+}
+#endif
diff --git a/vendor/stb/tests/test_ds_cpp.cpp b/vendor/stb/tests/test_ds_cpp.cpp
new file mode 100644
index 0000000..fce99df
--- /dev/null
+++ b/vendor/stb/tests/test_ds_cpp.cpp
@@ -0,0 +1,418 @@
+#include <stdio.h>
+
+#ifdef DS_TEST
+#define STBDS_UNIT_TESTS
+#endif
+
+#ifdef DS_STATS
+#define STBDS_STATISTICS
+#endif
+
+#ifndef DS_PERF
+#define STBDS_ASSERT assert
+#include <assert.h>
+#endif
+
+//#define STBDS_SIPHASH_2_4
+#define STB_DS_IMPLEMENTATION
+#include "../stb_ds.h"
+
+size_t churn_inserts, churn_deletes;
+
+void churn(int a, int b, int count)
+{
+  struct { int key,value; } *map=NULL;
+  int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i,i+1);
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      assert(k != 0);
+    }
+    assert(hmlen(map) == a);
+  }
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+}
+
+#ifdef DS_TEST
+#include <stdio.h>
+int main(int argc, char **argv)
+{
+  stbds_unit_tests();
+  churn(0,100,1);
+  churn(3,7,50000);
+  churn(3,15,50000);
+  churn(16, 48, 25000);
+  churn(10, 15, 25000);
+  churn(200,500, 5000);
+  churn(2000,5000, 500);
+  churn(20000,50000, 50);
+  printf("Ok!");
+  return 0;
+}
+#endif
+
+#ifdef DS_STATS
+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+size_t max_hit_probes, max_miss_probes, total_put_probes, total_miss_probes, churn_misses;
+void churn_stats(int a, int b, int count)
+{
+  struct { int key,value; } *map=NULL;
+  int i,j,n,k;
+  churn_misses = 0;
+  for (i=0; i < a; ++i) {
+    hmput(map,i,i+1);
+    max_hit_probes = MAX(max_hit_probes, stbds_hash_probes);
+    total_put_probes += stbds_hash_probes;
+    stbds_hash_probes = 0;
+  }
+    
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+      max_hit_probes = MAX(max_hit_probes, stbds_hash_probes);
+      total_put_probes += stbds_hash_probes;
+      stbds_hash_probes = 0;
+    }
+    for (j=0; j < (b-a)*10; ++j) {
+      k=i+j;
+      (void) hmgeti(map,k); // miss
+      max_miss_probes = MAX(max_miss_probes, stbds_hash_probes);
+      total_miss_probes += stbds_hash_probes;
+      stbds_hash_probes = 0;
+      ++churn_misses;
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      stbds_hash_probes = 0;
+      assert(k);
+    }
+    assert(hmlen(map) == a);
+  }
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+}
+
+void reset_stats(void)
+{
+  stbds_array_grow=0, 
+  stbds_hash_grow=0;
+  stbds_hash_shrink=0;
+  stbds_hash_rebuild=0;
+  stbds_hash_probes=0;
+  stbds_hash_alloc=0;
+  stbds_rehash_probes=0;
+  stbds_rehash_items=0;
+  max_hit_probes = 0;
+  max_miss_probes = 0;
+  total_put_probes = 0;
+  total_miss_probes = 0;
+}
+
+void print_churn_probe_stats(char *str)
+{
+  printf("Probes: %3d max hit, %3d max miss, %4.2f avg hit, %4.2f avg miss: %s\n",
+    (int) max_hit_probes, (int) max_miss_probes, (float) total_put_probes / churn_inserts, (float) total_miss_probes / churn_misses, str);
+  reset_stats();
+}
+
+int main(int arg, char **argv)
+{
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(0,500000,1); print_churn_probe_stats("Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  churn_stats(49000,50000,500); print_churn_probe_stats("Deleting/Inserting 500000 items");
+  return 0;
+}
+#endif
+
+#ifdef DS_PERF
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#define STB_DEFINE
+#define STB_NO_REGISTRY
+//#include "../stb.h"
+
+
+size_t t0, sum, mn,mx,count;
+void begin(void)
+{
+  size_t t0;
+  LARGE_INTEGER m;
+  QueryPerformanceCounter(&m);
+  t0 = m.QuadPart;
+  sum = 0;
+  count = 0;
+  mx = 0;
+  mn = ~(size_t) 0;
+}
+
+void measure(void)
+{
+  size_t t1, t;
+  LARGE_INTEGER m;
+  QueryPerformanceCounter(&m);
+  t1 = m.QuadPart;
+  t = t1-t0;
+  if (t1 < t0)
+    printf("ALERT: QueryPerformanceCounter was unordered!\n");
+  if (t < mn) mn = t;
+  if (t > mx) mx = t;
+  sum += t;
+  ++count;
+  t0 = t1;
+}
+
+void dont_measure(void)
+{
+  size_t t1, t;
+  LARGE_INTEGER m;
+  QueryPerformanceCounter(&m);
+  t0 = m.QuadPart;
+}
+
+double timer;
+void end(void)
+{
+  LARGE_INTEGER m;
+  QueryPerformanceFrequency(&m);
+
+  if (count > 3) {
+    // discard the highest and lowest
+    sum -= mn;
+    sum -= mx;
+    count -= 2;
+  }
+  timer = (double) (sum) / count / m.QuadPart * 1000;
+}
+
+void build(int a, int b, int count, int step)
+{
+  struct { int key,value; } *map=NULL;
+  int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i*step,i+1);
+  measure();
+  churn_inserts = i;
+  hmfree(map);
+  dont_measure();
+}
+
+#ifdef STB__INCLUDE_STB_H
+void build_stb(int a, int b, int count, int step)
+{
+  stb_idict *d = stb_idict_new_size(8);
+  struct { int key,value; } *map=NULL;
+  int i,j,n,k;
+  for (i=0; i < a; ++i)
+    stb_idict_add(d, i*step, i+1);
+  measure();
+  churn_inserts = i;
+  stb_idict_destroy(d);
+  dont_measure();
+}
+#endif
+
+void churn_skip(unsigned int a, unsigned int b, int count)
+{
+  struct { unsigned int key,value; } *map=NULL;
+  unsigned int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i,i+1);
+  dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      assert(k != 0);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  hmfree(map);
+  dont_measure();
+}
+
+typedef struct { int n[8]; } str32;
+void churn32(int a, int b, int count, int include_startup)
+{
+  struct { str32 key; int value; } *map=NULL;
+  int i,j,n;
+  str32 key = { 0 };
+  for (i=0; i < a; ++i) {
+    key.n[0] = i;
+    hmput(map,key,i+1);
+  }
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      key.n[0] = i;
+      hmput(map,key,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      key.n[0] = i-j-1;
+      hmdel(map,key);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  dont_measure();
+}
+
+typedef struct { int n[32]; } str256;
+void churn256(int a, int b, int count, int include_startup)
+{
+  struct { str256 key; int value; } *map=NULL;
+  int i,j,n;
+  str256 key = { 0 };
+  for (i=0; i < a; ++i) {
+    key.n[0] = i;
+    hmput(map,key,i+1);
+  }
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      key.n[0] = i;
+      hmput(map,key,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      key.n[0] = i-j-1;
+      hmdel(map,key);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  dont_measure();
+}
+
+void churn8(int a, int b, int count, int include_startup)
+{
+  struct { size_t key,value; } *map=NULL;
+  int i,j,n,k;
+  for (i=0; i < a; ++i)
+    hmput(map,i,i+1);
+  if (!include_startup)
+    dont_measure();
+  for (n=0; n < count; ++n) {
+    for (j=a; j < b; ++j,++i) {
+      hmput(map,i,i+1);
+    }
+    assert(hmlen(map) == b);
+    for (j=a; j < b; ++j) {
+      k=i-j-1;
+      k = hmdel(map,k);
+      assert(k != 0);
+    }
+    assert(hmlen(map) == a);
+  }
+  measure();
+  hmfree(map);
+  churn_inserts = i;
+  churn_deletes = (b-a) * n;
+  dont_measure();
+}
+
+
+int main(int arg, char **argv)
+{
+  int n,s,w;
+  double worst = 0;
+
+#if 0
+  begin(); for (n=0; n < 2000; ++n) { build_stb(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table\n", timer);
+  begin(); for (n=0; n <  500; ++n) { build_stb(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table\n", timer);
+  begin(); for (n=0; n <  100; ++n) { build_stb(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table\n", timer);
+  begin(); for (n=0; n <   10; ++n) { build_stb(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table\n", timer);
+  begin(); for (n=0; n <    5; ++n) { build_stb(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table\n", timer);
+#endif
+
+  begin(); for (n=0; n < 2000; ++n) { churn8(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { churn8(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { churn8(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { churn8(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 8-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn8(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 8-byte key\n", timer);
+
+#if 0
+  begin(); for (n=0; n < 2000; ++n) { churn32(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { churn32(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { churn32(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { churn32(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 32-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn32(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 32-byte key\n", timer);
+
+  begin(); for (n=0; n < 2000; ++n) { churn256(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { churn256(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { churn256(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { churn256(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 256-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn256(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 256-byte key\n", timer);
+#endif
+
+  begin(); for (n=0; n < 2000; ++n) { build(2000,0,0,1);          } end(); printf("  // %7.2fms :      2,000 inserts creating 2K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <  500; ++n) { build(20000,0,0,1);         } end(); printf("  // %7.2fms :     20,000 inserts creating 20K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <  100; ++n) { build(200000,0,0,1);        } end(); printf("  // %7.2fms :    200,000 inserts creating 200K table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <   10; ++n) { build(2000000,0,0,1);       } end(); printf("  // %7.2fms :  2,000,000 inserts creating 2M table w/ 4-byte key\n", timer);
+  begin(); for (n=0; n <    5; ++n) { build(20000000,0,0,1);      } end(); printf("  // %7.2fms : 20,000,000 inserts creating 20M table w/ 4-byte key\n", timer);
+
+  begin(); for (n=0; n <   60; ++n) { churn_skip(2000,2100,5000);            } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 2K table\n", timer);
+  begin(); for (n=0; n <   30; ++n) { churn_skip(20000,21000,500);           } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 20K table\n", timer);
+  begin(); for (n=0; n <   15; ++n) { churn_skip(200000,201000,500);         } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 200K table\n", timer);
+  begin(); for (n=0; n <    8; ++n) { churn_skip(2000000,2001000,500);       } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 2M table\n", timer);
+  begin(); for (n=0; n <    5; ++n) { churn_skip(20000000,20001000,500);     } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 20M table\n", timer);
+  begin(); for (n=0; n <    1; ++n) { churn_skip(200000000u,200001000u,500); } end(); printf("  // %7.2fms : 500,000 inserts & deletes in 200M table\n", timer);
+  // even though the above measures a roughly fixed amount of work, we still have to build the table n times, hence the fewer measurements each time
+
+  begin(); for (n=0; n <   60; ++n) { churn_skip(1000,3000,250);             } end(); printf("  // %7.2fms :    500,000 inserts & deletes in 2K table\n", timer);
+  begin(); for (n=0; n <   15; ++n) { churn_skip(10000,30000,25);            } end(); printf("  // %7.2fms :    500,000 inserts & deletes in 20K table\n", timer);
+  begin(); for (n=0; n <    7; ++n) { churn_skip(100000,300000,10);          } end(); printf("  // %7.2fms :  2,000,000 inserts & deletes in 200K table\n", timer);
+  begin(); for (n=0; n <    2; ++n) { churn_skip(1000000,3000000,10);        } end(); printf("  // %7.2fms : 20,000,000 inserts & deletes in 2M table\n", timer);
+
+  // search for bad intervals.. in practice this just seems to measure execution variance
+  for (s = 2; s < 64; ++s) {
+    begin(); for (n=0; n < 50; ++n) { build(200000,0,0,s); } end();
+    if (timer > worst) {
+      worst = timer;
+      w = s;
+    }
+  }
+  for (; s <= 1024; s *= 2) {
+    begin(); for (n=0; n < 50; ++n) { build(200000,0,0,s); } end();
+    if (timer > worst) {
+      worst = timer;
+      w = s;
+    }
+  }
+  printf("  // %7.2fms(%d)   : Worst time from inserting 200,000 items with spacing %d.\n", worst, w, w);
+
+  return 0;
+}
+#endif
diff --git a/vendor/stb/tests/test_dxt.c b/vendor/stb/tests/test_dxt.c
new file mode 100644
index 0000000..a1ef29d
--- /dev/null
+++ b/vendor/stb/tests/test_dxt.c
@@ -0,0 +1 @@
+#include "stb_dxt.h"
diff --git a/vendor/stb/tests/test_easyfont.c b/vendor/stb/tests/test_easyfont.c
new file mode 100644
index 0000000..25d54ff
--- /dev/null
+++ b/vendor/stb/tests/test_easyfont.c
@@ -0,0 +1,10 @@
+#include "stb_easy_font.h"
+
+void ef_dummy(void)
+{
+   // suppress unsused-function warning
+   stb_easy_font_spacing(0);
+   stb_easy_font_print(0,0,0,0,0,0);
+   stb_easy_font_width(0);
+   stb_easy_font_height(0);
+}
\ No newline at end of file
diff --git a/vendor/stb/tests/test_image.c b/vendor/stb/tests/test_image.c
new file mode 100644
index 0000000..de2c8b0
--- /dev/null
+++ b/vendor/stb/tests/test_image.c
@@ -0,0 +1,7 @@
+#ifdef __clang__
+#define STBIDEF static inline
+#endif
+
+#define STB_IMAGE_STATIC
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
\ No newline at end of file
diff --git a/vendor/stb/tests/test_image_write.c b/vendor/stb/tests/test_image_write.c
new file mode 100644
index 0000000..8c4c35c
--- /dev/null
+++ b/vendor/stb/tests/test_image_write.c
@@ -0,0 +1,7 @@
+#ifdef __clang__
+#define STBIWDEF static inline
+#endif
+
+#define STB_IMAGE_WRITE_IMPLEMENTATION
+#define STB_IMAGE_WRITE_STATIC
+#include "stb_image_write.h"
\ No newline at end of file
diff --git a/vendor/stb/tests/test_perlin.c b/vendor/stb/tests/test_perlin.c
new file mode 100644
index 0000000..de702c9
--- /dev/null
+++ b/vendor/stb/tests/test_perlin.c
@@ -0,0 +1 @@
+#include "stb_perlin.h"
\ No newline at end of file
diff --git a/vendor/stb/tests/test_png_paeth.c b/vendor/stb/tests/test_png_paeth.c
new file mode 100644
index 0000000..69ba37f
--- /dev/null
+++ b/vendor/stb/tests/test_png_paeth.c
@@ -0,0 +1,47 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+// Reference Paeth filter as per PNG spec
+static int ref_paeth(int a, int b, int c)
+{
+   int p = a + b - c;
+   int pa = abs(p-a);
+   int pb = abs(p-b);
+   int pc = abs(p-c);
+   if (pa <= pb && pa <= pc) return a;
+   if (pb <= pc) return b;
+   return c;
+}
+
+// Optimized Paeth filter
+static int opt_paeth(int a, int b, int c)
+{
+   int thresh = c*3 - (a + b);
+   int lo = a < b ? a : b;
+   int hi = a < b ? b : a;
+   int t0 = (hi <= thresh) ? lo : c;
+   int t1 = (thresh <= lo) ? hi : t0;
+   return t1;
+}
+
+int main()
+{
+   // Exhaustively test the functions match for all byte inputs a, b,c in [0,255]
+   for (int i = 0; i < (1 << 24); ++i) {
+      int a = i & 0xff;
+      int b = (i >> 8) & 0xff;
+      int c = (i >> 16) & 0xff;
+
+      int ref = ref_paeth(a, b, c);
+      int opt = opt_paeth(a, b, c);
+      if (ref != opt) {
+         fprintf(stderr, "mismatch at a=%3d b=%3d c=%3d: ref=%3d opt=%3d\n", a, b, c, ref, opt);
+         return 1;
+      }
+   }
+
+   printf("all ok!\n");
+   return 0;
+}
+
+// vim:sw=3:sts=3:et
diff --git a/vendor/stb/tests/test_png_regress.c b/vendor/stb/tests/test_png_regress.c
new file mode 100644
index 0000000..5ba6796
--- /dev/null
+++ b/vendor/stb/tests/test_png_regress.c
@@ -0,0 +1,75 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#define STBI_WINDOWS_UTF8
+
+#ifdef _WIN32
+#define WIN32 // what stb.h checks
+#pragma comment(lib, "advapi32.lib")
+#endif
+
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+#define STB_DEFINE
+#include "deprecated/stb.h"
+
+static unsigned int fnv1a_hash32(const stbi_uc *bytes, size_t len)
+{
+   unsigned int hash = 0x811c9dc5;
+   unsigned int mul = 0x01000193;
+   size_t i;
+
+   for (i = 0; i < len; ++i)
+      hash = (hash ^ bytes[i]) * mul;
+
+   return hash;
+}
+
+// The idea for this test is to leave pngsuite/ref_results.csv checked in,
+// and then you can run this test after making PNG loader changes. If the
+// ref results change (as per git diff), confirm that the change was
+// intentional. If so, commit them as well; if not, undo.
+int main()
+{
+   char **files;
+   FILE *csv_file;
+   int i;
+
+   files = stb_readdir_recursive("pngsuite", "*.png");
+   if (!files) {
+      fprintf(stderr, "pngsuite files not found!\n");
+      return 1;
+   }
+
+   // sort files by name
+   qsort(files, stb_arr_len(files), sizeof(char*), stb_qsort_strcmp(0));
+
+   csv_file = fopen("pngsuite/ref_results.csv", "w");
+   if (!csv_file) {
+      fprintf(stderr, "error opening ref results for writing!\n");
+      stb_readdir_free(files);
+      return 1;
+   }
+
+   fprintf(csv_file, "filename,width,height,ncomp,error,hash\n");
+   for (i = 0; i < stb_arr_len(files); ++i) {
+      char *filename = files[i];
+      int width, height, ncomp;
+      stbi_uc *pixels = stbi_load(filename, &width, &height, &ncomp, 0);
+      const char *error = "";
+      unsigned int hash = 0;
+
+      if (!pixels)
+         error = stbi_failure_reason();
+      else {
+         hash = fnv1a_hash32(pixels, width * height * ncomp);
+         stbi_image_free(pixels);
+      }
+
+      fprintf(csv_file, "%s,%d,%d,%d,%s,0x%08x\n", filename, width, height, ncomp, error, hash);
+   }
+
+   fclose(csv_file);
+   stb_readdir_free(files);
+}
diff --git a/vendor/stb/tests/test_siphash.c b/vendor/stb/tests/test_siphash.c
new file mode 100644
index 0000000..643f549
--- /dev/null
+++ b/vendor/stb/tests/test_siphash.c
@@ -0,0 +1,21 @@
+#include <stdio.h>
+
+#define STB_DS_IMPLEMENTATION
+#define STBDS_SIPHASH_2_4
+#define STBDS_TEST_SIPHASH_2_4
+#include "../stb_ds.h"
+
+int main(int argc, char **argv)
+{
+  unsigned char mem[64];
+  int i,j;
+  for (i=0; i < 64; ++i) mem[i] = i;
+  for (i=0; i < 64; ++i) {
+    size_t hash = stbds_hash_bytes(mem, i, 0);
+    printf("  { ");
+    for (j=0; j < 8; ++j)
+      printf("0x%02x, ", (unsigned char) ((hash >> (j*8)) & 255));
+    printf(" },\n");
+  }
+  return 0;
+}
\ No newline at end of file
diff --git a/vendor/stb/tests/test_sprintf.c b/vendor/stb/tests/test_sprintf.c
new file mode 100644
index 0000000..60ef40b
--- /dev/null
+++ b/vendor/stb/tests/test_sprintf.c
@@ -0,0 +1,202 @@
+
+#define USE_STB 1
+
+#if USE_STB
+# include "stb_sprintf.h"
+# define SPRINTF stbsp_sprintf
+# define SNPRINTF stbsp_snprintf
+#else
+# include <locale.h>
+# define SPRINTF sprintf
+# define SNPRINTF snprintf
+#endif
+
+#include <assert.h>
+#include <math.h>    // for INFINITY, NAN
+#include <stddef.h>  // for ptrdiff_t
+#include <stdio.h>   // for printf
+#include <string.h>  // for strcmp, strncmp, strlen
+
+#if _MSC_VER && _MSC_VER <= 1600 
+typedef int intmax_t;
+typedef ptrdiff_t ssize_t;
+#else
+#include <stdint.h>  // for intmax_t, ssize_t
+#endif
+
+// stbsp_sprintf
+#define CHECK_END(str) \
+   if (strcmp(buf, str) != 0 || (unsigned) ret != strlen(str)) { \
+      printf("< '%s'\n> '%s'\n", str, buf); \
+      assert(!"Fail"); \
+   }
+
+#define CHECK9(str, v1, v2, v3, v4, v5, v6, v7, v8, v9) { int ret = SPRINTF(buf, v1, v2, v3, v4, v5, v6, v7, v8, v9); CHECK_END(str); }
+#define CHECK8(str, v1, v2, v3, v4, v5, v6, v7, v8    ) { int ret = SPRINTF(buf, v1, v2, v3, v4, v5, v6, v7, v8    ); CHECK_END(str); }
+#define CHECK7(str, v1, v2, v3, v4, v5, v6, v7        ) { int ret = SPRINTF(buf, v1, v2, v3, v4, v5, v6, v7        ); CHECK_END(str); }
+#define CHECK6(str, v1, v2, v3, v4, v5, v6            ) { int ret = SPRINTF(buf, v1, v2, v3, v4, v5, v6            ); CHECK_END(str); }
+#define CHECK5(str, v1, v2, v3, v4, v5                ) { int ret = SPRINTF(buf, v1, v2, v3, v4, v5                ); CHECK_END(str); }
+#define CHECK4(str, v1, v2, v3, v4                    ) { int ret = SPRINTF(buf, v1, v2, v3, v4                    ); CHECK_END(str); }
+#define CHECK3(str, v1, v2, v3                        ) { int ret = SPRINTF(buf, v1, v2, v3                        ); CHECK_END(str); }
+#define CHECK2(str, v1, v2                            ) { int ret = SPRINTF(buf, v1, v2                            ); CHECK_END(str); }
+#define CHECK1(str, v1                                ) { int ret = SPRINTF(buf, v1                                ); CHECK_END(str); }
+
+#ifdef TEST_SPRINTF
+int main()
+{
+   char buf[1024];
+   int n = 0;
+   const double pow_2_75 = 37778931862957161709568.0;
+   const double pow_2_85 = 38685626227668133590597632.0;
+
+   // integers
+   CHECK4("a b     1", "%c %s     %d", 'a', "b", 1);
+   CHECK2("abc     ", "%-8.3s", "abcdefgh");
+   CHECK2("+5", "%+2d", 5);
+   CHECK2("  6", "% 3i", 6);
+   CHECK2("-7  ", "%-4d", -7);
+   CHECK2("+0", "%+d", 0);
+   CHECK3("     00003:     00004", "%10.5d:%10.5d", 3, 4);
+   CHECK2("-100006789", "%d", -100006789);
+   CHECK3("20 0020", "%u %04u", 20u, 20u);
+   CHECK4("12 1e 3C", "%o %x %X", 10u, 30u, 60u);
+   CHECK4(" 12 1e 3C ", "%3o %2x %-3X", 10u, 30u, 60u);
+   CHECK4("012 0x1e 0X3C", "%#o %#x %#X", 10u, 30u, 60u);
+   CHECK2("", "%.0x", 0);
+#if USE_STB
+   CHECK2("0", "%.0d", 0);  // stb_sprintf gives "0"
+#else
+   CHECK2("",  "%.0d", 0);  // glibc gives "" as specified by C99(?)
+#endif
+   CHECK3("33 555", "%hi %ld", (short)33, 555l);
+#if !defined(_MSC_VER) || _MSC_VER >= 1600
+   CHECK2("9888777666", "%llu", 9888777666llu);
+#endif
+   CHECK4("-1 2 -3", "%ji %zi %ti", (intmax_t)-1, (ssize_t)2, (ptrdiff_t)-3);
+
+   // floating-point numbers
+   CHECK2("-3.000000", "%f", -3.0);
+   CHECK2("-8.8888888800", "%.10f", -8.88888888);
+   CHECK2("880.0888888800", "%.10f", 880.08888888);
+   CHECK2("4.1", "%.1f", 4.1);
+   CHECK2(" 0", "% .0f", 0.1);
+   CHECK2("0.00", "%.2f", 1e-4);
+   CHECK2("-5.20", "%+4.2f", -5.2);
+   CHECK2("0.0       ", "%-10.1f", 0.);
+   CHECK2("-0.000000", "%f", -0.);
+   CHECK2("0.000001", "%f", 9.09834e-07);
+#if USE_STB  // rounding differences
+   CHECK2("38685626227668133600000000.0", "%.1f", pow_2_85);
+   CHECK2("0.000000499999999999999978", "%.24f", 5e-7);
+#else
+   CHECK2("38685626227668133590597632.0", "%.1f", pow_2_85); // exact
+   CHECK2("0.000000499999999999999977", "%.24f", 5e-7);
+#endif
+   CHECK2("0.000000000000000020000000", "%.24f", 2e-17);
+   CHECK3("0.0000000100 100000000", "%.10f %.0f", 1e-8, 1e+8);
+   CHECK2("100056789.0", "%.1f", 100056789.0);
+   CHECK4(" 1.23 %", "%*.*f %%", 5, 2, 1.23);
+   CHECK2("-3.000000e+00", "%e", -3.0);
+   CHECK2("4.1E+00", "%.1E", 4.1);
+   CHECK2("-5.20e+00", "%+4.2e", -5.2);
+   CHECK3("+0.3 -3", "%+g %+g", 0.3, -3.0);
+   CHECK2("4", "%.1G", 4.1);
+   CHECK2("-5.2", "%+4.2g", -5.2);
+   CHECK2("3e-300", "%g", 3e-300);
+   CHECK2("1", "%.0g", 1.2);
+   CHECK3(" 3.7 3.71", "% .3g %.3g", 3.704, 3.706);
+   CHECK3("2e-315:1e+308", "%g:%g", 2e-315, 1e+308);
+
+#if __STDC_VERSION__ >= 199901L
+#if USE_STB
+   CHECK4("Inf Inf NaN", "%g %G %f", INFINITY, INFINITY, NAN);
+   CHECK2("N", "%.1g", NAN);
+#else
+   CHECK4("inf INF nan", "%g %G %f", INFINITY, INFINITY, NAN);
+   CHECK2("nan", "%.1g", NAN);
+#endif
+#endif
+
+   // %n
+   CHECK3("aaa ", "%.3s %n", "aaaaaaaaaaaaa", &n);
+   assert(n == 4);
+
+#if __STDC_VERSION__ >= 199901L
+   // hex floats
+   CHECK2("0x1.fedcbap+98", "%a", 0x1.fedcbap+98);
+   CHECK2("0x1.999999999999a0p-4", "%.14a", 0.1);
+   CHECK2("0x1.0p-1022", "%.1a", 0x1.ffp-1023);
+#if USE_STB  // difference in default precision and x vs X for %A
+   CHECK2("0x1.009117p-1022", "%a", 2.23e-308);
+   CHECK2("-0x1.AB0P-5", "%.3A", -0x1.abp-5);
+#else
+   CHECK2("0x1.0091177587f83p-1022", "%a", 2.23e-308);
+   CHECK2("-0X1.AB0P-5", "%.3A", -0X1.abp-5);
+#endif
+#endif
+
+   // %p
+#if USE_STB
+   CHECK2("0000000000000000", "%p", (void*) NULL);
+#else
+   CHECK2("(nil)", "%p", (void*) NULL);
+#endif
+
+   // snprintf
+   assert(SNPRINTF(buf, 100, " %s     %d",  "b", 123) == 10);
+   assert(strcmp(buf, " b     123") == 0);
+   assert(SNPRINTF(buf, 100, "%f", pow_2_75) == 30);
+   assert(strncmp(buf, "37778931862957161709568.000000", 17) == 0);
+   n = SNPRINTF(buf, 10, "number %f", 123.456789);
+   assert(strcmp(buf, "number 12") == 0);
+   assert(n == 17);  // written vs would-be written bytes
+   n = SNPRINTF(buf, 0, "7 chars");
+   assert(n == 7);
+   // stb_sprintf uses internal buffer of 512 chars - test longer string
+   assert(SPRINTF(buf, "%d  %600s", 3, "abc") == 603);
+   assert(strlen(buf) == 603);
+   SNPRINTF(buf, 550, "%d  %600s", 3, "abc");
+   assert(strlen(buf) == 549);
+   assert(SNPRINTF(buf, 600, "%510s     %c", "a", 'b') == 516);
+
+   // length check
+   assert(SNPRINTF(NULL, 0, " %s     %d",  "b", 123) == 10);
+
+   // ' modifier. Non-standard, but supported by glibc.
+#if !USE_STB
+   setlocale(LC_NUMERIC, "");  // C locale does not group digits
+#endif
+   CHECK2("1,200,000", "%'d", 1200000);
+   CHECK2("-100,006,789", "%'d", -100006789);
+#if !defined(_MSC_VER) || _MSC_VER >= 1600
+   CHECK2("9,888,777,666", "%'lld", 9888777666ll);
+#endif
+   CHECK2("200,000,000.000000", "%'18f", 2e8);
+   CHECK2("100,056,789", "%'.0f", 100056789.0);
+   CHECK2("100,056,789.0", "%'.1f", 100056789.0);
+#if USE_STB  // difference in leading zeros
+   CHECK2("000,001,200,000", "%'015d", 1200000);
+#else
+   CHECK2("0000001,200,000", "%'015d", 1200000);
+#endif
+
+   // things not supported by glibc
+#if USE_STB
+   CHECK2("null", "%s", (char*) NULL);
+   CHECK2("123,4abc:", "%'x:", 0x1234ABC);
+   CHECK2("100000000", "%b", 256);
+   CHECK3("0b10 0B11", "%#b %#B", 2, 3);
+#if !defined(_MSC_VER) || _MSC_VER >= 1600
+   CHECK4("2 3 4", "%I64d %I32d %Id", 2ll, 3, 4ll);
+#endif
+   CHECK3("1k 2.54 M", "%$_d %$.2d", 1000, 2536000);
+   CHECK3("2.42 Mi 2.4 M", "%$$.2d %$$$d", 2536000, 2536000);
+
+   // different separators
+   stbsp_set_separators(' ', ',');
+   CHECK2("12 345,678900", "%'f", 12345.6789);
+#endif
+
+   return 0;
+}
+#endif
diff --git a/vendor/stb/tests/test_truetype.c b/vendor/stb/tests/test_truetype.c
new file mode 100644
index 0000000..cb9b35f
--- /dev/null
+++ b/vendor/stb/tests/test_truetype.c
@@ -0,0 +1,154 @@
+#ifndef _CRT_SECURE_NO_WARNINGS
+// Fixes Compile Errors for Visual Studio 2005 or newer
+  #define _CRT_SECURE_NO_WARNINGS
+#endif
+
+#include <stdlib.h>
+
+// this isn't meant to compile standalone; link with test_c_compilation.c as well
+#include "stb_rect_pack.h"
+#define STB_TRUETYPE_IMPLEMENTATION
+#include "stb_truetype.h"
+#include "stb_image_write.h"
+
+#ifdef TT_TEST
+
+#include <stdio.h>
+
+unsigned char ttf_buffer[1 << 25];
+unsigned char output[512*100];
+
+void debug(void)
+{
+   stbtt_fontinfo font;
+   fread(ttf_buffer, 1, 1<<25, fopen("c:/x/lm/LiberationMono-Regular.ttf", "rb"));
+   stbtt_InitFont(&font, ttf_buffer, 0);
+
+   stbtt_MakeGlyphBitmap(&font, output, 6, 9, 512, 5.172414E-03f, 5.172414E-03f, 54);
+}
+
+#define BITMAP_W  256
+#define BITMAP_H  512
+unsigned char temp_bitmap[BITMAP_H][BITMAP_W];
+stbtt_bakedchar cdata[256*2]; // ASCII 32..126 is 95 glyphs
+stbtt_packedchar pdata[256*2];
+
+int main(int argc, char **argv)
+{
+   stbtt_fontinfo font;
+   unsigned char *bitmap;
+   int w,h,i,j,c = (argc > 1 ? atoi(argv[1]) : '@'), s = (argc > 2 ? atoi(argv[2]) : 32);
+
+   //debug();
+
+   // @TODO: why is minglui.ttc failing? 
+   //fread(ttf_buffer, 1, 1<<25, fopen(argc > 3 ? argv[3] : "c:/windows/fonts/mingliu.ttc", "rb"));
+
+   fread(ttf_buffer, 1, 1<<25, fopen(argc > 3 ? argv[3] : "c:/windows/fonts/DejaVuSans.ttf", "rb"));
+
+   stbtt_InitFont(&font, ttf_buffer, stbtt_GetFontOffsetForIndex(ttf_buffer,0));
+
+#if 0
+   {
+      stbtt__bitmap b;
+      stbtt__point p[2];
+      int wcount[2] = { 2,0 };
+      p[0].x = 0.2f;
+      p[0].y = 0.3f;
+      p[1].x = 3.8f;
+      p[1].y = 0.8f;
+      b.w = 16;
+      b.h = 2;
+      b.stride = 16;
+      b.pixels = malloc(b.w*b.h);
+      stbtt__rasterize(&b, p, wcount, 1, 1, 1, 0, 0, 0, 0, 0, NULL);
+      for (i=0; i < 8; ++i)
+         printf("%f\n", b.pixels[i]/255.0);
+   }
+#endif
+
+#if 1
+   {
+      static stbtt_pack_context pc;
+      static stbtt_packedchar cd[256];
+      static unsigned char atlas[1024*1024];
+
+      stbtt_PackBegin(&pc, atlas, 1024,1024,1024,1,NULL);
+      stbtt_PackFontRange(&pc, ttf_buffer, 0, 32.0, 0, 256, cd);
+      stbtt_PackEnd(&pc);
+   }
+#endif
+
+
+#if 1
+   {
+      static stbtt_pack_context pc;
+      static stbtt_packedchar cd[256];
+      static unsigned char atlas[1024*1024];
+      unsigned char *data;
+
+      data = stbtt_GetCodepointSDF(&font, stbtt_ScaleForPixelHeight(&font,32.0), 'u', 4, 128, 128/4, &w,&h,&i,&j);
+      for (j=0; j < h; ++j) {
+         for (i=0; i < w; ++i) {
+            putchar(" .:ioVM@"[data[j*w+i]>>5]);
+         }
+         putchar('\n');
+      }
+   }
+#endif
+
+#if 0
+   stbtt_BakeFontBitmap(ttf_buffer,stbtt_GetFontOffsetForIndex(ttf_buffer,0), 40.0, temp_bitmap[0],BITMAP_W,BITMAP_H, 32,96, cdata); // no guarantee this fits!
+   stbi_write_png("fonttest1.png", BITMAP_W, BITMAP_H, 1, temp_bitmap, 0);
+
+   {
+      stbtt_pack_context pc;
+      stbtt_PackBegin(&pc, temp_bitmap[0], BITMAP_W, BITMAP_H, 0, 1, NULL);
+      stbtt_PackFontRange(&pc, ttf_buffer, 0, 20.0, 32, 95, pdata);
+      stbtt_PackFontRange(&pc, ttf_buffer, 0, 20.0, 0xa0, 0x100-0xa0, pdata);
+      stbtt_PackEnd(&pc);
+      stbi_write_png("fonttest2.png", BITMAP_W, BITMAP_H, 1, temp_bitmap, 0);
+   }
+
+   {
+      stbtt_pack_context pc;
+      stbtt_pack_range pr[2];
+      stbtt_PackBegin(&pc, temp_bitmap[0], BITMAP_W, BITMAP_H, 0, 1, NULL);
+
+      pr[0].chardata_for_range = pdata;
+      pr[0].array_of_unicode_codepoints = NULL;
+      pr[0].first_unicode_codepoint_in_range = 32;
+      pr[0].num_chars = 95;
+      pr[0].font_size = 20.0f;
+      pr[1].chardata_for_range = pdata+256;
+      pr[1].array_of_unicode_codepoints = NULL;
+      pr[1].first_unicode_codepoint_in_range = 0xa0;
+      pr[1].num_chars = 0x100 - 0xa0;
+      pr[1].font_size = 20.0f;
+
+      stbtt_PackSetOversampling(&pc, 2, 2);
+      stbtt_PackFontRanges(&pc, ttf_buffer, 0, pr, 2);
+      stbtt_PackEnd(&pc);
+      stbi_write_png("fonttest3.png", BITMAP_W, BITMAP_H, 1, temp_bitmap, 0);
+   }
+   return 0;
+#endif
+
+   (void)stbtt_GetCodepointBitmapSubpixel(&font,
+					 0.4972374737262726f,
+					 0.4986416995525360f,
+					 0.2391788959503174f,
+					 0.1752119064331055f,
+					 'd',
+					 &w, &h,
+					 0,0);
+
+   bitmap = stbtt_GetCodepointBitmap(&font, 0,stbtt_ScaleForPixelHeight(&font, (float)s), c, &w, &h, 0,0);
+   for (j=0; j < h; ++j) {
+      for (i=0; i < w; ++i)
+         putchar(" .:ioVM@"[bitmap[j*w+i]>>5]);
+      putchar('\n');
+   }
+   return 0;
+}
+#endif
diff --git a/vendor/stb/tests/test_vorbis.c b/vendor/stb/tests/test_vorbis.c
new file mode 100644
index 0000000..fc795c8
--- /dev/null
+++ b/vendor/stb/tests/test_vorbis.c
@@ -0,0 +1,18 @@
+#define STB_VORBIS_HEADER_ONLY
+#include "stb_vorbis.c"
+#include "stb.h"
+
+extern void stb_vorbis_dumpmem(void);
+
+#ifdef VORBIS_TEST
+int main(int argc, char **argv)
+{
+   size_t memlen;
+   unsigned char *mem = stb_fileu("../../lib/vorbis/sample/sketch008.ogg", &memlen);
+   int chan, samplerate;
+   short *output;
+   int samples = stb_vorbis_decode_memory(mem, memlen, &chan, &samplerate, &output);
+   stb_filewrite("c:/x/sketch008.raw", output, samples*4);
+   return 0;
+}
+#endif
diff --git a/vendor/stb/tests/test_voxel.c b/vendor/stb/tests/test_voxel.c
new file mode 100644
index 0000000..e177b6f
--- /dev/null
+++ b/vendor/stb/tests/test_voxel.c
@@ -0,0 +1 @@
+#include "stb_voxel_render.h"
\ No newline at end of file
diff --git a/vendor/stb/tests/textedit_sample.c b/vendor/stb/tests/textedit_sample.c
new file mode 100644
index 0000000..57250f3
--- /dev/null
+++ b/vendor/stb/tests/textedit_sample.c
@@ -0,0 +1,94 @@
+// I haven't actually tested this yet, this is just to make sure it compiles
+
+#include <stdlib.h>
+#include <string.h> // memmove
+#include <ctype.h>  // isspace
+
+#define STB_TEXTEDIT_CHARTYPE   char
+#define STB_TEXTEDIT_STRING     text_control
+
+// get the base type
+#include "stb_textedit.h"
+
+// define our editor structure
+typedef struct
+{
+   char *string;
+   int stringlen;
+   STB_TexteditState state;
+} text_control;
+
+// define the functions we need
+void layout_func(StbTexteditRow *row, STB_TEXTEDIT_STRING *str, int start_i)
+{
+   int remaining_chars = str->stringlen - start_i;
+   row->num_chars = remaining_chars > 20 ? 20 : remaining_chars; // should do real word wrap here
+   row->x0 = 0;
+   row->x1 = 20; // need to account for actual size of characters
+   row->baseline_y_delta = 1.25;
+   row->ymin = -1;
+   row->ymax =  0;
+}
+
+int delete_chars(STB_TEXTEDIT_STRING *str, int pos, int num)
+{
+   memmove(&str->string[pos], &str->string[pos+num], str->stringlen - (pos+num));
+   str->stringlen -= num;
+   return 1; // always succeeds
+}
+
+int insert_chars(STB_TEXTEDIT_STRING *str, int pos, STB_TEXTEDIT_CHARTYPE *newtext, int num)
+{
+   str->string = realloc(str->string, str->stringlen + num);
+   memmove(&str->string[pos+num], &str->string[pos], str->stringlen - pos);
+   memcpy(&str->string[pos], newtext, num);
+   str->stringlen += num;
+   return 1; // always succeeds
+}
+
+// define all the #defines needed 
+
+#define KEYDOWN_BIT                    0x80000000
+
+#define STB_TEXTEDIT_STRINGLEN(tc)     ((tc)->stringlen)
+#define STB_TEXTEDIT_LAYOUTROW         layout_func
+#define STB_TEXTEDIT_GETWIDTH(tc,n,i)  (1) // quick hack for monospaced
+#define STB_TEXTEDIT_KEYTOTEXT(key)    (((key) & KEYDOWN_BIT) ? 0 : (key))
+#define STB_TEXTEDIT_GETCHAR(tc,i)     ((tc)->string[i])
+#define STB_TEXTEDIT_NEWLINE           '\n'
+#define STB_TEXTEDIT_IS_SPACE(ch)      isspace(ch)
+#define STB_TEXTEDIT_DELETECHARS       delete_chars
+#define STB_TEXTEDIT_INSERTCHARS       insert_chars
+
+#define STB_TEXTEDIT_K_SHIFT           0x40000000
+#define STB_TEXTEDIT_K_CONTROL         0x20000000
+#define STB_TEXTEDIT_K_LEFT            (KEYDOWN_BIT | 1) // actually use VK_LEFT, SDLK_LEFT, etc
+#define STB_TEXTEDIT_K_RIGHT           (KEYDOWN_BIT | 2) // VK_RIGHT
+#define STB_TEXTEDIT_K_UP              (KEYDOWN_BIT | 3) // VK_UP
+#define STB_TEXTEDIT_K_DOWN            (KEYDOWN_BIT | 4) // VK_DOWN
+#define STB_TEXTEDIT_K_LINESTART       (KEYDOWN_BIT | 5) // VK_HOME
+#define STB_TEXTEDIT_K_LINEEND         (KEYDOWN_BIT | 6) // VK_END
+#define STB_TEXTEDIT_K_TEXTSTART       (STB_TEXTEDIT_K_LINESTART | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_TEXTEND         (STB_TEXTEDIT_K_LINEEND   | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_DELETE          (KEYDOWN_BIT | 7) // VK_DELETE
+#define STB_TEXTEDIT_K_BACKSPACE       (KEYDOWN_BIT | 8) // VK_BACKSPACE
+#define STB_TEXTEDIT_K_UNDO            (KEYDOWN_BIT | STB_TEXTEDIT_K_CONTROL | 'z')
+#define STB_TEXTEDIT_K_REDO            (KEYDOWN_BIT | STB_TEXTEDIT_K_CONTROL | 'y')
+#define STB_TEXTEDIT_K_INSERT          (KEYDOWN_BIT | 9) // VK_INSERT
+#define STB_TEXTEDIT_K_WORDLEFT        (STB_TEXTEDIT_K_LEFT  | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_WORDRIGHT       (STB_TEXTEDIT_K_RIGHT | STB_TEXTEDIT_K_CONTROL)
+#define STB_TEXTEDIT_K_PGUP            (KEYDOWN_BIT | 10) // VK_PGUP -- not implemented
+#define STB_TEXTEDIT_K_PGDOWN          (KEYDOWN_BIT | 11) // VK_PGDOWN -- not implemented
+
+#define STB_TEXTEDIT_IMPLEMENTATION
+#include "stb_textedit.h"
+
+void dummy3(void)
+{
+  stb_textedit_click(0,0,0,0);
+  stb_textedit_drag(0,0,0,0);
+  stb_textedit_cut(0,0);
+  stb_textedit_key(0,0,0);
+  stb_textedit_initialize_state(0,0);
+  stb_textedit_paste(0,0,0,0);
+}
diff --git a/vendor/stb/tests/tilemap_editor_integration_example.c b/vendor/stb/tests/tilemap_editor_integration_example.c
new file mode 100644
index 0000000..ea5dee7
--- /dev/null
+++ b/vendor/stb/tests/tilemap_editor_integration_example.c
@@ -0,0 +1,193 @@
+// This isn't compilable as-is, as it was extracted from a working
+// integration-in-a-game and makes reference to symbols from that game.
+
+#include <assert.h>
+#include <ctype.h>
+#include "game.h"
+#include "SDL.h"
+#include "stb_tilemap_editor.h"
+
+extern void editor_draw_tile(int x, int y, unsigned short tile, int mode, float *props);
+extern void editor_draw_rect(int x0, int y0, int x1, int y1, unsigned char r, unsigned char g, unsigned char b);
+
+static int is_platform(short *tiles);
+static unsigned int prop_type(int n, short *tiles);
+static char *prop_name(int n, short *tiles);
+static float prop_range(int n, short *tiles, int is_max);
+static int allow_link(short *src, short *dest);
+
+#define STBTE_MAX_PROPERTIES  8
+
+#define STBTE_PROP_TYPE(n, tiledata, p) prop_type(n,tiledata)
+#define STBTE_PROP_NAME(n, tiledata, p) prop_name(n,tiledata)
+#define STBTE_PROP_MIN(n, tiledata, p)  prop_range(n,tiledata,0)
+#define STBTE_PROP_MAX(n, tiledata, p)  prop_range(n,tiledata,1)
+#define STBTE_PROP_FLOAT_SCALE(n,td,p)  (0.1)
+
+#define STBTE_ALLOW_LINK(srctile, srcprop, desttile, destprop) \
+           allow_link(srctile, desttile)
+
+#define STBTE_LINK_COLOR(srctile, srcprop, desttile, destprop) \
+          (is_platform(srctile) ? 0xff80ff : 0x808040)
+
+#define STBTE_DRAW_RECT(x0,y0,x1,y1,c)           \
+          editor_draw_rect(x0,y0,x1,y1,(c)>>16,((c)>>8)&255,(c)&255)
+
+#define STBTE_DRAW_TILE(x,y,id,highlight,props)  \
+          editor_draw_tile(x,y,id,highlight,props)
+
+
+
+#define STB_TILEMAP_EDITOR_IMPLEMENTATION
+#include "stb_tilemap_editor.h"
+
+stbte_tilemap *edit_map;
+
+void editor_key(enum stbte_action act)
+{
+   stbte_action(edit_map, act);
+}
+
+void editor_process_sdl_event(SDL_Event *e)
+{
+   switch (e->type) {
+      case SDL_MOUSEMOTION:
+      case SDL_MOUSEBUTTONDOWN:
+      case SDL_MOUSEBUTTONUP:
+      case SDL_MOUSEWHEEL:
+         stbte_mouse_sdl(edit_map, e, 1.0f/editor_scale,1.0f/editor_scale,0,0);
+         break;
+
+      case SDL_KEYDOWN:
+         if (in_editor) {
+            switch (e->key.keysym.sym) {
+               case SDLK_RIGHT: editor_key(STBTE_scroll_right); break;
+               case SDLK_LEFT : editor_key(STBTE_scroll_left ); break;
+               case SDLK_UP   : editor_key(STBTE_scroll_up   ); break;
+               case SDLK_DOWN : editor_key(STBTE_scroll_down ); break;
+            }
+            switch (e->key.keysym.scancode) {
+               case SDL_SCANCODE_S: editor_key(STBTE_tool_select); break;
+               case SDL_SCANCODE_B: editor_key(STBTE_tool_brush ); break;
+               case SDL_SCANCODE_E: editor_key(STBTE_tool_erase ); break;
+               case SDL_SCANCODE_R: editor_key(STBTE_tool_rectangle ); break;
+               case SDL_SCANCODE_I: editor_key(STBTE_tool_eyedropper); break;
+               case SDL_SCANCODE_L: editor_key(STBTE_tool_link);       break;
+               case SDL_SCANCODE_G: editor_key(STBTE_act_toggle_grid); break;
+            }
+            if ((e->key.keysym.mod & KMOD_CTRL) && !(e->key.keysym.mod & ~KMOD_CTRL)) {
+               switch (e->key.keysym.scancode) {
+                  case SDL_SCANCODE_X: editor_key(STBTE_act_cut  ); break;
+                  case SDL_SCANCODE_C: editor_key(STBTE_act_copy ); break;
+                  case SDL_SCANCODE_V: editor_key(STBTE_act_paste); break;
+                  case SDL_SCANCODE_Z: editor_key(STBTE_act_undo ); break;
+                  case SDL_SCANCODE_Y: editor_key(STBTE_act_redo ); break;
+               }
+            }
+         }
+         break;
+   }
+}
+
+void editor_init(void)
+{
+   int i;
+   edit_map = stbte_create_map(20,14, 8, 16,16, 100);
+
+   stbte_set_background_tile(edit_map, T_empty);
+
+   for (i=0; i < T__num_types; ++i) {
+      if (i != T_reserved1 && i != T_entry && i != T_doorframe)
+         stbte_define_tile(edit_map, 0+i, 1, "Background");
+   }
+   stbte_define_tile(edit_map, 256+O_player   , 8, "Char");
+   stbte_define_tile(edit_map, 256+O_robot    , 8, "Char");
+   for (i=O_lockeddoor; i < O__num_types-2; ++i)
+      if (i == O_platform || i == O_vplatform)
+         stbte_define_tile(edit_map, 256+i, 4, "Object");
+      else
+         stbte_define_tile(edit_map, 256+i, 2, "Object");
+
+   //stbte_set_layername(edit_map, 0, "background");
+   //stbte_set_layername(edit_map, 1, "objects");
+   //stbte_set_layername(edit_map, 2, "platforms");
+   //stbte_set_layername(edit_map, 3, "characters");
+}
+
+static int is_platform(short *tiles)
+{
+   // platforms are only on layer #2
+   return tiles[2] == 256 + O_platform || tiles[2] == 256 + O_vplatform;
+}
+
+static int is_object(short *tiles)
+{
+   return (tiles[1] >= 256 || tiles[2] >= 256 || tiles[3] >= 256);
+}
+
+static unsigned int prop_type(int n, short *tiles)
+{
+   if (is_platform(tiles)) {
+      static unsigned int platform_types[STBTE_MAX_PROPERTIES] = {
+         STBTE_PROP_bool,  // phantom
+         STBTE_PROP_int,   // x_adjust
+         STBTE_PROP_int,   // y_adjust
+         STBTE_PROP_float, // width
+         STBTE_PROP_float, // lspeed
+         STBTE_PROP_float, // rspeed
+         STBTE_PROP_bool,  // autoreturn
+         STBTE_PROP_bool,  // one-shot
+         // remainder get 0, means 'no property in this slot'
+      };
+      return platform_types[n];
+   } else if (is_object(tiles)) {
+      if (n == 0)
+         return STBTE_PROP_bool;
+   }
+   return 0;
+}
+
+static char *prop_name(int n, short *tiles)
+{
+   if (is_platform(tiles)) {
+      static char *platform_vars[STBTE_MAX_PROPERTIES] = {
+         "phantom",
+         "x_adjust",
+         "y_adjust",
+         "width",
+         "lspeed",
+         "rspeed",
+         "autoreturn",
+         "one-shot",
+      };
+      return platform_vars[n];
+   }
+   return "phantom";
+}
+
+static float prop_range(int n, short *tiles, int is_max)
+{
+   if (is_platform(tiles)) {
+      static float ranges[8][2] = {
+         {   0,  1 }, // phantom-flag, range is ignored
+         { -15, 15 }, // x_adjust
+         { -15, 15 }, // y_adjust
+         {   0,  6 }, // width
+         {   0, 10 }, // lspeed
+         {   0, 10 }, // rspeed
+         {   0,  1 }, // autoreturn, range is ignored
+         {   0,  1 }, // one-shot, range is ignored
+      };
+      return ranges[n][is_max];
+   }
+   return 0;
+}
+
+static int allow_link(short *src, short *dest)
+{
+   if (is_platform(src))
+      return dest[1] == 256+O_lever;
+   if (src[1] == 256+O_endpoint)
+      return is_platform(dest);
+   return 0;
+}
diff --git a/vendor/stb/tests/truetype_test_win32.c b/vendor/stb/tests/truetype_test_win32.c
new file mode 100644
index 0000000..c9aed99
--- /dev/null
+++ b/vendor/stb/tests/truetype_test_win32.c
@@ -0,0 +1,184 @@
+// tested in VC6 (1998) and VS 2019
+#define _CRT_SECURE_NO_WARNINGS
+#define WIN32_MEAN_AND_LEAN
+#include <windows.h>
+
+#include <stdio.h>
+#include <tchar.h>
+
+#define STB_TRUETYPE_IMPLEMENTATION
+#include "stb_truetype.h"
+
+#include <gl/gl.h>
+#include <gl/glu.h>
+
+int screen_x=1024, screen_y=768;
+GLuint tex;
+
+unsigned char ttf_buffer[1<<20];
+unsigned char temp_bitmap[1024*1024];
+stbtt_bakedchar cdata[96]; // ASCII 32..126 is 95 glyphs
+
+void init(void)
+{
+   fread(ttf_buffer, 1, 1<<20, fopen("c:/windows/fonts/times.ttf", "rb"));
+   stbtt_BakeFontBitmap(ttf_buffer,0, 64.0, temp_bitmap,1024,1024, 32,96, cdata);
+   glGenTextures(1, &tex);
+   glBindTexture(GL_TEXTURE_2D, tex);
+   glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, 1024,1024,0, GL_ALPHA, GL_UNSIGNED_BYTE, temp_bitmap);
+   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
+}
+
+void print(float x, float y, char *text)
+{
+   // assume orthographic projection with units = screen pixels, origin at top left
+   glBindTexture(GL_TEXTURE_2D, tex);
+   glBegin(GL_QUADS);
+   while (*text) {
+      if (*text >= 32 && *text < 128) {
+         stbtt_aligned_quad q;
+         stbtt_GetBakedQuad(cdata, 1024,1024, *text-32, &x,&y,&q,1);//1=opengl & d3d10+,0=d3d9
+         glTexCoord2f(q.s0,q.t0); glVertex2f(q.x0,q.y0);
+         glTexCoord2f(q.s1,q.t0); glVertex2f(q.x1,q.y0);
+         glTexCoord2f(q.s1,q.t1); glVertex2f(q.x1,q.y1);
+         glTexCoord2f(q.s0,q.t1); glVertex2f(q.x0,q.y1);
+      }
+      ++text;
+   }
+   glEnd();
+}
+
+void draw(void)
+{
+   glViewport(0,0,screen_x,screen_y);
+   glClearColor(0.45f,0.45f,0.75f,0);
+   glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
+   glDisable(GL_CULL_FACE);
+   glDisable(GL_DEPTH_TEST);
+   glDisable(GL_BLEND);
+
+   glMatrixMode(GL_PROJECTION);
+   glLoadIdentity();
+   glOrtho(0,screen_x,screen_y,0,-1,1);
+   glMatrixMode(GL_MODELVIEW);
+   glLoadIdentity();
+
+   glEnable(GL_TEXTURE_2D);
+   glEnable(GL_BLEND);
+   glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
+   glColor3f(1,1,1);
+
+   print(100,150, "This is a simple test!");
+
+   // show font bitmap
+   glBegin(GL_QUADS);
+      glTexCoord2f(0,0); glVertex2f(256,200+0);
+      glTexCoord2f(1,0); glVertex2f(768,200+0);
+      glTexCoord2f(1,1); glVertex2f(768,200+512);
+      glTexCoord2f(0,1); glVertex2f(256,200+512);
+   glEnd();
+}
+
+///////////////////////////////////////////////////////////////////////
+///
+///
+///  Windows OpenGL setup
+///
+///
+
+HINSTANCE app;
+HWND  window;
+HGLRC rc;
+HDC   dc;
+
+#pragma comment(lib, "opengl32.lib")
+#pragma comment(lib, "glu32.lib")
+#pragma comment(lib, "winmm.lib")
+
+int mySetPixelFormat(HWND win)
+{
+   PIXELFORMATDESCRIPTOR pfd = { sizeof(pfd), 1, PFD_SUPPORT_OPENGL | PFD_DRAW_TO_WINDOW | PFD_DOUBLEBUFFER, PFD_TYPE_RGBA };
+   int                   pixel_format;
+   pfd.dwLayerMask  = PFD_MAIN_PLANE;
+   pfd.cColorBits   = 24;
+   pfd.cAlphaBits   = 8;
+   pfd.cDepthBits   = 24;
+   pfd.cStencilBits = 8;
+   pixel_format = ChoosePixelFormat(dc, &pfd);
+   if (!pixel_format) return FALSE;
+   if (!DescribePixelFormat(dc, pixel_format, sizeof(PIXELFORMATDESCRIPTOR), &pfd))
+      return FALSE;
+   SetPixelFormat(dc, pixel_format, &pfd);
+   return TRUE;
+}
+
+static int WINAPI WinProc(HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam)
+{
+   switch (msg) {
+      case WM_CREATE: {
+         LPCREATESTRUCT lpcs = (LPCREATESTRUCT) lparam;
+         dc = GetDC(wnd);
+         if (mySetPixelFormat(wnd)) {
+            rc = wglCreateContext(dc);
+            if (rc) {
+               wglMakeCurrent(dc, rc);
+               return 0;
+            }
+         }
+         return -1;
+      }
+
+      case WM_DESTROY:
+         wglMakeCurrent(NULL, NULL); 
+         if (rc) wglDeleteContext(rc);
+         PostQuitMessage (0);
+         return 0;
+
+      default:
+         return DefWindowProc (wnd, msg, wparam, lparam);
+   }
+
+   return DefWindowProc (wnd, msg, wparam, lparam);
+}
+
+int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow)
+{
+   DWORD dwstyle = WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_MINIMIZEBOX;
+   WNDCLASSEX  wndclass;
+   wndclass.cbSize        = sizeof(wndclass);
+   wndclass.style         = CS_OWNDC;
+   wndclass.lpfnWndProc   = (WNDPROC) WinProc;
+   wndclass.cbClsExtra    = 0;
+   wndclass.cbWndExtra    = 0;
+   wndclass.hInstance     = hInstance;
+   wndclass.hIcon         = LoadIcon(hInstance, _T("appicon"));
+   wndclass.hCursor       = LoadCursor(NULL,IDC_ARROW);
+   wndclass.hbrBackground = GetStockObject(NULL_BRUSH);
+   wndclass.lpszMenuName  = _T("truetype-test");
+   wndclass.lpszClassName = _T("truetype-test");
+   wndclass.hIconSm       = NULL;
+   app = hInstance;
+
+   if (!RegisterClassEx(&wndclass))
+      return 0;
+
+   window = CreateWindow(_T("truetype-test"), _T("truetype test"), dwstyle,
+                      CW_USEDEFAULT,0, screen_x, screen_y,
+                      NULL, NULL, app,  NULL);
+   ShowWindow(window, SW_SHOWNORMAL);
+   init();
+
+   for(;;) {
+      MSG msg;
+      if (GetMessage(&msg, NULL, 0, 0)) {
+         TranslateMessage(&msg);
+         DispatchMessage(&msg);
+      } else {
+         return 1;   // WM_QUIT
+      }
+      wglMakeCurrent(dc, rc);
+      draw();
+      SwapBuffers(dc);
+   }
+   return 0;
+}
diff --git a/vendor/stb/tests/vorbseek/vorbseek.c b/vendor/stb/tests/vorbseek/vorbseek.c
new file mode 100644
index 0000000..f3460ad
--- /dev/null
+++ b/vendor/stb/tests/vorbseek/vorbseek.c
@@ -0,0 +1,125 @@
+#include <assert.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+#define STB_VORBIS_HEADER_ONLY
+#include "stb_vorbis.c"
+
+#define SAMPLES_TO_TEST 3000
+
+int test_count  [5] = { 5000, 3000, 2000, 50000, 50000 };
+int test_spacing[5] = {    1,  111, 3337,  7779, 72717 };
+
+int try_seeking(stb_vorbis *v, unsigned int pos, short *output, unsigned int num_samples)
+{
+   int count;
+   short samples[SAMPLES_TO_TEST*2];
+   assert(pos <= num_samples);
+
+   if (!stb_vorbis_seek(v, pos)) {
+      fprintf(stderr, "Seek to %u returned error from stb_vorbis\n", pos);
+      return 0;
+   }
+
+   count = stb_vorbis_get_samples_short_interleaved(v, 2, samples, SAMPLES_TO_TEST*2);
+
+   if (count > (int) (num_samples - pos)) {
+      fprintf(stderr, "Seek to %u allowed decoding %d samples when only %d should have been valid.\n",
+            pos, count, (int) (num_samples - pos));
+      return 0;
+   }
+
+   if (count < SAMPLES_TO_TEST && count < (int) (num_samples - pos)) {
+      fprintf(stderr, "Seek to %u only decoded %d samples of %d attempted when at least %d should have been valid.\n",
+         pos, count, SAMPLES_TO_TEST, num_samples - pos);
+      return 0;                      
+   }
+
+   if (0 != memcmp(samples, output + pos*2, count*2)) {
+      int k;
+      for (k=0; k < SAMPLES_TO_TEST*2; ++k) {
+         if (samples[k] != output[k]) {
+            fprintf(stderr, "Seek to %u produced incorrect samples starting at sample %u (short #%d in buffer).\n",
+                    pos, pos + (k/2), k);
+            break;
+         }
+      }
+      assert(k != SAMPLES_TO_TEST*2);
+      return 0;
+   }
+
+   return 1;
+}
+
+int main(int argc, char **argv)
+{
+   int num_chan, samprate;
+   int i, j, test, phase;
+   short *output;
+
+   if (argc == 1) {
+      fprintf(stderr, "Usage: vorbseek {vorbisfile} [{vorbisfile]*]\n");
+      fprintf(stderr, "Tests various seek offsets to make sure they're sample exact.\n");
+      return 0;
+   }
+
+   #if 0
+   {
+      // check that outofmem occurs correctly
+      stb_vorbis_alloc va;
+      va.alloc_buffer = malloc(1024*1024);
+      for (i=0; i < 1024*1024; i += 10) {
+         int error=0;
+         stb_vorbis *v;
+         va.alloc_buffer_length_in_bytes = i;
+         v = stb_vorbis_open_filename(argv[1], &error, &va);
+         if (v != NULL)
+            break;
+         printf("Error %d at %d\n", error, i);
+      }
+   }
+   #endif
+
+   for (j=1; j < argc; ++j) {
+      unsigned int successes=0, attempts = 0;
+      unsigned int num_samples = stb_vorbis_decode_filename(argv[j], &num_chan, &samprate, &output);
+
+      break;
+
+      if (num_samples == 0xffffffff) {
+         fprintf(stderr, "Error: couldn't open file or not vorbis file: %s\n", argv[j]);
+         goto fail;
+      }
+
+      if (num_chan != 2) {
+         fprintf(stderr, "vorbseek testing only works with files with 2 channels, %s has %d\n", argv[j], num_chan);
+         goto fail;
+      }
+
+      for (test=0; test < 5; ++test) {
+         int error;
+         stb_vorbis *v = stb_vorbis_open_filename(argv[j], &error, NULL);
+         if (v == NULL) {
+            fprintf(stderr, "Couldn't re-open %s for test #%d\n", argv[j], test);
+            goto fail;
+         }
+         for (phase=0; phase < 3; ++phase) {
+            unsigned int base = phase == 0 ? 0 : phase == 1 ? num_samples - test_count[test]*test_spacing[test] : num_samples/3;
+            for (i=0; i < test_count[test]; ++i) {
+               unsigned int pos = base + i*test_spacing[test];
+               if (pos > num_samples) // this also catches underflows
+                  continue;
+               successes += try_seeking(v, pos, output, num_samples);
+               attempts += 1;
+            }
+         }
+         stb_vorbis_close(v);
+      }
+      printf("%d of %d seeks failed in %s (%d samples)\n", attempts-successes, attempts, argv[j], num_samples);
+      free(output);
+   }
+   return 0;
+  fail:
+   return 1;
+}
\ No newline at end of file
diff --git a/vendor/stb/tests/vorbseek/vorbseek.dsp b/vendor/stb/tests/vorbseek/vorbseek.dsp
new file mode 100644
index 0000000..5eaf579
--- /dev/null
+++ b/vendor/stb/tests/vorbseek/vorbseek.dsp
@@ -0,0 +1,96 @@
+# Microsoft Developer Studio Project File - Name="vorbseek" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=vorbseek - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "vorbseek.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "vorbseek.mak" CFG="vorbseek - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "vorbseek - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "vorbseek - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "vorbseek - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /Zd /O2 /I "..\.." /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /FD /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386
+
+!ELSEIF  "$(CFG)" == "vorbseek - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Ignore_Export_Lib 0
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ  /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I "..\.." /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ  /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "vorbseek - Win32 Release"
+# Name "vorbseek - Win32 Debug"
+# Begin Source File
+
+SOURCE=..\..\stb_vorbis.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\vorbseek.c
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tools/README.footer.md b/vendor/stb/tools/README.footer.md
new file mode 100644
index 0000000..3eb8dc0
--- /dev/null
+++ b/vendor/stb/tools/README.footer.md
@@ -0,0 +1,135 @@
+
+FAQ
+---
+
+#### What's the license?
+
+These libraries are in the public domain. You can do anything you
+want with them. You have no legal obligation
+to do anything else, although I appreciate attribution.
+
+They are also licensed under the MIT open source license, if you have lawyers
+who are unhappy with public domain. Every source file includes an explicit
+dual-license for you to choose from.
+
+#### How do I use these libraries?
+
+The idea behind single-header file libraries is that they're easy to distribute and deploy
+because all the code is contained in a single file. By default, the .h files in here act as
+their own header files, i.e. they declare the functions contained in the file but don't
+actually result in any code getting compiled.
+
+So in addition, you should select _exactly one_ C/C++ source file that actually instantiates
+the code, preferably a file you're not editing frequently. This file should define a
+specific macro (this is documented per-library) to actually enable the function definitions.
+For example, to use stb_image, you should have exactly one C/C++ file that doesn't
+include stb_image.h regularly, but instead does
+
+    #define STB_IMAGE_IMPLEMENTATION
+    #include "stb_image.h"
+
+The right macro to define is pointed out right at the top of each of these libraries.
+
+#### <a name="other_libs"></a> Are there other single-file public-domain/open source libraries with minimal dependencies out there?
+
+[Yes.](https://github.com/nothings/single_file_libs)
+
+#### If I wrap an stb library in a new library, does the new library have to be public domain/MIT?
+
+No, because it's public domain you can freely relicense it to whatever license your new
+library wants to be.
+
+#### What's the deal with SSE support in GCC-based compilers?
+
+stb_image will either use SSE2 (if you compile with -msse2) or
+will not use any SIMD at all, rather than trying to detect the
+processor at runtime and handle it correctly. As I understand it,
+the approved path in GCC for runtime-detection require
+you to use multiple source files, one for each CPU configuration.
+Because stb_image is a header-file library that compiles in only
+one source file, there's no approved way to build both an
+SSE-enabled and a non-SSE-enabled variation.
+
+While we've tried to work around it, we've had multiple issues over
+the years due to specific versions of gcc breaking what we're doing,
+so we've given up on it. See https://github.com/nothings/stb/issues/280
+and https://github.com/nothings/stb/issues/410 for examples.
+
+#### Some of these libraries seem redundant to existing open source libraries. Are they better somehow?
+
+Generally they're only better in that they're easier to integrate,
+easier to use, and easier to release (single file; good API; no
+attribution requirement). They may be less featureful, slower,
+and/or use more memory. If you're already using an equivalent
+library, there's probably no good reason to switch.
+
+#### Can I link directly to the table of stb libraries?
+
+You can use [this URL](https://github.com/nothings/stb#stb_libs) to link directly to that list.
+
+#### Why do you list "lines of code"? It's a terrible metric.
+
+Just to give you some idea of the internal complexity of the library,
+to help you manage your expectations, or to let you know what you're
+getting into. While not all the libraries are written in the same
+style, they're certainly similar styles, and so comparisons between
+the libraries are probably still meaningful.
+
+Note though that the lines do include both the implementation, the
+part that corresponds to a header file, and the documentation.
+
+#### Why single-file headers?
+
+Windows doesn't have standard directories where libraries
+live. That makes deploying libraries in Windows a lot more
+painful than open source developers on Unix-derivates generally
+realize. (It also makes library dependencies a lot worse in Windows.)
+
+There's also a common problem in Windows where a library was built
+against a different version of the runtime library, which causes
+link conflicts and confusion. Shipping the libs as headers means
+you normally just compile them straight into your project without
+making libraries, thus sidestepping that problem.
+
+Making them a single file makes it very easy to just
+drop them into a project that needs them. (Of course you can
+still put them in a proper shared library tree if you want.)
+
+Why not two files, one a header and one an implementation?
+The difference between 10 files and 9 files is not a big deal,
+but the difference between 2 files and 1 file is a big deal.
+You don't need to zip or tar the files up, you don't have to
+remember to attach *two* files, etc.
+
+#### Why "stb"? Is this something to do with Set-Top Boxes?
+
+No, they are just the initials for my name, Sean T. Barrett.
+This was not chosen out of egomania, but as a moderately sane
+way of namespacing the filenames and source function names.
+
+#### Will you add more image types to stb_image.h?
+
+No. As stb_image use has grown, it has become more important
+for us to focus on security of the codebase. Adding new image
+formats increases the amount of code we need to secure, so it
+is no longer worth adding new formats.
+
+#### Do you have any advice on how to create my own single-file library?
+
+Yes. https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
+
+#### Why public domain?
+
+I prefer it over GPL, LGPL, BSD, zlib, etc. for many reasons.
+Some of them are listed here:
+https://github.com/nothings/stb/blob/master/docs/why_public_domain.md
+
+#### Why C?
+
+Primarily, because I use C, not C++. But it does also make it easier
+for other people to use them from other languages.
+
+#### Why not C99? stdint.h, declare-anywhere, etc.
+
+I still use MSVC 6 (1998) as my IDE because it has better human factors
+for me than later versions of MSVC.
diff --git a/vendor/stb/tools/README.header.md b/vendor/stb/tools/README.header.md
new file mode 100644
index 0000000..8df64ce
--- /dev/null
+++ b/vendor/stb/tools/README.header.md
@@ -0,0 +1,22 @@
+stb
+===
+
+single-file public domain (or MIT licensed) libraries for C/C++
+
+# This project discusses security-relevant bugs in public in Github Issues and Pull Requests, and it may take significant time for security fixes to be implemented or merged. If this poses an unreasonable risk to your project, do not use stb libraries.
+
+Noteworthy:
+
+* image loader: [stb_image.h](stb_image.h)
+* image writer: [stb_image_write.h](stb_image_write.h)
+* image resizer: [stb_image_resize2.h](stb_image_resize2.h)
+* font text rasterizer: [stb_truetype.h](stb_truetype.h)
+* typesafe containers: [stb_ds.h](stb_ds.h)
+
+Most libraries by stb, except: stb_dxt by Fabian "ryg" Giesen, original stb_image_resize
+by Jorge L. "VinoBS" Rodriguez, and stb_image_resize2 and stb_sprintf by Jeff Roberts.
+
+<a name="stb_libs"></a>
+
+library    | latest version | category | LoC | description
+--------------------- | ---- | -------- | --- | --------------------------------
diff --git a/vendor/stb/tools/README.list b/vendor/stb/tools/README.list
new file mode 100644
index 0000000..e858783
--- /dev/null
+++ b/vendor/stb/tools/README.list
@@ -0,0 +1,21 @@
+stb_vorbis.c                | audio            | decode ogg vorbis files from file/memory to float/16-bit signed output
+stb_hexwave.h               | audio            | audio waveform synthesizer
+stb_image.h                 | graphics         | image loading/decoding from file/memory: JPG, PNG, TGA, BMP, PSD, GIF, HDR, PIC
+stb_truetype.h              | graphics         | parse, decode, and rasterize characters from truetype fonts
+stb_image_write.h           | graphics         | image writing to disk: PNG, TGA, BMP
+stb_image_resize2.h         | graphics         | resize images larger/smaller with good quality
+stb_rect_pack.h             | graphics         | simple 2D rectangle packer with decent quality
+stb_perlin.h                | graphics         | perlin's revised simplex noise w/ different seeds
+stb_ds.h                    | utility          | typesafe dynamic array and hash tables for C, will compile in C++
+stb_sprintf.h               | utility          | fast sprintf, snprintf for C/C++
+stb_textedit.h              | user interface   | guts of a text editor for games etc implementing them from scratch
+stb_voxel_render.h          | 3D graphics      | Minecraft-esque voxel rendering "engine" with many more features
+stb_dxt.h                   | 3D graphics      | Fabian "ryg" Giesen's real-time DXT compressor
+stb_easy_font.h             | 3D graphics      | quick-and-dirty easy-to-deploy bitmap font for printing frame rate, etc
+stb_tilemap_editor.h        | game dev         | embeddable tilemap editor
+stb_herringbone_wang_tile.h | game dev         | herringbone Wang tile map generator
+stb_c_lexer.h               | parsing          | simplify writing parsers for C-like languages
+stb_divide.h                | math             | more useful 32-bit modulus e.g. "euclidean divide"
+stb_connected_components.h  | misc             | incrementally compute reachability on grids
+stb_leakcheck.h             | misc             | quick-and-dirty malloc/free leak-checking
+stb_include.h               | misc             | implement recursive #include support, particularly for GLSL
diff --git a/vendor/stb/tools/build_matrix.c b/vendor/stb/tools/build_matrix.c
new file mode 100644
index 0000000..4051020
--- /dev/null
+++ b/vendor/stb/tools/build_matrix.c
@@ -0,0 +1,137 @@
+#define STB_DEFINE
+#include "stb.h"
+
+// true if no error
+int run_command(char *batch_file, char *command)
+{
+   char buffer[4096];
+   if (batch_file[0]) {
+      sprintf(buffer, "%s && %s", batch_file, command);
+      return system(buffer) == 0;
+   } else {
+      return system(command) == 0;
+   }
+}
+
+typedef struct
+{
+   char *compiler_name;
+   char *batchfile;
+   char *objdir;
+   char *compiler;
+   char *args;
+   char *link;
+} compiler_info;
+
+compiler_info *compilers;
+char *shared_args;
+char *shared_link;
+
+typedef struct
+{
+   char *filelist;
+} project_info;
+
+project_info *projects;
+
+enum { NONE, IN_COMPILERS, IN_ARGS, IN_PROJECTS, IN_LINK };
+
+int main(int argc, char **argv)
+{
+   int state = NONE;
+   int i,j,n;
+   char **line;
+   if (argc != 2) stb_fatal("Usage: stb_build_matrix {build-file}\n");
+   line = stb_stringfile(argv[1], &n);
+   if (line == 0) stb_fatal("Couldn't open file '%s'\n", argv[1]);
+
+   for (i=0; i < n; ++i) {
+      char *p = stb_trimwhite(line[i]);
+      if (p[0] == 0) continue;
+      else if (p[0] == '#') continue;
+      else if (0 == stricmp(p, "[compilers]")) { state = IN_COMPILERS;                     }
+      else if (0 == stricmp(p, "[args]"     )) { state = IN_ARGS     ; shared_args = NULL; }
+      else if (0 == stricmp(p, "[projects]" )) { state = IN_PROJECTS ;                     }
+      else if (0 == stricmp(p, "[link]"     )) { state = IN_LINK     ; shared_link = NULL; }
+      else {
+         switch (state) {
+            case NONE: stb_fatal("Invalid text outside section at line %d.", i+1);
+            case IN_COMPILERS: {
+               char buffer[4096];
+               int count;
+               compiler_info ci;
+               char **tokens = stb_tokens_stripwhite(p, ",", &count), *batch;
+               if (count > 3) stb_fatal("Expecting name and batch file name at line %d.", i+1);
+               batch = (count==1 ? tokens[0] : tokens[1]);
+               if (strlen(batch)) 
+                  sprintf(buffer, "c:\\%s.bat", batch);
+               else
+                  strcpy(buffer, "");
+               ci.compiler_name = strdup(tokens[0]);
+               ci.batchfile     = strdup(buffer);
+               ci.compiler      = count==3 ? strdup(tokens[2]) : "cl";
+               if (0==strnicmp(batch, "vcvars_", 7))
+                  ci.objdir     = strdup(stb_sprintf("vs_%s_%d", batch+7, i));
+               else
+                  ci.objdir     = strdup(stb_sprintf("%s_%d", batch, i));
+               ci.args = shared_args;
+               ci.link = shared_link;
+               stb_arr_push(compilers, ci);
+               break;
+            }
+            case IN_ARGS: {
+               stb_arr_push(shared_args, ' ');
+               for (j=0; p[j] != 0; ++j)
+                  stb_arr_push(shared_args, p[j]);
+               break;
+            }
+            case IN_LINK: {
+               stb_arr_push(shared_link, ' ');
+               for (j=0; p[j] != 0; ++j)
+                  stb_arr_push(shared_link, p[j]);
+               break;
+            }
+            case IN_PROJECTS: {
+               project_info pi;
+               pi.filelist = strdup(p);
+               stb_arr_push(projects, pi);
+               break;
+            }
+         }
+      }
+   }
+
+   _mkdir("obj");
+   for (j=0; j < stb_arr_len(compilers); ++j) {
+      char command[4096];
+      for (i=0; i < stb_arr_len(projects); ++i) {
+         int r;
+         _mkdir(stb_sprintf("obj/%s", compilers[j].objdir));
+         if (stb_suffix(compilers[j].compiler, "cl"))
+            sprintf(command, "%s %.*s %s /link %.*s",
+                                     compilers[j].compiler,
+                                     stb_arr_len(compilers[j].args), compilers[j].args,
+                                     projects[i].filelist,
+                                     stb_arr_len(compilers[j].link), compilers[j].link);
+         else
+            sprintf(command, "%s %.*s %s %.*s",
+                                     compilers[j].compiler,
+                                     stb_arr_len(compilers[j].args), compilers[j].args,
+                                     projects[i].filelist,
+                                     stb_arr_len(compilers[j].link), compilers[j].link);
+         r = run_command(compilers[j].batchfile, command);
+         stbprint("{%c== Compiler %s == Building %s}\n", r ? '$' : '!', compilers[j].compiler_name, projects[i].filelist);
+         stb_copyfile("a.exe", stb_sprintf("obj/%s/a.exe", compilers[j].objdir));
+         //printf("Copy: %s to %s\n", "a.exe", stb_sprintf("obj/%s/a.exe", compilers[j].objdir));
+         stb_copyfile("temp.exe", stb_sprintf("obj/%s/temp.exe", compilers[j].objdir));
+         system("if EXIST a.exe del /q a.exe");
+         system("if EXIST temp.exe del /q temp.exe");
+         system("if EXIST *.obj del /q *.obj");
+         system("if EXIST *.o del /q *.o");
+         if (!r)
+            return 1;
+      }
+   }
+
+   return 0;
+}
diff --git a/vendor/stb/tools/easy_font_maker.c b/vendor/stb/tools/easy_font_maker.c
new file mode 100644
index 0000000..f1b4836
--- /dev/null
+++ b/vendor/stb/tools/easy_font_maker.c
@@ -0,0 +1,211 @@
+// This program was used to encode the data for stb_simple_font.h
+
+#define STB_DEFINE
+#include "stb.h"
+#define STB_IMAGE_IMPLEMENTATION
+#include "stb_image.h"
+
+int w,h;
+uint8 *data;
+
+int last_x[2], last_y[2];
+int num_seg[2], non_empty;
+#if 0
+typedef struct
+{
+   unsigned short first_segment;
+   unsigned char advance;
+} chardata;
+
+typedef struct
+{
+   unsigned char x:4;
+   unsigned char y:4;
+   unsigned char len:3;
+   unsigned char dir:1;
+} segment;
+
+segment *segments;
+
+void add_seg(int x, int y, int len, int horizontal)
+{
+   segment s;
+   s.x = x;
+   s.y = y;
+   s.len = len;
+   s.dir = horizontal;
+   assert(s.x == x);
+   assert(s.y == y);
+   assert(s.len == len);
+   stb_arr_push(segments, s);
+}
+#else
+typedef struct
+{
+   unsigned char first_segment:8;
+   unsigned char first_v_segment:8;
+   unsigned char advance:5;
+   unsigned char voff:1;
+} chardata;
+
+#define X_LIMIT 1
+#define LEN_LIMIT 7
+
+typedef struct
+{
+   unsigned char dx:1;
+   unsigned char y:4;
+   unsigned char len:3;
+} segment;
+
+segment *segments;
+segment *vsegments;
+
+void add_seg(int x, int y, int len, int horizontal)
+{
+   segment s;
+
+   while (x - last_x[horizontal] > X_LIMIT) {
+      add_seg(last_x[horizontal] + X_LIMIT, 0, 0, horizontal);
+   }
+   while (len > LEN_LIMIT) {
+      add_seg(x, y, LEN_LIMIT, horizontal);
+      len -= LEN_LIMIT;
+      x += LEN_LIMIT*horizontal;
+      y += LEN_LIMIT*!horizontal;
+   }
+
+   s.dx = x - last_x[horizontal];
+   s.y = y;
+   s.len = len;
+   non_empty += len != 0;
+   //assert(s.x == x);
+   assert(s.y == y);
+   assert(s.len == len);
+   ++num_seg[horizontal];
+   if (horizontal)
+      stb_arr_push(segments, s);
+   else
+      stb_arr_push(vsegments, s);
+   last_x[horizontal] = x;
+}
+
+void print_segments(segment *s)
+{
+   int i, hpos;
+   printf("   ");
+   hpos = 4;
+   for (i=0; i < stb_arr_len(s); ++i) {
+      // repack for portability
+      unsigned char seg = s[i].len + s[i].dx*8 + s[i].y*16;
+      hpos += printf("%d,", seg);
+      if (hpos > 72 && i+1 < stb_arr_len(s)) {
+         hpos = 4;
+         printf("\n    ");
+      }
+   }
+   printf("\n");
+}
+
+#endif
+
+chardata charinfo[128];
+
+int parse_char(int x, chardata *c, int offset)
+{
+   int start_x = x, end_x, top_y = 0, y;
+
+   c->first_segment = stb_arr_len(segments);
+   c->first_v_segment = stb_arr_len(vsegments) - offset;
+   assert(c->first_segment == stb_arr_len(segments));
+   assert(c->first_v_segment + offset == stb_arr_len(vsegments));
+
+   // find advance distance
+   end_x = x+1;
+   while (data[end_x*3] == 255)
+      ++end_x;
+   c->advance = end_x - start_x + 1;
+
+   last_x[0] = last_x[1] = 0;
+   last_y[0] = last_y[1] = 0;
+
+   for (y=2; y < h; ++y) {
+      for (x=start_x; x < end_x; ++x) {
+         if (data[y*3*w+x*3+1] < 255) {
+            top_y = y;
+            break;
+         }
+      }
+      if (top_y)
+         break;
+   }
+   c->voff = top_y > 2;
+   if (top_y > 2) 
+      top_y = 3;
+
+   for (x=start_x; x < end_x; ++x) {
+      int y;
+      for (y=2; y < h; ++y) {
+         if (data[y*3*w+x*3+1] < 255) {
+            if (data[y*3*w+x*3+0] == 255) { // red
+               int len=0;
+               while (y+len < h && data[(y+len)*3*w+x*3+0] == 255 && data[(y+len)*3*w+x*3+1] == 0) {
+                  data[(y+len)*3*w+x*3+0] = 0;
+                  ++len;
+               }
+               add_seg(x-start_x,y-top_y,len,0);
+            }
+            if (data[y*3*w+x*3+2] == 255) { // blue
+               int len=0;
+               while (x+len < end_x && data[y*3*w+(x+len)*3+2] == 255 && data[y*3*w+(x+len)*3+1] == 0) {
+                  data[y*3*w+(x+len)*3+2] = 0;
+                  ++len;
+               }
+               add_seg(x-start_x,y-top_y,len,1);
+            }
+         }
+      }
+   }
+   return end_x;
+}
+
+
+int main(int argc, char **argv)
+{
+   int c, x=0;
+   data = stbi_load("easy_font_raw.png", &w, &h, 0, 3);
+   for (c=32; c < 127; ++c) {
+      x = parse_char(x, &charinfo[c], 0);
+      printf("%3d -- %3d %3d\n", c, charinfo[c].first_segment, charinfo[c].first_v_segment);
+   }
+   printf("===\n");
+   printf("%d %d %d\n", num_seg[0], num_seg[1], non_empty);
+   printf("%d\n", sizeof(segments[0]) * stb_arr_len(segments));
+   printf("%d\n", sizeof(segments[0]) * stb_arr_len(segments) + sizeof(segments[0]) * stb_arr_len(vsegments) + sizeof(charinfo[32])*95);
+
+   printf("struct {\n"
+          "    unsigned char advance;\n"
+          "    unsigned char h_seg;\n"
+          "    unsigned char v_seg;\n"
+          "} stb_easy_font_charinfo[96] = {\n");
+   charinfo[c].first_segment = stb_arr_len(segments);
+   charinfo[c].first_v_segment = stb_arr_len(vsegments);
+   for (c=32; c < 128; ++c) {
+      if ((c & 3) == 0) printf("    ");
+      printf("{ %2d,%3d,%3d },",
+         charinfo[c].advance + 16*charinfo[c].voff,
+         charinfo[c].first_segment,
+         charinfo[c].first_v_segment);
+      if ((c & 3) == 3) printf("\n"); else printf("  ");
+   }
+   printf("};\n\n");
+
+   printf("unsigned char stb_easy_font_hseg[%d] = {\n", stb_arr_len(segments));
+      print_segments(segments);
+   printf("};\n\n");
+
+   printf("unsigned char stb_easy_font_vseg[%d] = {\n", stb_arr_len(vsegments));
+      print_segments(vsegments);
+   printf("};\n");
+   return 0;
+}
diff --git a/vendor/stb/tools/make_readme.c b/vendor/stb/tools/make_readme.c
new file mode 100644
index 0000000..bca9d2b
--- /dev/null
+++ b/vendor/stb/tools/make_readme.c
@@ -0,0 +1,65 @@
+#define STB_DEFINE
+#include "../stb.h"
+
+int main(int argc, char  **argv)
+{
+   int i;
+   int hlen, flen, listlen, total_lines = 0;
+   char *header = stb_file("README.header.md", &hlen);      // stb_file - read file into malloc()ed buffer
+   char *footer = stb_file("README.footer.md", &flen);      // stb_file - read file into malloc()ed buffer
+   char **list  = stb_stringfile("README.list", &listlen);  // stb_stringfile - read file lines into malloced array of strings
+
+   FILE *f = fopen("../README.md", "wb");
+
+   fprintf(f, "<!---   THIS FILE IS AUTOMATICALLY GENERATED, DO NOT CHANGE IT BY HAND   --->\r\n\r\n");
+   fwrite(header, 1, hlen, f);
+
+   for (i=0; i < listlen; ++i) {
+      int num,j;
+      char **tokens = stb_tokens_stripwhite(list[i], "|", &num);  // stb_tokens -- tokenize string into malloced array of strings
+      int num_lines;
+      char **lines = stb_stringfile(stb_sprintf("../%s", tokens[0]), &num_lines);
+      char *s1, *s2,*s3;
+      if (lines == NULL) stb_fatal("Couldn't open '%s'", tokens[0]);
+      s1 = strchr(lines[0], '-');
+      if (!s1) stb_fatal("Couldn't find '-' before version number in %s", tokens[0]); // stb_fatal -- print error message & exit
+      s2 = strchr(s1+2, '-');
+      if (!s2) stb_fatal("Couldn't find '-' after version number in %s", tokens[0]);  // stb_fatal -- print error message & exit
+      *s2 = 0;
+      s1 += 1;
+      s1 = stb_trimwhite(s1);                  // stb_trimwhite -- advance pointer to after whitespace & delete trailing whitespace
+      if (*s1 == 'v') ++s1;
+      s3 = tokens[0];
+      stb_trimwhite(s3);
+      fprintf(f, "**[");
+      if (strlen(s3) < 21) {
+         fprintf(f, "%s", tokens[0]);
+      } else {
+         char buffer[256];
+         strncpy(buffer, s3, 18);
+         buffer[18] = 0;   
+         strcat(buffer, "...");
+         fprintf(f, "%s", buffer);
+      }
+      fprintf(f, "](%s)**", tokens[0]);
+      fprintf(f, " | %s", s1);
+      s1 = stb_trimwhite(tokens[1]);           // stb_trimwhite -- advance pointer to after whitespace & delete trailing whitespace
+      s2 = stb_dupreplace(s1, " ", "&nbsp;");  // stb_dupreplace -- search & replace string and malloc result
+      fprintf(f, " | %s", s2);
+      free(s2);
+      fprintf(f, " | %d", num_lines);
+      total_lines += num_lines;
+      for (j=2; j < num; ++j)
+         fprintf(f, " | %s", tokens[j]);
+      fprintf(f, "\r\n");
+   }
+
+   fprintf(f, "\r\n");
+   fprintf(f, "Total libraries: %d\r\n", listlen);
+   fprintf(f, "Total lines of C code: %d\r\n\r\n", total_lines);
+
+   fwrite(footer, 1, flen, f);
+   fclose(f);
+
+   return 0;
+}
diff --git a/vendor/stb/tools/make_readme.dsp b/vendor/stb/tools/make_readme.dsp
new file mode 100644
index 0000000..232dd86
--- /dev/null
+++ b/vendor/stb/tools/make_readme.dsp
@@ -0,0 +1,97 @@
+# Microsoft Developer Studio Project File - Name="make_readme" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=make_readme - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "make_readme.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "make_readme.mak" CFG="make_readme - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "make_readme - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "make_readme - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "make_readme - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "make_readme - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug\make_readme"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /FD /GZ /c
+# SUBTRACT CPP /YX
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "make_readme - Win32 Release"
+# Name "make_readme - Win32 Debug"
+# Begin Source File
+
+SOURCE=.\make_readme.c
+# End Source File
+# Begin Source File
+
+SOURCE=.\README.header.md
+# End Source File
+# Begin Source File
+
+SOURCE=.\README.list
+# End Source File
+# End Target
+# End Project
diff --git a/vendor/stb/tools/mr.bat b/vendor/stb/tools/mr.bat
new file mode 100644
index 0000000..475bc4f
--- /dev/null
+++ b/vendor/stb/tools/mr.bat
@@ -0,0 +1 @@
+debug\make_readme
diff --git a/vendor/stb/tools/trailing_whitespace.c b/vendor/stb/tools/trailing_whitespace.c
new file mode 100644
index 0000000..55afc87
--- /dev/null
+++ b/vendor/stb/tools/trailing_whitespace.c
@@ -0,0 +1,32 @@
+#define STB_DEFINE
+#include "stb.h"
+
+int main(int argc, char **argv)
+{
+   int i;
+   for (i=1; i < argc; ++i) {
+      int len;
+      FILE *f;
+      char *s = stb_file(argv[i], &len);
+      char *end, *src, *dest;
+      if (s == NULL) {
+         printf("Couldn't read file '%s'.\n", argv[i]);
+         continue; 
+      }
+      end = s + len;
+      src = dest = s;
+      while (src < end) {
+         char *start=0;
+         while (src < end && *src != '\n' && *src != '\r')
+            *dest++ = *src++;
+         while (dest-1 > s && (dest[-1] == ' ' || dest[-1] == '\t'))
+            --dest;
+         while (src < end && (*src == '\n' || *src == '\r'))
+            *dest++ = *src++;
+      }
+      f = fopen(argv[i], "wb");
+      fwrite(s, 1, dest-s, f);
+      fclose(f);
+   }
+    return 0;
+}
diff --git a/vendor/stb/tools/unicode.c b/vendor/stb/tools/unicode.c
new file mode 100644
index 0000000..9536d5d
--- /dev/null
+++ b/vendor/stb/tools/unicode.c
@@ -0,0 +1,749 @@
+#define STB_DEFINE
+#include "../stb.h"
+
+// create unicode mappings
+//
+// Two kinds of mappings:
+//     map to a number
+//     map to a bit
+//
+// For mapping to a number, we use the following strategy:
+//
+//   User supplies:
+//     1. a table of numbers (for now we use uint16, so full Unicode table is 4MB)
+//     2. a "don't care" value
+//     3. define a 'fallback' value (typically 0)
+//     4. define a fast-path range (typically 0..255 or 0..1023) [@TODO: automate detecting this]
+//
+//   Code:
+//     1. Determine range of *end* of unicode codepoints (U+10FFFF and down) which
+//        all have the same value (or don't care). If large enough, emit this as a
+//        special case in the code.
+//     2. Repeat above, limited to at most U+FFFF.
+//     3. Cluster the data into intervals of 8,16,32,64,128,256 numeric values.
+//        3a. If all the values in an interval are fallback/dont-care, no further processing
+//        3b. Find the "trimmed range" outside which all the values are the fallback or don't care
+//        3c. Find the "special trimmed range" outside which all the values are some constant or don't care
+//     4. Pack the clusters into continuous memory, and find previous instances of
+//        the cluster. Repeat for trimmed & special-trimmed. In the first case, find
+//        previous instances of the cluster (allow don't-care to match in either
+//        direction), both aligned and mis-aligned; in the latter, starting where
+//        things start or mis-aligned. Build an index table specifying the
+//        location of each cluster (and its length). Allow an extra indirection here;
+//        the full-sized index can index a smaller table which has the actual offset
+//        (and lengths).
+//     5. Associate with each packed continuous memory above the amount of memory
+//        required to store the data w/ smallest datatype (of uint8, uint16, uint32).
+//        Discard the continuous memory. Recurse on each index table, but avoid the
+//        smaller packing.
+//
+// For mapping to a bit, we pack the results for 8 characters into a byte, and then apply
+// the above strategy. Note that there may be more optimal approaches with e.g. packing
+// 8 different bits into a single structure, though, which we should explore eventually.
+
+
+// currently we limit *indices* to being 2^16, and we pack them as
+//      index + end_trim*2^16 + start_trim*2^24; specials have to go in a separate table
+typedef uint32 uval;
+#define UVAL_DONT_CARE_DEFAULT 0xffffffff
+
+typedef struct
+{
+   uval *input;
+   uint32 dont_care;
+   uint32 fallback;
+   int  fastpath;
+   int  length;
+   int  depth;
+   int  has_sign;
+   int  splittable;
+   int  replace_fallback_with_codepoint;
+   size_t input_size;
+   size_t inherited_storage;
+} table;
+
+typedef struct
+{
+   int   split_log2;
+   table result; // index into not-returned table
+   int   storage;
+} output;
+
+typedef struct
+{
+   table t;
+   char **output_name;
+} info;
+
+typedef struct
+{
+   size_t path;
+   size_t size;
+} result;
+
+typedef struct
+{
+   uint8 trim_end;
+   uint8 trim_start;
+   uint8 special;
+   uint8 aligned;
+   uint8 indirect;
+
+   uint16 overhead; // add some forced overhead for each mode to avoid getting complex encoding when it doesn't save much
+
+} mode_info;
+
+mode_info modes[] =
+{
+   {   0,0,0,0,0,    32, },
+   {   0,0,0,0,1,   100, },
+   {   0,0,0,1,0,    32, },
+   {   0,0,0,1,1,   100, },
+   {   0,0,1,0,1,   100, },
+   {   0,0,1,1,0,    32, },
+   {   0,0,1,1,1,   200, },
+   {   1,0,0,0,0,   100, },
+   {   1,0,0,0,1,   120, },
+   {   1,1,0,0,0,   100, },
+   {   1,1,0,0,1,   130, },
+   {   1,0,1,0,0,   130, },
+   {   1,0,1,0,1,   180, },
+   {   1,1,1,0,0,   180, },
+   {   1,1,1,0,1,   200, },
+};
+
+#define MODECOUNT  (sizeof(modes)/sizeof(modes[0]))
+#define CLUSTERSIZECOUNT   6    // 8,16, 32,64,  128,256
+
+size_t size_for_max_number(uint32 number)
+{
+   if (number == 0) return 0;
+   if (number < 256) return 1;
+   if (number < 256*256) return 2;
+   if (number < 256*256*256) return 3;
+   return 4;
+}
+
+size_t size_for_max_number_aligned(uint32 number)
+{
+   size_t n = size_for_max_number(number);
+   return n == 3 ? 4 : n;
+}
+
+uval get_data(uval *data, int offset, uval *end)
+{
+   if (data + offset >= end)
+      return 0;
+   else
+      return data[offset];
+}
+
+int safe_len(uval *data, int len, uval *end)
+{
+   if (len > end - data)
+      return end - data;
+   return len;
+}
+
+uval tempdata[256];
+int dirty=0;
+
+size_t find_packed(uval **packed, uval *data, int len, int aligned, int fastpath, uval *end, int offset, int replace)
+{
+   int packlen = stb_arr_len(*packed);
+   int i,p;
+
+   if (data+len > end || replace) {
+      int safelen = safe_len(data, len, end);
+      memset(tempdata, 0, dirty*sizeof(tempdata[0]));
+      memcpy(tempdata, data, safelen * sizeof(data[0]));
+      data = tempdata;
+      dirty = len;
+   }
+   if (replace) {
+      int i;
+      int safelen = safe_len(data, len, end);
+      for (i=0; i < safelen; ++i)
+         if (data[i] == 0)
+            data[i] = offset+i;
+   }
+
+   if (len <= 0)
+      return 0;
+   if (!fastpath) {
+      if (aligned) {
+         for (i=0; i < packlen; i += len)
+            if ((*packed)[i] == data[0] && 0==memcmp(&(*packed)[i], data, len * sizeof(uval)))
+               return i / len;
+      } else {
+         for (i=0; i < packlen-len+1; i +=  1 )
+            if ((*packed)[i] == data[0] && 0==memcmp(&(*packed)[i], data, len * sizeof(uval)))
+               return i;
+      }
+   }
+   p = stb_arr_len(*packed);
+   for (i=0; i < len; ++i)
+      stb_arr_push(*packed, data[i]);
+   return p;
+}
+
+void output_table(char *name1, char *name2, uval *data, int length, int sign, char **names)
+{
+   char temp[20];
+   uval maxv = 0;
+   int bytes, numlen, at_newline;
+   int linelen = 79; // @TODO: make table more readable by choosing a length that's a multiple?
+   int i,pos, do_split=0;
+   for (i=0; i < length; ++i)
+      if (sign)
+         maxv = stb_max(maxv, (uval)abs((int)data[i]));
+      else
+         maxv = stb_max(maxv, data[i]);
+   bytes = size_for_max_number_aligned(maxv);
+   sprintf(temp, "%d", maxv);
+   numlen=strlen(temp);
+   if (sign)
+      ++numlen;
+   
+   if (bytes == 0)
+      return;
+
+   printf("uint%d %s%s[%d] = {\n", bytes*8, name1, name2, length);
+   at_newline = 1;
+   for (i=0; i < length; ++i) {
+      if (pos + numlen + 2 > linelen) {
+         printf("\n");
+         at_newline = 1;
+         pos = 0;
+      }
+      if (at_newline) {
+         printf("  ");
+         pos = 2;
+         at_newline = 0;
+      } else {
+         printf(" ");
+         ++pos;
+      }
+      printf("%*d,", numlen, data[i]);
+      pos += numlen+1;
+   }
+   if (!at_newline) printf("\n");
+   printf("};\n");
+}
+
+void output_table_with_trims(char *name1, char *name2, uval *data, int length)
+{
+   uval maxt=0, maxp=0;
+   int i,d,s,e, count;
+   // split the table into two pieces
+   uval *trims = NULL;
+
+   if (length == 0)
+      return;
+
+   for (i=0; i < stb_arr_len(data); ++i) {
+      stb_arr_push(trims, data[i] >> 16);
+      data[i] &= 0xffff;
+      maxt = stb_max(maxt, trims[i]);
+      maxp = stb_max(maxp, data[i]);
+   }
+
+   d=s=e=1;
+   if (maxt >= 256) {
+      // need to output start & end values
+      if (maxp >= 256) {
+         // can pack into a single table
+         printf("struct { uint16 val; uint8 start, end; } %s%s[%d] = {\n", name1, name2, length);
+      } else {
+         output_table(name1, name2, data, length, 0, 0);
+         d=0;
+         printf("struct { uint8 start, end; } %s%s_trim[%d] = {\n", name1, name2, length);
+      }
+   } else if (maxt > 0) {
+      if (maxp >= 256) {
+         output_table(name1, name2, data, length, 0, 0);
+         output_table(name1, stb_sprintf("%s_end", name2), trims, length, 0, 0);
+         return;
+      } else {
+         printf("struct { uint8 val, end; } %s%s[%d] = {\n", name1, name2, length);
+         s=0;
+      }
+   } else {
+      output_table(name1, name2, data, length, 0, 0);
+      return;
+   }
+   // d or s can be zero (but not both), e is always present and last
+   count = d + s + e;
+   assert(count >= 2 && count <= 3);
+
+   {
+      char temp[60];
+      uval maxv = 0;
+      int numlen, at_newline, len;
+      int linelen = 79; // @TODO: make table more readable by choosing a length that's a multiple?
+      int i,pos, do_split=0;
+      numlen = 0;
+      for (i=0; i < length; ++i) {
+         if (count == 2)
+            sprintf(temp, "{%d,%d}", d ? data[i] : (trims[i]>>8), trims[i]&255);
+         else
+            sprintf(temp, "{%d,%d,%d}", data[i], trims[i]>>8, trims[i]&255);
+         len = strlen(temp);
+         numlen = stb_max(len, numlen);
+      }
+   
+      at_newline = 1;
+      for (i=0; i < length; ++i) {
+         if (pos + numlen + 2 > linelen) {
+            printf("\n");
+            at_newline = 1;
+            pos = 0;
+         }
+         if (at_newline) {
+            printf("  ");
+            pos = 2;
+            at_newline = 0;
+         } else {
+            printf(" ");
+            ++pos;
+         }
+         if (count == 2)
+            sprintf(temp, "{%d,%d}", d ? data[i] : (trims[i]>>8), trims[i]&255);
+         else
+            sprintf(temp, "{%d,%d,%d}", data[i], trims[i]>>8, trims[i]&255);
+         printf("%*s,", numlen, temp);
+         pos += numlen+1;
+      }
+      if (!at_newline) printf("\n");
+      printf("};\n");
+   }
+}
+
+int weight=1;
+
+table pack_for_mode(table *t, int mode, char *table_name)
+{
+   size_t extra_size;
+   int i;
+   uval maxv;
+   mode_info mi = modes[mode % MODECOUNT];
+   int size = 8 << (mode / MODECOUNT);
+   table newtab;
+   uval *packed = NULL;
+   uval *index = NULL;
+   uval *indirect = NULL;
+   uval *specials = NULL;
+   newtab.dont_care = UVAL_DONT_CARE_DEFAULT;
+   if (table_name)
+      printf("// clusters of %d\n", size);
+   for (i=0; i < t->length; i += size) {
+      uval newval;
+      int fastpath = (i < t->fastpath);
+      if (mi.special) {
+         int end_trim = size-1;
+         int start_trim = 0;
+         uval special;
+         // @TODO: pick special from start or end instead of only end depending on which is longer
+         for(;;) {
+            special = t->input[i + end_trim];
+            if (special != t->dont_care || end_trim == 0)
+               break;
+            --end_trim;
+         }
+         // at this point, special==inp[end_trim], and end_trim >= 0
+         if (special == t->dont_care && !fastpath) {
+            // entire block is don't care, so OUTPUT don't care
+            stb_arr_push(index, newtab.dont_care);
+            continue;
+         } else {
+            uval pos, trim;
+            if (mi.trim_end && !fastpath) {
+               while (end_trim >= 0) {
+                  if (t->input[i + end_trim] == special || t->input[i + end_trim] == t->dont_care)
+                     --end_trim;
+                  else
+                     break;
+               }
+            }
+
+            if (mi.trim_start && !fastpath) {
+               while (start_trim < end_trim) {
+                  if (t->input[i + start_trim] == special || t->input[i + start_trim] == t->dont_care)
+                     ++start_trim;
+                  else
+                     break;
+               }
+            }
+
+            // end_trim points to the last character we have to output
+
+            // find the first match, or add it
+            pos = find_packed(&packed, &t->input[i+start_trim], end_trim-start_trim+1, mi.aligned, fastpath, &t->input[t->length], i+start_trim, t->replace_fallback_with_codepoint);
+
+            // encode as a uval
+            if (!mi.trim_end) {
+               if (end_trim == 0)
+                  pos = special;
+               else
+                  pos = pos | 0x80000000;
+            } else {
+               assert(end_trim < size && end_trim >= -1);
+               if (!fastpath) assert(end_trim < size-1); // special always matches last one
+               assert(end_trim < size && end_trim+1 >= 0);
+               if (!fastpath) assert(end_trim+1 < size);
+
+               if (mi.trim_start)
+                  trim = start_trim*256 + (end_trim+1);
+               else
+                  trim = end_trim+1;
+
+               assert(pos < 65536); // @TODO: if this triggers, just bail on this search path
+               pos = pos + (trim << 16);
+            }
+
+            newval = pos;
+
+            stb_arr_push(specials, special);
+         }
+      } else if (mi.trim_end) {
+         int end_trim = size-1;
+         int start_trim = 0;
+         uval pos, trim;
+
+         while (end_trim >= 0 && !fastpath)
+            if (t->input[i + end_trim] == t->fallback || t->input[i + end_trim] == t->dont_care)
+               --end_trim;
+            else
+               break;
+
+         if (mi.trim_start && !fastpath) {
+            while (start_trim < end_trim) {
+               if (t->input[i + start_trim] == t->fallback || t->input[i + start_trim] == t->dont_care)
+                  ++start_trim;
+               else
+                  break;
+            }
+         }
+
+         // end_trim points to the last character we have to output, and can be -1
+         ++end_trim; // make exclusive at end
+
+         if (end_trim == 0 && size == 256)
+            start_trim = end_trim = 1;  // we can't make encode a length from 0..256 in 8 bits, so restrict end_trim to 1..256
+
+         // find the first match, or add it
+         pos = find_packed(&packed, &t->input[i+start_trim], end_trim - start_trim, mi.aligned, fastpath, &t->input[t->length], i+start_trim, t->replace_fallback_with_codepoint);
+
+         assert(end_trim <= size && end_trim >= 0);
+         if (size == 256)
+            assert(end_trim-1 < 256 && end_trim-1 >= 0);
+         else
+            assert(end_trim < 256 && end_trim >= 0);
+         if (size == 256)
+            --end_trim;
+
+         if (mi.trim_start)
+            trim = start_trim*256 + end_trim;
+         else
+            trim = end_trim;
+
+         assert(pos < 65536); // @TODO: if this triggers, just bail on this search path
+         pos = pos + (trim << 16);
+
+         newval = pos;
+      } else {
+         newval = find_packed(&packed, &t->input[i], size, mi.aligned, fastpath, &t->input[t->length], i, t->replace_fallback_with_codepoint);
+      }
+
+      if (mi.indirect) {
+         int j;
+         for (j=0; j < stb_arr_len(indirect); ++j)
+            if (indirect[j] == newval)
+               break;
+         if (j == stb_arr_len(indirect))
+            stb_arr_push(indirect, newval);
+         stb_arr_push(index, j);
+      } else {
+         stb_arr_push(index, newval);
+      }
+   }
+
+   // total up the new size for everything but the index table
+   extra_size = mi.overhead * weight; // not the actual overhead cost; a penalty to avoid excessive complexity
+   extra_size += 150; // per indirection
+   if (table_name)
+      extra_size = 0;
+   
+   if (t->has_sign) {
+      // 'packed' contains two values, which should be packed positive & negative for size
+      uval maxv2;
+      for (i=0; i < stb_arr_len(packed); ++i)
+         if (packed[i] & 0x80000000)
+            maxv2 = stb_max(maxv2, packed[i]);
+         else
+            maxv  = stb_max(maxv, packed[i]);
+      maxv = stb_max(maxv, maxv2) << 1;
+   } else {
+      maxv = 0;
+      for (i=0; i < stb_arr_len(packed); ++i)
+         if (packed[i] > maxv && packed[i] != t->dont_care)
+            maxv = packed[i];
+   }
+   extra_size += stb_arr_len(packed) * (t->splittable ? size_for_max_number(maxv) : size_for_max_number_aligned(maxv));
+   if (table_name) {
+      if (t->splittable)
+         output_table_with_trims(table_name, "", packed, stb_arr_len(packed));
+      else
+         output_table(table_name, "", packed, stb_arr_len(packed), t->has_sign, NULL);
+   }
+
+   maxv = 0;
+   for (i=0; i < stb_arr_len(specials); ++i)
+      if (specials[i] > maxv)
+         maxv = specials[i];
+   extra_size += stb_arr_len(specials) * size_for_max_number_aligned(maxv);
+   if (table_name)
+      output_table(table_name, "_default", specials, stb_arr_len(specials), 0, NULL);
+
+   maxv = 0;
+   for (i=0; i < stb_arr_len(indirect); ++i)
+      if (indirect[i] > maxv)
+         maxv = indirect[i];
+   extra_size += stb_arr_len(indirect) * size_for_max_number(maxv);
+
+   if (table_name && stb_arr_len(indirect)) {
+      if (mi.trim_end)
+         output_table_with_trims(table_name, "_index", indirect, stb_arr_len(indirect));
+      else {
+         assert(0); // this case should only trigger in very extreme circumstances
+         output_table(table_name, "_index", indirect, stb_arr_len(indirect), 0, NULL);
+      }
+      mi.trim_end = mi.special = 0;
+   }
+
+   if (table_name)
+      printf("// above tables should be %d bytes\n", extra_size);
+
+   maxv = 0;
+   for (i=0; i < stb_arr_len(index); ++i)
+      if (index[i] > maxv && index[i] != t->dont_care)
+         maxv = index[i];
+   newtab.splittable = mi.trim_end;
+   newtab.input_size = newtab.splittable ? size_for_max_number(maxv) : size_for_max_number_aligned(maxv);
+   newtab.input = index;
+   newtab.length = stb_arr_len(index);
+   newtab.inherited_storage = t->inherited_storage + extra_size;
+   newtab.fastpath = 0;
+   newtab.depth = t->depth+1;
+   stb_arr_free(indirect);
+   stb_arr_free(packed);
+   stb_arr_free(specials);
+
+   return newtab;
+}
+
+result pack_table(table *t, size_t path, int min_storage)
+{
+   int i;
+   result best;
+   best.size = t->inherited_storage + t->input_size * t->length;
+   best.path = path;
+
+   if ((int) t->inherited_storage > min_storage) {
+      best.size = stb_max(best.size, t->inherited_storage);
+      return best;
+   }
+
+   if (t->length <= 256 || t->depth >= 4) {
+      //printf("%08x: %7d\n", best.path, best.size);
+      return best;
+   }
+
+   path <<= 7;
+   for (i=0; i < MODECOUNT * CLUSTERSIZECOUNT; ++i) {
+      table newtab;
+      result r;
+      newtab = pack_for_mode(t, i, 0);
+      r = pack_table(&newtab, path+i+1, min_storage);
+      if (r.size < best.size)
+         best = r;
+      stb_arr_free(newtab.input);
+      //printf("Size: %6d + %6d\n", newtab.inherited_storage, newtab.input_size * newtab.length);
+   }
+   return best;
+}
+
+int pack_table_by_modes(table *t, int *modes)
+{
+   table s = *t;
+   while (*modes > -1) {
+      table newtab;
+      newtab = pack_for_mode(&s, *modes, 0);
+      if (s.input != t->input)
+         stb_arr_free(s.input);
+      s = newtab;
+      ++modes;
+   }
+   return s.inherited_storage + s.input_size * s.length;
+}
+
+int strip_table(table *t, int exceptions)
+{
+   uval terminal_value;
+   int p = t->length-1;
+   while (t->input[p] == t->dont_care)
+      --p;
+   terminal_value = t->input[p];
+
+   while (p >= 0x10000) {
+      if (t->input[p] != terminal_value && t->input[p] != t->dont_care) {
+         if (exceptions)
+            --exceptions;
+         else
+            break;
+      }
+      --p;
+   }
+   return p+1; // p is a character we must output
+}
+
+void optimize_table(table *t, char *table_name)
+{
+   int modelist[3] = { 85, -1 };
+   int modes[8];
+   int num_modes = 0;
+   int decent_size;
+   result r;
+   size_t path;
+   table s;
+
+   // strip tail end of table
+   int orig_length = t->length;
+   int threshhold = 0xffff;
+   int p = strip_table(t, 2);
+   int len_saved = t->length - p;
+   if (len_saved >= threshhold) {
+      t->length = p;
+      while (p > 0x10000) {
+         p = strip_table(t, 0);
+         len_saved = t->length - p;
+         if (len_saved < 0x10000)
+            break;
+         len_saved = orig_length - p;
+         if (len_saved < threshhold)
+            break;
+         threshhold *= 2;
+      }
+   }
+
+   t->depth = 1;
+
+
+   // find size of table if we use path 86
+   decent_size = pack_table_by_modes(t, modelist);
+
+
+   #if 1
+   // find best packing of remainder of table by exploring tree of packings
+   r = pack_table(t, 0, decent_size);
+   // use the computed 'path' to evaluate and output tree
+   path = r.path;
+   #else
+   path = 86;//90;//132097;
+   #endif
+
+   while (path) {
+      modes[num_modes++] = (path & 127) - 1;
+      path >>= 7;
+   }
+
+   printf("// modes: %d\n", r.path);
+   s = *t;
+   while (num_modes > 0) {
+      char name[256];
+      sprintf(name, "%s_%d", table_name, num_modes+1);
+      --num_modes;
+      s = pack_for_mode(&s, modes[num_modes], name);
+   }
+   // output the final table as-is
+   if (s.splittable)
+      output_table_with_trims(table_name, "_1", s.input, s.length);
+   else
+      output_table(table_name, "_1", s.input, s.length, 0, NULL);
+}
+
+uval unicode_table[0x110000];
+
+typedef struct
+{
+   uval lo,hi;
+} char_range;
+
+char_range get_range(char *str)
+{
+   char_range cr;
+   char *p;
+   cr.lo = strtol(str, &p, 16);
+   p = stb_skipwhite(p);
+   if (*p == '.')
+      cr.hi = strtol(p+2, NULL, 16);
+   else
+      cr.hi = cr.lo;
+   return cr;
+}
+
+char *skip_semi(char *s, int count)
+{
+   while (count) {
+      s = strchr(s, ';');
+      assert(s != NULL);
+      ++s;
+      --count;
+   }
+   return s;
+}
+
+int main(int argc, char **argv)
+{
+   table t;
+   uval maxv=0;
+   int i,n=0;
+   char **s = stb_stringfile("../../data/UnicodeData.txt", &n);
+   assert(s);
+   for (i=0; i < n; ++i) {
+      if (s[i][0] == '#' || s[i][0] == '\n' || s[i][0] == 0)
+         ;
+      else {
+         char_range cr = get_range(s[i]);
+         char *t = skip_semi(s[i], 13);
+         uval j, v;
+         if (*t == ';' || *t == '\n' || *t == 0)
+            v = 0;
+         else {
+            v = strtol(t, NULL, 16);
+            if (v < 65536) {
+               maxv = stb_max(v, maxv);
+               for (j=cr.lo; j <= cr.hi; ++j) {
+                  unicode_table[j] = v;
+                  //printf("%06x => %06x\n", j, v);
+               }
+            }
+         }
+      }
+   }
+
+   t.depth = 0;
+   t.dont_care = UVAL_DONT_CARE_DEFAULT;
+   t.fallback = 0;
+   t.fastpath = 256;
+   t.inherited_storage = 0;
+   t.has_sign = 0;
+   t.splittable = 0;
+   t.input = unicode_table;
+   t.input_size = size_for_max_number(maxv);
+   t.length = 0x110000;
+   t.replace_fallback_with_codepoint = 1;
+
+   optimize_table(&t, "stbu_upppercase");
+   return 0;
+}
diff --git a/vendor/stb/tools/unicode/unicode.dsp b/vendor/stb/tools/unicode/unicode.dsp
new file mode 100644
index 0000000..78e6a5b
--- /dev/null
+++ b/vendor/stb/tools/unicode/unicode.dsp
@@ -0,0 +1,88 @@
+# Microsoft Developer Studio Project File - Name="unicode" - Package Owner=<4>
+# Microsoft Developer Studio Generated Build File, Format Version 6.00
+# ** DO NOT EDIT **
+
+# TARGTYPE "Win32 (x86) Console Application" 0x0103
+
+CFG=unicode - Win32 Debug
+!MESSAGE This is not a valid makefile. To build this project using NMAKE,
+!MESSAGE use the Export Makefile command and run
+!MESSAGE 
+!MESSAGE NMAKE /f "unicode.mak".
+!MESSAGE 
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f "unicode.mak" CFG="unicode - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "unicode - Win32 Release" (based on "Win32 (x86) Console Application")
+!MESSAGE "unicode - Win32 Debug" (based on "Win32 (x86) Console Application")
+!MESSAGE 
+
+# Begin Project
+# PROP AllowPerConfigDependencies 0
+# PROP Scc_ProjName ""
+# PROP Scc_LocalPath ""
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "unicode - Win32 Release"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 0
+# PROP BASE Output_Dir "Release"
+# PROP BASE Intermediate_Dir "Release"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 0
+# PROP Output_Dir "Release"
+# PROP Intermediate_Dir "Release"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
+# ADD BASE RSC /l 0x409 /d "NDEBUG"
+# ADD RSC /l 0x409 /d "NDEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
+
+!ELSEIF  "$(CFG)" == "unicode - Win32 Debug"
+
+# PROP BASE Use_MFC 0
+# PROP BASE Use_Debug_Libraries 1
+# PROP BASE Output_Dir "Debug"
+# PROP BASE Intermediate_Dir "Debug"
+# PROP BASE Target_Dir ""
+# PROP Use_MFC 0
+# PROP Use_Debug_Libraries 1
+# PROP Output_Dir "Debug"
+# PROP Intermediate_Dir "Debug"
+# PROP Target_Dir ""
+# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ  /c
+# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ  /c
+# ADD BASE RSC /l 0x409 /d "_DEBUG"
+# ADD RSC /l 0x409 /d "_DEBUG"
+BSC32=bscmake.exe
+# ADD BASE BSC32 /nologo
+# ADD BSC32 /nologo
+LINK32=link.exe
+# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
+
+!ENDIF 
+
+# Begin Target
+
+# Name "unicode - Win32 Release"
+# Name "unicode - Win32 Debug"
+# Begin Source File
+
+SOURCE=..\unicode.c
+# End Source File
+# End Target
+# End Project