Baby LLaMA 2 Optimization for Duo (Kids Book Read Aloud)

Index:S2311

Award:20000Chinese Yuan

Challenge Description

It is a feat to get Baby LLaMA 2 running on small development boards like the Milk-V Duo. This challenge aims to improve Baby LLaMA 2 performance on Milk-V Duo, as assessed by tokens processed per second. The participant will make use of lightweight techniques and compiler optimization principles to develop a story-telling robot with support for microphone and command-line input. The demo should output robotic voice via speakers.

Rubrics

Submissions will be evaluated based on both accuracy and performance, each submission will be given the same input and assessed based on reference testsuites for accuracy and performance.
Accuracy will be assessed based on a delta difference between submitted output and reference output, evaluating the application of optimization techniques and their effect on the accuracy of model inference. Performance will be assessed based on the number of tokens processed per second.
Time elapsed during Text-to-Speech (TTS) computation.
The host will assess both accuracy and performance and assign scores to each submission based on a predetermined weighted scale. The highest scoring submission wins.

Submission

Please submit your completed project at https://github.com/plctlab/rvspoc-s2311-llama2 as a pull request.
Please outline the software environment needed to run your submission in your pull request body.
During the competition, all optimised artifacts may be submitted as the following:
1. Binary files.
2. Encrypted code (please send decryption details to rvspoc@cyberlimes.cn).
3. Source code.
After the results are announced, you will need to publish source code for all submissions.
The committee will close pull request submissions after the challenge’s run time (on ~~February 16~~February 29, 2024) and commence assessment of submissions.

Assessment

The assessment platform for this challenge is Duo (64MB model).
- We will use a specific microSD (TF) card for assessment.
- Swap usage is not recommended.
- No overclocking will be applied.
- No additional cooling will be provided.
We do not limit the operating systems used on the Duo.
Baby Llama 2’s repository can be found at https://github.com/karpathy/llama2.c.
Participants must submit:
- An SD card system image for the Duo board.
- Regression test pass rate and detailed testing results.
- Performance test pass rate and detailed testing results.
- Testing challenge and OpenCV runtime library, testing procedures and instructions, and scripts.
The committee will benchmark and average the results from the following software configuration, running on the hardware platform as outlined in item 1, as a baseline for assessing the optimized submissions:
1. Baby Llama 2 binary built from unedited C files using -Ofast.
2. Official cross compiler toolchain provided by Milk-V (version to be announced).
3. Default Milk-V Duo system image (version to be announced).
4. Identical input and parameters (to be announced).
5. Large Language Model files used are as follows:
  - We will use the 15M model in the Baby Llama 2 repository (pure FP32 model) for pure FP32 inference (run.c).
  - SHA-256 checksum：cd590644d963867a2b6e5a1107f51fad663c41d79c149fbecbbb1f95fa81f49a
Correctness assessment: As we will be using a specific input (with a specific seed), the output should be consistent across different runs. The committee will therefore compare the output with the run.c binary generated, as outlined in item 4. Any discrepancies will be assessed in accordance with common human grammatical language - correct grammar will not be penalized (penalties for incorrect grammar are to be decided).
Performance assessment: All submissions will be run 30 times. The results will be taken average after removing the significant highest and lowest scores.
If the text-to-speech (TTS) feature is implemented, extra credits will be awarded based on completeness of implementation (the amounts of extra credits are to be decided).
The final assessment will contain remarks on each assessment items.
The details of the challenges may change during the competition as we optimise and clarify the rubrics and instructions. Please keep an eye out for updates. The assessment committee reserves the right to final interpretation of all rubrics and instructions.

Notice on Intellectual Properties and Open Source Licenses

All challenge submissions must be open source and committed to a specified repository. The participant(s) (author) holds rights to their work. The host encourages contributing any changes made to the upstream.

Resources

Live Replay & Documents (Chinese)： plctlab/rvspoc:archives/2023/Docs/S2311