Using Next.js, it is possible to upload sites whose pages have been statically generated very easily. To do this, you just have to develop your web page as a "template" and tell Next.js how to retrieve the data for each statically generated path. It is exactly this process that is implemented on Kanjisho when you access the page of a kanji for example.
Next.js calls this feature SSG (for Static Site Generation) and allows its implementation very simply: once your page has been created, you just need to add two functions getStaticPaths and getStaticProps in your file to activate the SSG, then you you will need to rename your page so that it can "capture" the path your user wants to go to. In my case, my page is located in /pages/kanji/[kanji].tsx because I wanted the end paths to be of the form /kanji/月.
First, you need to inform Next.js of all the possible paths for this page. If you are designing a blog, you will typically have one path per article. In my case, I have one different path by kanji. I am using TypeScript for Kanjisho, so I include in my examples the PathsProps interface used and the type of getStaticPaths. More information on the official Next.js documentation.
My work has mainly focused on the function that I called getKanjisPaths and which returns a structured array with a unique path per kanji. I did not want to use an API to retrieve the list of existing kanjis for several reasons: my internet connection is slow, I did not want (in doubt) to overload an external service with requests at build-time and above all I did not want to depend on an external service. If I use an API and it stops working from one day to another, my site will no longer be able to build. There is a big advantage in building pages using SSG for kanjis: the informations about them will (almost) never change.
I then decided to download all the data from the KanjiAPI site in JSON format (~ 90MB of data) and parse it at build-time to retrieve the informations I needed. It's perfect: my data is saved locally, and if I want to update it, I can always re-download a newer version on KanjiAPI.
Maintenant que vous avez défini quels sont les chemins acceptés par Next.js pour votre page, il va être nécessaire de fournir à chacun des chemins ses différentes props pour injecter du contenu dans votre page "template".
Now that you have defined which paths are accepted by Next.js for your page, it will be necessary to provide each of the paths with its different props to inject content into your "template" page.
After having retrieved the kanji parameter passed by the getStaticPaths function allowing to identify for which path we want to retrieve the props, I call a getKanjiProps function that I created, returning an object of KanjiPageProps type. This object is then passed as props to my page, where I can then access my data using props.kanji.stroke_count for example.
Similar to getStaticPaths, I get kanji-related data in my big JSON file, build my Kanji-type object, and return the props object my page needs. It's finish ! Our pages can now be generated in SSG without problem.
In the context of a blog with articles, we generally make a call to the API of a Headless CMS to retrieve the slugs of all our articles (getStaticPaths), then one API call by slug to retrieve the content of each article (getStaticProps). For Kanjisho, in order to avoid a phenomenal amount of calls to an API, it was best to download a large JSON file containing all the data.
At build time, Next.js will call the getStaticPaths for the [kanji] .tsx page function only once, then it will call getStaticProps as many times as there are paths returned by getStaticPaths. This means that when building Kanjisho, the [kanji] .tsx page requires 1 + 13026 reads in a 90MB JSON file, which means that it is essential to think your algorithms to be fast and your file to be easily browsable. .
At first, my static generation was extremely slow, and it happened to me many times to get errors during the deployment of the site on Vercel related to a lack of memory for during build phase.
I first thought that the problem could be related to Next.js or Vercel, but then I decided to dig into my code to find the sources that could potentially create memory problems. I initially thought that importing my JSON file using an import (or require) cached my file automatically, but it absolutely wasn't the case! So I designed a very simple cache system and loaded my JSON file using the filesystem. Using the filesystem is possible because the Next.js build process is considered as a server side operation and not a client side operation.
Thanks to this system, the number of times I opened my JSON file went from 13,027 to just 1 opening. My spatial complexity went from O(n) to O(1) and the memory used became ridiculously low.
As part of the static generation of other pages like the kanjis index page displaying kanjis sorted by number of strokes, I encountered another problem: the static generation was taking way too much time and my builds ended up with a timeout error on Vercel after 45 minutes. This problem was not related to the spatial complexity of my getStaticProps function, but rather its time complexity.
To provide context, I wanted to generate in SSG pages of the form /kanjis/strokes/[count]/[page].tsx. The objective was therefore to sort the kanjis by number of strokes then to paginate the results, all at build time, so I don't have to do it client-side, which would have been poor in terms of performance (1167 kanjis for 12 strokes for example).
The problem here was my way of handling the problem: I performed in getStaticProps the work of parsing the JSON file, then retrieving the kanjis for the number of strokes requested, then slicing the results to return the desired data to my page. This work had not been optimized at all and passing the details, the time complexity exceeded O(n²). With the possible combinations of paths (33 possible values for [count] and the values of [page] variables), this function alone was called more than 100 times. I also have two other index pages for kanjis (sorted by JLPT, then by grade) which used the same algorithms to split and paginate kanjis. In the end, we are talking about more than 200 calls to a getStaticProps function whose O(n²) algorithms process a file which size is 90MB. I had found the source of my timeout.
The solution was ultimately simple: set up scripts to parse my initial data.json file and generate a formatted file for this use: data_strokes.json, data_jlpt.json and data_grades.json. These files, once generated, correspond to a JSON object which associates for each [count] an object which for each [page] associates a list of kanjis. The file is smaller and accessing the kanji data for 12 strokes on page 3 is like writing dataStrokes: the time complexity has dropped to O(1).
It is quite easy to generate pages using the SSG feature of Next.js. It is more difficult, when you want to scale, to do this efficiently. As part of Kanjisho's development, I went from a build taking over 8GB of memory and over 55 minutes to complete to a 3-minute build that only consumes 90MB of memory.
It is important to question the quality of its algorithms and its memory accesses in this kind of situation in order not to destroy your development experience, or your serverless host.
I would like to draw your attention to the fact that Kanjisho uses multiple sources of data provided by third parties.
The search results are provided by the wonderful Jisho API which itself uses various data sources which you can find on the jisho's website. The data concerning the kanji strokes are provided by the KanjiVG project, under the CC BY 3.0 FR license. I also use Tatoeba to provide example sentences and Kuroshiro to obtain the furiganas and romajis versions of the sentences.
If you find any vices, errors or malfunctions, please let me know so that I can correct them.