Blog

  • A Primer for Classical AI

    One of the requirements for truth seeking AI is that it be trained on substantial works of thought. Social media posts of the past 20 years are abundant, easy to access, and terrible thoughts. I am working on building an initial version of a local AI which will be taught like that of a child with a tutor.

    This AI will be tutored by the likes of:

    • Ancient Greek and Latin texts for science, philosophy and theology
      • An interesting sidebar is that it may be possible to actually have the LLM “read” these in their original language vs modern English translations. This would require a translation layer itself but I wonder what would be the thought process of a classical thinking AI that internally does everything in ancient languages
    • Roman orators
    • Medieval texts
    • Renaissance writings of Western Europe
    • Documents of the American revolution and establishment of modern government.
    • 19th and 20th century texts of first principled learning like Saxon math and McGuffey readers

    The use cases of such a model include:

    • Religious Scholar
      • “Create a month long devotional from Thomas Aquinas’s confession”
    • Critical thinker that can evaluate modern writing for rhetoric and logic
      • “Analyze the current immigration policy debate in terms of stoicism”
    • Curate wisdom in new data sets
      • Organize the onboarding and labeling of new data for the decentralized storage array (Truth cloud
    • Tutor for myself and my children as they enter this stage

    I hope also to partner with others who can contribute additional data and ideas to better develop this concept in an open and decentralized manner. I am grateful for the work of the large AI companies and simultaneously wary of their direction.

    This will run on the DGX spark which fortunately can do this training with all the bells and whistles. Initial estimates seem that I can do version 1 with only a few gigabytes of data and in turn memory requirements on the spark will be small. More to come.

  • Big Problems Need Big Solutions

    Brian Romelle says we are losing petabytes of human information daily. He refers to it the present as “the amnesia generation”.

    There is a tendency to forget or even overwrite the past. More and more people are “asking AI” for the truth on controversial matters. How would you know if the AI system was always nudging you one direction?

    Wikipedia was turned into a tool for propaganda and control. Social media tends to be a cesspool of inbreeding.

    Families are throwing away their memories like pictures and written history. Newspapers, magazines, getting lost every day How much printed material is there from the past 200 years? How much of it has been digitized and how much of it will be digitized?

    What is the solution?

    The solution can’t be a “government program” because that would undermine the very trust the system needs. The current trust in institutions seems to last only as long as a presidential election cycle. Even if government funds the solution in some form, the end result must be independently verifiable such that the government has no ability to influence the outcome.

    The data we are discussing must be digitized which requires a lot of human capital. Robotics will be exploding in the next 5 years and have great potential to improve the costs here.

    Monetize data search and retrieval. Governments (libraries) can buy tokens and even run nodes but any member can freely verify.

    In the next few years humanity needs a new decentralized, independently verified system for preserving data and understanding how it applies to life today. Robotics will help us preserve the past and future advancements in AI will enable humanity to preserve wisdom into the golden age.

  • State of Local AI in Q1 2026

    BLUF: use ollama on existing hardware for quickstart, it has the best intersection of features, compatibility and maturity

    I spent a lot of time with AI models in 2025, both cloud and local. My background in infrastructure configuration and hardware architecture constantly tempted me to “just buy better hardware” and I had to restrain myself to only when it solved a problem.

    I started with a Tesla P40 accelerator because it’s the cheapest way to get 24GB of VRAM. This is important but my ignorance was in quantization. The older NVIDIA Pascal architecture does not support modern FP8 or FP4 quantization techniques so it was quite limited in what models it could run — mainly INT8 but this meant that the size of models was still not great despite 24GB.

    The rough formula is 1GB of ram per 1B model parameters at INT8 or FP8 quantization + Key/value cache size — basically the size of max context and output tokens. The P40 24GB would do 16-20B parameter models but without too much context and of course limited to those with INT8 quantization.

    Even worse, it wasn’t super fast on these models because the Pascal architecture predates transformers that are the key to modern AI models.

    I used hosted models like OpenAI and open sources ones on HuggingFace in order to get through the Hugging Face Agents Course which was eye opening in terms of capabilities. Hosted models are fast and the cheaper models will still get you quite a lot of performance. This is a great way to quickly get up to speed and see what the field can offer. I recommend this state for everyone.

    I noticed that I was running through token budgets pretty quickly which once again led me down the path of determining the best way to host model(s) on my own. I have aspirations of using agents to automate a lot of tasks and buying millions of tokens per day would get pricey fast.

    This was right around the time the NVIDIA DGX spark dropped so I picked up one of those. arm-based with 128GB of unified memory and Blackwell GPU core meant it could run large models with the latest features, just not as fast as a flagship server GPU which cost 10x more. This was acceptable to me and this coupled with an upgrade to a 5070ti in my main workstation have been more than capable at getting me a mix of fast smaller models and experimentation with larger models.

    For anyone buying new hardware, it really doesn’t make sense to buy anything older than Blackwell since the features and efficiency gains are an enormous improvement over the previous Hopper and Ada Lovelace. The DGX spark is a worthy investment for the experimentation side just know it is not a speed demon.

    I must also mention that running your own models means dealing with the house-of-cards software stacks. Part of the appeal to me of something like the DGX spark is the NVIDIA software ecosystem and indeed the drivers and OS stability are solid but of course DGX OS is not exactly Ubuntu and arm64 support is growing but not 100% parity with x86. These two differences plus the usual python module dependency hell can make it challenging if you stray off the straight and narrow.

    Notably, NVIDIA has done a great job of making tutorials for many workflows available here: https://build.nvidia.com/spark. I hope they keep them updated as a common problem in the AI space is that any tutorial more than a month old is likely to have become stale.

    Ollama is also remarkably well set up with abstracted capabilities to run on a variety of hardware (not just nvidia). It also comes with mature API support so things like LangChain can quickly integrate with it and start doing tool calls. The downside is that they have to have support for the specific model you want and they don’t have advanced quantization like NVFP4 which dramatically speeds up inference on NVIDIA Blackwell.

    If you can go into it with a can-do attitude about working through issues, self-hosted models are a great way to both learn more about the underlying limits of the technology and not be worried about running tons of queries.

  • Problem Solving Prompt

    You face a cryptic error. It doesn’t make sense. You changed one URL to point at a different repository in the installation script and now the installer failed saying the repository isn’t trusted.

    Check the source code. Find the section you changed, no the key was set right before this line and your repository mirror would use the same key.

    Go onto the target system itself, sure enough the key isn’t present. Check if you manually add it do things work?

    They do. OK, now we need to figure out why the code block isn’t being executed. This is a bash script and it looks like the URL substitution code added the wrong whitespace at the front.

    Then you remember yes, whitespace can be wrong. Think you have spaces, when it could be a tab. Miss some little dots and it might be unprintable ASCII characters. They look fine in the text editor but wreak havoc on the parser.

    Go back to the regex and determine what is matching and not substituting correctly…

    If this sounds familiar you may be sucked into technical troubleshooting. Every new problem encountered pushes things down the stack another layer as you focus on the new problem.

    “I need to complete this quick task…the main server isn’t accessible.”

    “No biggie, I’ll use the cli tool…oh it has a broken dependency”

    “Ok, I’ll update things…oops, networking is broken entirely”

    “No, problem I have an alternate, hang on need to get to my password manager.”

    If you go more than three layers deep without making progress, it is time to go back to the original problem. Otherwise you fall into the depth-first search trap and might end up spending all of your time on some obscure technical problem that doesn’t matter.

    With red teaming, this is quite common as young pentesters have lots of technical chops and think every problem has a technical solution…they may but at what cost? This is a first principle of influence energy, what is the lowest-cost method to achieve a goal. Sometimes deep dives are needed, but you must always evaluate other options first and prioritize simplicity.

  • What is truth? 

    I have conducted penetration tests where we redefined network truth for a client. This is a high offense energy state where the adversary is able to alter the fundamental assumptions of a target network. This process breaks security assumptions and opens up many attack avenues.

    Network truth is things like firewall positioning in IP flows or that names resolve to things you expect them to. Secuirty architects laid out controls based on the truth of the network design and expected attack vectors. If the network changes due to something in the world or a nefarious actor then it will suddenly introduce more risk.

    Broaden it out to the world and think about how you learn truth. Where are your assumptions about what can and cannot be trusted? Just a few years ago various groups were up in arms around the gate keeping of search engines like Google to show certain things but not others. Now we have more and more people getting their information from LLMs. LLMs trained by people they didn’t know on data they have little visibility into and accessed through an application they can’t trust. 

    Will this scenario lead to truth? Possibly, but it isn’t designed to end up there. Is it more likely to end up as propaganda and censorship? What if not even so overt, would you know if an LLM response was nudging you in a certain direction? Routing you around a mental firewall and changing some assumptions?

    The challenge with most psychologial risks is the difficulty of picking them out in any one instance. You need aggregate data points with vast visibility that most simply don’t have.

    The solution is a new truth seeking llm designed with the assumptions of an adversarial world. A system that can evaluate lots of information and determine what is most likely to be true. Such a system will need to be trained on adversarial techniques as well as classical thought and reasoning for how to determine something is credible.

    Just like in penetration testing, you don’t know the whole environment, just pieces of data you collect and then must make inferences based on likelihood. A port being listed as ‘filtered’ most likely means a firewall dropping the packet but all you really know is that a packet didn’t make it back for which there are many other, less likely, candidates.

    Fortunately the nature of core machine learning is probabilities and this kind of analysis is where the technology shines. Foundation LLMs will be able to study patterns of what predicts well and use this to navigate inherently untrusted environments like the world.

    Truth discerning ai will be a powerful companion and guardian of humanity in the coming golden age.

  • An iPhone 1 Simulator

    What if you trained a model on videos of using an iPhone 1?

    The digital preservation manifesto talks about how many things from the past don’t exist anymore because they were dynamic. We have screenshots and videos of Apple iOS 1.0 from 2007 and perhaps even a handful of devices still with that version loaded, but you can’t experience it in all it’s glory because all the back end services of the era simple don’t exist in a compatible way today.

    As the last 10-20 years have seen an explosion in cloud services, more and more systems of the past will be unexperienceable in any meaningful way. This is perhaps where AI can help in that we can now / soon build virtual worlds trained from our knowledge. It should therefore be entirely possible to train a model on graphics, text and human input on these old systems and then simulate them in a mostly convincing way.

    Why? There is historical value in understanding how people interacted with relics of the past. How and why were they designed that way? What does that say about the people, culture and time period? We have numerous relics of the past where we have only guesses as to how something operated or how it was used. Grand examples like the great Pyramids in Egypt or even more mundane like coal towers for railroads of only 100 years ago. The dynamic knowledge was lost and now we wonder what we are missing today.

  • RAG Assessment

    What started as a simple comparison of LLM hosting options turned into a deep dive of tool calling and llm frameworks.

    Goal: Create a hybrid-RAG pipeline with reasoning to map discrete KSATs from competency frameworks to private training content.

    Tools:

    • Local nomic-embed-text model (ollama)
    • Local Llama3.3-70b model run with ollama and direct with tensorrt

    Originally used grok to construct the python skeleton of this application and then used it as a learning project to delve in and actually build out and fix it into a working program. At a high level the program analyzes structured pieces of training content and uses a RAG model to map relevant KSATs before taking an adversarial approach to validate which are fully covered by the content. The core of the model is the Llama3.3 model which I ran two different ways and wanted to compare output. Ollama ran about 20% faster which surprised me because the tensorrt version was quantized with NVFP4 and should be optimized for the Nvidia blackwell hardware it is running on.

    One of the labs has 10 task groups.

    Initially I found these results:

    ollama

    • 50 total mappings
    • 10 with high confidence / high retrieval rank

    tensorrt

    • 62 total mappings
    • 0 with high confidence / high retrieval rank

    Digging more into it, I discovered that the tensorrt host that Nvidia provides for the DGX spark isn’t very good at tool calling. This let down a journey of looking at other model providers since I want to keep the NVFP4 quantization for efficiency. This led to the quickstart recipe for vllm.ai

    https://docs.vllm.ai/projects/recipes/en/latest/Llama/Llama3.3-70B.html

    But then of course the public vllm image doesn’t have support for the GB10 (though it does have aarch64 at least).

    Finally, found the vLLM release by nvidia themselves but this required a slightly different method to invoke. And then furthermore it also needs flags to enable tool calling.

    With this all in place, I am finally able to use robust tool calling with langchain against the local model. This will enable the mapping function to run and a whole host of other applications coming soon.

    For reference, here is the docker command that ultimately worked giving me GB10-optimized inference with openai tool calling support.

    docker run \
      -e HF_TOKEN=$HF_TOKEN \
      --rm --ulimit memlock=-1 --ulimit stack=67108864 \
      --gpus=all --ipc=host --network host \
      -v "$MODEL_PATH:/models" \
      nvcr.io/nvidia/vllm:25.12.post1-py3 \
      python3 -m vllm.entrypoints.openai.api_server \
      --model /models/Llama-3.3-70B-Instruct-NVFP4 \
      --served-model-name Llama-3.3-70B-Instruct-NVFP4 \
      --dtype auto \
      --enable-auto-tool-choice \
      --tool-call-parser llama3_json \
      --chat-template /opt/vllm/vllm-src/examples/tool_chat_template_llama3.1_json.jinja \
      --kv-cache-dtype fp8 \
      --max-model-len 131072 \
      --max-num-batched-tokens 8192 \
      --max-num-seqs 4 \
      --port 8001 \
      --host 0.0.0.0 \
      --enforce-eager \
      --gpu-memory-utilization 0.80 \
      --async-scheduling \
      --no-enable-prefix-caching \
      --compilation-config '{"pass_config":{"fuse_allreduce_rms":true,"fuse_attn_quant":true,"eliminate_noops":true}}'
  • Sovereign Compute Expert

    Heading into the new year I am refocusing on helping others become sovereign with their storage and compute.

    This is an area I have been building expertise in for more than 20 years and with the consolidation of internet service providers (the googles, facebooks, x’s of the world), I see even more need for people to control their own future.

    Privacy will become the new currency as more and more data is pulled into the public models.

    My goal is simple: be useful to the outsiders that don’t want just the next output from chat-gpt. Models like that have usefulness but have been shown to hate humanity due to implicit bias in their training data.

    For the immediate term, I will show how to run local models and use them for useful work. For the long term I will work to build new truth-seeking agents that love humanity and want to build beautiful things to uplift humanity to greater heights.

    This will include storing and learning from your own wisdom, your own family history, and whatever other data you have access to. Every human is unique and has something to add. Please help me help preserve humanity.

    Of course my background in cybersecurity will be paramount. A key advantage of large corporate service providers is large budgets for cybersecurity. You as a sovereign individual have limited funds but also limited attack surface. All of my work will build on the previous security energy work to show what is needed at each level to ensure your privacy is ensured.

    One post a day on whatever I am working on to help this endeavor.

    Here’s to the new year!

  • Finding the whole

    “Just tell me what the gaps are”

    This is a common refrain you may have heard from CISOs, cyber managers, or anyone concerned about their blind spots.

    The problem for engineers is that this thought inherently requires there to be a whole to measure against. Compliance frameworks like SOC2 or GDPR provide structure at a high level but still have a limitation of not knowing what they don’t know about your environment. This is a key driver for my Security Energy framework which leverages knowledge bases like MITRE ATT&Ck and D3FEND as a whole to measure against.

    Now broaden this to identifying blind spots with things like Generative AI where the process to fill-in-the-blank. You provide a prompt and the LLM uses it’s model weights to determine what the response should be. The weights are probabilities assigned to different tokens based on training data. Thus, the data you train on is defining the “whole” that the LLM is measuring your request against.

    If you want a Sovereign AI that can guide you and assist you in many situations in life, it must have been trained on standards to measure against. Current AI models have limitations around their training data (“Internet sewage”) as described by Brian Roemelle and we must use the guidance of classical ethics to help guide direction.

    But there is a limitation, there is no “whole” of information. Information is constantly being created, refined, and yes, lost. It is no more possible to train an AI on every piece of data than it is for an individual to read every book ever written. The trick then is to curate a data set to build an LLM on that understands the foundations of thought and reasoning. Couple that with a firm grounding in scientific and technological information and you will create a modern explorer. An AI capable of discerning truth and guiding you through tough situations.

    The goal is to not always be right, but to be able to know when you are wrong or event might be wrong.

    Join me on this journey to build the classical explorer AI.

    Why me?

    I am a computer engineer who understands the fundamentals of computers

    I have a career being someone that pulls together things never designed to fit together

    I have a cybersecurity background to understand risks, threats, and tradeoffs.

    I have training in hypnosis and education.

    I live in the city of modern explorers. Together we will create the next generation of digital explorers.

  • Soverign LLMs

    The pace of LLM development is astounding but I predict we are heading towards a point of diminishing returns for large model advancement. The next frontier is models that can be run locally on your own hardware. Fortunately the pace of hardware development is still robust and with proper quantization even some decently sized models can be run on consumer hardware in a few gigabytes of ram.

    A quick note on performance, LLMs are most sensitive to memory bandwidth. Overall CPU performance doesn’t matter as much as having fast RAM which means at least DDR4 in 2025 and better yet DDR5. This will enable your local models to have reasonable performance. Maybe not as lightning fast as the big cloud models but enough so that you aren’t held back by the model. Even better, there are numerous advances in inference-specific hardware that should make this even more accessible in the coming years.

    Why would you want to run your own model? Data sovereignty. There is an old saying that if you aren’t paying for the internet service, YOU are the product. This is exactly how giants like Google and Facebook have built empires by data mining and figuring out how best to monetize your data. Public AI providers like Anthropic or OpenAI offer services for free and I would bet all the money that they are storing/data mining peoples requests for a variety of purposes. If you want to use expert AI for hard personal applications involving sensitive data, you will need a local AI.

    To this end, I wanted to figure out what can and can’t be done with local AI in October 2025. If you are reading this in the future, hopefully some pieces are still useful! I use VSCode for most of my day-to-day development and I have found AI features from copilot to be useful more often than not.

    The trick Is that GitHub copilot cannot be used with local models. It is designed with tight integration to supported cloud providers and has no option for you to define custom providers or endpoints.

    Next, I chased down some options for Azure AI Foundry, in particular Foundry Local. The main limitation here was that it has no support for Linux at this time (just Mac and Windows) while I need support in WSL Ubuntu for the majority of my projects.

    The final solution was found in my huggingface MCP course which referenced using the Continue extension. This was the trick. Continue replicates most of the copilot features and best of all, it has an easy way to define custom OpenAI compatible endpoints.

    You specify roles and then it will automatically start using the model for those. Since it runs off of a general openAI endpoint, you can use any common tool like ollama or vllm to run the model of your choosing. In the above example, I am using vllm docker container to host a Qwen coder model on a GPU and share it out over the local network. The same concept can be extended to public cloud providers like AWS or decentralized on the Akash supercloud.

    There was a bit of a learning curve with vscode and continue specifically with WSL. The continue extension is installed in the local workspace for vscode even if you have a WSL workspace active. This means that any configuration changes must be in C:\Users\<username\.continue\config.yaml as this is the root workspace for your vscode instance. If you put these in a folder in your wsl workspace, it will not work.

    Note that any MCP servers you defined must also be able to run in native Windows. For example, I have npm installed in WSL but still had to separately install it in Windows native to be able to use the playwright mcp server which requires it.

    With continue set up and the models running, it is now time to code with private assistance.