<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi Lyndsey, <br>

    </p>

    <p>The comments here are from my personal perspective, not those of

      any academic or institutional affiliations I have. <br>

    </p>

    <p>Firstly I want to clarify some terms. AI and machine learning is

      a very broad space - it's used in almost every vertical - defence,

      aerospace, marketing, business analytics, education and so on. A

      large language model (LLM) is a specific type of machine learning

      model that is _predictive_ and _generative_. Given a prompt, it

      responds with an output that best matches the prompts _based on

      the data it has been trained on_. Other types of machine learning

      models (for example, image generators like DALLE-2) use different

      algorithms and are trained on different types of data. If you

      train an LLM on a particular set of data, for example, "movies",

      then it will predict what it's been trained on.</p>

    <p>## Distinguishing open source models from open source data <br>

    </p>

    <p>Models are built on data. <br>

    </p>

    <p>While a model can be open sourced, that isn't enough to make it

      "open source AI", IMHO - and the Open Source Initiative is having

      a broader conversation around what is meant by "open source AI"

      [5]. The algorithm, code, dependencies etc are required for

      reproducibility to make it "truly open", IMHO. </p>

    <p></p>

    <p>The key thing to keep in mind is that models are trained on

      *data* - it's not so important where the model is developed, but

      it's really important where the data comes from (this is the focus

      of the movement called "data-centric AI" [0]). <br>

    </p>

    <p>## LLMs in the Australian context <br>

    </p>

    <p>The challenge here for the Australian context is that many LLMs

      are either not trained on Australian-specific data, _or_, their

      predictions do not mirror the Australian context because their

      training set has little Australian content compared to American

      content. This can be specified using prompts - for example "Please

      create a news headline and two paragraphs of copy of an important

      event in Melbourne in 2002. Use Australian English." - feeding

      this prompt to ChatGPT (3) will use Australian spelling and

      Melbourne-specific landmarks. Chat GPT also seems to have some

      grounding in Indigenous knowledges, such as Bunjil. But if I ask

      ChatGPT the question "There's a bingle at Broady and the Western's

      chokkas back to the servo. What should I do?" then it quickly

      degrades to general advice, rather than context-specific

      suggestions ("Mate, can you turn off to Donnybrook Road or Plenty

      Road?"). <br>

    </p>

    <p></p>

    <p>AFAIK there are no open source, public LLMs being developed

      within Australia. </p>

    <p>The main academic conference for LLM work in Australia (which is

      considered less prestigious than international conferences such as

      NeurIPS [1] and ICML [2]) is the Australasian Language Technology

      Conference (ALTA), which I attended in December. My notes from

      this are public [3]. The main focus of this conference was the

      application of LLM technology to healthcare applications - such as

      mining medical records to assist health professionals in making

      accurate and timely diagnoses. These language models *are* being

      made open source, but they are smaller and much more specific than

      ChatGPT. </p>

    <p>Indeed, there was a lot of conversation at the conference about

      the *need* for a research or open source LLM in Australia because

      the costs of using ChatGPT and others (Claude, Bard) quickly

      become expensive. <br>

    </p>

    <p>In summary, there is a lot of scope for creating an

      Australian-specific, open source LLM. AFAIK, one doesn't exit. <br>

    </p>

    <p>## Other open source LLM efforts <br>

    </p>

    <p>The main open source LLM efforts are: <br>

    </p>

    <p>* <a class="moz-txt-link-freetext" href="https://www.eleuther.ai/">https://www.eleuther.ai/</a></p>

    <p>* Llama 2 by Meta</p>

    <p>* <a class="moz-txt-link-freetext" href="https://falconllm.tii.ae/">https://falconllm.tii.ae/</a></p>

    <p>* Mistral AI <br>

    </p>

    <p>None of these efforts are Australian-based. <br>

    </p>

    <p>## Other Australian open AI efforts <br>

    </p>

    <p>Many universities are tackling the AI-generated / LLM generated

      text issue by using AI-detection tools, primarily within the

      Turnitin suite. Turnitin LLC is headquartered in California, USA.

    </p>

    <p>From conversations I've had with archivists in Australian

      collection institutions, there is also a need for

      Australian-specific speech recognition tools - because tools like

      Whisper do not recognise Australian accented speech as well as

      other accents [4]. These tools are being used to transcribe audio

      visual archives. Again, a lot of this comes down to the *data*

      that the model is trained on. The key problem here is that Whisper

      was trained on 680k hours of speech data. To train a comparable

      model, you would need hundreds of thousands of hours of

      Australian-accented speech. The AusTalk archive, for example, from

      memory has maybe a couple of thousand hours (it's offline so I

      can't check). <br>

    </p>

    <p>In the last week, we've also seen rapid advances in voice

      synthesis - with MyShell releasing OpenVoice voice cloning

      technology [6]. The *model* is openly available, but again, the

      data and algorithms and code, are not. The challenge here for

      Australia is - do we want to be able to clone Australian-accented

      voices (yeah, nah ;-). There are no Australian TTS / voice cloning

      efforts that are open source, AFAIK. This raises major ethical

      questions for the likes of the ATO that uses voice recognition

      (and which has been previously spoofed [7]). <br>

    </p>

    <p>AI is rapidly being adopted into cybersecurity efforts,

      particularly in the field of adversarial AI. These capabilities

      are predominantly the domain of the Acronym Agencies (ASIO, DSD

      etc), and the folks at BSides might be useful to talk to about

      Australian efforts here. </p>

    <p>In terms of Australian AI institutes, there are a few: <br>

    </p>

    <p>* The Data61 CSIRO National Artificial Intelligence Centre -

      which doesn't actually produce any AI or ML, its remit is to

      encourage AI adoption -

<a class="moz-txt-link-freetext" href="https://www.csiro.au/en/work-with-us/industries/technology/National-AI-Centre">https://www.csiro.au/en/work-with-us/industries/technology/National-AI-Centre</a></p>

    <p>* Australian Institute for Machine Learning at University of

      Adelaide - Research in to AI / ML -

      <a class="moz-txt-link-freetext" href="https://www.adelaide.edu.au/aiml/">https://www.adelaide.edu.au/aiml/</a></p>

    <p>* A2I2 Institute at Deakin University -

      <a class="moz-txt-link-freetext" href="https://a2i2.deakin.edu.au/">https://a2i2.deakin.edu.au/</a></p>

    <p>* UNSW AI Institute - <a class="moz-txt-link-freetext" href="https://www.unsw.edu.au/unsw-ai">https://www.unsw.edu.au/unsw-ai</a></p>

    <p>Invariably, the academic institutions offer various forms of

      (mostly postgrad) offerings, with a heavy emphasis on "engaging

      industry" (read: getting industry to fund AI research because the

      government's research funding is paltry). <br>

      <br>

    </p>

    <p>## The problem of national capability and why the business

      adoption centres are exacerbating rather than addressing this,

      IMHO<br>

    </p>

    <p>(I was about to write "sovereign capability" but as I was

      reminded, correctly, recently, sovereignty was never ceded). <br>

    </p>

    <p>This all brings me to the key problem I have with the business

      adoption initiative. What I've outlined above is that Australia

      has very little national capability in AI, and even less in open

      source AI. What we adopt is, predominantly, American-owned AI that

      might then be shoe-horned into an Australian context. Sure,

      businesses should be looking to adopt AI to remain competitive.

      But the only AI they can adopt at the moment is, largely, American

      AI. <br>

    </p>

    <p>What the business adoption initiative seeks to do is spur

      *adoption* rather than *development*. What I would like to see

      happen is the development of national AI capability, preferably in

      the form of open source products that can be used by Australian

      businesses and organisations nationally. Perhaps one of our

      national organisations should focus on that, rather than

      encouraging Australian businesses to spend money overseas ...  <br>

    </p>

    <p>Kind regards, <br>

    </p>

    <p>Kathy Reid <br>

    </p>

    <p><br>

    </p>

    <p>[0] <a class="moz-txt-link-freetext" href="https://dcai.csail.mit.edu/">https://dcai.csail.mit.edu/</a></p>

    <p>[1] <a class="moz-txt-link-freetext" href="https://neurips.cc/">https://neurips.cc/</a><br>

    </p>

    <p>[2] <a class="moz-txt-link-freetext" href="https://icml.cc/">https://icml.cc/</a></p>

    <p>[3] <a class="moz-txt-link-freetext" href="https://blog.kathyreid.id.au/2023/12/10/alta2023/">https://blog.kathyreid.id.au/2023/12/10/alta2023/</a></p>

    <p>[4] My PhD research, forthcoming</p>

    <p>[5]

      <a class="moz-txt-link-freetext" href="https://blog.opensource.org/open-source-ai-establishing-a-common-ground/">https://blog.opensource.org/open-source-ai-establishing-a-common-ground/</a></p>

    <p>[6] <a class="moz-txt-link-freetext" href="https://research.myshell.ai/open-voice">https://research.myshell.ai/open-voice</a></p>

    <p>[7]<a class="moz-txt-link-freetext" href="https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai">https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai</a><br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 7/1/24 14:54, Lyndsey Jackson via

      linux-aus wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CACwVwkWiEXM4WzF7a8+yPJOOAc_=MJm1QnoMugrwvbWqNV+=JA@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="auto">

        <div dir="ltr">Hi all, <br>

        </div>

        <div dir="ltr">

          <div><br>

          </div>

          <div>on a bit of a fact finding reach out for people or

            connections from people working on open AI/LLM projects. </div>

          <div><br>

          </div>

          <div>Late last year a proposal for AI Centres to help

            SME's adopt AI dropped. <a

href="https://business.gov.au/grants-and-programs/artificial-intelligence-ai-adopt-program"

              style="text-decoration-line:none;color:rgb(66,133,244)"

              moz-do-not-send="true" class="moz-txt-link-freetext">https://business.gov.au/grants-and-programs/artificial-intelligence-ai-adopt-program</a> Before

            the holiday break I did some work on a proposal concept for

            agricultural value add (which is very, very broad), and I

            have insight into how some key groups were considering

            approaching it. </div>

          <div dir="auto"><br>

          </div>

          <div>And if you have any tech/advice/ideas/groups please let

            me know, I might not get a group to put a bid in but that's

            ok. I still want to know what's happening in open source and

            who is working on it. </div>

          <div><br>

          </div>

          <div><br>

          </div>

          <div>Thanks, </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Lyndsey<br>

            <div dir="ltr" data-smartmail="gmail_signature">

              <div dir="ltr">

                <div style="font-size:small">

                  <div><br>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

linux-aus mailing list

<a class="moz-txt-link-abbreviated" href="mailto:linux-aus@lists.linux.org.au">linux-aus@lists.linux.org.au</a>

<a class="moz-txt-link-freetext" href="http://lists.linux.org.au/mailman/listinfo/linux-aus">http://lists.linux.org.au/mailman/listinfo/linux-aus</a>

To unsubscribe from this list, send a blank email to

<a class="moz-txt-link-abbreviated" href="mailto:linux-aus-unsubscribe@lists.linux.org.au">linux-aus-unsubscribe@lists.linux.org.au</a></pre>

    </blockquote>

  </body>

</html>