<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Lyndsey, <br>
</p>
<p>The comments here are from my personal perspective, not those of
any academic or institutional affiliations I have. <br>
</p>
<p>Firstly I want to clarify some terms. AI and machine learning is
a very broad space - it's used in almost every vertical - defence,
aerospace, marketing, business analytics, education and so on. A
large language model (LLM) is a specific type of machine learning
model that is _predictive_ and _generative_. Given a prompt, it
responds with an output that best matches the prompts _based on
the data it has been trained on_. Other types of machine learning
models (for example, image generators like DALLE-2) use different
algorithms and are trained on different types of data. If you
train an LLM on a particular set of data, for example, "movies",
then it will predict what it's been trained on.</p>
<p>## Distinguishing open source models from open source data <br>
</p>
<p>Models are built on data. <br>
</p>
<p>While a model can be open sourced, that isn't enough to make it
"open source AI", IMHO - and the Open Source Initiative is having
a broader conversation around what is meant by "open source AI"
[5]. The algorithm, code, dependencies etc are required for
reproducibility to make it "truly open", IMHO. </p>
<p></p>
<p>The key thing to keep in mind is that models are trained on
*data* - it's not so important where the model is developed, but
it's really important where the data comes from (this is the focus
of the movement called "data-centric AI" [0]). <br>
</p>
<p>## LLMs in the Australian context <br>
</p>
<p>The challenge here for the Australian context is that many LLMs
are either not trained on Australian-specific data, _or_, their
predictions do not mirror the Australian context because their
training set has little Australian content compared to American
content. This can be specified using prompts - for example "Please
create a news headline and two paragraphs of copy of an important
event in Melbourne in 2002. Use Australian English." - feeding
this prompt to ChatGPT (3) will use Australian spelling and
Melbourne-specific landmarks. Chat GPT also seems to have some
grounding in Indigenous knowledges, such as Bunjil. But if I ask
ChatGPT the question "There's a bingle at Broady and the Western's
chokkas back to the servo. What should I do?" then it quickly
degrades to general advice, rather than context-specific
suggestions ("Mate, can you turn off to Donnybrook Road or Plenty
Road?"). <br>
</p>
<p></p>
<p>AFAIK there are no open source, public LLMs being developed
within Australia. </p>
<p>The main academic conference for LLM work in Australia (which is
considered less prestigious than international conferences such as
NeurIPS [1] and ICML [2]) is the Australasian Language Technology
Conference (ALTA), which I attended in December. My notes from
this are public [3]. The main focus of this conference was the
application of LLM technology to healthcare applications - such as
mining medical records to assist health professionals in making
accurate and timely diagnoses. These language models *are* being
made open source, but they are smaller and much more specific than
ChatGPT. </p>
<p>Indeed, there was a lot of conversation at the conference about
the *need* for a research or open source LLM in Australia because
the costs of using ChatGPT and others (Claude, Bard) quickly
become expensive. <br>
</p>
<p>In summary, there is a lot of scope for creating an
Australian-specific, open source LLM. AFAIK, one doesn't exit. <br>
</p>
<p>## Other open source LLM efforts <br>
</p>
<p>The main open source LLM efforts are: <br>
</p>
<p>* <a class="moz-txt-link-freetext" href="https://www.eleuther.ai/">https://www.eleuther.ai/</a></p>
<p>* Llama 2 by Meta</p>
<p>* <a class="moz-txt-link-freetext" href="https://falconllm.tii.ae/">https://falconllm.tii.ae/</a></p>
<p>* Mistral AI <br>
</p>
<p>None of these efforts are Australian-based. <br>
</p>
<p>## Other Australian open AI efforts <br>
</p>
<p>Many universities are tackling the AI-generated / LLM generated
text issue by using AI-detection tools, primarily within the
Turnitin suite. Turnitin LLC is headquartered in California, USA.
</p>
<p>From conversations I've had with archivists in Australian
collection institutions, there is also a need for
Australian-specific speech recognition tools - because tools like
Whisper do not recognise Australian accented speech as well as
other accents [4]. These tools are being used to transcribe audio
visual archives. Again, a lot of this comes down to the *data*
that the model is trained on. The key problem here is that Whisper
was trained on 680k hours of speech data. To train a comparable
model, you would need hundreds of thousands of hours of
Australian-accented speech. The AusTalk archive, for example, from
memory has maybe a couple of thousand hours (it's offline so I
can't check). <br>
</p>
<p>In the last week, we've also seen rapid advances in voice
synthesis - with MyShell releasing OpenVoice voice cloning
technology [6]. The *model* is openly available, but again, the
data and algorithms and code, are not. The challenge here for
Australia is - do we want to be able to clone Australian-accented
voices (yeah, nah ;-). There are no Australian TTS / voice cloning
efforts that are open source, AFAIK. This raises major ethical
questions for the likes of the ATO that uses voice recognition
(and which has been previously spoofed [7]). <br>
</p>
<p>AI is rapidly being adopted into cybersecurity efforts,
particularly in the field of adversarial AI. These capabilities
are predominantly the domain of the Acronym Agencies (ASIO, DSD
etc), and the folks at BSides might be useful to talk to about
Australian efforts here. </p>
<p>In terms of Australian AI institutes, there are a few: <br>
</p>
<p>* The Data61 CSIRO National Artificial Intelligence Centre -
which doesn't actually produce any AI or ML, its remit is to
encourage AI adoption -
<a class="moz-txt-link-freetext" href="https://www.csiro.au/en/work-with-us/industries/technology/National-AI-Centre">https://www.csiro.au/en/work-with-us/industries/technology/National-AI-Centre</a></p>
<p>* Australian Institute for Machine Learning at University of
Adelaide - Research in to AI / ML -
<a class="moz-txt-link-freetext" href="https://www.adelaide.edu.au/aiml/">https://www.adelaide.edu.au/aiml/</a></p>
<p>* A2I2 Institute at Deakin University -
<a class="moz-txt-link-freetext" href="https://a2i2.deakin.edu.au/">https://a2i2.deakin.edu.au/</a></p>
<p>* UNSW AI Institute - <a class="moz-txt-link-freetext" href="https://www.unsw.edu.au/unsw-ai">https://www.unsw.edu.au/unsw-ai</a></p>
<p>Invariably, the academic institutions offer various forms of
(mostly postgrad) offerings, with a heavy emphasis on "engaging
industry" (read: getting industry to fund AI research because the
government's research funding is paltry). <br>
<br>
</p>
<p>## The problem of national capability and why the business
adoption centres are exacerbating rather than addressing this,
IMHO<br>
</p>
<p>(I was about to write "sovereign capability" but as I was
reminded, correctly, recently, sovereignty was never ceded). <br>
</p>
<p>This all brings me to the key problem I have with the business
adoption initiative. What I've outlined above is that Australia
has very little national capability in AI, and even less in open
source AI. What we adopt is, predominantly, American-owned AI that
might then be shoe-horned into an Australian context. Sure,
businesses should be looking to adopt AI to remain competitive.
But the only AI they can adopt at the moment is, largely, American
AI. <br>
</p>
<p>What the business adoption initiative seeks to do is spur
*adoption* rather than *development*. What I would like to see
happen is the development of national AI capability, preferably in
the form of open source products that can be used by Australian
businesses and organisations nationally. Perhaps one of our
national organisations should focus on that, rather than
encouraging Australian businesses to spend money overseas ... <br>
</p>
<p>Kind regards, <br>
</p>
<p>Kathy Reid <br>
</p>
<p><br>
</p>
<p>[0] <a class="moz-txt-link-freetext" href="https://dcai.csail.mit.edu/">https://dcai.csail.mit.edu/</a></p>
<p>[1] <a class="moz-txt-link-freetext" href="https://neurips.cc/">https://neurips.cc/</a><br>
</p>
<p>[2] <a class="moz-txt-link-freetext" href="https://icml.cc/">https://icml.cc/</a></p>
<p>[3] <a class="moz-txt-link-freetext" href="https://blog.kathyreid.id.au/2023/12/10/alta2023/">https://blog.kathyreid.id.au/2023/12/10/alta2023/</a></p>
<p>[4] My PhD research, forthcoming</p>
<p>[5]
<a class="moz-txt-link-freetext" href="https://blog.opensource.org/open-source-ai-establishing-a-common-ground/">https://blog.opensource.org/open-source-ai-establishing-a-common-ground/</a></p>
<p>[6] <a class="moz-txt-link-freetext" href="https://research.myshell.ai/open-voice">https://research.myshell.ai/open-voice</a></p>
<p>[7]<a class="moz-txt-link-freetext" href="https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai">https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai</a><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 7/1/24 14:54, Lyndsey Jackson via
linux-aus wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACwVwkWiEXM4WzF7a8+yPJOOAc_=MJm1QnoMugrwvbWqNV+=JA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="auto">
<div dir="ltr">Hi all, <br>
</div>
<div dir="ltr">
<div><br>
</div>
<div>on a bit of a fact finding reach out for people or
connections from people working on open AI/LLM projects. </div>
<div><br>
</div>
<div>Late last year a proposal for AI Centres to help
SME's adopt AI dropped. <a
href="https://business.gov.au/grants-and-programs/artificial-intelligence-ai-adopt-program"
style="text-decoration-line:none;color:rgb(66,133,244)"
moz-do-not-send="true" class="moz-txt-link-freetext">https://business.gov.au/grants-and-programs/artificial-intelligence-ai-adopt-program</a> Before
the holiday break I did some work on a proposal concept for
agricultural value add (which is very, very broad), and I
have insight into how some key groups were considering
approaching it. </div>
<div dir="auto"><br>
</div>
<div>And if you have any tech/advice/ideas/groups please let
me know, I might not get a group to put a bid in but that's
ok. I still want to know what's happening in open source and
who is working on it. </div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks, </div>
<div dir="auto"><br>
</div>
<div dir="auto">Lyndsey<br>
<div dir="ltr" data-smartmail="gmail_signature">
<div dir="ltr">
<div style="font-size:small">
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
linux-aus mailing list
<a class="moz-txt-link-abbreviated" href="mailto:linux-aus@lists.linux.org.au">linux-aus@lists.linux.org.au</a>
<a class="moz-txt-link-freetext" href="http://lists.linux.org.au/mailman/listinfo/linux-aus">http://lists.linux.org.au/mailman/listinfo/linux-aus</a>
To unsubscribe from this list, send a blank email to
<a class="moz-txt-link-abbreviated" href="mailto:linux-aus-unsubscribe@lists.linux.org.au">linux-aus-unsubscribe@lists.linux.org.au</a></pre>
</blockquote>
</body>
</html>